Setting up Timing Routines
Welcome to a new Linux column focused on demonstrating and comparing the performance of the Linux and Windows 2000 operating systems. Columnist Ed Bradford compares operating system-level features rather than applications with the goal of providing an understanding of each operating system’s best performance features. Source code is included and represents “best programming practices” for each platform, in as impartial an environment as possible.
In this new series of articles, I’ll focus on high-performance programming techniques for the Linux and Windows 2000 operating systems. I’ll show you useful and efficient programming practices solving the same problems on Linux and Windows 2000. Once solved, we’ll measure at least one aspect of the performance of each platform. A variety of performance testing scripts and programs will demonstrate the speed of operating system features. The goal is to show how to get the best possible performance from each operating system and, as an aside, compare the performance of the two platforms.
Performance Test Overview
Our tests will examine memory speeds, system call speeds, IO throughput, context switching speeds, and a number of other programming facilities common to both platforms. Access to the Windows Registry will not be measured. In all cases the source code will either be published in the article or available for free download. Constructive criticism of the software programming is sought. My purpose is to demonstrate best programming practices first — and only then, to compare the performance. Comments from readers are heartily encouraged in the discussion forum for this article.
We’ll treat Linux as two operating systems: the Linux 2.2.16 kernel and the 2.4.2 kernel. Windows 2000 will be considered the Windows 2000 build 2195 system and, when released, the Windows XP kernel. All runs for a given test will be carried out on exactly the same hardware. Primarily the hardware will be a dual-booting IBM ThinkPad 600X with 576 MB of memory and two 12-GB disk drives.
Our programs will be written in C++ although very little object-oriented code will be used. C++ for our purposes is just C with strong type checking. On Linux we will be using the gcc that is included with the Red Hat 7.0 distribution. On Windows 2000, we will use the Microsoft C++ Version 12.00.8168 from Visual Studio 6.0.
Measurement Utilities
This first article will define the utility routines we will need for making and reporting measurements on Windows 2000 and Linux. Our list of tools is short: an interface for measuring time, a routine that returns a string describing the operating system, a simple and more efficient interface to the simple malloc() memory allocation routine, and an input routine for handling large numbers. (I’ll go into detail about the timing routines below).
Our memory allocator is called Malloc(int). All it does is call the malloc(int) routine, and if malloc(int) fails, Malloc() prints an error message and exits. When we test memory allocation performance, we won’t use this routine, but in the meantime, it streamlines coding. We use malloc() because it is available on both Windows 2000 and Linux. Malloc() is uninteresting and identical on both systems.
The next routine is atoik(char *). This routine is identical to the atoi() routine but also takes a trailing “k” or “m”. A trailing “k” or “m” means to multiply the already parsed number by 1024 for “k” or by 1024*1024 for “m”. Either “k” or “m” can be upper or lower case, and any number of them can be appended. Atoik() only returns 32 bits, so any number greater than or equal to 2 GB is probably not going to work. When that becomes an issue, we will invent atoik64(). This routine is identical on both systems
Measuring Time
How are we going to measure time on our two operating systems? Let’s look at our choices. On Windows 2000, there are two APIs for measuring time intervals. The first is GetTickCount(). This function reports the number of milliseconds elapsed since the system was booted. GetTickCount() has a “clock tick” granularity. That means that it is only updated when the system clock ticks. In Windows’ case, it is only updated every 10 milliseconds. Therefore its granularity is no better than 10 milliseconds or ten thousand microseconds.
Windows 2000 also has a QueryPerformanceCounter() API that retrieves the current value of the 64-bit high-resolution performance counter. The meaning of each “tick” included in the result of a call to QueryPerformanceCounter() depends on the value returned by QueryPerformanceFrequency(). The frequency is the number of counter increments per second, so seconds are expressed as:
Computation of seconds using QueryPerformanceCounter()
LARGE_INTEGER tim, freq;
double seconds;
QueryPerformanceCounter(&tim);
QeryPerformanceFrequency(&freq);
seconds = (double)tim / (double) freq;
Because of the resolution issues associated with GetTickCount() we will be using QueryPerformanceCounter exclusively. We must, however, be mindful that if a timing is as short as the overhead of the QueryPerformanceCounter() API, then our results are probably not credible. We will measure the overhead of our timing routines below.
On Linux we use the gettimeofday() API. It is the only API that meets our need to time at the sub-millisecond level.
Now that we have chosen the APIs to use, we need to define our own APIs so that our program can use them without having to know about the host operating system. We chose the following interface for this function:
Our timing routine interfaces
void tstart();
void tend();
double tval();
Tstart() records the time value in static memory of when it is called. Tend() records the time value in static memory of when it is called. Tval() takes the tstart and tend time values, converts them to double precision numbers, subtracts them, and returns the result as a double. The interface is simple to implement on Linux and Windows, and it performs the needed timing functions.
The implementation of timing routines for Linux and Windows 2000 follows. We cannot avoid system dependencies, but our goal will be to write optimal code while minimizing the number of conditional definitions. Here is the listing for the timing routines.
Timing routines
#ifdef _WIN32
static LARGE_INTEGER _tstart, _tend;
static LARGE_INTEGER freq;
void tstart(void)
{
static int first = 1;
if(first) {
QueryPerformanceFrequency(&freq);
first = 0;
}
QueryPerformanceCounter(&_tstart);
}
void tend(void)
{
QueryPerformanceCounter(&_tend);
}
double tval()
{
return ((double)_tend.QuadPart -
(double)_tstart.QuadPart)/((double)freq.QuadPart);
}
#else
static struct timeval _tstart, _tend;
static struct timezone tz;
void tstart(void)
{
gettimeofday(&_tstart, &tz);
}
void tend(void)
{
gettimeofday(&_tend,&tz);
}
double tval()
{
double t1, t2;
t1 = (double)_tstart.tv_sec + (double)_tstart.tv_usec/(1000*1000);
t2 = (double)_tend.tv_sec + (double)_tend.tv_usec/(1000*1000);
return t2-t1;
}
#endif
The last routine is “char *ver()”. This simple function returns a string describing the current operating system environment. As you can see in the source code, it is entirely different on each platform. Included at the end of the routine is a conditionally defined main() routine for testing it. Compilation is done as follows:
Compiling ver.cpp into a program
The ver.exe program will be used to log operating system version information to output files.
Ver.cpp – Prints Operating System Version
The timing functions defined above will meet our needs. Before we start using them, we should know how long they take to execute. In reality, we only need to know how long tstart() and tend() take. Of those two, since they are identical in form, we only need to time one of them. This we will do on both Windows 2000 and Linux, using the time-timers.cpp program to perform our timing analysis for the timing functions. Note that the listing has only the main() routine. The actual program has all of the timer functions and the atoik() source code preceding the code in the listing.
time-timers.cpp – Program to time the timers
char *applname;
int main(int ac, char *av[])
{
long count = 100000;
long i;
double t;
char *v = ver();
char *q;
applname = av[0];
if(strrchr(applname,SLASHC))
applname = strrchr(applname,SLASHC) + 1;
if(ac > 1) {
count = atoik(av[1]);
ac--;
av++;
if(count < 0)
count = 100000;
}
tstart();
for(i = 0; i < count; i++)
tend();
tend();
t = tval();
printf("%s: ",applname);
printf("%d calls to tend() = %8.3f seconds %8.3f usec/call\n",
count,
t,
(t/( (double) count ))*1E6);
return 0;
}
Almost everything is in place. We now compile our time-timers.cpp program as follows:
Compiling time-timers.cpp
on LINUX
gcc -O2 time-timers.cpp -o time-timers
on Windows 2000
cl -O2 time-timers.cpp -o time-timers.exe
The program was written to take a single optional argument. By default, the program calls tend() 100,000 times. Repeated runs of the program should guarantee that a reproducible time results. We ran the program with the default count 10 times. The results are shown in the table for Linux 2.2.16, Linux 2.4.2, and Windows 2000.
I ran the following script on Linux 2.2.16, Linux 2.4.2 and Windows 2000 on the same Thinkpad. Actually, initially I used a symmetric multiprocessing (SMP) version of Linux 2.4.2. I accidentally built and tested with the SMP version. Discovering this, I also built the single processor version and ran the tests with the single processor version. I have reported the results for both versions below (for interest only).
Script for Running Time-Timers
This produces 20 runs of 1 million calls to tend(). Here are the results:
Conclusion
In this article, I’ve defined the timing routines we will be using in this series, reviewed how we will identify the results using a simple ver.cpp program, and lastly, measured and documented the overhead of our timing routines. The measurements we produce will inevitably lead to a performance characterization of the operating systems were are discussing.
Using a program called time-timers.cpp, we learned that the overhead of the timing routines is less than 2 microseconds on Windows and less than 1 microsecond on Linux. Please feel free to make use of the discussion forum for your reactions, thoughts, questions, etc. on this article.
Resources
• Participate in the discussion forum on this article. (You can also click Discuss at the top or bottom of the article to access the forum.)
• Participate in the discussion forum on this article by clicking Discuss at the top or bottom of the article.
• Files referenced in this column:
• ver.cpp
• Performance comparisons between Linux and Windows NT have appeared in the trade press in the past. Mostly they focus on higher level operations. Ziff Davis publishes two free benchmarks that are relevant to this work:
• WebBench measures Web page accesses.
• NetBench measures network file accesses.
• Results from these benchmarks have been published and argued in the press and on Web sites. A recent comparison appears on Microsoft.com.
If you found this post useful you may also want to check these out:
