Results of some quick research on timing in Win32
by Ryan Geiss - 16 August 2002 (...with updates since then)
You might be thinking to yourself: this is a pretty simple thing
to be posting; what's the big deal? The deal is that somehow,
good timing code has eluded me for years. Finally, frustrated,
I dug in and did some formal experiments on a few different computers,
testing the timing precision they could offer, using various win32
functions. I was fairly surprised by the results!
I tested on three computers; here are their specs:
Gemini: 933 mhz desktop, win2k
Vaio: 333 mhz laptop, win98
HP: 733 mhz laptop, win2k
Also, abbreviations to be used hereafter:
ms: milliseconds, or 1/1,000 of a second
us: microseconds, or 1/1,000,000 of a second
timeGetTime - what they don't tell you
First, I tried to determine the precision of timeGetTime().
In order to do this, I simply ran a loop, constantly polling
timeGetTime() until the time changed, and then printing the
delta (between the prev. time and the new time). I then looked
at the output, and for each computer, took the minimum of all
the delta's that occured. (Usually, the minimum was very solid,
occuring about 90% of the time.) The results:
Resolution of timeGetTime()
Gemini: 10 ms
Vaio: 1 ms
HP: 10 ms
For now, I am assuming that it was the OS kernel that made the
difference: win2k offers a max. precision of 10 ms for timeGetTime(),
while win98 is much better, at 1 ms. I assume that WinXP would also
have a precision of 10 ms, and that Win95 would be ~1 ms, like Win98.
(If anyone tests this out, please let me know either way!)
(Note that using timeGetTime() unfortunately requires linking to
winmm.lib, which slightly increases your file size. You could use
GetTickCount() instead, which doesn't require linking to winmm.lib,
but it tends to not have as good of a timer resolution... so I would
recommend sticking with timeGetTime().
Next, I tested Sleep(). A while back I noticed that when you call
Sleep(1), it doesn't really sleep for 1 ms; it usually sleeps for longer
than that. I verified this by calling Sleep(1) ten times in a row,
and taking the difference in timeGetTime() readings from the beginning
to the end. Whatever delta there was for these ten sleeps, I just divided
it by 10 to get the average duration of Sleep(1). This turned out to be:
Average duration of Sleep(1)
Gemini: 10 ms (10 calls to Sleep(1) took exactly 100 ms)
Vaio: ~4 ms (10 calls to Sleep(1) took 35-45 ms)
HP: 10 ms (10 calls to Sleep(1) took exactly 100 ms)
Now, this was disturbing, because it meant that if you call Sleep(1)
and Sleep(9) on a win2k machine, there is no difference - it still
sleeps for 10 ms! "So *this* is the reason all my timing code sucks,"
I sighed to myself.
Given that, I decided to give up on Sleep() and timeGetTime(). The
application I was working on required really good fps limiting, and
10ms Sleeps were not precise enough to do a good job. So I looked
elsewhere.
UPDATE: Matthijs de Boer points out that the timeGetTime function
returns a DWORD value, which will wraps around to 0 every 2^32
milliseconds, which is about 49.71 days, so you should write your
code to be aware of this possibility.
timeBeginPeriod / timeEndPeriod
HOWEVER, I should not have given up so fast! It turns out that there
is a win32 command, timeBeginPeriod(), which solves our problem:
it lowers the granularity of Sleep() to whatever parameter you give it.
So if you're on windows 2000 and you call timeBeginPeriod(1) and then
Sleep(1), it will truly sleep for just 1 millisecond, rather than the
default 10!
timeBeginPeriod() only affects the granularity of Sleep() for the application
that calls it, so don't worry about messing up the system with it. Also,
be sure you call timeEndPeriod() when your program exits, with the same
parameter you fed into timeBeginPeriod() when your program started (presumably
1). Both of these functions are in winmm.lib, so you'll have to link to it
if you want to lower your Sleep() granularity down to 1 ms.
How reliable is it? I have yet to find a system for which timeBeginPeriod(1)
does not drop the granularity of Sleep(1) to 1 or, at most, 2 milliseconds.
If anyone out there does, please let me know
(e-mail: );
I'd like to hear about it, and I will post a warning here.
Note also that calling timeBeginPeriod() also affects the granularity of some
other timing calls, such as CreateWaitableTimer() and WaitForSingleObject();
however, some functions are still unaffected, such as _ftime(). (Special
thanks to Mark Epstein for pointing this out to me!)
some convenient test code
The following code will tell you:
1. what the granularity, or minimum resolution, of calls to timeGetTime() are,
on your system. In other words, if you sit in a tight loop and call timeGetTime(),
only noting when the value returned changes, what value do you get? This
granularity tells you, more or less, what kind of potential error to expect in
the result when calling timeGetTime().
2. it also tests how long your machine really sleeps when you call Sleep(1).
Often this is actually 2 or more milliseconds, so be careful!
NOTE that these tests are performed after calling timeBeginPeriod(1), so if
you forget to call timeBeginPeriod(1) in your own init code, you might not get
as good of granularity as you see from this test!
#include <stdio.h>
#include "windows.h"
int main(int argc, char **argv)
{
const int count = 64;
timeBeginPeriod(1);
printf("1. testing granularity of timeGetTime()...\n");
int its = 0;
long cur = 0, last = timeGetTime();
while (its < count) {
cur = timeGetTime();
if (cur != last) {
printf("%ld ", cur-last);
last = cur;
its++;
}
}
printf("\n\n2. testing granularity of Sleep(1)...\n ");
long first = timeGetTime();
cur = first;
last = first;
for (int n=0; n<count; n++) {
Sleep(1);
cur = timeGetTime();
printf("%d ", cur-last);
last = cur;
}
printf("\n");
return 0;
}
RDTSC: Eh, no thanks
On the web, I found several references to the "RDTSC" Pentium instruction,
which stands for "Read Time Stamp Counter." This assembly instruction returns
an unsigned 64-bit integer reading on the processor's internal high-precision
timer. In order to get the frequency of the timer (how much the timer return
value will increment in 1 second), you can read the registry for the machine's
speed (in MHz - millions of cycles per second), like this:
// WARNING: YOU DON'T REALLY WANT TO USE THIS FUNCTION
bool GetPentiumClockEstimateFromRegistry(unsigned __int64 *frequency)
{
HKEY hKey;
DWORD cbBuffer;
LONG rc;
*frequency = 0;
rc = RegOpenKeyEx(
HKEY_LOCAL_MACHINE,
"Hardware\\Description\\System\\CentralProcessor\\0",
0,
KEY_READ,
&hKey
);
if (rc == ERROR_SUCCESS)
{
cbBuffer = sizeof (DWORD);
DWORD freq_mhz;
rc = RegQueryValueEx
(
hKey,
"~MHz",
NULL,
NULL,
(LPBYTE)(&freq_mhz),
&cbBuffer
);
if (rc == ERROR_SUCCESS)
*frequency = freq_mhz*1024*1024;
RegCloseKey (hKey);
}
return (*frequency > 0);
}
Result of GetPentiumClockEstimateFromRegistry()
Gemini: 975,175,680 Hz
Vaio: FAILED.
HP: 573,571,072 Hz <-- strange...
Empirical tests: RDTSC delta after Sleep(1000)
Gemini: 931,440,000 Hz
Vaio: 331,500,000 Hz
HP: 13,401,287 Hz
However, as you can see, this failed on Vaio (the win98 laptop).
Worse yet, however, is that on the HP, the value in the registry
does not match the MHz rating of the machine (733). That would
be okay if the value was actually the rate at which the timer
ticked; but, after doing some empirical testing, it turns out that
the HP's timer frequency is really 13 MHz. Trusting the
registry reading on the HP would be a big, big mistake!
So, one conclusion is: don't try to read the registry to get the
timer frequency; you're asking for trouble. Instead, do it yourself.
Just call Sleep(1000) to allow 1 second (plus or minus ~1%) to pass,
calling GetPentiumTimeRaw() (below) at the beginning and end, and then
simply subtract the two unsigned __int64's, and voila, you now know
the frequency of the timer that feeds RDTSC on the current system.
(*watch out for timer wraps during that 1 second, though...)
Note that you could easily do this in the background, though, using
timeGetTime() instead of Sleep(), so there wouldn't be a 1-second pause
when your program starts.
int GetPentiumTimeRaw(unsigned __int64 *ret)
{
// returns 0 on failure, 1 on success
// warning: watch out for wraparound!
// get high-precision time:
__try
{
unsigned __int64 *dest = (unsigned __int64 *)ret;
__asm
{
_emit 0xf // these two bytes form the 'rdtsc' asm instruction,
_emit 0x31 // available on Pentium I and later.
mov esi, dest
mov [esi ], eax // lower 32 bits of tsc
mov [esi+4], edx // upper 32 bits of tsc
}
return 1;
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
return 0;
}
return 0;
}
Once you figure out the frequency, using this 1-second test, you can now
translate readings from the cpu's timestamp counter directly into a real
'time' reading, in seconds:
double GetPentiumTimeAsDouble(unsigned __int64 frequency)
{
// returns < 0 on failure; otherwise, returns current cpu time, in seconds.
// warning: watch out for wraparound!
if (frequency==0)
return -1.0;
// get high-precision time:
__try
{
unsigned __int64 high_perf_time;
unsigned __int64 *dest = &high_perf_time;
__asm
{
_emit 0xf // these two bytes form the 'rdtsc' asm instruction,
_emit 0x31 // available on Pentium I and later.
mov esi, dest
mov [esi ], eax // lower 32 bits of tsc
mov [esi+4], edx // upper 32 bits of tsc
}
__int64 time_s = (__int64)(high_perf_time / frequency); // unsigned->sign conversion should be safe here
__int64 time_fract = (__int64)(high_perf_time % frequency); // unsigned->sign conversion should be safe here
// note: here, we wrap the timer more frequently (once per week)
// than it otherwise would (VERY RARELY - once every 585 years on
// a 1 GHz), to alleviate floating-point precision errors that start
// to occur when you get to very high counter values.
double ret = (time_s % (60*60*24*7)) + (double)time_fract/(double)((__int64)frequency);
return ret;
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
return -1.0;
}
return -1.0;
}
This works pretty well, works on ALL Pentium I and later processors, and offers
AMAZING precision. However, it can be messy, especially working that 1-second
test in there with all your other code, so that it runs in the background.
UPDATE: Ross Bencina was kind enough to point out to me that rdtsc "is a per-cpu
operation, so on multiprocessor systems you have to be careful that multiple calls
to rdtsc are actually executing on the same cpu." (You can do that using the
SetThreadAffinityMask() function.) Thanks Ross!
QueryPerformanceFrequency & QueryPerformanceCounter: Nice
There is one more item in our bag of tricks. It is simple, elegant, and as far
as I can tell, extremely accurate and reliable. It is a pair of win32 functions:
QueryPerformanceFrequency and QueryPerformanceCounter.
QueryPerformanceFrequency returns the amount that the counter will increment over
1 second; QueryPerformanceCounter returns a LARGE_INTEGER (a 64-bit *signed* integer)
that is the current value of the counter.
Perhaps I am lucky, but it works flawlessly on my 3 machines. The MSDN library
says that it should work on Windows 95 and later.
Here are some results:
Return value of QueryPerformanceFrequency
Gemini: 3,579,545 Hz
Vaio: 1,193,000 Hz
HP: 3,579,545 Hz
Maximum # of unique readings I could get in 1 second
Gemini: 658,000 (-> 1.52 us resolution!)
Vaio: 174,300 (-> 5.73 us resolution!)
HP: 617,000 (-> 1.62 us resolution!)
I was pretty excited to see timing resolutions in the low-microsecond
range. Note that for the latter test, I avoided printing any text
during the 1-second interval, as it would drastically affect the outcome.
Now, here is my question to you: do these two functions work for you?
What OS does the computer run, what is the MHz rating, and is it a laptop
or desktop? What was the result of QueryPerformanceFrequency?
What was the max. # of unique readings you could get in 1 second?
Can you find any computers that it doesn't work on? Let me know (e-mail: ), and
I'll collect & publish everyone's results here.
So, until I find some computers that QueryPerformanceFrequency &
QueryPerformanceCounter don't work on, I'm sticking with them. If they fail,
I've got backup code that will kick in, which uses timeGetTime(); I didn't
bother to use RDTSC because of the calibration issue, and I'm hopeful that
these two functions are highly reliable. I suppose only feedback from
readers like you will tell... =)
UPDATE: a few people have written e-mail pointing me to this Microsoft Knowledge
Base article which outlines some cases in which the QueryPerformanceCounter
function can unexpectedly jump forward by a few seconds.
UPDATE: Matthijs de Boer points out that you can use the SetThreadAffinityMask()
function to make your thread stick to one core or the other, so that 'rdtsc' and
QueryPerformanceCounter() don't have timing issues in dual core systems.
Accurate FPS Limiting / High-precision 'Sleeps'
So now, when I need to do FPS limiting (limiting the framerate to some
maximum), I don't just naively call Sleep() anymore. Instead, I use
QueryPerformanceCounter in a loop that runs Sleep(0). Sleep(0) simply
gives up your thread's current timeslice to another waiting thread; it
doesn't really sleep at all. So, if you just keep calling Sleep(0)
in a loop until QueryPerformanceCounter() says you've hit the right time,
you'll get ultra-accurate FPS readings.
There is one problem with this kind of fps limiting: it will use up
100% of the CPU. Even though the computer WILL remain
quite responsive, because the app sucking up the idle time is being very
"nice", this will still look very bad on the CPU meter (which will stay
at 100%) and, much worse, it will drain the battery quite quickly on
laptops.
To get around this, I use a hybrid algorithm that uses Sleep() to do the
bulk of the waiting, and QueryPerformanceCounter() to do the finishing
touches, making it accurate to ~10 microseconds, but still wasting very
little processor.
My code for accurate FPS limiting looks something like this, and runs
at the end of each frame, immediately after the page flip:
// note: BE SURE YOU CALL timeBeginPeriod(1) at program startup!!!
// note: BE SURE YOU CALL timeEndPeriod(1) at program exit!!!
// note: that will require linking to winmm.lib
// note: never use static initializers (like this) with Winamp plug-ins!
static LARGE_INTEGER m_prev_end_of_frame = 0;
int max_fps = 60;
LARGE_INTEGER t;
QueryPerformanceCounter(&t);
if (m_prev_end_of_frame.QuadPart != 0)
{
int ticks_to_wait = (int)m_high_perf_timer_freq.QuadPart / max_fps;
int done = 0;
do
{
QueryPerformanceCounter(&t);
int ticks_passed = (int)((__int64)t.QuadPart - (__int64)m_prev_end_of_frame.QuadPart);
int ticks_left = ticks_to_wait - ticks_passed;
if (t.QuadPart < m_prev_end_of_frame.QuadPart) // time wrap
done = 1;
if (ticks_passed >= ticks_to_wait)
done = 1;
if (!done)
{
// if > 0.002s left, do Sleep(1), which will actually sleep some
// steady amount, probably 1-2 ms,
// and do so in a nice way (cpu meter drops; laptop battery spared).
// otherwise, do a few Sleep(0)'s, which just give up the timeslice,
// but don't really save cpu or battery, but do pass a tiny
// amount of time.
if (ticks_left > (int)m_high_perf_timer_freq.QuadPart*2/1000)
Sleep(1);
else
for (int i=0; i<10; i++)
Sleep(0); // causes thread to give up its timeslice
}
}
while (!done);
}
m_prev_end_of_frame = t;
...which is trivial to convert this into a high-precision Sleep() function.
Conclusions & Summary
Using regular old timeGetTime() to do timing is not reliable on many Windows-based
operating systems because the granularity of the system timer can be as high as 10-15
milliseconds, meaning that timeGetTime() is only accurate to 10-15 milliseconds.
[Note that the high granularities occur on NT-based operation systems like Windows NT,
2000, and XP. Windows 95 and 98 tend to have much better granularity, around 1-5 ms.]
However, if you call timeBeginPeriod(1) at the beginning of your program (and
timeEndPeriod(1) at the end), timeGetTime() will usually become accurate to 1-2
milliseconds, and will provide you with extremely accurate timing information.
Sleep() behaves similarly; the length of time that Sleep() actually sleeps for
goes hand-in-hand with the granularity of timeGetTime(), so after calling
timeBeginPeriod(1) once, Sleep(1) will actually sleep for 1-2 milliseconds, Sleep(2)
for 2-3, and so on (instead of sleeping in increments as high as 10-15 ms).
For higher precision timing (sub-millisecond accuracy), you'll probably want to avoid
using the assembly mnemonic RDTSC because it is hard to calibrate; instead, use
QueryPerformanceFrequency and QueryPerformanceCounter, which are accurate to less
than 10 microseconds (0.00001 seconds).
For simple timing, both timeGetTime and QueryPerformanceCounter work well, and
QueryPerformanceCounter is obviously more accurate. However, if you need to do
any kind of "timed pauses" (such as those necessary for framerate limiting), you
need to be careful of sitting in a loop calling QueryPerformanceCounter, waiting
for it to reach a certain value; this will eat up 100% of your processor. Instead,
consider a hybrid scheme, where you call Sleep(1) (don't forget timeBeginPeriod(1)
first!) whenever you need to pass more than 1 ms of time, and then only enter the
QueryPerformanceCounter 100%-busy loop to finish off the last < 1/1000th of a
second of the delay you need. This will give you ultra-accurate delays (accurate
to 10 microseconds), with very minimal CPU usage. See the code above.
Please Note: Several people have written me over the years, offering additions
or new developments since I first wrote this article, and I've added 'update'
comments here and there. The general text of the article DOES NOT reflect the
'UPDATE' comments yet, so please keep that in mind, if you see any contradictions.
UPDATE: Matthijs de Boer points out that you should watch out for variable CPU speeds,
in general, when running on laptops or other power-conserving (perhaps even just
eco-friendly) devices. (Thanks Matthijs!)
This document copyright (c)2002+ Ryan M. Geiss.
Return to faq page