Better never than late
Performance and Latency FAQ for audio developers on Windows
 
Last updated: Monday, September 13, 2010 
 
 
 
General 
 
 
I am writing an audio application or plugin and my audio stream hickups and stutters, what can be the cause ? 
 
There can be several causes such as poorly behaving drivers and hardware. But as soon as you have singled those out, the first thing you should think about is a buffer underrun. 
 
 
 
 
 
What is an audio buffer underrun ? 
 
It means that audio buffers are not delivered in time. 
If your application or plugin delivers audio, it needs to calculate or at least provide an audio buffer to the audio subsystem, drivers and hardware. 
The samples are delivered in blocks. For instance if your audio sample rate is 96KHz and your block size 1024, it theoretically means there it about 10ms for each block 
available to be delivered to the hardware. If that deadline is missed, the audio hardware did not get the correct samples and this will have audible consequences. 
 
 
 
 
 
Why should I care only about audio latency and forget about performance ? 
 
In an audio application, we want to avoid anything that has audible consequences such as a buffer underrun. What we care about is 
maximum execution times (latencies) of the audio process functions. It does not matter to anyone that the audio application could 
calculate so many samples per second. Performance is important to other types of applications. If for instance a web server on average can serve more requests 
per hour, we talk about performance. We do not care if possibly one of the many request took a much longer time than the others. 
There is nothing wrong with measuring performance but if this means calculating averages over a long period of time (say seconds), 
the values obtained are not of immediate interest. As an audio developer you don't particularly care about averages but about the number of times that you 
actually missed the boat (buffer underruns). 
 
 
 
 
About Windows 
 
 
Is Windows a real-time operating system ? 
 
No, all requests to the operating system are delivered on a best effort basis. There are no guarantees whatsoever 
that requests are delivered within a certain time frame, which are the characteristics of a real-time operating system. 
That is bad news for audio applications  (which are considered soft real-time) because they have to avoid 
buffer underruns at any price. 
 
 
 
 
 
What are ISRs ? 
 
ISRs (Interrupt Service Routines) are kernel routines which are part of drivers or the OS which execute when hardware devices 
interrupt a CPU. They run at elevated IRQL which means that no other thread (or program) can run on the current processor until the ISR has finished executing. 
Awareness about ISRs is important because they can be the cause of audio buffer underruns. You can get some insight into ISR execution times with the 
LatencyMon analysis tool. 
 
 
 
 
 
What are DPCs ? 
 
DPCs (Deferred Procedure Calls) are kernel routines which are part of device drivers or the OS kernel. They are normally requested and scheduled by ISRs (interrupt service routines) 
or they are associated with a kernel timer. They run at elevated IRQL which means that no other thread (or program) can run on the current processor until the DPC has finished executing. Awareness about DPCs is important 
because they can be the cause of audio buffer underruns. That is because an audio program, if interrupted by a DPC routine cannot continue until the DPC has completed execution. You can get some insight into DPC latencies (execution time) with the DPC latency checker tool 
or the Windows Performance Toolkit. 
 
 
 
 
 
Can my audio thread really get interrupted by the OS at any point in time ? 
 
Yes. An interrupt can occur on the processor that your program is running on. Execution of your program will temporarily halt. 
The interrupt service routine (ISR) is executed and may schedule a DPC (Deferred Procedure Call). The DPC will most likely 
run immediately on the same processor which means your program will halt until both the ISR and the DPC have finished executing. 
The same goes for program errors which throw exceptions. An interrupt will take place on the CPU on which you are running and the exception 
handler (provided by Windows) will handle the problem. some examples of exceptions are pagefaults, FPU faults, stack faults, GPFs. 
 
 
 
 
About Pagefaults 
 
 
What are pagefaults ? 
 
Windows uses a concept of virtual memory which relies on the page translation system provided by 
the CPU. Whenever a memory address is requested which is not available in physical memory (not resident), 
an INT 14 will occur. The OS provided INT 14 handler will decide how to proceed next. If the page in 
which the address resides is known to Windows but not resident, Windows will read in the required 
page from the page file. That is known as a hard pagefault and can take a lot of time to complete. 
 
 
 
 
 
What is the difference between hard pagefaults and soft pagefaults ? 
 
Soft pagefaults are requests for pages which are resident in RAM but not immediately available to the current task. They 
will be resolved much faster than hard pagefaults which need to go all the way through the file system. 
 
 
 
 
Are hard pagefaults really that expensive ?
 
Yes, a single pagefault can literally take millions of instruction cycles. The pagefault handler needs to go 
through the filesystem which in its turn needs to access the disk to read in the requested page from the page file. 
That is very expensive and one of the most common causes of audio buffer underruns on Windows (buffer underruns, audio hickups, clicks, pops and stutter). 
 
 
 
 
 
How can I find out about pagefaults occurring while my audio app/plugin is running ? 
 
You can start Task Manager, select your process and check the pagefaults and PF delta column. Unfortunately 
this shows both hard and soft pagefaults. It's mostly hard pagefaults you want to know about because they can 
take up a LONG time to get resolved. If you experience stutter in your audio stream and at the same time  you 
see the PF delta change value you have a very good indication that pagefaults are the main cause of hickups in your audio stream. 
For in-depth analysis of pagefaults, use the LatencyMon tool. 
 
 
 
 
 
What is the working set of a process and why should I care ? 
 
The working set of a process is the memory which is resident in RAM. You should care because if the working set of your 
audio application (or the host process if you are a plugin) is lower than the amount of memory it actually uses it means 
it uses memory which is not resident in RAM which means it is paged out to disk. Accessing this memory causes hard pagefaults which 
are very expensive in terms of execution time and a very common cause of buffer audio underruns. You should make sure the 
working set of your application is set to an acceptable minimum (by using the SetProcessWorkingSetSize API). 
 
 
 
 
 
How can I find out about the actual working set of my application ? 
 
Start Windows Task Manager, find your process and check the working set column. 
 
 
 
 
Measuring and Profiling 
 
 
Should I care about the percentage of CPU consumption of my audio application in Task Manager ? 
 
No. It's useless to look at it because it does not  matter to audio applications or plugins. Also this number is only 
an indication and inaccurate for several reasons. At any point in time a CPU is either busy (100%) or idle (0%). 
The number represented is an average over a long period of time. What you should care about is maximum execution times of your audio processing functions 
and avoid buffer underruns. 
 
 
 
 
 
Why does the CPU column in Task Manager display inaccurate information ? 
 
On each timer interrupt it checks what was running and the 
full time slice will be charged against the thread that was running. If multiple context switches took places, and possibly ISRs and DPCs were also running 
in a single quantum, only one of them will be charged with the full CPU time. Due to this "Monte Carlo" style of measuring it is easy to write a program that does 
nothing and remains idle most of the time but nonetheless displays 100% CPU usage (or the percentage of a full CPU) in Task Manager. This may not be true for certain versions of Windows 
which use another method of measuring. In any case what is displayed is an average over a long period which is not of concern if you are measuring maximum execution latencies. 
 
 
 
 
 
How can I measure the execution times of my audio plugin/app from within my code? 
 
Use the QueryPerformanceCounter function (and QueryPerformanceFrequency) to get a high resolution time stamp at the beginning and end 
of your audio process functions. There are some caveats to using this function, because each CPU has its own time stamp counter and they are not necessarily synchronized 
you need to make sure you execute this function always on the same CPU to get results that make sense. You can use 
SetThreadAffinityMask to make sure your thread will only run on a specific CPU. Use the GetCurrentProcessorNumber API to get the number of the CPU 
you are currently running on. Check out the code sample at the bottom of this page. 
 
 
 
 
 
Why should I not use GetTickCount to measure execution times ? 
 
GetTickCount returns a value with only millisecond precision. Then it's useless because its value is updated only at each clock interrupt. This means that calling 
this routine consecutively will have the same value returned. 
 
 
 
 
 
Should I use a RDTSC instruction directly or not ? 
 
There is a long list of issues, some of them are outinlined in the following article:
 
http://msdn.microsoft.com/en-us/library/ee417693
 
Still, what's not mentioned here is that RDTSC is not a serializing instruction and is subject to out-of-order execution and cache issues leading to 
inaccurate results. And the various implementations of QueryPerformanceCounter and KeQueryPerformanceCounter (kernel) actually do not 
add serializing instructions either possibly because they come with a heavy price tag. 
 
 
 
 
 
Now that I know how to measure, what should I do with this information ? 
 
You should take a good look at the MAXIMUM executions times of your audio processing functions and the frequency of them. As you wish 
to avoid buffer underruns at any price (which are only hit or miss) you do not care about AVERAGE execution times. 
You should care about the number of times your audio process function did not deliver its buffers in time. Unfortunately 
after your application delivered the buffer to the audio subsystem, all sorts of things can happen still (interrupts, exceptions..). 
Possibly, create a recording of your audio stream and do a graphical analysis to see if there are any hickups in there.
 
In case you wish to measure execution times of a particular routine or section of code for performance optimization you should also check the minimum execution times 
so that you concentrate on your code and not the bad weather conditions. The difference between minimum and average execution times gives you an idea 
of the OS factors that impact the execution of your code. Some of these can be taken care of programmatically (pagefaults) while others depend on the system configuration 
(DPCs, ISRs and other OS factors). 
 
 
 
 
About the clock resolution 
 
 
Should I use the timeBeginPeriod API in my software to change the clock resolution ? 
 
Although changing the clock resolution may speed up the time required for your thread to get 
attention from the dispatcher, it also means that the thread which processes your audio 
will get a shorter time slice. The default time slice on most systems of 
about 16ms is exactly a healthy window for your audio application to fill an audio buffer. 
Setting the clock resolution to a lower interval means an audio thread may need to become scheduled multiple times 
in order to fill up a single buffer because it will use up its quantum, all this leading to higher latency. 
Also what you need to realize is that this is a global system-wide setting that affects everything else in the system 
so its not an option for a plugin which runs in a host application. 
Whether or not this is a good idea depends on the outcome of a complicated equation with many factors including 
the threads that your audio application makes use of and the number of CPUs in the system. 
It should not be considered the holy grail to improving audio latency on Windows, the best thing 
to find out is to measure maximum execution times and see if it decreases the number of buffer underruns. 
Some applications (Windows Messenger, Windows Mediaplayer, Borland Delphi) change the clock resolution so you cannot 
rely on it to work always. 
 
 
 
 
 
How can I check the current clock resolution ? 
 
Get the ClockRes utility from www.sysinternals.com 
 
 
 
 
Practices and recommendations 
 
 
Does it make sense to "reserve" a CPU for my audio application ? 
 
No. Although you could set affinities for your threads to make sure they execute on a particular CPU only, there is no way you can stop ISRs, DPCs and exceptions from 
interrupting your audio and executing on your CPU without hacking into the Windows kernel. If another CPU is available it means your thread cannot run there, so that's a chance missed. 
You can use the technique of setting an affinity to your thread/application for the purpose testing and measuring but 
it's no sense to use this technique in production code unless you wish to hurt audio latency. 
 
 
 
 
 
How can floating point errors be the cause of buffer underruns ? 
 
Divisions by zero, numeric overflows, FPU stack overflows and denormalizations can all generate exceptions. This means your program 
will be interrupted by an interrupt (16) which causes the FPU exception handler to handle the matter. If this 
happens a lot of times it will dramatically impact the execution time of your audio app/plug-in. If such faults 
are not avoidable due to the nature of your calculations you should programmatically mask off FPU exceptions. In Visual Studio, 
FPU exceptions are masked off by default. 
 
 
 
 
 
How can I use LatencyMon tool to find out about hard pagefaults, DPCs and ISRs which interrupted my program ? 
 
You can download LatencyMon from here. Note: only Vista and higher is supported. 
 
 
 
What can I do programmatically to improve audio latency for my audio plugin/application ? 
 
 
 
 
 
 
 
 
 
 
 
 
 
Code samples 
Check out this code sample to get an idea how to measure minimum, maximum and average execution times of a routine or portion of code with Visual C++.
 
Links 
 
 
Do you have some recommended reading and links to sites which discuss these topics ? 
 
book: Windows Internals by Russinovich, Solomon and Ionescu 
 
MSDN: http://msdn.microsoft.com 
 
Raymond Chens blog: http://blogs.msdn.com/oldnewthing/ 
 
Windows Performance Analysis Toolkit (XPERF) and forums: http://msdn.microsoft.com/en-us/library/cc305187.aspx 
 
Pigs might Fly, Windows performance blog: http://blogs.msdn.com/pigscanfly/archive/2008/03/02/using-the-windows-sample-profiler-with-xperf.aspx 
 
The search tool at the Windows kernel newsgroups at http://www.osronline.com 
 
Copyright © 1997-2025 Resplendence Software Projects. All rights reserved. Privacy Policy.
Page generated on 11/4/2025 3:01:45 PM. Last updated on 8/24/2020 3:51:23 PM.