Voice Performance (Windows CE 5.0)

Article
09/14/2012

Properly handling samples in the audio driver is key to achieving good sound quality and performance with audio sessions managed by RTC. These sessions can be a core part of a Voice over IP (VoIP) architecture.

The following best practices are useful to consider when optimizing voice performance:

A driver's audio capture characteristics have a greater influence on overall VoIP quality than a driver's audio rendering characteristics. These characteristics can have an effect on performance that is more important than, and even independent of, your device's raw processing power.
The smallest captured sound sample used in the an RTC Client API system is about 20 milliseconds (ms) long. The actual sample size depends upon the audio codec being used. It is set in the sample's WAVEHDR structure.
Your audio driver should send an interrupt as soon as each sample is complete and the WAVEHDR structure is filled. This interrupt causes the captured audio data to be sent immediately to the encoder and then on to the receiving device.

Some audio hardware generates an interrupt for variable length captures. If your hardware does not support this, simulate this behavior in your device driver.
If your driver does not send an interrupt at the end of every sample, the samples accumulate in the DMA buffer until it is full. When the DMA buffer is full, the audio hardware creates an interrupt and sends all samples over the network in one large stream.

This can lead to audio that is received and rendered in bursts because, depending on the specific hardware configuration, the DMA buffer can hold as much as 200 ms worth of audio data.
The overall latency for a VoIP receiving device is the sum of the times spent capturing the audio data, encoding it, sending it over the network, decoding it, and rendering it.
If the audio capture time is the sum of many samples filling the DMA buffer, the total end-to-end time of the process can exceed the acceptable latency limit for a phone call, which is about 200 ms.

You can achieve low-latency audio capture by indicating the completion of small WAVEHDR structures as soon as each one is filled.

Share via

Voice Performance (Windows CE 5.0)

See Also

Additional resources