Voice Capture DSP
An object that encapsulates several DSPs related to voice capture.
CLSID
CLSID_CWMAudioAEC
Interfaces
- IMediaObject
- IPropertyStore
Properties
Property | Description |
---|---|
MFPKEY_WMAAECMA_DEVICE_INDEXES | Specifies which audio devices the DMO uses for capturing and rendering audio. |
MFPKEY_WMAAECMA_DEVICEPAIR_GUID | Identifies the combination of audio devices that the application is currently using. |
MFPKEY_WMAAECMA_DMO_SOURCE_MODE | Specifies whether the DMO uses source mode or filter mode. |
MFPKEY_WMAAECMA_FEATR_AES | Specifies how many times the DMO performs acoustic echo suppression (AES) on the residual signal. |
MFPKEY_WMAAECMA_FEATR_AGC | Specifies whether the DMO performs automatic gain control. |
MFPKEY_WMAAECMA_FEATR_CENTER_CLIP | Specifies whether the DMO performs center clipping. |
MFPKEY_WMAAECMA_FEATR_ECHO_LENGTH | Specifies the duration of echo that the acoustic echo cancellation (AEC) algorithm can handle. |
MFPKEY_WMAAECMA_FEATR_FRAME_SIZE | Specifies the audio frame size. |
MFPKEY_WMAAECMA_FEATR_MICARR_BEAM | Specifies which beam the DMO uses for microphone array processing. |
MFPKEY_WMAAECMA_FEATR_MICARR_MODE | Specifies how the DMO performs microphone array processing. |
MFPKEY_WMAAECMA_FEATR_MICARR_PREPROC | Specifies whether the DMO performs microphone array preprocessing. |
MFPKEY_WMAAECMA_FEATR_NOISE_FILL | Specifies whether the DMO performs noise filling. |
MFPKEY_WMAAECMA_FEATR_NS | Specifies whether the DMO performs noise suppression. |
MFPKEY_WMAAECMA_FEATR_VAD | Specifies the type of voice activity detection that the DMO performs. |
MFPKEY_WMAAECMA_FEATURE_MODE | Enables the application to override the default settings on various properties. |
MFPKEY_WMAAECMA_MIC_GAIN_BOUNDER | Specifies whether the DMO applies microphone gain bounding. |
MFPKEY_WMAAECMA_MICARRAY_DESCPTR | Specifies the microphone array geometry. |
MFPKEY_WMAAECMA_QUALITY_METRICS | Retrieves quality metrics for AEC. |
MFPKEY_WMAAECMA_RETRIEVE_TS_STATS | Specifies whether the DMO stores time stamp statistics in the registry. |
MFPKEY_WMAAECMA_SYSTEM_MODE | Sets the processing mode. |
Remarks
Unlike the other DSPs, the voice capture object encapsulates multiple DSPs in a single object, and the object is a DMO object only (it does not implement IMFTransform). The voice capture DMO includes the following DSP components:
- Acoustic echo cancellation (AEC)
- Microphone array processing
- Noise suppression
- Automatic gain control
- Voice activity detection
Applications can turn each component on and off individually.
The voice capture DMO supports two modes of operation, filter mode and source mode. In filter mode, the application sends audio samples from the microphone and from the speaker line to the DMO, and the DMO produces output.
In source mode, the application does not need to deliver samples to the DMO. Instead, the DMO manages all of the operations on the audio devices, including initializing the devices, capturing and synchronizing the audio streams, calculating time stamps, and retrieving the geometry of the microphone array. Using source mode, the application simply configures the DMO, and the output from the DMO is a clean, processed microphone signal. Source mode is significantly easier to use than filter mode, and is recommended for most applications.
Currently the voice capture DMO supports only single-channel acoustic echo cancellation (AEC), so the output from the speaker line must be single-channel. If microphone array processing is disabled, multi-channel input is folded down to one channel for AEC processing. If both microphone array processing and AEC processing are enabled, AEC is performed on each microphone element before microphone array processing.
Microphone Array Processing
A microphone array is a set of closely positioned microphones. Microphone arrays achieve better directionality than a single microphone, because the acoustic waves arrive at each microphone at a slightly different time. For more information on microphone arrays see the web articles Microphone Array Support in Windows Vista and How to Build and Use Microphone Arrays for Windows Vista.
Using the Voice Capture DSP
To use the Voice Capture DSP, perform the following steps.
1. Initialize the DMO
Create the voice capture DMO by calling CoCreateInstance with the CLSID CLSID_CWMAudioAEC. The voice capture DSDP exposes only the IMediaObject and IPropertyStore interfaces, so it can only be used as a DMO.
The DMO defaults to source mode. To select filter mode, set the MFPKEY_WMAAECMA_DMO_SOURCE_MODE property to VARIANT_FALSE.
Next, configure the internal properties of the DMO by using the IPropertyStore interface. The only property that an application must set is the MFPKEY_WMAAECMA_SYSTEM_MODE property. This property configures the processing pipeline within the DMO. The other properties are optional.
2. Set the Input and Output Formats
If you are using the DMO in filter mode, set the input format by calling IMediaObject::SetInputType. The input format can be almost any valid uncompressed PCM or IEEE floating-point audio type. If the input format does not match the output format, the DMO automatically performs sample-rate conversion.
If you are using the DMO in source mode, do not set the input format. The DMO automatically configures the input format based on the audio devices.
In either mode, set the output format by calling IMediaObject::SetOutputType. The DMO can accept the following output formats:
- Subtype: MEDIASUBTYPE_PCM or MEDIASUBTYPE_IEEE_FLOAT
- Format block: WAVEFORMAT or WAVEFORMATEX
- Samples per second: 8,000; 11,025; 16,000; or 22,050
- Channels: 1 for AEC-only mode, 2 or 4 for microphone array processing
- Bits per sample: 16
The following code sets the output type to 16-bit single-channel PCM audio:
DMO_MEDIA_TYPE mt; // Media type.
mt.majortype = MEDIATYPE_Audio;
mt.subtype = MEDIASUBTYPE_PCM;
mt.lSampleSize = 0;
mt.bFixedSizeSamples = TRUE;
mt.bTemporalCompression = FALSE;
mt.formattype = FORMAT_WaveFormatEx;
// Allocate the format block to hold the WAVEFORMATEX structure.
hr = MoInitMediaType(&mt, sizeof(WAVEFORMATEX));
if (SUCCEEDED(hr))
{
WAVEFORMATEX *pwav = (WAVEFORMATEX*)mt.pbFormat;
pwav->wFormatTag = WAVE_FORMAT_PCM;
pwav->nChannels = 1;
pwav->nSamplesPerSec = 16000;
pwav->nAvgBytesPerSec = 32000;
pwav->nBlockAlign = 2;
pwav->wBitsPerSample = 16;
pwav->cbSize = 0;
// Set the output type.
if (SUCCEEDED(hr))
{
hr = pDMO->SetOutputType(0, &mt, 0);
}
// Free the format block.
MoFreeMediaType(&mt);
}
3. Process Data
Before processing any data, it is recommended to call IMediaObject::AllocateStreamingResources. This method allocates the resources used internally by the DMO. Call AllocateStreamingResources after the steps listed previously, not before. If you do not call this method, the DMO automatically allocates resources when data processing starts.
If you are using the DMO in filter mode, you must pass input data to the DMO by calling IMediaObject::ProcessInput. The audio data from the microphone goes to stream 0, and the audio data from the speaker line goes to stream 1. If you are using the DMO in source mode, you do not need to call ProcessInput.
To get output data from the DSP, perform the following steps:
- Create a buffer object to hold the output data. The buffer object must implement the IMediaBuffer interface. The size of the buffer depends on the requirements of your application. Allocating a larger buffer can reduce the chances of glitches occurring.
- Declare a DMO_OUTPUT_DATA_BUFFER structure and set the pBuffer member to point to your buffer object.
- Pass the DMO_OUTPUT_DATA_BUFFER structure to the IMediaObject::ProcessOutput method.
- Continue to call this method for as long as the DMO has output data. The DSP signals that it has more output by setting the DMO_OUTPUT_DATA_BUFFERF_INCOMPLETE flag in the dwStatus member of the DMO_OUTPUT_DATA_BUFFER structure.
Requirements
Requirement | Value |
---|---|
Minimum supported client |
Windows Vista [desktop apps only] |
Minimum supported server |
Windows Server 2008 [desktop apps only] |
Header |
|
DLL |
|
See also