Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Microsoft Speech Platform
Initialize a Voice
To perform speech synthesis (TTS, text-to-speech) in the Microsoft Speech Platform, you first initialize a voice. A voice is an instance of a TTS engine that uses an installed Runtime Language to perform speech synthesis. A Runtime Language is represented in the registry by a token. See Speech Platform Overview for information about downloading Runtime Languages.
To initialize a TTS voice in the Speech Platform, you query the registry for the desired voice token, select the voice token, create a voice object, and set the voice token that voice object will use.
The Speech Platform provides helper functions that reduce the number of steps necessary to initialize a voice. You can use one or more helper functions to find, select, and create a voice using any of the following processes:
- Enumerate voice tokens that match specified attributes
- Find a single voice token that best matches specified attributes
- Select the default voice token
- Create a voice from the default voice token
Enumerate voice tokens that match specified attributes
SpEnumTokens returns a token enumerator containing all tokens from a specified category that match the specified required and optional attributes. When you Applications the Voices category, SpEnumTokens returns a list of voices ordered with the best matches listed first. In the following snippet, Language=409 is a required attribute, Gender=Female is an optional attribute.
`
CComPtr<IEnumSpObjectTokens> cpIEnum; CComPtr<ISpObjectToken> cpToken; CComPtr<ISpVoice> cpVoice;`// Enumerate voice tokens that speak US English in a female voice. hr = SpEnumTokens(SPCAT_VOICES, L"Language=409", L"Gender=Female;", &cpIEnum;);
// Get the best matching token. if(SUCCEEDED(hr)) { hr = cpIEnum->Next(1, &cpToken;, NULL); }
// Create a voice and set its token to the one we just found. if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }
// Set the voice. if(SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpToken); }
Back to top
Find a single voice token that best matches specified attributes
SpFindBestToken returns a single token from a specified category, in this case the Voices category, that best matches specified attributes. In the following snippet, Language=409 is a required attribute, VendorPreferred is an optional attribute.
`
// Find the best token to use for a voice that speaks US English, preferably female. CComPtr<ISpObjectToken> cpVoiceToken;`hr = SpFindBestToken(SPCAT_VOICES, L"Language=409", L"VendorPreferred", &cpVoiceToken;);
// Create a voice and set its token to the one we just found. CComPtr<ISpVoice> cpVoice;
if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }
if (SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpVoiceToken); }
Back to top
Select the default voice token
SpGetDefaultTokenFromCategoryId creates a token object from the default token in a specified category, in this case the Voices category.
`
CComPtr<ISpObjectToken> cpVoiceToken;`if (SUCCEEDED(hr)) { hr = SpGetDefaultTokenFromCategoryId(SPCAT_VOICES, &cpVoiceToken;); }
// Create a voice and set its token to the one we just found. CComPtr<ISpVoice> cpVoice;
if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }
// Set the voice. if(SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpVoiceToken); }
Back to top
Setting the default voice token
SpGetDefaultTokenFromCategoryId gets the default token from a specified category, in this case Voices. The following example first sets a French-speaking token (Hortense) as the default for the Voices category using ISpObjectTokenCategory::SetDefaultTokenId.
`
HRESULT hr = S_OK; CComPtr<ISpObjectToken> cpVoiceToken; CComPtr<ISpObjectTokenCategory> cpTokenCat;`// This is the category for which we want to set the default token. if (SUCCEEDED(hr)) { hr = SpGetCategoryFromId(SPCAT_VOICES, &cpTokenCat;); }
// Set the default token for the VOICES category to Hortense (French). if (SUCCEEDED(hr)) { hr = cpTokenCat->SetDefaultTokenId(L"HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices\Tokens\TTS_MS_fr-FR_Hortense_11.0"); }
// Get the token we just set as the default. if (SUCCEEDED(hr)) { hr = SpGetDefaultTokenFromCategoryId(SPCAT_VOICES, &cpVoiceToken;); }
// Create a voice. CComPtr<ISpVoice> cpVoice;
if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }
// Set the voice to the retrieved token. if(SUCCEEDED(hr)) { hr = cpVoice->SetVoice(cpVoiceToken); }
Back to top
Each of the examples above is ready to speak text after you set the output and give a speak command, as follows:
`
// Set the output to the default audio device. if(SUCCEEDED(hr)) { hr = cpVoice->SetOutput(NULL, TRUE); }`// Speak a string directly. if (SUCCEEDED(hr)) { hr = cpVoice->Speak(L"Hello world.", SPF_Default, 0); }
Note: Setting the output to the default audio device is useful for debugging. Typically, a production server application will write to a stream.