Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Microsoft Speech Platform
Object Tokens and Categories Overview
The Microsoft Speech Platform stores information in the registry about the resources installed on a computer that support speech functionality. The following is an overview of speech-specific information in the registry and how to use it to program speech applications.
Token
A token is an object that represents a resource installed on a computer by the Microsoft Speech Platform, such as a Recognizer or a Voice. Tokens provide an application a mechanism with which to inspect the various attributes of a resource without instantiating it.
The Microsoft Speech Platform stores information about tokens in the registry. Generally, a token contains a name, a CLSID used to instantiate the object from the token, and a set of attributes. Application developers can query the registry for which tokens are installed (SpEnumTokens, ISpObjectTokenCategory::EnumTokens), query for a token that has specific attributes (SpFindBestToken), and get the default token for a category (SpGetDefaultTokenFromCategoryId).
A token is represented in the registry by a key, and the key's underlying keys and values. For example, the text-to-speech (TTS) voice "TTS_MS_en-US_Helen_11.0" is a token that represents a Microsoft speaking voice for US English. Its TokenId is:
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices\Tokens\TTS_MS_en-US_Helen_11.0
Categories
An Object Token Category (hereafter referred to as "category") is a class of tokens. A category is represented in the registry by a key containing one or more token keys under it. Categories contain a single key called Tokens, which contains the keys for the tokens that belong to that category. Application developers can query for the default token in a category using SpGetDefaultTokenFromCategoryId. The Speech Platform creates categories in the registry at the following location:
- HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0
For example, the Speech Platform creates a category named Voices that contains tokens for the Runtime Languages for speech synthesis that are installed on the system. The CategoryId for the Voices category is HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices.
See Using Object Tokens and Categories for more information about categories and tokens associated with the Speech Platform.
Using Tokens and Categories
The following are some examples that demonstrate how to query the registry for installed tokens, query for a token that has specific attributes, and create an object from a token.
Enumerate tokens
To enumerate tokens, the application can call the ISpObjectTokenCategory::EnumTokens method, or the helper function SpEnumTokens. The following code examples demonstrate a request for female voices, using each technique. In both cases, adult voices will be listed first.
The first example calls ISpObjectTokenCategory::EnumTokens.
`
HRESULT hr = S_OK; CComPtr<ISpObjectTokenCategory> cpCategory; CComPtr<IEnumSpObjectTokens> cpEnum;`if (SUCCEEDED(hr)) { hr = SpGetCategoryFromId(SPCAT_VOICES, &cpCategory;); }
if (SUCCEEDED(hr)) { hr = cpCategory->EnumTokens(L"Gender=Female", L"Age=Adult", &cpEnum;); }
The second example calls SpEnumTokens.
`
HRESULT hr = S_OK; CComPtr<IEnumSpObjectTokens> cpEnum;`if (SUCCEEDED(hr)) { hr = SpEnumTokens(SPCAT_VOICES, L"Gender=Female", L"Age=Adult", &cpEnum;); }
// After getting an enumerator, use methods in IEnumSpObjectTokens to get tokens.
CComPtr<ISpObjectToken> cpToken;
if (SUCCEEDED(hr)) { hr = cpEnum->Next(1, &cpToken;, NULL); }
Get the default token for a category
The following code snippet gets the default token in the Voices category.
`
HRESULT hr = S_OK;`// Get the default token for the Voices category. CComPtr<ISpObjectToken> cpObjectToken;
if (SUCCEEDED(hr)) { hr = SpGetDefaultTokenFromCategoryId(SPCAT_VOICES, &cpObjectToken;); }
Find a token that matches specific attributes
This example queries the registry for a token in the Recognizers category that supports the US English language (409).
`
// Find the best matching installed en-US recognizer. HRESULT hr = S_OK; CComPtr<ISpObjectToken> cpRecognizerToken;`if (SUCCEEDED(hr)) { hr = SpFindBestToken(SPCAT_RECOGNIZERS, L"language=409", NULL, &cpRecognizerToken;); }
See Using Object Tokens and Categories for more examples of using helper functions to work with tokens and categories. Also see Helper Functions for a complete list of all the helper functions provided by the Speech Platform.