Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Microsoft Speech Platform
Enumerate and Inspect Tokens
An application needs to enumerate and inspect tokens to gain access to resources in the Speech Platform, such as recognizers and voices.
The two primary ways to enumerate tokens are by the helper function SpEnumTokens or by the method ISpObjectTokenCategory::EnumTokens. Both methods allow the caller to specify a category and a set of required and optional attributes. The call then returns a token enumerator containing all the tokens matching those criteria. The method is defined as:
<pre IsFakePre="true" xmlns="http://www.w3.org/1999/xhtml"> HRESULT EnumTokens( [in] const WCHAR *pszCatName, [in, string] const WCHAR *pReqAttrs, [in, string] const WCHAR *pOptAttrs, [out] IEnumSpObjectTokens **ppEnum); </pre>
When identifying matching tokens in a category, an application needs to specify a fully qualified category identifier (FQCID). An FQCID is the full registry path to a category, such as HKEY_CURRENT_USER\Software\Microsoft\Speech Server\v11.0\Voices. We recommend that applications reference these categories using the constants defined in the the sapi.idl file, and not using the full string, as this will minimize typographical errors in commonly used registry paths. The Speech Platform maps the constant to the correct subtree in the registry and returns matching tokens from the category. For instance, the constant defined by the Speech Platform for Recognizers (from the sapi.idl file) is as follows:
<pre IsFakePre="true" xmlns="http://www.w3.org/1999/xhtml"> // Categories for speech resource management. const WCHAR SPCAT_RECOGNIZERS[] = L"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Speech Server\\v11.0\\Recognizers";</pre>
Similarly, there are constants for the AudioInput, AudioOutput, and Voices categories.
In both SpEnumTokens and ISpObjectTokenCategory::EnumTokens, the following clauses (separated by semicolons) are permitted in the ReqAttrs and OptAttrs strings.
Condition | Example | Explanation |
---|---|---|
Exists | Language;Gender | The entries Language and Gender exist in the list of attributes for this token. |
One of | Language=409 | At least one of the values of the entry "Language" is 409. There may be other values, like 809, 512 as well. |
Not Equals | Age!=Child;Age!=Teen | Values of Age that are neither "Child" nor "Teen". |
Table 1: Query operators
The tokens returned by an enumeration are ordered with the best matches first, using the following rules:
- Only tokens matching the attributes are returned.
- Tokens matching both required and optional attributes will be before those that only match required attributes.
- If no required or optional attributes are specified (both are set to NULL), the first token returned is the default token for that category. If there is a valid DefaultTokenID in HKLMS/Category, that is returned as the default tokenID. If not, if there is a default tokenID in HKCUS/Category, that is returned. If none of these exist, the Speech Platform searches for a DefaultdefaultTokenID in HKLMS/CategoryName, and that is returned.
- If a token matches an optional attribute, it gets a score of 1, otherwise, 0 for that attribute. The optional attributes mentioned earlier in the query string are more significant. These scores are concatenated as shown in Table 3. The tokens are then placed in descending order. This is illustrated in Tables 2 and 3.
- Tokens having the same score are returned in random order in the enumerator.
The following shows an example of enumerating tokens using the helper function SpEnumTokens.
`
HRESULT hr = S_OK;`// Find the voice token that best matches the specified attributes. LPCWSTR pszReqAttrs = L"LanguagesSupported=409"; LPCWSTR pszOptAttrs = L"Vendor=VoiceVendor1;Age=Child;Gender=Female"; CComPtr<IEnumSpObjectTokens> cpVoiceEnum;
hr = SpEnumTokens(SPCAT_VOICES, pszReqAttrs, pszOptAttrs, &cpVoiceEnum;);
// SPCAT_VOICES is defined in sapi.idl
For example, if the voices listed below in Table 2 are installed on a computer, then the order of the Voices returned in cpEnum will be as shown in Table 3.
Voice | Vendor | Age | LanguagesSupported | Gender |
---|---|---|---|---|
Michelle | VoiceVendor1 | Child | 409; 411 | Female |
Mary | VoiceVendor1 | Adult | 409 | Female |
Jane | VoiceVendor2 | Child | 409 | Female |
Frank | VoiceVendor2 | Adult | 411 | Male |
Anna | VoiceVendor2 | Adult | 411 | Female |
Table 2. Voices installed on a computer
Optional Criteria -> | Vendor | Age | Gender | Net Score |
---|---|---|---|---|
Michelle | 1 | 1 | 1 | 111 |
Mary | 1 | 0 | 1 | 101 |
Jane | 0 | 1 | 1 | 011 |
Table 3. Scoring of tokens that match optional criteria
The final order is:
- Michelle (meets all required criteria, scored 111 on optional criteria)
- Mary (meets all required criteria, scores 101 on optional criteria)
- Jane (meets only required criteria, score 11 optional criteria)
If the call to EnumTokens is changed as follows:
hr = cpVoiceCat->EnumTokens(SPCAT_VOICES, NULL, NULL, &cpEnum);
...and the default token in HKEY_CURRENT_USER\Software\Microsoft\Speech Server\v11.0\Voices\DefaultTokenID is set to:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices\Tokens\Jane
...then the enumerator cpEnum will contain all the tokens, with Jane being the first token.
What does the Speech Platform on a call to ISpObjectTokenCategory::EnumTokens?
Consider a fictitious category that contains tokens. When an application calls ISpObjectTokenCategory::EnumTokens, the following occur:
- The Speech Platform creates an enumerator called IEnumSpObjectTokens that can enumerate all the matching tokens from these keys under HKLMS/Voices/Tokens.
- The Speech Platform applies the required attributes so that the IEnumSpObject enumerator contains only those tokens that match these attributes, then it sorts them according to how well they match the optional attributes (exact rules earlier in this section).
- The application searches for an appropriate token and until one is found, it steps through each token, and further checks attributes and strings of each token with ISpObjectToken methods GetData, GetStringValue, and GetDWORD (inherited from ISpDataKey).
- The application identifies the token it is interested in and calls ISpObjectToken::CreateInstance and QIs the newly created object to see if it supports the ISpObjectWithToken interface. If it does, the Speech Platform calls ISpObjectWithToken::SetDataKey to give the newly instantiated object a pointer to the token from which it was instantiated.
Instantiate an Object from a Token
Continuing with the example presented in the previous section, the application now has a pointer to the enumerator IEnumSpObjectTokens. An application may choose to step through the enumerator with the methods Next, Skip, or Reset to find an ISpObjectToken that best meets its needs.
Assume that the application is searching for a voice that sounds clear over a telephone. Also assume that such voices typically have a ValueName called SupportsTelephony, which is set to 1. There is no such protocol in the Speech Platform; this is for illustration only. Because this is not a value under Attributes, it cannot be picked up by the standard query mechanism of required attributes. The variable pCurVoiceToken represents a token for that category. In the example below, the category is populated with tokens in cpEnum until a voice is found that also supports telephony.
`
// Find a Voice token that has a ValueName called SupportsTelephony, which is set to 1. ISpObjectToken* pCurVoiceToken; ISpObjectToken* pSelectedVoiceToken; bool fFound = false;`while (S_OK == cpVoiceEnum->Next(1, &pCurVoiceToken;, NULL)) { // At this point, all we know is that pToken is a pointer to a Voice token. LPWSTR pszValue;
fFound = SUCCEEDED(pCurVoiceToken->GetStringValue(L"SupportsTelephony", &pszValue;));
// Note, ISpObjectToken inherits from ISpDataKey. if (SUCCEEDED(hr) && fFound) { // This is the token for the Voice we want. pSelectedVoiceToken = pCurVoiceToken; break; } }
At this point, store the selected Voice token in pCurVoiceToken. Now create the voice object from this token, so that Speak and other methods on it may be called. To create a voice object, you must first create an ISpVoice instance.
`
CComPtr<ISpVoice> cpVoice;`// Create the object.
if (SUCCEEDED(hr)) { hr = cpVoice.CoCreateInstance(CLSID_SpVoice); }
hr = cpVoice->SetVoice(pSelectedVoiceToken);
Now, the cpVoice object (of type ISpVoice) has been instantiated and is ready to speak, with a call such as the following:
<pre IsFakePre="true" xmlns="http://www.w3.org/1999/xhtml"> hr = cpVoice->Speak( L"This audio file was created using text-to-speech from the Speech Platform.", 0, NULL); </pre>
Inspect Underlying Keys of a Token
In addition to using helper functions, you can inspect the keys under a token by opening the Attributes key under the token as a DataKey. Then all the ISpDataKey methods are available to inspect the values under the Attributes key.