Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

Using Object Tokens and Categories

This section is intended to help developers of speech-enabled applications to discover and use resources for speech synthesis and speech recognition on a computer that has the Microsoft Speech Platform installed.

This section answers the following questions:

  • What are Tokens and Categories in the Speech Platform?
  • Where is information about tokens stored in the Registry?
  • How does an application find tokens and initialize resources (for example, Voices or Recognizers) from them?
  • What attributes does the Speech Platform define for tokens in the registry?

For more information about Object Tokens, see ISpObjectToken.

Tokens

A token is an object that represents a resource that was installed on a computer by the Microsoft Speech Platform, such as a voice, a recognizer, or an audio input device. Speech applications use tokens to discover and initialize the resource that a token represents. A token provides an application an easy way to inspect the various attributes of a resource without having to instantiate it. For example, each token that represents a Recognizer or Voice resource has an attribute named "Language".

A token contains the following information:

  • The language-independent name. This is the name that should be displayed wherever the name of the token is displayed. It is marked as (Default) in the registry.
  • The CLSID used to instantiate the object from the token.
  • A set of attributes, which are the only set of values that an application can query in a token. The Speech Platform provides a mechanism to query for tokens whose attributes match certain values. See Helper Function Examples and Enumerate and Inspect Tokens for details about how to query for tokens that match a set of attributes.

Tokens in the registry

The Speech Platform stores information about tokens in the registry. A token is represented in the registry by a key and its underlying keys and values. When an application queries the Speech Platform for tokens of all the female voices on the computer, the Speech Platform will look in the registry under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices. This registry key represents the Voices category in the registry. See Categories later in this topic.

The following table shows the constituent parts of a token in the registry.

RegKey Entry Sample Value Comments
SampleTokenKey     This is the registry key for the Token.
  (Default) SampleTokenName The language-independent name.
/Attributes     Attributes for the token are under this key.
  Language 409 The language that this token supports.

Table 1. Parts of a Token in the Registry

The Attributes key contains all the values for the token for which an application can query. See Registry Settings for more information about token attributes in the registry. See Enumerate and Inspect Tokens for more information about how an application queries a token.

Initialize resources using tokens

In many cases, applications can use the helper functions provided by the Speech Platform for working with tokens. For example, an application can use the SpFindBestObject helper function to rapidly find an object that best matches specified criteria. The application can also query for tokens that meet certain criteria without using the helper function. To do this, the application calls the EnumTokens method on the ISpObjectTokenCategory interface to get an enumerator, and to inspect the tokens in the enumerator further.

Finally, the application selects one of the tokens in the enumerator to instantiate a resource. After the resource (such as a speech recognition token) is instantiated, provided it implements the ISpObjectWithToken interface, it is handed a pointer to the token that was used to create it. This way, the resource contains a handle to more information about itself.

See Helper Function Examples for examples of how to query and initialize a token using helper functions.

Categories

An ObjectTokenCategory (hereafter referred to as category) is the highest level of grouping of registry entries in the Speech Platform. A category is a class of tokens (or of resources, since each token represents an actual resource on the computer). All system-specific Speech Platform keys and values are stored in these categories. Examples include settings and files for Voices and Recognizers that are installed on a computer.

Categories in the registry

A category is represented in the registry by a key containing one or more token keys under it. The Speech Platform organizes tokens in the Registry under the following seven token categories, located under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0. Figure 1 shows the default the Speech Platform categories.

Token Categories in the Registry for the Speech Platform

Figure 1. Token Categories for the Speech Platform in the Registry

Categories contain a single key called Tokens, which contains the keys for the tokens that belong to that category. For example, the Voices category may have a key for an English-speaking voice called TTS_MS_en-US_Helen_11.0. All the keys and values for TTS_MS_en-US_Helen_11.0 are located under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Voices\Tokens\TTS_MS_en-US_Helen_11.0.

Use categories to initialize resources

Applications work with categories and tokens using helper functions. Each category can have a default token. This can be set using ISpObjectTokenCategory::SetDefaultTokenId. An application can get the default token for the Recognizers category using the helper function SpGetDefaultTokenFromCategoryId, and use the found token to create an instance of the SpRecognizer class (see ISpRecognizer).

SpCreateDefaultObjectFromCategoryIDISpObjectTokenCategoryHelper Function Examples

TokenIDs and CategoryIDs

A CategoryID uniquely identifies a category in the registry. For categories defined by the Speech Platform, CategoryIDs take the form of HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\{CategoryName}. For example, this is the ID for the Recognizers category:

  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v11.0\Recognizers.

All the Speech Platform CategoryIDs should be referenced using the constants defined in the sapi.idl file, listed below:

  • SPCAT_AUDIOIN
  • SPCAT_AUDIOOUT
  • SPCAT_RECOGNIZERS
  • SPCAT_VOICES

In This Section