Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
SPEVENTENUM
Microsoft Speech Platform
SPEVENTENUM lists the events possible from the Microsoft Speech Platform.
It is recommended that developers use the helper class CSpEvent to easily and clearly decode events.
`
typedef enum SPEVENTENUM { SPEI_UNDEFINED,`//--- TTS engine <strong>SPEI_START_INPUT_STREAM,</strong> <strong>SPEI_END_INPUT_STREAM,</strong> <strong>SPEI_VOICE_CHANGE,</strong> <strong>SPEI_TTS_BOOKMARK,</strong> <strong>SPEI_WORD_BOUNDARY,</strong> <strong>SPEI_PHONEME,</strong> <strong>SPEI_SENTENCE_BOUNDARY,</strong> <strong>SPEI_VISEME,</strong> <strong>SPEI_TTS_AUDIO_LEVEL,</strong> //--- Engine vendors use these reserved bits <strong>SPEI_TTS_PRIVATE,</strong> <strong>SPEI_MIN_TTS,</strong> <strong>SPEI_MAX_TTS,</strong> //--- Speech Recognition <strong>SPEI_END_SR_STREAM,</strong> <strong>SPEI_SOUND_START,</strong> <strong>SPEI_SOUND_END,</strong> <strong>SPEI_PHRASE_START,</strong> <strong>SPEI_RECOGNITION,</strong> <strong>SPEI_HYPOTHESIS,</strong> <strong>SPEI_SR_BOOKMARK,</strong> <strong>SPEI_PROPERTY_NUM_CHANGE,</strong> <strong>SPEI_PROPERTY_STRING_CHANGE,</strong> <strong>SPEI_FALSE_RECOGNITION,</strong> <strong>SPEI_INTERFERENCE,</strong> <strong>SPEI_REQUEST_UI,</strong> <strong>SPEI_RECO_STATE_CHANGE,</strong> <strong>SPEI_START_SR_STREAM,</strong> <strong>SPEI_RECO_OTHER_CONTEXT,</strong> <strong>SPEI_SR_AUDIO_LEVEL,</strong> <strong>SPEI_SR_RETAINEDAUDIO,</strong> //--- Engine vendors use this reserved value. <strong>SPEI_SR_PRIVATE,</strong> <strong>SPEI_ACTIVE_CATEGORY_CHANGED,</strong> //--- Reserved for system use. <strong>SPEI_RESERVED5,</strong> <strong>SPEI_RESERVED6,</strong> <strong>SPEI_MIN_SR,</strong> <strong>SPEI_MAX_SR,</strong> //--- Reserved: Do not use <strong>SPEI_RESERVED1,</strong> <strong>SPEI_RESERVED2,</strong> <strong>SPEI_RESERVED3</strong>
} SPEVENTENUM;
Elements
SPEI_START_INPUT_STREAM
The input stream (text or audio) from a Speak or SpeakStream call has begun synthesizing to the output. The event is fired by the Speech Platform.SPEI_END_INPUT_STREAM
The input stream (text or audio) from a Speak or SpeakStream call has finished synthesizing to the output. The event is fired by the Speech Platform.SPEI_VOICE_CHANGE
The Speech Platform fires this event for voice changes within a single input stream of a Speak call. wParam is either zero or the SPF_PERSIST_XML. If the current speak call takes SPF_PERSIST_XML, wparam is SPF_PERSIST_XML. Otherwise, zero. lParam is the current voice object token. elParamType has to be SPET_LPARAM_IS_TOKEN.SPEI_TTS_BOOKMARK
The bookmark element is used to insert a bookmark into the output stream. If an application specifies interest in bookmark events, it will receive the bookmark events during synthesis. wParam is the current bookmark name (in base 10) converted to a long integer. If name of current bookmark is not an integer, wParam will be zero. lParam is the bookmark string. elParamType has to be SPET_LPARAM_IS_STRING.SPEI_WORD_BOUNDARY
A word is beginning to synthesize. Markup language (XML) markers are counted in the boundaries and offsets. wParam is the character length of the word in the current input stream being synthesized. lParam is the character position within the current text input stream of the word being synthesized.SPEI_PHONEME
Phoneme was returned by the TTS engine. The high word of wParam is the duration, in milliseconds, of the current phoneme element. The low word is the id of the next phoneme element. The high word of lparam is the phoneme element feature defined in SPVFEATURE. This value will be zero if the current phoneme element is not a primary stress or emphasis. The low word of lParam is the id for the current phoneme element being synthesized.When the engine synthesizes a phoneme comprised of more than one phoneme element, it raises an event for each element. For example, when a Japanese TTS engine speaks the phoneme "KYA," which is comprised of the phoneme elements "KI" and "XYA," it raises an SPEI_PHONEME event for each element. Because the element "KI" in this case modifies the sound of the element following it, rather than initiating a sound, the duration of its SPEI_PHONEME event is zero.
SPEI_SENTENCE_BOUNDARY
A sentence is beginning to synthesize. wParam is the character length of the sentence including punctuation in the current input stream being synthesized. lParam is the character position within the current text input stream of the sentence being synthesized.SPEI_VISEME
Viseme was determined by synthesis engine. The high word of wParam is the duration, in milliseconds, of the current viseme. The low word is for the next viseme of type SPVISEMES. The high word of lParam is the viseme feature defined in SPVFEATURE. This value will be zero if the current viseme is not primary stress or emphasis. The low word of lParam is the current viseme being synthesized.SPEI_TTS_AUDIO_LEVEL
This event is fired by the Speech Platform. lParam is 0, and wParam is the current audio level from zero to 100.SPEI_TTS_PRIVATE
Reserved for private/internal use by the TTS Engine.SPEI_MIN_TTS
Minimum event enumeration value for TTS events.SPEI_MAX_TTS
Maximum event enumeration value for TTS events.SPEI_END_SR_STREAM
The SR engine has finished receiving an audio input stream. LPARAM points to the SR engine's final HRESULT code (see CSpEvent::EndStreamResult). WPARAM points to a Boolean value signifying whether the audio input stream object was released (see CSpEvent::InputStreamReleased).SPEI_SOUND_START
The SR engine determined that audible sound is available through the input stream.SPEI_SOUND_END
The SR engine has determined that audible sound is no longer available through the input stream, or that the sound stream has been inactive for a period.SPEI_PHRASE_START
The SR engine is starting to recognize a phrase. Note that this MUST be followed by either an SPEI_FALSE_RECOGNITION or SPEI_RECOGNITION event.SPEI_RECOGNITION
The SR engine is returning a full recognition - its best guess at a text representation of the audio data. LParam is a pointer to an ISpRecoResult object (see CSpEvent::RecoResult).SPEI_HYPOTHESIS
The SR engine is returning a partial phrase recognition - effectively its best guess up to that point in the stream. LParam is a pointer to an ISpRecoResult object (see CSpEvent::RecoResult).SPEI_SR_BOOKMARK
A Bookmark event is returned when the SR engine has processed to the stream position of a bookmark. lParam is an application specified value set using ISpRecoContext::Bookmark. wParam is SPREF_AutoPause if ISpRecoContext::Bookmark was called with SPBO_PAUSE, and NULL otherwise.SPEI_PROPERTY_NUM_CHANGE
An SR engine supported property was changed. LPARAM is a string pointer to the property name that changed (see CSpEvent::PropertyName]. WPARAM contains the new value (see CSpEvent::PropertyNumValue).SPEI_PROPERTY_STRING_CHANGE
LPARAM is a string pointer to the property name that changed (see CSpEvent::PropertyName). Immediately following the NULL-termination of the property name is the new property value (see CSpEvent::PropertyStringValue).SPEI_FALSE_RECOGNITION
Apparent speech without valid recognition. An SR engine can optionally return a result object, which will be referenced by the LPARAM member (see CSpEvent::RecoResult).SPEI_INTERFERENCE
The SR engine determined that the sound stream has a hindrance and is preventing a successful recognition. lParam is any combination of SPINTERFERENCE flags (See CSpEvent::Interference).SPEI_REQUEST_UI
The SR engine's request to display a specific user interface. LPARAM is a null-terminated string (see CSpEvent::RequestTypeOfUI). Microsoft engines do not support display of graphical user interfaces (GUIs) in the Speech Platform. Calls to any ::DisplayUI method will fail.SPEI_RECO_STATE_CHANGE
The recognizer state has changed. WPARAM is the new recognizer state (see SPRECOSTATE and CSpEvent::RecoState).SPEI_START_SR_STREAM
The SR engine has reached the start of a new audio stream.SPEI_SR_AUDIO_LEVEL
The audio input stream object fires this event. wParam is the currentaudio level from zero to 100.SPEI_SR_RETAINEDAUDIO
Returns the audio that was sent to the recognizer.SPEI_RECO_OTHER_CONTEXT
A recognition was sent to another context.SPEI_SR_PRIVATE
Reserved for private/internal use by the SR engine.SPEI_ACTIVE_CATEGORY_CHANGED
The active category on the speech recognizer has changed. wParam and lParam are null.SPEI_RESERVED5
Reserved for system use.SPEI_RESERVED6
Reserved for systems use.SPEI_MIN_SR
Minimum event enumeration value for speech recognition events.SPEI_MAX_SR
Maximum event enumeration value for speech recognition events.SPEI_RESERVED1
Reserved for internal use by the Speech Platform. See SPFEI Remarks section.SPEI_RESERVED2
Reserved for internal use by the Speech Platform. See SPFEI Remarks section.SPEI_RESERVED3
Reserved for future use, do not use.