Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
SML Output Overview (Microsoft.Speech)
On recognition, the semantic interpretation mechanism of a speech recognition engine returns a semantic result to the speech application. The semantic result of a recognition is the value of the root Rule Variable (RRV) in the grammar that the speech recognition engine used to perform the recognition.
A semantic result typically contains information that is more useful to the application than only the text of the utterance. For example, a grammar for specifying an airport might recognize the utterance "Heathrow Airport" and generate the airport code "LHR" as the semantic result. Scripts contained within tag elements inserted in the input grammar generate the content of a semantic result. The speech recognizer serializes the script products and generates the semantic result in the form of Semantic Markup Language (SML) output.
SML is a container format for holding the semantic information that semantic interpretation markup in the input grammar generates. SML output is a valid XML document in which the top-level element is named SML. The SML element can have zero, one, or more child elements, depending on whether the input grammar contains markup for semantic interpretation.
SML Element Content and Structure
The following list outlines the attribute-value pairs that the semantic interpreter automatically includes in the start tag of the SML element:
text-a string consisting of all recognized utterance tokens.
utteranceConfidence-a floating point value between 0.000-1.000.
confidence-a floating point value between 0.000-1.000.
The recognizer automatically sets the values of the utteranceConfidence and text attributes. These values identify the confidence score for the full utterance and the recognized text of the utterance. These values cannot be changed by the developer. The recognizer sets the value of utteranceConfidence to 1.000 for results derived from text parsing or if the recognition engine is unable to generate a confidence score for the full utterance.
The recognizer sets the value of the confidence attribute to the value of the utteranceConfidence attribute by default. Using scripts in semantic interpretation markup, developers are able to change the value of this attribute. Developers might want to control the value of this attribute to push a score from a semantically relevant node upwards or to calculate an interpretive score for the semantic result based on criteria other than simple utterance confidence.
Developers can specify the text content of the SML element by assigning to the _value property of the RRV. Developers can also add attributes to the SML element's start tag by creating child properties of the RRV's _attributes property.
The following code block illustrates the general structure of the SML output for the top-level SML element. Strings enclosed in brackets ("[]") describe the values or names that appear where the bracketed strings are located.
<SML text="[utterance_tokens]"
utteranceConfidence="[full_utterance_confidence_score]"
confidence="[value_of_RRV_confidence_property]">
[value_of_RRV]
</SML>
The recognizer serializes input grammars that contain no semantic interpretation markup into SML output that contains only the top-level SML element and its contents. To view an example, see Serialization of the Root Rule Variable (Microsoft.Speech).
Child Element Content and Structure
Using semantic interpretation tags, developers can obtain confidence scores and text content at the rule level. If a developer assigns variables for semantic interpretation to words or phrases in the input grammar, the SML output contains child elements corresponding to each word or phrase associated with the semantic interpretation markup. The inclusion of any semantic interpretation markup in the grammar disables the default behavior of the semantic interpreter. The SML output contains only the elements that are marked for semantic interpretation.
Child elements take the name of the property that produces them. The semantic interpreter automatically includes a confidence attribute in the start tag of child elements. Additional attribute-value pairs can be added to the start tag of child elements using the _attributes property. Use the _value property to set the text content of the child element. The value of a child element can also be an array, in which case each item in the array is enclosed in an item element contained by the child element. For examples, see Serialization of Developer-defined Rule Variable Properties (Microsoft.Speech).
The following example is SML output that illustrates the structure of three different child elements, one of which itself has child elements. The names of the three child elements are Weekday, Ordinal, and Month. The dayNumber and dayAbbrev elements are children of the Weekday element. For each element that is not the child of another element, the confidence of the recognition and the semantic value associated with that element are shown. This example results from speech recognition using the grammar that appears in Referencing Grammar Rule Variables (Microsoft.Speech). The SML code shown here can be generated by using the ConstructSmlFromSemantics() method.
<SML text="the second Saturday in March" utteranceConfidence="0.950" confidence="0.950">
<Weekday confidence="0.998">
<dayNumber confidence="0.998">6</dayNumber>
<dayAbbrev confidence="0.998">Sat</dayAbbrev>
</Weekday>
<Ordinal confidence="0.998">2</Ordinal>
<Month confidence="0.998">3</Month>
</SML>