Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Introduction to XML Grammar Elements (Microsoft.Speech)
An XML-format grammar consists of rule elements which define the speech input that a speech recognition engine will recognize. Rule elements contain the sets of words or phrases that the speech recognition engine uses to match user input, and also specify the required sequence of phrases that a user can speak. Any of the grammar elements or unmarked text sequences within a rule element is called a rule expansion.
A grammar rule must contain at least one rule expansion that contains text that a user can speak. You place elements, such as item elements, token elements, and ruleref elements (which contain references to other rules, including those in other grammars) in a specific sequential order. This allows grammars to offer multiple variations of word combinations that can be recognized.
Examples
The following information describes some commonly used grammar elements.
item element. May contain a word that can be spoken, a ruleref element, a tag element, or any logical combination of these.
When an item element contains a combination of rule expansions (for example, a combination of words), the sequence of the words in that item element must match the sequence of the words spoken by the user for recognition to be successful. For example, given the following grammar, the spoken input must contain the phrase "metallic red" for recognition to be successful.
<grammar version="1.0" xml:lang="en-US" mode="voice" root="ruleColors" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0"> <rule id="ruleColors" scope="public"> <item> metallic red </item> </rule> </grammar>
See item Element (Microsoft.Speech) for more information.
one-of element. Contains a set of alternative rule expansions, any of which can be used to recognize spoken input. This increases the flexibility of the grammar by requiring that the input match only one of the alternatives. For example, in the following grammar, the input must contain the initial phrase "I would like the car in" for recognition to be successful. However, the phrase can be completed by any of the words: "red", "white", or "green".
<grammar version="1.0" xml:lang="en-US" mode="voice" root="ruleColors" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0"> <rule id="ruleColors" scope="public"> <item> I would like the car in </item> <one-of> <item> red </item> <item> white </item> <item> green </item> </one-of> </rule> </grammar>
See one-of Element (Microsoft.Speech) for more information.
ruleref element. Specifies a pointer to another rule that spoken input must match as part of a successful recognition of the current rule.
Grammars reference rules using ruleref elements.The following example defines a rule element named ruleColors that contains alternative selections for colors. The root rule, buyShirt, then uses a ruleref element to reference the ruleColors rule twice.
<grammar version="1.0" xml:lang="en-US" mode="voice" root="buyShirt" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0"> <rule id="buyShirt" scope="public"> <item> Get me a <ruleref uri="#ruleColors" /> shirt and a <ruleref uri="#ruleColors"/> tie </item> </rule> <rule id="ruleColors" scope="public"> <one-of> <item> red </item> <item> white </item> <item> green </item> </one-of> </rule> </grammar>
The customer requests a color item twice, but the grammar need define ruleColors only once.
See ruleref Element (Microsoft.Speech) for more information.