Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Compile Grammar Input and Output File Format
Compile Grammar converts XML-format grammar files to binary grammar files with the “.cfg” extension. Input files may conform to the Speech Recognition Grammar Specification (SRGS) Version 1.0 or to the Advanced Research Projects Agency (ARPA) format.
The output of the Compile Grammar tool can be used as input to the Prepare Grammar tool, and as a convenient format for transporting grammar files.
Input File Format
The following describes the input file formats for Compile Grammar.
SRGS Format
The default input file format for Compile Grammar is an SRGS-compliant grammar file in XML format.
A valid XML-format grammar document consists of a legal header followed by a body consisting of a set of legal rule definitions. A legal header in a grammar document must include the xml declaration element and may include an optional DOCTYPE declaration element. These are followed in the header by the root grammar element. Grammar files used as input to the Compile Grammar tool must contain the following required attributes in the root grammar element for a grammar to compile successfully:
Attribute |
Description |
Example |
---|---|---|
Version |
Attribute of the grammar element. The version of the specification implemented by the grammar. |
version="1.0" A grammar that complies with SRGS Version 1.0 must declare the version to be "1.0". |
Xmlns |
Attribute of the grammar element. The URI of the namespace for the grammar. |
xmlns="http://www.w3.org/2001/06/grammar" |
Xml:lang |
Attribute of the grammar element. The primary language contained by the document and optionally a country or other variation. |
xml:lang="en-US" The xml:lang attribute is required if grammar mode="voice" or if the mode attribute is omitted, in which case the value defaults to voice. The xml:lang attribute is not required if grammar mode="dtmf". |
Note
Grammar files in Augmented Backus-Naur Form (ABNF) format are not supported.
Example
The following is an example of a valid XML-format, SRGS-compliant grammar document that contains only the minimum required elements and attributes in the document’s header, and a simple rule in the body of the document.
<?xml version="1.0"?>
<grammar version="1.0"
xmlns="http://www.w3.org/2001/06/grammar"
xml:lang="en-US">
<rule id="main">
<one-of>
<item>hello</item>
<item>world</item>
</one-of>
</rule>
</grammar>
The document header ends and the body of the grammar document begins with the first rule element. Grammar files used as input to the grammar compiler can contain optional elements and attributes in the document header, see SRGS Grammar XML Reference (Microsoft.Speech). For more information about elements and attributes in SRGS grammars, see Speech Recognition Grammar Specification Version 1.0.
ARPA Format
Optionally, Compile Grammar accepts input files that conform to the Advanced Research Projects Agency (ARPA) format. The ARPA format expresses language models using n-grams. N-gram language models are traditionally used in speech recognition systems based on large vocabularies. The following table lists and describes the components that comprise an ARPA file:
Tag |
Description |
---|---|
<header> |
Optional. Contains comments that describe the language model. Applications will ignore any information in it. |
\data\ |
Required. Marks the beginning of the language model data. This is followed immediately by the number of unigrams, bigrams, and trigrams that the language model contains. Compile Grammar ignores any content before the \data\ tag. |
ngram 1 |
Required. The number of unigrams in the language model. |
ngram 2 |
Required. The number of bigrams in the language model. |
ngram 3 |
Required. The number of trigrams in the language model. |
\1-grams |
Required. Marks the beginning of the enumeration of unigrams in the language model. |
\2-grams |
Optional. Marks the beginning of the enumeration of bigrams in the language model. |
\3-grams |
Optional. Marks the beginning of the enumeration of trigrams in the language model. |
<unk> |
Optional. Represents an unknown word. Tag must be in lower case. |
<s> |
Optional. Represents the beginning of a sentence. This is counted as a word in the n-gram. For example, “<s> When” is a bigram. Tag must be in lower case. |
</s> |
Required. Represents the end of a sentence. This is counted as a word in the n-gram. For example, “go up </s>” is a trigram. Tag must be in lower case. |
\end\ |
Required. Marks the end of the language model data. |
Note
Tags other than those listed in the table cannot be used.
Example
The following is an example of an ARPA file. See the remarks for a discussion about the significance of the numbers.
<header - information ignored by applications>
\data\
ngram 1=9
ngram 2=11
ngram 3=3
\1-grams:
-0.8953 <unk> -0.7373
-0.7404 </s> -0.6515
-0.7861 <s> -0.1764
-1.0414 When -0.4754
-1.0414 will -0.1315
-0.9622 the 0.0080
-1.4393 Stock -0.3100
-1.0414 Go -0.3852
-0.9622 Up -0.1286
\2-grams:
-0.3626 <s> When 0.1736
-1.2765 <s> the 0.0000
-1.2765 <s> Up 0.0000
-0.2359 When will 0.1011
-1.0212 will </s> 0.0000
-0.4191 will the 0.0000
-1.1004 the </s> 0.0000
-1.1004 the Go 0.0000
-0.6232 Stock Go 0.0000
-0.2359 Go Up 0.0587
-0.4983 Up </s>
\3-grams:
-0.4260 <s> When will
-0.6601 When will the
-0.6601 Go Up </s>
\end\
Remarks
N-gram entries in ARPA files must include numbers that express recognition probabilities. The number to the left of the n-gram is called probability, and must always be present. The number to the right is called back-off probability, and may be omitted for some n-gram entries.
A probability in an ARPA file will always appear in its log (base 10) format. This is a value from -∞ to 0, which when processed and converted to its actual value will be from 0 to 1. A back-off value is specified the same way.
Here is a 2-gram from the example above:
-0.2359 Go Up 0.0587
The number on the left (probability) is the log probability that the word “Up” will be recognized, given that the word “Go” has been recognized. The actual probability is 10-0.2359. Similarly, the probability to the left of a trigram “A B C” is the probability that the recognition engine will recognize “C”, given that it has just recognized an “A” followed by a “B”. This is written as P(C|A B). For unigrams, the number on the left is just the probability that the given word will be recognized.
The number on the right (back-off probability) is the probability that the following state for an n-gram is not defined by any of the (n+1)-grams. Consider the following entry from the ARPA example above:
-1.0414 Go -0.3852
There is only one state that follows this unigram. That is, there is only one bigram defined in the example that begins with “Go”, in this case “Go Up”. The back-off probability that the recognition engine will NOT recognize “Up”, given that it has already recognized “Go”, is 10-0.3852.
If the ARPA example above also included bigrams for “Go Down” and “Go Left”, then the back-off probability for “Go” would be the probability that the recognition engine will NOT recognize “Up” or “Down” or “Left” given that it has just recognized “Go”. The back-off probability for “Go” added to the probabilities for “Go Up”, “Go Down”, and “Go Left”, when converted to their actual values, will equal 1.
For a given n-gram, the sum of its back-off probability and of all the probabilities of the n+1 grams that start with the same word/phrases specified in n-gram, ideally should sum to 1. However, Compile Grammar does not verify this so they are essentially treated as weights.
It also follows that for any n-gram “A B C”, if there is no n-gram “A B C X”, then there is no need to specify a back-off probability for “A B C”. If you do specify one it will be ignored. This is illustrated in the example above by the bigram “Up <s>” and by all of the trigrams.
Back-off weights also need to be adjusted to account for the followers from the state to which you are backing off. The description of that process is outside the scope of these remarks.
ARPA files for consumption by Compile Grammar must also adhere to the following guidelines for structure and syntax:
The enumeration of unigrams must come before the enumeration of bigrams, and so forth.
N-grams of dimension greater than 3 (for example, 4-grams) can be specified but will be ignored by this version of Compile Grammar.
The enumerations within each n-gram set need not follow any specific order.
Lines must be separated by carriage returns, empty lines will be ignored.
Words in n-grams must be separated from each other by one or more blanks or tabs or both.
N-gram entries use the lexical form of a word, and are case-sensitive. This matches the lexical form used in the dictionary file.
Compile Grammar supports ANSI and UNICODE text formats.
Output File Format
The output of Compile Grammar is a binary grammar file with the “.cfg” extension. Grammar files with the “.cfg” extension are optimized for the Microsoft Speech Platform Runtime 11 and are ready for conversion to grammar files with the “.cfgpp” extension, either by the Prepare Grammar tool or by a hosted speech recognizer.