Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Microsoft Speech Platform
Create Custom Pronunciations with Lexicons
Lexicons contain the mapping between the written representations and the pronunciations of words or short phrases. Speech engines have an internal lexicon that specifies pronunciations for a large number of words in a language. An application can use lexicons to create custom pronunciations that may improve the accuracy of speech recognition or speech synthesis for its specialized vocabulary. Typically, you will only need to create custom pronunciations for words that are not common to a language and that do not follow the normal pronunciation rules for the orthography of a language. See Using Custom Pronunciations for more information.
In the Microsoft Speech Platform, you create lexicons as XML documents that conform to the Pronunciation Lexicon Specification (PLS) Version 1.0. You can then link a PLS lexicon to an XML grammar that conforms to the Speech Recognition Grammar Specification (SRGS) Version 1.0 for speech recognition, or to an XML prompt that conforms to the Speech Synthesis Markup Language (SSML) Version 1.0 for speech synthesis. When your application loads a grammar or a prompt that contains a link to a lexicon, the speech engine will use the pronunciations specified in the custom application lexicon instead of those in its internal lexicon.
Examples
The following is an example of a simple PLS lexicon that defines the pronunciation for a single word, "blue". The pronunciation is specified in the phoneme element using characters from the Universal Phone Set (UPS), and corresponds to the word spelling "blee".
`
<?xml version="1.0" encoding="UTF-8"?>`<lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="x-microsoft-ups" xml:lang="en-US">
<lexeme> <grapheme> blue </grapheme> <phoneme> B L I </phoneme> </lexeme>
</lexicon>
When the lexicon above is linked to an SRGS grammar or an SSML prompt, a speech engine will use the pronunciation "B L I" (blee) each time it encounters the word "blue". For example, given the following SSML prompt, a speech synthesis engine will speak: "My favorite color is blee."
`
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">`<lexicon uri="c:\Test\Blue.pls" type="application/pls+xml"/>
My favorite color is blue.
</speak>
When the lexicon above is linked to an SRGS grammar, as shown below, the speech recognition engine will recognize the speech input "blee", but will return "blue" as the recognized text. This grammar will probably not recognize the speech input "blue". If it does, it will be with much lower confidence than when recognizing the speech input "blee".
`
<?xml version="1.0" encoding="UTF-8"?>`<grammar version="1.0" mode="voice" root="colors" xml:lang="en-US" tag-format="semantics/1.0" sapi:alphabet="x-microsoft-ups" xml:base="https://www.contoso.com/" xmlns="http://www.w3.org/2001/06/grammar" xmlns:sapi="https://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
<lexicon uri="C:\Test\Blue.pls" />
<rule id="colors" scope="public"> <one-of> <item> blue </item> <item> yellow </item> </one-of> </rule>
</grammar>