Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Phonetic Alphabet Reference (Microsoft.Speech)
A phonetic alphabet contains combinations of letters, numbers, and characters which are known as “phones”. A phone represents a discrete sound in a spoken language. Phones are used to create phonetic spellings that determine how a word should be pronounced to be recognized or spoken. Microsoft.Speech supports three phonetic alphabets:
International Phonetic Alphabet (IPA). A system of phonetic notation based primarily on the Latin alphabet, devised as a standardized representation of the sounds of spoken language. You can use this phonetic alphabet to specify pronunciations for any language.
Universal Phone Set (UPS). A machine-readable phonetic alphabet, created by Microsoft, which is based on the International Phonetic Alphabet (IPA). You can use this phonetic alphabet to specify pronunciations for any language except those that use the SAPI phonetic alphabet, see the next item.
Speech API (SAPI) Phone Set. The pronunciation alphabet used in Microsoft.Speech for the following languages:
Language-Culture Code |
Language Name |
Language ID |
---|---|---|
zh-TW |
Chinese (Taiwan) |
404 |
zh-CN |
Chinese (PRC) |
804 |
en-US |
English (United States) |
409 |
fr-FR |
French (Standard) |
40c |
de-DE |
German (Standard) |
407 |
jp-JP |
Japanese |
411 |
es-ES |
Spanish (Spain, Traditional Sort) |
40a |
Phone Tables
Humans create speech sounds by generating airflow with one or more of the lungs, ribs, diaphragm, larynx, tongue, or cheeks and by modifying the airflow in the vocal tract. Typically, some part of the tongue moves relative to some part of the roof of the mouth to restrict the airflow in varying degrees.
From greatest to least stricture, speech sounds may be classified as stop consonants (with occlusion, or blocked airflow), fricative consonants (with partially blocked and therefore strongly turbulent airflow), approximants (with reduced airflow but no turbulence), and vowels (with full unimpeded airflow). Affricates are sequences of stop plus fricative that behave as a single phoneme.
This section contains lists of phones for each of the speech sound classifications. The tables encompass the phonetic alphabets that Microsoft.Speech supports, and include Unicode and ASCII equivalents, where applicable.
Consonants (Microsoft.Speech) are speech sounds that are articulated with complete (voiceless) or partial closure of the vocal tract.
Vowels (Microsoft.Speech) are speech sounds that are articulated with an open vocal tract.
Diacritics (Microsoft.Speech) are used to modify segmental phones (vowels, consonants, clicks, and ejectives) with additional phonetic detail.
Suprasegmentals (Microsoft.Speech) describe the features of a language above the level of individual consonants and vowels, such as prosody, tone, length, and stress.
Clicks and Ejectives (Microsoft.Speech) are voiceless consonants with specific velaric and glottalic airflow features. An example of a click in US English is tsk! tsk!.
Tones (Microsoft.Speech) describe the use of pitch in speech sounds to distinguish lexical or grammatical meaning.
Other Phones (Microsoft.Speech) contains rare phones that are not included in the main IPA consonant table.