Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Phonetic Alphabet Reference (Microsoft.Speech)

A phonetic alphabet contains combinations of letters, numbers, and characters which are known as “phones”. A phone represents a discrete sound in a spoken language. Phones are used to create phonetic spellings that determine how a word should be pronounced to be recognized or spoken. Microsoft.Speech supports three phonetic alphabets:

  • International Phonetic Alphabet (IPA). A system of phonetic notation based primarily on the Latin alphabet, devised as a standardized representation of the sounds of spoken language. You can use this phonetic alphabet to specify pronunciations for any language.

  • Universal Phone Set (UPS). A machine-readable phonetic alphabet, created by Microsoft, which is based on the International Phonetic Alphabet (IPA). You can use this phonetic alphabet to specify pronunciations for any language except those that use the SAPI phonetic alphabet, see the next item.

  • Speech API (SAPI) Phone Set. The pronunciation alphabet used in Microsoft.Speech for the following languages:

Language-Culture Code

Language Name

Language ID

zh-TW

Chinese (Taiwan)

404

zh-CN

Chinese (PRC)

804

en-US

English (United States)

409

fr-FR

French (Standard)

40c

de-DE

German (Standard)

407

jp-JP

Japanese

411

es-ES

Spanish (Spain, Traditional Sort)

40a

Phone Tables

Humans create speech sounds by generating airflow with one or more of the lungs, ribs, diaphragm, larynx, tongue, or cheeks and by modifying the airflow in the vocal tract. Typically, some part of the tongue moves relative to some part of the roof of the mouth to restrict the airflow in varying degrees.

From greatest to least stricture, speech sounds may be classified as stop consonants (with occlusion, or blocked airflow), fricative consonants (with partially blocked and therefore strongly turbulent airflow), approximants (with reduced airflow but no turbulence), and vowels (with full unimpeded airflow). Affricates are sequences of stop plus fricative that behave as a single phoneme.

This section contains lists of phones for each of the speech sound classifications. The tables encompass the phonetic alphabets that Microsoft.Speech supports, and include Unicode and ASCII equivalents, where applicable.