Note

Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.

Microsoft Speech Platform

Microsoft Speech Platform Overview

The Microsoft Speech Platform consists of an Application Runtime that provides speech functionality, an Application Programming Interface (API) for managing the runtime, and Runtime Languages that enable speech recognition and speech synthesis (text-to-speech or TTS) in specific languages. Using the Microsoft Speech Platform, you can add speech recognition and text-to-speech (TTS) functionality to enhance users' interaction with your applications.

The following diagram presents an overview of the Speech Platform.

Speech Platform Overview Diagram

Figure 1. Overview of the Microsoft Speech Platform

Speech Platform features compared with SAPI

The Speech Platform API and runtime are derived from the Speech API (SAPI) and the speech runtime in Windows, but are primarily intended to support speech applications running as standalone services. As a result, there are some important differences. The following table lists key functional areas of the Speech Platform and describes the principal differences between the Speech Platform and the speech functionality in Windows, as provided by SAPI.

Feature Microsoft Speech Platform Windows/SAPI
Speech Runtime Available for download from the Microsoft Download Center. Can be redistributed with your applications. Requires the Microsoft Speech Platform SDK and one or more Runtime Languages. Included in Windows Vista, Windows 7, and Windows Server 2008. Not redistributable.
Language support and speech engines Supports 26 languages for speech recognition and text-to-speech using redistributable Runtime Languages that you can download to enable speech recognition and TTS for specific languages. You can redistribute these Runtime Languages with your applications. Does not support the speech engines that are included in Windows. Supports 8 languages for speech recognition and 3 languages for text-to-speech using the speech engines that are included in Windows Vista, Windows 7, and Windows Server 2008. Speech engines are not redistributable.
Speech engine access Supports exclusive access to a speech engine by one application. Supports shared access to a speech engine by multiple applications and exclusive access by one application.
Speech Recognition Optimized to understand variations in speech patterns from a diverse population of users for any given language. Optimized to train speech recognition for a specific user.
Grammars Supports command-and-control grammars that define the vocabulary that is meaningful to the application. Supports free-text dictation and command-and-control grammars.
Audio format Supports 8-bit audio, automatically downsamples audio files of higher resolution. Supports 16-bit audio.
Native-code Application Programming Interface (API) Speech Platform Native-Code API, included in Microsoft Speech Platform SDK. Requires the Microsoft Speech Platform Runtime and one or more Runtime Languages. SAPI 5.4 for Windows 7 and Windows Server 2008.
SAPI 5.3 for Windows Vista.
Managed-code API Microsoft.Speech namespaces, included in the Microsoft Speech Platform SDK, available for download from the Microsoft Download Center. Requires the Microsoft Speech Platform Runtime and one or more Runtime Languages. System.Speech namespaces in the .NET Framework versions 3.0 and higher.

Table 1. Comparison of Speech Platform with Windows/SAPI Speech Elements

Runtime Languages

The Microsoft Speech Platform Runtime 11 and Microsoft Speech Platform SDK 11 do not include Runtime Languages for speech recognition or speech synthesis. A Runtime Language includes the language model, acoustic model, and other data necessary to provision a speech engine to perform speech recognition or speech synthesis (TTS, text-to-speech) in a particular language. You must download a Runtime Language for each language in which you want to perform speech recognition or to generate synthesized speech.

The Runtime Languages are different for each version of the Speech Platform Runtime. You must download the Runtime Language version that matches the version of the Speech Platform Runtime that you have installed. The Runtime Languages for the Speech Platform SDK 11 are redistributable and are different than the languages that ship with Windows Vista or Windows 7. Use the following link to download Runtime Languages for use with the Speech Platform Runtime 11 and Speech Platform SDK 11:

Language Support

To get started using the Speech Platform native-code API, see Speech Platform Programming Guide.

There is also a managed code API for the Speech Platform for developers who prefer to program in C# code with .NET Framework objects. The managed-code API omits some of the low-level functionality of the native code in return for more efficient programming of speech functionality, and provides ample programming control for all but the most rigorous speech application requirements. For documentation of the managed-code API for the Speech platform, see Microsoft Speech Programming Guide.

Operating Systems for Development and Deployment

Development of Speech Platform applications is supported on:

  • Windows Vista or later
  • Windows 2003 Server or later
  • Windows 2008 Server or later

Deployment of Speech Platform applications is supported on:

  • Windows 2003 Server or later
  • Windows 2008 Server or later