Making Applications Easier to Build

  Microsoft Speech Technologies Homepage

The Microsoft Speech Application SDK (SASDK) enables developers to build speech-enabled Web applications faster and more easily. The keys to making speech application building fast and easy are:

  • The SASDK Visual Studio .NET 2003 Tool Suite: Richly-featured tools for creating and debugging all components of a speech-enabled Web application.
  • ASP.NET Application Speech Controls: Robust ASP.NET controls that incorporate complex logic for common application scenarios.

The SASDK Visual Studio .NET 2003 Tool Suite

The components of the SASDK are integrated directly into Microsoft Visual Studio .NET 2003, resulting in a comprehensive development environment for building ASP.NET speech-enabled Web applications using Visual Studio .NET 2003 or Visual C# .NET. The SASDK contains three tools designed to create the grammars, prompts, and speech-enabled Web pages, and a set of tools for debugging speech-enabled Web applications.

Tool Description
Speech Grammar Editor Speech Grammar Editor is a graphical grammar file editor. Use Speech Grammar Editor to create grammars, and then bind grammars to a Speech QA or Command control. Speech Grammar Editor also validates grammar files, verifying that the files contain valid XML code, and that they are compliant with the W3C Speech Recognition Grammar Specification V. 1.0, without needing to build the entire application.
Speech Prompt Editor Speech Prompt Editor is a data entry and editing tool for prompt databases. Use it to create a prompt database containing recording scripts, .wav files, prompt descriptions, and to track data for each prompt. It includes the Prompt Validation Tool, which checks prompt coverage, and allows testing of extractions in sample combinations. Speech Prompt Editor includes a graphical Wave Editor to customize .wav files. Use Wave Editor to display the wave form, edit word boundaries, copy and paste individual sound segments within and across .wav files, and play the edited .wav files.
Speech Control Editor Speech Control Editor is a design tool for speech-enabled Web pages. Use Speech Control Editor to place Speech Controls on a Web page and combine them with Web controls, grammars, prompts, answers and confirms.
Tools for Debugging Speech Applications These include Speech Debugging Console (which can run both within the Visual Studio .NET 2003 programming environment, or as a stand-alone application outside Visual Studio), Speech Debugging Console Log Player, and Telephony Application Simulator.

ASP.NET Application Speech Controls

Application Speech Controls are ASP.NET controls that contain linguistic information that applies to common Web application scenarios. Common scenarios include those in which users pick a date, input an amount in dollars, provide a ZIP Code, or select an item from a list. Application Speech Controls prompt the user with a relevant initial question, retrieve values, and perform all necessary confirmation in case of uncertainty.

Application Speech Controls are built from one or more Dialog Speech Controls like QA and Command, and use the Voice Mode Grammar Library included with the SASDK. The Speech QA control is the primary Dialog Speech Control. The QA control models a single human-computer speech interaction. Its properties provide the mechanisms to prompt the user, recognize a response, and bind elements of the recognition results to semantic items. The Voice Mode Grammar Library contains the rules that are necessary to restrict and recognize speech input for many common application scenarios.

Although the QA control is the primary building block of many Application Speech Controls, Application Speech Controls model a more complex interaction than the QA control. In addition to the basic features of the QA control, Application Speech Controls contain built-in mechanisms to handle speech events such as mumbling or silence that may occur during a dialogue. When Application Speech Controls detect one of these events, they prompt the user with a prompt designed to respond to that specific event. They also prompt the user to confirm the accuracy of a speech recognition result when the confidence of that recognition result is below an acceptable threshold.

The following table lists Application Speech Controls that ship with the SASDK.

Application Speech Control Name Description
DataTableNavigator Supports navigation through and within tables of data by means of commands like "Next," "Back" and "Repeat."
ListSelector Supports the selection of data from lists.
AlphaDigit Collects a string of numbers and letters.
CreditCardDate Collects a credit card date.
CreditCardNumber Collects a credit card number and type.
Currency Collects an amount in U.S. dollars.
Date Collects a date.
NaturalNumber Collects and validates a natural number within an upper and a lower boundary.
Phone Collects a U.S. telephone number and extension.
SocialSecurityNumber Collects a U.S. Social Security number.
YesNo Collects a Yes or No answer.
ZipCode Collects a U.S. ZIP Code and extension.

For more details, see the Application Speech Controls section.

Why Use Application Speech Controls?

A primary design goal of Application Speech Controls is to extend the ease and speed of development provided by Dialog Speech Controls to more complex dialogues. For many application scenarios, implementing a speech-enabled user interface is more difficult than implementing the same scenario in a graphical user interface (GUI). Although a few QA controls, or even a single QA control can speech-enable some scenarios, other scenarios are more complex, and require a series of QA controls to properly model the complete dialogue of the scenario.

Consider the complex scenario in which the user requests the application to retrieve a date or range of dates. Dates are ubiquitous in many of today's most popular Web applications. Dates are required data in airline booking, expense reporting, and calendar applications. Speech-enabling date retrieval is very complex due to the need to manage the potential for misrecognitions of each element of a date (day, month, year) as well as the confirmation of each element. The grammar for retrieving a date is also very complex. There are many ways to say a date. Allowing for the recognition of the very large number of ways to say a date requires a comprehensive grammar.

Application Speech Controls extend the ease and speed of development by:

  • Incorporating built-in features that manage many of the frequent events occurring in human-computer speech interaction.

Application Speech Controls contain linguistic components such as built-in prompts and grammars that make them directly applicable to common Web application scenarios. They also contain built-in mechanisms to manage speech events and confirm input with the user. The built-in components enable developers to focus development effort on the function and flow of their applications. They help reduce or even eliminate the development time that building and debugging complex grammars and prompting strategies requires.

  • Permitting easy customization in order to adapt controls to specific application needs.

Adjusting the properties of Application Speech Controls allows developers to customize the controls to fit the needs of their applications. Properties can be used to control prompting, modify the control's behavior when speech events occur, or change the way the control binds recognition results. Developers can tune the flow of the application by adjusting recognition time-out values.

  • Integrating seamlessly into the Visual Studio .NET 2003 development environment.

Learning to use and customize Application Speech Controls is fast and easy. Application Speech Controls share the advantages of ASP.NET controls:

  • They are available in the Visual Studio .NET 2003 Toolbox.
  • They can be dragged onto Web Forms.
  • Their properties are set using the Properties window.

Because of this integration, Visual Studio developers do not need to install and learn a new development environment or programming language in order to use these controls.

The following figure illustrates the Speech Toolbox for Visual Studio. The toolbox contains Dialog Speech Controls discussed previously, as well as Application Speech Controls.