Envisioning the Power of the Speech-Enabled Web
Developers who build ASP.NET Web applications with the Microsoft Speech Application SDK Version 1.1 (SASDK) can add the naturalness and power of speech interaction to their applications. Using the SASDK, developers can create applications that enable users to:
- Speak, and listen to a speech-enabled ASP.NET Web application using a telephone connection (a type of voice-only interaction).
- Speak, listen, and press keys or on-screen objects to interact using a speech-enabled Web application running in a visual browser on a desktop PC or mobile device (a multimodal interaction).
Companies that build speech-enabled Web applications with the SASDK can reduce the cost of building and deploying services that deploy on both Interactive Voice Response (IVR) telephony and Internet browser-based systems. This cost reduction is possible because applications built with the SASDK for voice-only and multimodal systems can share a common code base, despite their different user interfaces. A common code underlying base reduces costs by reducing both application development time and application maintenance time.
This document describes a scenario that illustrates how the naturalness and power of a speech interface can add value to an organization's operations, and then outlines several types of applications to which speech can add value.
A User Experience Scenario
The following hypothetical scenario illustrates the power of the speech-enabled Web. This scenario illustrates how a user can use both a simple telephone and a smart device with a browser to interact with a single, speech-enabled database application. Lucy is a busy sales executive for Northwind Traders and is constantly traveling. In this scenario, she uses her wireless connected Windows Mobile-based Pocket PC 2003 (Pocket PC) and her cell phone.
While waiting in a meeting room, Lucy plans her next sales call. She logs on to her corporate intranet sales site, and navigates to the database search page of the Sales department. The Web page displays a speech icon that is associated with a speech-enabled control, and a button labeled Search Sales Database. The page also displays various controls: a text box labeled Company Name, a drop-down menu labeled Record Type, and a set of numbered radio buttons collectively labeled Number of Records. None of these controls displays any data.
Using her stylus, Lucy taps the speech icon and says, Show the last ten sales for Wingtip. The application recognizes her request and parses the semantic values of the recognized speech. The application displays the name "Wingtip Toys" in the Company Name text box, and displays the word "Sales" in the Record Type drop-down menu, and selects the Number of Records radio button labeled "10." Then Lucy clicks Search Sales Database, and the application displays the ten most recent sales to Wingtip Toys.
Lucy taps the speech icon again and requests a list of the contacts for the Wingtip Toys account. She adds some new information to the database, turns her smart device off, and leaves for lunch and her next appointment.
An hour later, as Lucy is driving in her car, she realizes that she has forgotten the address of the company for her next appointment. She uses her cell phone to call the same corporate sales site that she had contacted earlier with her Pocket PC. Her call is connected using Telephony Application Services. Because Lucy's telephone sends a caller ID number, the application knows that Lucy is the person calling. Telephony Application Services works together with the Web server and Microsoft Speech Server, and initiates the following dialogue with Lucy.
Telephony Server: | Hello Lucy. Welcome to the Northwind Traders database search portal. Please say or enter your passport identification number. |
Lucy: | 74532 |
Telephony Server: | Lucy, you have two new opportunities on an account, do you… |
Lucy: | Give me the address for Tailspin Toys |
Telephony Server: | Tailspin Toys. The main address is 1234 Rodeo Drive, Seattle, Washington. |
Lucy: | Goodbye. |
Business-to-Consumer Telephony Applications
Typically, business-to-consumer telephony applications perform at least one of four generic functions. These functions are described in the following table.
Function | Description |
Call Redirection | A customer wants a solution to a problem at a company, but is not sure who can help him. The customer calls the company, is presented with a welcome message, answers a few questions, and is directed to the appropriate department. A customer's call may even be directed by another speech application that is designed to serve the customer's specific needs. Throughout the interaction, the customer is able to contact an operator at any time. |
Information Retrieval | A customer wants a specific piece of information. The application might require customer identification. And after being identified, the customer might answer a moderate number of questions. Based on the customer's input, the application retrieves a specific piece of information from a very large information database. |
Order Transaction | A customer wants to perform a transaction. The customer calls a company, is presented with a welcome message, and identifies himself, using a personal identification number (PIN), a credit card number, or some other form of identification. The customer answers a moderate number of questions to fill out a transaction form, and then submits the transaction. Throughout the interaction, the customer is able to contact an operator at any time. |
Alert Notification | A customer asks for notifications regarding specific topics of interest. For example, a customer may want to be notified when the balance in his checking accounts falls below a certain level. The application interacts with a notification service, such as a Web service, to obtain specific notifications. When the application receives a notification, it calls the customer and presents the information. The application may enable the customer to answer a small number of questions in order to receive additional information. |
Many businesses have call centers with agents answering calls from customers who want to speak to a sales representative, a customer service representative, or a customer support representative. The cost of running a call center can be very expensive. The Speech Platform enables businesses to automate many sales, service, and support calls, significantly reducing operating costs and increasing the amount of time that agents can devote to higher-value work. For example, call center agents at a large bank may spend a significant amount of time during their day answering questions regarding account status. An application based on the Speech Platform can handle callers' account balance inquiries quickly and naturally, and pass the caller to a human agent only if necessary.
Other businesses may want to extend the functionality available on their Web site to customers to:
- Use cell phones
- Buy goods
- Check order status
- Receive notifications
An application based on the Speech Platform can enable businesses to implement this kind of extended functionality with a natural speech user interface and avoid the expense of running a 24-hour call center. For example, by extending functionality in this way, an airline company can telephone members of its executive club to notify them of a flight delay for a flight on which they have reservations that day.
Business-to-Employee Telephony or Multimodal Applications
Businesses with mobile work forces can greatly increase productivity by enabling their employees to access and enter crucial information when they are away from the office, using only a telephone or Pocket PC.
Mobile Intranet Access
Using a telephone connection, an intranet application can prompt an employee to answer questions that identify needed information, and then speak the requested information to the employee. For example, a sales person might want information about the most recent orders placed by a customer. Using an application built with the SASDK, the sales person can call from a mobile phone to a corporate portal number and receive information about that customer.Using a multimodal device, employees can fill out multiple fields in on-screen forms by speaking a complete sentence, and then watch as the application populates all of the form fields with the correct data. For example, a service engineer might want to receive process information on a computer that is under repair. Using a wireless-connected Pocket PC, the engineer can browse to an intranet site, speak a failure diagnosis, the computer's network name and model number in a single utterance. An application built with the SASDK recognizes the speech, determines the meaning of what the engineer said, and then display a set of service instructions that are appropriate for the specific computer and failure diagnosis.
Unified Communications/Messaging
Using a telephone connection, an employee can call an enterprise data center to manage voice mail, e-mail, and calendar appointments. For example, an insurance agent who is traveling might want to check his voice mail, and use the information in the voice mail to set a customer appointment. Imagine that a customer calls her insurance agent, and leaves a voice message telling the agent when she can meet with him. While the agent is out of his office, he can call his company's data center, hear the message from the customer, and then tell the application to enter an appointment into his calendar for the day and time that the customer requested. The agent can even tell the application to send an e-mail to his customer, confirming the date and time for the appointment.
Rich Client Multimodal Applications
Web Kiosk
An organization can place kiosks equipped with a headset and microphone throughout a location that they serve. Organization members, customers, clients, or visitors can use the kiosks to retrieve information or transact orders. For example, a university might place kiosks throughout the student union lobby. Students can use the kiosk to view course descriptions, view their schedules and grades, enroll in classes, or make tuition payments.