Speech-Enabled Fitch and Mather Stocks: Design and Implementation
Vertigo Software, Inc.
Updated August 2004
Applies to:
Microsoft Visual Studio .NET
Microsoft Speech Application SDK
Summary: This article describes the design and architectural decisions for the Speech-Enabled Fitch and Mather Stocks sample application. The document also includes a detailed review and explanation of the code. (36 printed pages)
Download the Speech-enabled Fitch and Mather Stocks sample
Contents
Purpose
Using the Web-Based Application as a Development Blueprint
Designing the Application
Prompt Databases
Lessons Learned
For More Information
Purpose
What is the Speech-Enabled Fitch and Mather Stocks?
The Speech Enabled Fitch and Mather Stocks application (referred to as "FMStocksVoice" for rest of this article) is a voice-only version of the "Fitch and Mather Stocks" Web application, a fictitious online stock trading company. The application is built using the Microsoft Speech Application SDK.
Goals:
- Show how to create a voice-only service from an existing Web application: FMStocksVoice leverages the existing business and data layers of the Fitch and Mather sample it is based on, making only minor modifications. To this end, the Web-based version is included with this sample to illustrate how the two presentation-layers work together simultaneously on the same data. It is possible to place a trade in the voice-only application and see the account change immediately reflected in the Web version.
- Demonstrate best-practice programming and design techniques for using the Speech Application SDK: The Speech Application SDK provides a rich base of tools for developing speech applications. These tools also allow the programmer a great deal of flexibility in making design decisions. The developers of this sample have made it a priority to show a consistent set of best-practices for developing voice-only applications.
Key features:
- Buying a stock.
- Selling a stock from the user's portfolio.
- Getting current stock quotes.
- Browsing through stocks in "Buy Stock," "Sell Stock," and "Quotes" using a voice-based stock browser.
- Viewing current portfolio.
- Account login security using Windows Authentication.
- Leveraging pre-existing business-layer and data-layer code.
This article discusses the FMStocksVoice application in-depth, and provides insight from the perspective of the creators on the process of building voice-only applications in general. It includes lessons learned from the testing, design, and development stages, as well as thoughts about the differences between building visual applications for the Web and speech applications for telephony.
Using the Web-Based Application as a Development Blueprint
FMStocksVoice shares its business and data layers with the Web-based sample. More information on that application, including downloads and documentation, can be found at the Microsoft.com Downloads Center (search on FMStocks 7.0):
Fitch and Mather 7.0 is a fictitious Web-based stock brokerage application. The Web-based version showcases best-practice techniques for the design, implementation, and deployment of a robust enterprise application with excellent performance, scalability, and interoperability.
Code-Reuse
In essence, a voice-only version of an existing application is in fact a new presentation layer. The user interface is now auditory rather than graphical. This means that the business logic and the data layer should essentially remain unchanged.
With a few exceptions, we have followed this concept as a development guideline. In the sample Microsoft Visual Studio .NET solution, note that the FMStocksVoice project file includes a reference to the Components folder in FMStocksWeb (the included Web-based version). Since the two applications share this same code, trades that occur in one interface are immediately reflected in the other.
Figure 1. Code reuse
Suggested Enhancements
While the FMStocksVoice sample application provides the implementation for the core features originally implemented in the Fitch and Mather Stocks sample application, the following are ideas for extending the functionality of the FMStocksVoice application:
- Support New User Registration: In the current FMStocksVoice implementation the user must already have an account in order to use the voice-only application. Extend the FMStocksVoice application to allow the user to create an account. The challenge here would be how to handle the entry of the user's personal information (for example, name, username, password, PIN, and so on.). One solution would be to limit the amount of information required to set up an account.
Designing the Application
The FMStocksVoice system was designed to target more advanced users, who use this system frequently and are familiar with the options and navigation. With this in mind, the system was designed to include advanced features to enable the typical user to navigate the system quickly, yet include enough help and explanation such that the user would not get lost.
Target User and Voice Personality
For the personality of the speaking voice, we had two goals:
- Speed: The recorded sentences should be spoken at a moderate-to-quick pace (since most of the time the user knows what will be said) yet not so quickly that the recordings are mumbled or too difficult to follow.
- Mood: The system's voice should be businesslike and factual. The system should try to avoid a high-pitched voice or a slightly happy accentuation, as the system should appear professional and credible.
Navigation Design
Designing a voice-only system is much different from designing traditional GUI-based applications. Whereas a Web page is a two-dimensional interface, the voice medium is one-dimensional. For example, a table of data on a Web page needs to be read item by item over the phone. As one designer put it, the challenge becomes, "How do you chop things up to establish a coherent flow of information? How do you express content in a way that the user can digest, understand, and then act upon?"
Start with a User-Centered Design Approach
We started our design process by following our standard methodology of user-centered design. The 80/20 rule is a good guide: 80% of the users use 20% of the application. We focused on ideal scenarios and common user paths rather than considering exceptional cases in the preliminary stages. We acted out sample dialogues that helped us get a better sense of how a typical conversation might go.
From these sample dialogues, we began creating flow charts for each major component of the system. The following diagram illustrates the high level flow of the application:
Figure 2. Navigation flow diagram (click thumbnail for larger image)
In addition to the flow diagram above, several global commands are available to the user throughout the application:
- Main Menu: Returns the user to the main menu.
- Help: Provides the user with context-sensitive help text at any prompt.
- Instructions: Provides instructions on the basic usage of the system and global commands available to them at any point.
- Repeat: Repeats the most relevant last prompt. If the last prompt informed the user that his/her input was invalid, the repeat text will provide the user with the previous question prompt instead of repeating the error message.
- Representative: Transfers the user to a customer service representative.
- Goodbye: Ends the call.
In order to buy stock, sell stock, or get a quote on a stock, the user must first choose a company from among a large company list. In order to do this, they must first say a company name, such as "Microsoft Corporation," or a partial ticker symbol, such as "M," to get started. If there is more than one match, the user enters a speech Navigation control and selects which company they want. In each of the three pages mentioned, we use the DataTableNavigator application.
Prompt Design
The design team found the creation of a prompt specification document to be a challenge in itself. The number of paths available to the user at any one prompt leads to a complicated flow-chart diagram that, while technically accurate, loses a sense of the conversation flow that the designers had worked to achieve. The design team arrived at a compromise specification that allowed them to illustrate an ideal scenario while also handling exceptions. The following example illustrates the beginning of the "Buy Stock" scenario from the main menu:
Table 1. Prompt: Main Menu
Expected User Input | "Buy Stock" |
---|---|
Recognition | System Response |
Recognized Expected Input | Please say the name of the company or spell the ticker symbol of the company that you are interested in. You may also say Main Menu, Help, or Representative at any time. |
Recognized Alternate Input: "Help" | To help me direct your call, please say one of the following: Quotes, Buy Stock, Sell Stock, or Check Portfolio. |
Table 2. Prompt: Buy Stock
Expected User Input | "Microsoft Corporation" |
---|---|
Recognition | System Response |
Recognized Expected Input | I understood "Microsoft Corporation." Is that correct? |
Recognized Alternate Input: "Help" | Please say the name of the company or spell the ticker symbol, leaving clear pauses between letters. You may say Main Menu if you wish to cancel this transaction. |
This format of specifying functionality makes it very easy to conduct "Wizard-of-Oz" style testing. In this scenario, the test subject calls a tester who has the functional documents in front of him/her. The tester acts as the system, prompting the test subjects as the system would and responding to their input likewise. Trouble spots are easily identified and fixed using this style of testing.
How It Works
The following section is devoted to the architecture of the system. We start with an explanation of common user controls and common script files. Then we will go into detail on the Buy Stock feature, which provides a good encapsulation of many of the programming techniques used throughout the application. Finally, we'll review some of the coding conventions and practices we used as best-practice techniques for development.
Common Files: User Controls
Two ASP.NET user controls are included on almost every page in our application. Together they encapsulate much of the functionality of the site, and each deserves discussion. Implementing user controls, whether in a regular ASP.NET application or while using the ASP.NET Speech controls, can provide a consistent user experience while saving a great deal of code.
GlobalSpeechElements.aspx
The GlobalSpeechElements user control is used on every page of the application except for Goodbye.aspx and RepresentativeXfer.aspx, which do little more than read a prompt and transfer the user away. It contains the main stylesheet control that defines common properties for the speech controls used throughout the application, as well as global command controls and common script files that provide client-side functional components.
MainSpeechSettings: The Speech Application SDK style control is a powerful way of defining global application settings and assigning globally scoped functionality. In the Commerce sample we have four different styles:
BaseCommandSettings: This style is applied to all command controls. Its one attribute sets the AcceptCommandThreshold at .6, meaning that any command must be recognized with at least a 60 percent confidence rating to be accepted.
<speech:SpeechControlSettingsItem ID="BaseCommandSettings"> <Command AcceptCommandThreshold="0.6"> <Grammar Lang="en-us"></Grammar> </Command> </speech:SpeechControlSettingsItem>
GlobalCommandSettings: This style is applied only to the six global styles contained in GlobalSpeechElements. This style inherits the attributes of BaseCommandSettings and adds a dynamically set scope attribute. We want global commands to apply to all controls on any page they are included in, so we set the scope to be the parent page's ID at runtime.
<speech:SpeechControlSettingsItem Settings="BaseCommandSettings" ID="GlobalCommandSettings"> <Command Scope='<%# GetParentPageID() %>'></Command> </speech:SpeechControlSettingsItem>
BaseQASettings: This style is applied to all QA controls that accept user input (QA controls that do not accept user input are called Statements and use the StatementQA style below). In addition to setting timeout and confidence thresholds, this style also defines the OnClientActive event handler for all QA controls. HandleNoRecoAndSilence is a JScript event handler that monitors a user's unsuccessful attempts to say a valid response and transfers the user to customer service after enough unsuccessful events. It is described in the section on Common Script files below.
<speech:SpeechControlSettingsItem ID="BaseQASettings"> <QA OnClientActive="HandleNoRecoAndSilence"> <Reco InitialTimeout="5000"></Reco> <Answers Reject="0.2"></Answers> </QA> </speech:SpeechControlSettingsItem>
StatementQASettings: For QA controls that do not accept user input, we want to disable BargeIn—the act of interrupting a prompt before it ends with a response—and turn on PlayOnce, which ensures the prompt is not repeated. Normal QA controls are activated when their semantic item is empty; since Statement QA controls have no semantic item, the control would be played over and over again if PlayOnce was turned off.
<speech:SpeechControlSettingsItem ID="StatementQASettings"> <QA PlayOnce="True"> <Prompt BargeIn="False"></Prompt> </QA> </speech:SpeechControlSettingsItem>
NavStatementQASettings: DataTableNavigator controls within CommerceVoice are preceded by an initial statement QA. This QA gives a brief introduction to the DataTableNavigator content. Since the initial statement accepts no input, we immediately activate the DataTableNavigator after it completes. To do this, we set two timeouts: first, EndSilence indicates that the QA should wait only 100 milliseconds for a response. Second, BabbleTimeout stops recognition on any user input after 1 second.
<speech:SpeechControlSettingsItem ID="NavStatementQASettings"> <QA XpathDenyConfirms="" XpathAcceptConfirms="" AllowCommands="False"> <Reco EndSilence="100" BabbleTimeout="5000"></Reco> </QA> </speech:SpeechControlSettingsItem>
Global Commands: The global commands in GlobalSpeechElements (described in the Navigation Design section) each have a command grammar file associated with them that defines how the command is activated.
Figure 3. Global commands
Commands fall into two categories: those that affect the current prompt (HelpCmd, InstructionsCmd, RepeatCmd), and those that trigger an event (RepresentativeCmd, GoodbyeCmd, MainMenuCmd). For the former, the prompt function looks for a particular Type value in the History array parameter and creates an appropriate command. For the latter, the command's associated OnClientCommand event handler is executed.
<speech:command id="RepresentativeCmd" xpathtrigger="/SML/RepresentativeCmd" Settings="GlobalCommandSettings" type="Representative" runat="server" onclientcommand="OnRepresentativeCmd"> <Prompt ID="RepresentativeCmd_Prompt"></Prompt> <Grammar Src="Grammars/GlobalCommands.grxml" ID="RepresentativeCmd_Grammar1"> </Grammar> <DtmfGrammar ID="RepresentativeCmd_DtmfGrammar"></DtmfGrammar> </speech:command>
Common Script File Includes: GlobalSpeechElements is an ideal place to include references to all global script files. These files constitute all global client-side event handlers and prompt generation/formatting routines for the application. Since they are included in the control, individual pages can rely on their availability without explicitly including them.
<script type="text/jscript" src="routines.js"></script> <script type="text/jscript" src="speech.js"></script> <script type="text/jscript" src="debug.js"></script> <script type="text/jscript" src="PromptGenerator.js"></script> <script type="text/jscript" src="FMStocks7V.js"></script>
Use of DataTableNavigator Controls
We use the DataTableNavigator control to provide a dynamically generated list of items from which the user may browse or select. The DataTableNavigator application control provides most of the functionality we need automatically, including data binding, preset grammars, and item selection.
- InitialStatement: An initial statement QA usually precedes the DataTableNavigator in the call flow. We use it to introduce the list. Our original intent was to include this statement as part of the DataTableNavigator's first item prompt. We found that if a user mumbled during the initial statement, barge-in would stop playback and the first item was skipped. We separated the initial statement from the DataTableNavigator control to ensure that, even if the user mumbles during the initial statement, they will still hear the first item after the system recovers.
- DataTableNavigator: The DataTableNavigator control takes care of the tasks associated with reading and navigating through the list of items associated with the control. It also handles item selection, either by saying, "Select," or in some cases by saying the name of the item itself.
Selection is enabled by setting the Access Mode property of the DataTableNavigator to Fetch. This mode allows the user to select an item and, when selected, the DataTableNavigator's semantic item is filled with information about the selected item. In contrast, the Select mode is used to jump directly to an item in the list without leaving the DataTableNavigator control.
In FMStocksVoice, data binding is configured in the property builder for the DataTableNavigator control. A DataSet designer is pre-configured with data table structure and bound to the DataTableNavigator at design time. At run time, data from the database is merged into the designer control before it is bound to the DataTableNavigator:
dt.TableName= tickerDataSet1.Tables[0].TableName;
tickerDataSet1.Merge(dt);
tickerDataSet1.AcceptChanges();
CompanyOrTickerNav.DataBind();
CompanyOrTickerNav.Visible= true;
The following properties are configured for each DataTableNavigator control:
- DataSource: Stores a DataTable object that is bound to the control when we call DataBind().
- DataBindField: When the user selects an item, the DataTableNavigator's semantic item is populated with the value of this field for the selected item.
- DataTextField: If the user can say the name of an item to select it (rather than saying, "Select") DataTextField specifies which field provides this name.
- DataHeaderFields: Specifies which fields will be used to read the item in the list. Since we use prompt functions, this field is not used explicitly.
- DataContentFields: Specifies which fields will be used to read details of an item in the list. Since we use prompt functions, this field is not used explicitly.
After the DataTableNavigator completes, we determine the reason for completion in the OnClientComplete event handler. Typically, the event handler looks like this:
if(siCategory.value == "Exit")
{
// The user said "Cancel."
}
else if(siCategory.value != "")
{
// Get the value of DataBindField for the selected semantic item
var selectedItemValue = siCategory.value;
// Access other fields for the selected item by retrieving
// the selected item index (set upon item selection by the control)
var selectedItemIndex = parseInt(siCategory.attributes["index"]);
}
Common Files: Client-Side Scripting
The globally scoped client-side script files for the application are:
- Speech.js: NoReco/Silence event handler and object accessors
- Routines.js: String-formatting routines
- Debug.js: Client-side debugging utilities
- FMStocks7V.js: Global Navigation Event Handlers
- PromptGenerator.js: Prompt Generation Utility
A few of the more interesting functions of these scripts are outlined below.
HandleNoRecoAndSilence (Speech.js)
HandleNoRecoAndSilence takes care of handling cases where the user repeatedly responds to a prompt with silence or with an unrecognizable input. To avoid frustration, we don't want to repeat the same prompt over and over again. This function, executed each time a QA is made active, checks the command history for consecutive invalid inputs. If the number of invalid inputs exceeds a maximum (in this application, 3), we redirect the user to a Customer Service Representative.
This function is defined as the OnClientActive event handler for the BaseQAStyle in the GlobalSpeechElement's MainStyleSheet. Each QA that accepts user input must use this style in order for the function to be called correctly.
function HandleNoRecoAndSilence()
{
var History = RunSpeech.ActiveQA.History;
if (History.length >= representativeXferCount)
{
var command;
for (var i=1; i <= representativeXferCount; i++)
{
command = GetHistoryItem(History,i);
if (command != "Silence" && command != "NoReco")
break;
}
if (i == representativeXferCount+1)
Goto(representativeXferPage,"");
}
}
DataTableNavigator Functions (Speech.js)
Speech.js contains the following functions to make working with the DataTableNavigator application control easier:
- GetNavigator(navigatorName): Returns a DataTableNavigator object reference given its name as a string.
- GetNavigatorCount(navigatorName): Returns the count of items in the given DataTableNavigator.
- GetNavigatorData(navigatorName, columnName, index): Returns the data contained in the DataTableNavigator named navigatorName, the row specified by index, and the column specified by columnName.
- GetNavigatorDataAtIndex(navigatorName, columnName): Returns the data contained in the DataTableNavigator named navigatorName, the currently selected row, and the column specified by columnName.
Prompt Generation (PromptGenerator.js)
Prompt Generation is perhaps the most central element when creating a successful voice-only application. Providing a consistent voice interface is essential to creating a successful user experience. PromptGenerator.js does just this by encapsulating all common prompt-generation functionality in one place.
A prompt function in a typical page will always return the result of a call PromptGenerator.Generate() as its prompt:
return PromptGenerator.Generate(
RunSpeech.ActiveQA.History,
"Prompt Text Here",
"Help Text Here"
);
Notice that the prompt function passes both its main prompt and its help prompt into the function every time. PromptGenerator.Generate() decides the appropriate prompt to play given the current command history:
function PromptGenerator.Generate(History, text, help)
{
help += " You can always say Instructions for more options."
var prevCommand = GetHistoryItem(History,2);
switch( GetHistoryItem(History,1) )
{
case "NoReco":
if (prevCommand == "Silence" || prevCommand == "NoReco")
return "Sorry, I still don't understand you. " + help;
else
return "Sorry, I am having trouble understanding you. " +
"If you need help, say help. " + text;
case "Silence":
if (prevCommand == "Silence" || prevCommand == "NoReco")
return "Sorry, I still don't hear you. " + help;
else
return "Sorry, I am having trouble hearing you. " +
"If you need help, say help. " + text;
case "Help":
PromptGenerator.RepeatPrompt = help;
return help;
case "Instructions":
var instructionsPrompt = "Okay, here are a few instructions...";
PromptGenerator.RepeatPrompt = instructionsPrompt + text;
return instructionsPrompt;
case "Repeat":
return "I repeat: " + PromptGenerator.RepeatPrompt;
default:
PromptGenerator.RepeatPrompt = text;
return text;
}
}
Note Some of the longer strings have been shortened in the above code sample to save space.
A note on "Repeat": The PromptGenerator.RepeatPrompt variable stores the current text that will be read if the user says "Repeat." The first time the function is executed for any prompt, the RepeatPrompt will be set to the standard text. The RepeatPrompt is then only reset when the user says "Help" or "Instructions."
Other PromptGenerator functions: PromptGenerator also includes a few other functions for generating prompts in the application. They include:
- GenerateNavigator(History, text, help): This function adds to the functionality of Generate() by including standard prompts commonly needed while in a DataTableNavigator control. These prompts include additional help text and messages for when the user tries to navigate beyond the boundaries of the DataTableNavigator.
- ConvertNumberToWords(number, isMoney): In order to generate recorded prompts for all possible number values, we must convert numbers (for example, 123,456) to a readable string (for example, "one hundred twenty three thousand four hundred fifty six). This reduces the number of unique words that must be recorded to a manageable amount.
Designing Your Grammar
Items in your grammar files define what words and phrases are recognized. When the Speech engine matches an item from the grammar file, it returns SML or Speech Markup Language, which your application uses to extract definitive values from the text that the user spoke. Having too strict a grammar will result in no flexibility from the user's perspective in regards to what they can say; however, too many unnecessary grammar items can lead to lower speech recognition.
Preambles and Postambles
Very often, you will want to allow a generic "preamble," text said before the main item, and "postamble," text said after the main item. For instance, if the main command is "Buy Stock," you would want to allow the user to say "May I Buy Stock please."
Typically, you can use one grammar (.grxml) file for your preambles and one for your postambles. Within your other grammar rules, you can then reference the pre-ambles and post-ambles by using RuleRef's.
Tip Make the pre-ambles and post-ambles generic and robust enough that you don't limit your users' experience, but keep them reasonable in size so that you don't risk lowering the speech recognition for your main elements.
Static Grammar
Use the Grammar Editor tool to graphically set up grammar files. The basic task is to set up a text phrase or a list of phrases, and then assign a value that you want your application to use when each phrase is recognized.
Figure 4. Grammar editor tool
We found that the following strategies helped us in grammar development:
Typically, if we only need to recognize that a text phrase has been matched, especially in the case of commands, we create a semantic tag that adds a sub-property with the empty string value. For example, if you want to capture when the user says "Help," you can simply return the following SML:
<SML confidence="1.000" text="help" utteranceConfidence="1.000"> <HelpCmd></HelpCmd> </SML>
The control associated with this grammar file recognizes the phrase, and returns the SML element HelpCmd; the code-behind or client-side script makes a decision based on the SML element being returned, rather than the value.
Never match a grammar that is based on the root node /SML. Since every matched grammar returns this node as its root, your semantic item will be matched in every case.
Use rule references within grammar files to avoid duplicating the same rule across different speech controls.
Tip You must make sure that a rule to be referenced is a public rule, which you can set through the properties pane.
Figure 5. Using rule references within a grammar
A common grammars file is included with the Speech Application SDK, both in an XML file version (cmnrules.grxml) and in a smaller, faster compiled version (cmnrules.cfg). We copied the compiled version into our project and used it for commonly used grammar elements, such as digits and letters in the alphabet.
Creating Grammar Files Programmatically
Because grammar files are simply XML files, it is possible to create grammars programmatically. This was especially helpful when creating the grammar for the stock trading companies, as not only was there a number of companies, but also there needed to be at least two grammar phrases for each company. For instance, if the company in question is "Microsoft Corporation," we want the grammar to recognize both "Microsoft" and "Microsoft Corporation."
We created two Web pages to be used as tools to dynamically create company grammar from the database, and also as a way to show how this can be done.
CreateCompanyGrammar.aspx: This is the main Web page to create the company grammar, and it resides in the Tools folder. It consists mainly of a button and a text area. When you run the page and press the button, you should see a printout of either the converted XML, for debugging purposes, or an error message if there was a problem. The XML is automatically saved into the grammar file Companies.grxml, so there is no need to copy and paste the XML.
Database Stored Procedures: There are two stored procedures and one user-defined function installed from the database scripts that relate to dynamic grammar creation. Each has to do with string manipulation and/or loading the companies from the database, in order to most efficiently create the grammar.
Markup: Markup characters like '&' (ampersand), while common in company names, cannot be used within XML strings or within the grammar and prompt tools. Several string replacement functions are performed to normalize these company names for use in the grammar files.
The most common example of this is the case of the ampersand. We replace the ampersand with the string 'amp' in the code-behind, for grammar/prompt recording matching. Our transcriptions in the Prompt Database also reads 'amp,' again, to be sure to match what is being sent in by the prompt functions. However, when we record the company name, we say 'and,' not 'amp'.
Figure 6. Normalizing ampersands in the editing tools
Special Semantic Cases: In some rare cases, the speech recognition engine cannot match a company name with its correct pronunciation. We then have to manually add an extra grammar phrase in order to correctly recognize that company. For instance, the speech engine cannot understand 'Novo-Nordisk,' but will match correctly to 'No vo nor disk'. We enter a grammar element with the text 'no vo nor disk,' with a corresponding value of 'Novo-Nordisk'.
Coding Conventions
Server-Side Programming
Unlike traditional ASP.NET programming, the Speech Application SDK is primarily a client-side programming platform. Although its controls are instantiated and their properties manipulated on the server-side, controlling flow from one control to another is primarily a client-side task.
The controls offer opportunities to post back to the server automatically, including the SemanticItem's AutoPostBack property and an automatic postback when all QAs on a page are satisfied. As a convention, though, we chose to avoid postbacks except when we needed to access data or business layer functions. Most of our code is written through client-side event handlers, using SpeechCommon.Submit() to post back explicitly when data was needed from the server.
Client-Side Scripts
Because Jscript lacks many of the scoping restrictions found in C# or Visual Basic .NET, it is possible when programming on the client-side to perform a certain task in many different places. The SpeechCommon object is accessible from any client-side script, and its Submit() method can be executed from event handlers, prompt functions, or any helper routines. For this and other reasons, we have followed set of guidelines for the usage of these various components:
Prompt Functions Are Only For Generating Prompts: Never perform an action inside a prompt function that is not directly related to the generation and formatting of a prompt: no navigation flow, semantic item manipulation, and so on. Besides being a good practice, the other key reason for reserving prompt functions only for generating prompts is for using the prompt validation tool. If prompt functions contain calls to SpeechCommon or other in-memory objects, those objects must be declared and their references included in the "Validation References," for the prompt function. If these references are not included, validation will fail for the function. As a rule, the only functions referenced by prompt functions are in PromptGenerator.js.
One exception to this rule was necessary. DataTableNavigator application controls do not expose events that are equivalent to OnClientActive, or which fire each time a prompt function is about to be executed. For QA controls, we use OnClientActive to call HandleNoRecoAndSilence(), which monitors consecutive invalid inputs for a QA. We expect future versions of the SDK to expose this type of event in the DataTableNavigator control, but until then, we call HandleNoRecoAndSilence() from PromptGenerator.GenerateNavigator().
No Inline Prompts: Inline prompt functions are simple to configure, but they should only be used when the prompt associated with the control is static and will never change. Since most prompts in CommerceVoice use PromptGenerator for error handling, we avoid the use of inline prompts except where this functionality isn't needed. (Goodbye.aspx is one example.)
Control of Flow that is Handled in Event Handlers: Flow control is the most important function of event handlers and client activation functions. Most applications that have any complexity require a more complicated flow control than the standard question-and-answer format afforded by laying QA controls down in sequence on a page. For the most part, we achieved this control by manipulating the semantic state within event handlers.
Naming Conventions
We used the following naming conventions for consistency throughout our application:
- QA Controls: The QA Control can be used for a variety of purposes. We distinguish these purposes by their functions; traditional question-and-answer controls fill a semantic item with the result of user input, confirmations confirm a pre-filled semantic item, and statements are output-only; they do not accept user input.
- Question-And-Answer: <Name>QA (in this case, CompanyOrTIckerQA)
- Confirm: <Name>Confirm (in this case, NumberOfSharesConfirm)
- Statement: <Name>Statement (in this case, CompanyOrTickerNavStatement)
- DataTableNavigator Controls: <Name>Nav (in this case, CompanyOrTickerNav)
- Commands: <Name>Command (in this case, HelpCommand)
- Semantic Items: si<Name> (in this case, siTicker)
Jscript and C# server-side code use naming conventions standard in those environments.
In-Depth: Buy Stock Feature, Company Selection
We will tie together many of the features discussed in the "How It Works" section by examining the Company selection process in the Buy Stock feature of FMStocksVoice.
Overview
The Buy Stock feature allows a user to choose a company by speaking either a company name or ticker symbol, and to buy shares of stock from that company using the money in their account.
We take the entry that the user speaks and query for matches from the database. For example, if the user says "United," we would return "United Technologies" and "United Television" as choices for the user; if they say "M," we would return ten companies whose ticker symbol begins with "M."
If the database returns more than one match, we activate the DataTableNavigator control, to allow the user to browse through the companies and choose which one they want. Once the user makes a selection, we make sure they have enough money in their account to buy at least one share in the company, and then we confirm the choice with the user.
Speech Controls
We begin our page by determining which Speech elements to use, including controls and Semantic Items. Semantic items are the key to holding the answers that our users speak, and to controlling the flow of the page.
Tip Because during the Buy Stock process we want the users to be able to return to a previously read QA, we will focus on manipulating the "semantic state" of the semantic items to determine what QA should be activated next. This is referred to as "flow."
Generally, one semantic item corresponds to one user's "answer." Each QA expecting an answer is assigned a semantic item in which the value of the user's answer will be stored.
Figure 7. Functional flow: Company Selection Process
Step 1. Prompt functions and grammar
The first step in the Buy Stock process is to tell the user what we want. We need to set up some text to be read to the user, so they know what question they are expected to answer. We do this by entering our text and our logic into a client-side "prompt function."
Each prompt function is responsible for determining what is said to a user based on certain criteria. Specifically, we want to read something different if the user says "Help," if the user just entered the page for the first time, or if the user has returned to this QA from a later question.
For BuyStock.aspx, we have one prompt function file that holds all of the prompt functions for all of the speech controls, named BuyStock.pf. Our first function, CompanyOrTickerQA_Prompt, will ask the user which company they would like.
function CompanyOrTickerQA_prompt_inner(History, missingEntry,
userCanceled, zeroMaxShares, company)
{
var help= "Please say the name of the company or spell the ticker " +
"symbol, leaving clear pauses between letters...";
var text= "Please say the name of the company or spell the ticker...";
if(userCanceled)
text= "You canceled. " + text;
else if(missingEntry != null)
{
text= "I did not find any matches for the entry " + missingEntry
+ ". Please make a new entry to search again.";
}
else if(zeroMaxShares)
{
text= "I understood " + company + " . You do not have sufficient "
+ "funds in your account to buy shares in this company...";
}
else
text= "To buy stock, " + text;
return PromptGenerator.Generate(History, text, help);
}
Note Some of the longer strings have been shortened in the above code sample to save space.
Since it is possible for the user to come back to this QA later, we have had to add some parameters to this prompt function, so that we know with which text to prompt the user (see Figure 1, "Functional Flow"):
- missingEntry: Holds either null or a Ticker entry which the user entered, but which was not found (for example, "ZZZZ").
- userCanceled: Either true or false, indicating if the user is returning to this prompt after canceling out of the DataTableNavigator.
- zeroMaxShares: Either true or false. If the user picks a company, and the code-behind determines that the user does not have enough money to buy even one share, we send the user back to this first QA, indicate that they cannot buy any shares, and ask them to enter another company name.
- company: Holds a company name. Used with the zeroMaxShares parameter, to tell the user which company they don't have enough money to buy.
We choose which text to send based on the values of the parameters, and then we call the PromptGenerator.Generate function, which will determine if the user spoke one of the global commands, such as "Help," "Repeat," or "Instructions."
Next, we set up a Grammar file to define what phrases we expect the user to say. Our grammar file contains a list of acceptable companies and their corresponding values, as well as a reference to a grammar rule for ticker symbols.
Figure 8. Adding company names in the grammar file
When one of the choices is matched, the speech engine returns an SML document containing the name of the SML element matched, the text recognized, and the corresponding value. The SML element name and its corresponding value are set in the Assignments window of the grammar editor, as in the figures above.
We add an additional attribute to the Company element to indicate the type of the response (either a company name or a ticker symbol). When the speech engine makes a match, it will return one of the following types of responses:
<SML confidence="1.000" text="m. s. f. t." utteranceConfidence="1.000">
<Company confidence="1.000" responseType="Ticker">MSFT</Company>
</SML>
Ticker Match
<SML confidence="1.000" text="microsoft" utteranceConfidence="1.000">
<Company confidence="1.000" responseType="Company">
Microsoft Corporation
</Company>
</SML>
Company Match
Step 2. Parse user input: Client Normalization Function
After the grammar is matched and the SML document is returned, we will next want to have our server-side code retrieve a list of companies from the database that matches these criteria. We execute different business logic functions based on which type of information we have: company name or ticker symbol. We need to transfer this semantic information from the SML that is returned by the grammar to the code-behind. We do this in the QA's Client Normalization Function:
Figure 9. CompanyOrTickerQA Property Pages
Client Normalization Functions are client-side script functions that run when the SML is recognized but before the semantic item is filled. They allow us to examine the SML returned by the grammar and determine the semantic item's value programmatically.
We use the JScript function SetResponseType to fetch the response-type semantic information from the SML string:
function SetResponseType(smlNode, semanticItem)
{
semanticItem.attributes["ResponseType"]=
smlNode.attributes.getNamedItem("responseType").value;
return smlNode.text;
}
Here we set an attribute of the semantic item siTicker, called ResponseType, to the value of the responseType attribute.
Step 3. Load the Company Matches
After the semantic item is filled, the CompanyOrTickerQA has now been satisfied, but we are still running on the client side. We need to make a few more checks, and to submit the page, so we specify in the property pages that when this QA is satisfied, and the OnClientComplete event is fired, we want a JScript function to run.
function SubmitTickerForSearch()
{
userCanceled= false;
if(GetHistoryItem(RunSpeech.ActiveQA.History,1) == "")
SpeechCommon.Submit();
}
In the function SubmitTickerForSearch, we do two things:
- Reset userCanceled flag: If the user was in the context of the DataTableNavigator, and canceled out, they would be sent back to the CompanyOrTickerQA with a slightly different prompt. This variable is what tells the CompanyOrTickerQA prompt function whether the user canceled out of the DataTableNavigator control; we reset that value here.
- Manually submit page: Although we could have used the semantic item's AutoPostBack feature, the page would post back every time the semantic item's state changed (for example, Empty, NeedConfirmation or Confirmed). Instead, we manually submit if the history indicates the semantic item was filled on this iteration.
Once our semantic item has been filled (and therefore its state is no longer Empty), we want to retrieve matching values from the database. In the LoadCompanies() function on the server, we decide which method to call and retrieve the data.
switch(siTicker.Attributes["ResponseType"])
{
case "Company":
dt= tickerObj.ListByCompany(AccountID,
siTicker.Text.Replace(" amp ", " & "));
break;
case "Ticker":
dt= tickerObj.ListByTicker(AccountID, siTicker.Text);
break;
default:
throw new ApplicationException(...));
}
We use the semantic item's ResponseType attribute, which we set in the Client Normalization Function's SetResponseType attribute to determine if we received a match on a company or on a ticker. If we get an unrecognized value in "ResponseType," we manually throw an application exception.
Tip In server-side code, reference a semantic item's attributes collection with an upper-case A (siTicker.Attributes); in client-side script, however, remember that it's a lower-case a (siTicker.attributes).
Step 4. Determine number of companies selected
If more than one company is returned from the call to the database, we want to activate the DataTableNavigator control so that the user can navigate through the list of possible choices.
if(dt.Rows.Count > 1)
{
// Load data into the DataTableNavigator
dt.TableName= tickerDataSet1.Tables[0].TableName;
tickerDataSet1.Merge(dt);
tickerDataSet1.AcceptChanges();
CompanyOrTickerNav.DataBind();
CompanyOrTickerNav.Visible= true;
// Reset the starting index
CompanyOrTickerNav.StartingIndex= 0;
// Reset the DataTableNavigator's associated semantic item
// to cause activation.
siSelectedTicker.Attributes.Remove("index");
siSelectedTicker.Text= null;
siSelectedTicker.State= SemanticState.Empty;
siIgnored.State= SemanticState.Empty;
...
}
We initialize the DataTableNavigator in the server code, passing it the result set to use as its company list, as well as several other pieces of information, including the name of the OnSelect and OnCancel functions. Refer to the "DataTableNavigator" section for more detailed information.
As we continue with our example, we will assume that more than one company match was returned, and that the user is now sent to the DataTableNavigator control.
Step 5. DataTableNavigator
The DataTableNavigator allows the user to browse through a list of items—in our case, company names. They will hear the ticker symbol, and then the name of the company; they can navigate through the list by using the built-in commands "First," "Next," and "Previous." They can also cancel out of the list entirely by saying "Cancel," or they can make their selection by saying "Select" after an item has been read.
We handle exit-events for the DataTableNavigator control in the OnClientCompleteLast event handler:
function CheckForSelection()
{
if(siSelectedTicker.IsEmpty())
return true;
else
{
if(siSelectedTicker.attributes["index"] == null)
// cancel condition
StartNewSearch();
else
// Select condition
SelectTicker();
return false;
}
}
OnClientCompleteLast event handler
Step 6. Choosing the company
When the user either says the word "Select" after an item is read, or the name of one of the items in the list, SelectTicker() is executed. We transfer information on the selected Ticker into siTicker.
function SelectTicker()
{
siTicker.attributes["Company"]=
GetNavigatorData("CompanyOrTickerNav","Company",
siSelectedTicker.attributes["index"]).replace("&", "amp");
siTicker.attributes["CurrentPrice"]=
GetNavigatorData("CompanyOrTickerNav","CurrentPrice",
siSelectedTicker.attributes["index"]);
siTicker.attributes["MaxShares"]=
GetNavigatorData("CompanyOrTickerNav","MaxShares",
siSelectedTicker.attributes["index"]);
if(siTicker.attributes["MaxShares"] == "0")
{
siTicker.attributes["ZeroMaxShares"] = "true";
siTicker.Clear();
siIgnored.Clear();
siSelectedTicker.Clear();
}
else
{
siTicker.SetText(GetNavigatorData("CompanyOrTickerNav",
"Ticker",siSelectedTicker.attributes["index"]), false);
}
return true;
}
Our client function SelectTicker sets the semantic item and attributes with the selected company information, and logically decides if the user is allowed to continue:
- Set attributes: We first set some attributes for the semantic item to values from the current row in the DataTableNavigator's dataset. We will use these values in prompt functions later, to determine what to say to the user.
- Set Text value: The SetText method sets the value of the semantic item and stores that information to the viewstate, so that the values can be accessed after postback; call this method after any attributes are set, so that those values will also be saved.
- Check Max Shares: One of the pieces of information that is returned from the dataset is how many shares of the selected company the user could potentially buy. If they cannot buy any, because their account balance is too low, we clear siTicker and send them back to the first QA, with a special flag ("ZeroMaxShares") to tell the prompt function to change what is spoken to the user.
Step 7. Confirmation
Now that the company has been selected, and the siTicker semantic item has been set, we want to confirm with the user that we have the right company name.
Figure 10. The confirms tab
We use the Confirms tab in the SelectedTickerConfirm control to both confirm whether the company name is correct, but also to accept a different answer if they say "No." For example, the user is prompted, "I understood Yahoo! Inc., is that correct?" and replies "No, Microsoft Corporation," "Microsoft Corporation" is now filled in to siTicker.
We do not use the Extra Answers tab here, because we are not asking for an additional answer; rather we are asking for a replacement to the answer we already have.
function SubmitOnDenyCompany()
{
if(siTicker.NeedsConfirmation() &&
GetHistoryItem(RunSpeech.ActiveQA.History,1) == "")
SpeechCommon.Submit();
}
SelectedTickerConfirm OnClientComplete
In the case where the user rejects the company confirmation, the page is submitted manually. The server-side LoadCompanies function begins again, as the user may have specified a different company or ticker, and the possible company match list will have to be retrieved from the database for these new criteria.
If the user accepts the company choice, the semantic item's state is set to "Confirmed," and we move on to the next QA while remaining on the client.
In-Depth: Buy Stock Feature, Extra Answers
In the Buy Stock process, after the user has placed the order, the user is given the option to buy more stock. At this point, we use the Speech Application SDK's Extra Answers feature to allow expert users to quickly place more orders.
Without the use of Extra Answers, the user replays the entire Buy Stock process:
- [prompt] "Do you want to buy more stock?"
- [user] "Yes."
- [prompt] "Please say the name of the company or spell the ticker symbol you are interested in."
- [user] "Microsoft."
- [prompt] "I understood Microsoft Corporation, is this correct?"
- [user] "Yes."
- [prompt] "How many shares would you like to purchase?"
- [user] "Four."
- [prompt] "I understood Microsoft Corporation, is this correct?"
- [user] "Yes."
- [prompt] "I understood four shares, is this correct?"
- [user] "Yes."
- [prompt] "So you want to purchase four shares of Microsoft Corporation. Would you like to complete this order?"
- [user] "Yes."
With Extra Answers, the conversation is much simpler:
- [prompt] "Do you want to buy more stock?"
- [user] "Yes, I would like four shares of Microsoft Corporation."
- [prompt] "So you want to purchase four shares of Microsoft Corporation. Would you like to complete this order?"
- [user] "Yes."
The Extra Answers feature works like regular Answers, but it doesn't have to be filled for the QA to complete:
Figure 11. The Extra Answers tab
Semantic items siTicker and siNumberOfShares ** are added to the Extra Answers collection of the BuyMoreQA control, so that when any of the SML elements Ticker, Company, or NumberOfShares are matched, the appropriate semantic item is automatically filled.
We also must handle the case if the user indicates that they are done (by saying "no"), as well as set the confirmation status of the semantic items that they do fill in should they say "yes." We add an OnClientComplete function for the BuyMoreQA control to handle these cases.
Important The fact that we are at the last QA on the page implies that the siTicker ** and siNumberOfShares ** semantic items are in a CONFIRMED state. If the user supplies these as extra answers, they will go from a state of CONFIRMED to a state of NEEDSCONFIRMATION. This will activate the confirmation QAs for these semantic items; we don't want this. Instead, we want to immediately proceed to the order confirmation QA. We set semantic item states in the OnClientComplete EventHandler for the BuyMoreQA:
function GoToMainMenuIfNo()
{
if(GetHistoryItem(RunSpeech.ActiveQA.History,1) == "")
{
if(siBuyMore.value == "No")
Goto("MainMenu.aspx");
else
{
siBuyMore.Clear();
siOrder.Clear();
if(siNumberOfShares.IsConfirmed())
siNumberOfShares.Clear();
else
siNumberOfShares.Confirm();
if(siTicker.IsConfirmed())
siTicker.Clear();
else
{
siTicker.Confirm();
SpeechCommon.Submit();
}
}
}
}
BuyMoreQA OnClientComplete
In this code snippet, the case No is straightforward. If the user responds "no" to the question "Do you want to buy more stock?" then the user is redirected to the Main Menu.
If the user says "yes," we take the following actions:
- Clear siBuyMore and siOrder: Clear out the last two QA's (by clearing their corresponding semantic items) so that they will run again.
- Check answer for siNumberOfShares: See if the user said the number of shares that they want. Because of the way the Extra Answers feature has been set up, we can do this by checking the semantic state.
- If the user did not say anything, siNumberOfShares would still be confirmed from the purchase that was just completed. We then manually clear the semantic item, so that on the next loop, the user will be prompted to enter the number of shares.
- If the user did give a number, the semantic item's state would change to NeedConfirmation. We then manually set the item's state to Confirmed because we do not want to run the control NumberOfSharesConfirm.
- Check answer for siTicker: Next, we see if the user has entered a new company. Take the same logic with setting the semantic state as above.
- Manually submit the page: Finally, if the user did specify a new company name, submit the page so that the server-side LoadCompanies method can retrieve the possible company match list from the database using these new criteria.
Prompt Databases
The standard Text-To-Speech (TTS) engine may work well for development and debugging, but recorded prompts make a voice-only application truly user-friendly. Although the process can be tedious, and one of the biggest tasks of setting up your voice-enabled application, the Microsoft recording engine and prompt validation utilities make the process easy.
Prompt Database Setup
Although setting up the Prompt databases is rather straightforward, there are a couple of tips that we would like to point out, which we came across while setting up the FMStocksVoice prompt databases:
Keep your databases small: Within reason, keep your databases as small as possible so they are more manageable.
Use the import/export features: You can export the transcriptions to a comma-delimited file, and you can also import transcriptions and individual wave files. This comes in handy, especially if you record your prompts at a studio, outside of the prompt database recording tool.
Figure 12. The import/export features
Use the "Comment" column: If you have one database that holds prompts for multiple tasks, use the "Comment" column to keep track of which category your prompts belong to. You can then sort on the Comment column if you are trying to locate a particular prompt, or are trying to consolidate several similar prompts.
Try the wave normalization tool: If you have a large number of prompts, it is not uncommon to record them at different times. You need to remember that the voice will probably sound differently, based on the time of day, mood that the speaker is in at the time, and so on.
The volume of the recordings will probably also differ, but this can be normalized by setting a property in the property pages of the Prompt Database project.
Note We had the most success by picking one wave file and normalizing to that.
Achieving Realistic Inflection
The following techniques allow us to make our prompts play as smoothly as possible when reading strings that involve combining many different recordings (for example, "[Microsoft Corporation] [is at] [one] [hundred] [one] [point] [twenty] [five] [dollars]").
Note Throughout this section, individual prompt extractions are identified with brackets, just as they are in the prompt editor.
Record Extractions in Context: Prompt extractions almost always sound more realistic when spoken in context. While it may be tempting to record common single words like "companies," "shares," and "dollars" as individual recordings, they will sound much better when recorded along with the text that will accompany them when they are used in a prompt: "one [share]," "two [shares]," and so on. In one highly effective example, we recorded all of our large number terms in one recording: "one [million] three [thousand] five [hundred] twenty five dollars."
Recognize and Group Common Word Pairings: When recording singular words like "dollar" and "share," we almost always group them with "one" as they will always be used this way. Our extractions become "[one dollar]" and "[one share]."
Use Prompt Tags: Although we did not use any prompt tags in the FMStocks application, you can use tags if you have the same word with a different inflection (in this case, "Two dollars" versus "Two dollars and ten cents"). You implement these tags by adding a <withtag> element to your prompt function, as in the example below:
{ var num = "two"; if (isCents == true) return num + " <withtag tag='middleSentence'>dollars</withtag>" + " and zero cents"; else return num + " dollars."; }
Use 'Display Text' To Your Advantage: To achieve high-quality extractions when recording sentences, you can modify the display text column of your transcriptions to indicate where the extractions are, to put a very small pause in, for clearer extractions. As an example, the transcription, "[I understood] 25 [is that correct]?" would have the display text, "I understood, 25, is that correct?" During recording, the voice talent can pause at the appropriate places so that the extractions are recorded clearly. You can also manually align the wave files, as described in the next section.
Manual Wave Alignment
Figure 13. The wave alignment tool
The wave alignment tool is a very handy tool if you want to cut and paste wave sections, refine pre-set alignments, insert new alignments, and insert or delete "pauses."
In the FMStocksVoice application, we used this tool mostly when recording the letters and numbers, so that when ticker symbols and prices were read back to the user, they were all uniformly spaced.
Validation
Once you have completed your first round of recordings, thorough validation is important to make sure that no prompts have been missed. A few general strategies enabled us to make sure that our prompt generation functions were being validated completely and accurately:
Figure 14. The validate tool
Validation Values: In each prompt function, a "Validation Value" must be filled in for each Parameter, with a value or values that you wish to validate. When it comes to validation values with a large number of potential values (for example, numbers, dates, company names and so on), we want to provide a stand-in validation value that can represent as large a set for the validator as possible, without unduly slowing down the validator tool.
For instance, if we wish to validate a prompt that tells us how many companies matched their selection, we might enter both 1 and 2 for an itemCount parameter. This way, we can test both the sentences "One Company matched" and "Two Companies matched," showing two unique prompt results from those validation paths.
No object-references within prompt functions: Except for calls to PromptGenerator.js, we never make calls to script objects within the body of our prompt functions. Instead, our prompt function arguments are defined so that all function calls are made before the inner prompt function is executed. This avoids errors during validation.
Example In the snapshot below, note the call to insertSpaces(true) in the ticker variable declaration. A ticker symbol (in this case, "MSFT") must be separated into its component letters to be read correctly by recorded prompts. We make the call to the helper function that does this in the variable declaration and provide an already-formatted version of the ticker (in this case, "M, S, F, T") as the validation value.
Figure 15. Formatting ticker symbols for use as validation values
Running the Application
Our user tests were designed with two main goals in mind:
- Verify that the system performed well in real-life scenarios: The main goal is simply to verify that testers can manage the basic tasks that real customers would want to perform.
- Exercise the full feature-set of the application: In addition to testing standard goals, it was important to make sure that the complete feature set of the application was tested, as well. Testers were guided to parts of the system that might not necessarily be on a most-likely-path scenario, in order to make sure that the entirety of the system worked as expected.
To accomplish these goals we gave our testers scenarios that included both common tasks and special cases designed to guide the user toward special situations. A sample script might look like this:
TASK ONE (Researching and Buying)
You are considering buying shares from IBM, Microsoft, or Grey Advertising, but you are not sure which one. Check the market value of each of these companies.
Once you know the market values, buy as many shares as you can of the least expensive stock with the money in your account.
TASK TWO (Checking a Portfolio)
Check your portfolio to verify that your purchase has been made.
TASK THREE (Searching for a Company)
You hear a hot stock tip about a company, but you can't remember the full name. You only remember that it starts with the word "American." Find companies that might match, select the correct one when you hear it, and buy ten shares. (Since you don't actually know the company, choose whichever one you want.)
TASK FOUR (Selling Stock)
After your purchase you want to sell all of the shares of the two holdings with the least expensive per-share cost. Look up the company.
Test subjects were given account numbers and PINs to log into their account, but otherwise were left alone to complete the tasks. Tests were repeated with a number of different test subjects and over a number of successive product revisions.
Lessons Learned
We learned a great deal about building voice-only applications through the process of building these samples. Here we note some of the major points in the areas of user testing, design, and development.
Testing
The testing and tuning phase is important in any application, but in terms of design, it is especially important in voice-applications. We found that tuning our prompts, accept thresholds, and timeouts was key to making the application useful. Here are a few suggestions on how to conduct effective testing and tuning for voice-only systems.
Properly Configure Testing Equipment First
Many of our early user tests generated numerous usability problems that were due to improper configuration of the microphone. The microphone was too sensitive, picking up background noise, feedback from the speaker output, and slight utterances as user input. Users became increasingly frustrated as they found it difficult to hear a prompt in its entirety. This affected test results significantly.
Select Testers Carefully
We found that testing subjects brought a variety of expectations to the testing process. Developers whom we used as subjects often made assumptions about the way the system was working and became confused with ambiguous prompts like, "Would you like to start shopping or review your previous orders?" They preferred more explicit choices: "Say start shopping to start shopping or review orders to review your account history." Testers with a less technical background preferred less structured prompting; they felt they were speaking with a more friendly system.
To conduct effective tests, make sure the user group you are testing matches the target user group for your application.
Design
The most important lesson designing the application was the importance of tuning the prompt design throughout development. From the first stages of implementation through user testing of the completed system, we made changes to prompts to achieve a more fluid program flow. Our experience speaking with other teams who have attempted similar projects is that this is a fundamental part of voice-only application development.
With that in mind, here are a few points that will make the tuning process much more efficient:
- Long Prompts Don't Equal Helpful Prompts: At the outset, our design team approached the goal of a friendly interface by writing friendly text. Testing quickly revealed that verbose prompts were a serious impediment to usability. By keeping prompts short, users understood better what to do.
- Express Sentiment with Tone/Inflection: We found that helpfulness is best expressed through intonation and inflection, rather than extra words. A prompt like, "I'm sorry. I still didn't understand you. My fault again," expresses an apologetic sentiment on paper quite well, but spoken, it becomes excessive. This prompt became, "I'm sorry. I still didn't understand you," and we let the inflection of the speaker express the emotion.
- Build Cases For Invalid (but likely) Responses: Our tests surprised us when a majority of users answered "Yes" to the question, "Would you like to start shopping or review your previous orders?" We realized that part of the problem was the way in which the question was asked, but still, we built in a command to accept that response and provide a helpful response.
- Maintain a Prompt Style Guide: Design teams are used to maintaining style guides for their designs, and voice-only applications should be no exception. Having a consistent set of prompt styles and standard phrasings is paramount to creating a sense of familiarity for the user. Our team recommends an iterative process: modify the guide liberally in the early stages of a project as new cases arise. Then, toward the later stages, tweak new cases to fit the existing rules. This process should lead to a consistent user experience throughout your system.
Development
We needed to make several changes to our development strategy worth noting here.
Necessary Modifications to the Business and Data Layers
The concept of building a voice-only presentation layer as a replacement for a GUI necessitates a few changes to the database and business logic layers we didn't foresee.
Account Balance: Instead of tracking the users' account balances by calculating their portfolio transactions, we added the CurrentBalance field to the Accounts table. The stored procedures Ticker_ListByTicker and Ticker_ListByCompany were modified to accept Account Number as a parameter, and now return the user's account balance in addition to the matching companies.
Limited amount of Companies: We only chose 100 companies out of the original 7,950 companies, because we wanted to keep the grammar manageable, and we felt it unrealistic to record over 7,000 company prompts.
Field for Grammar Names: We added a field to the TickerList table called CompanyGrammar, for creating a dynamic grammar file. This field contains slightly normalized text so it's easier to load it in. The stored procedure Speech_PopulateCompanyGrammarField was created to automatically read in the company names, normalize the text, and populate the CompanyGrammar field.
Table 3. Company Grammar Field
Company CompanyGrammar J.P. Morgan & Co. j p morgan and company Nat'l Western Life Insur. national western life insurance Different Login Information: The Web version of FMStocks accepts an e-mail address and password as its login information. Both of these pieces of information are not easily expressed in a voice context. We replaced these fields with "Account Number" and "PIN" fields, which would typically also necessitate database changes.
For More Information
The complete documentation and source code for the FMStocksVoice application can be obtained here