Constructing Natural-sounding Recorded Prompts

  Microsoft Speech Technologies Homepage

(Dialog Speech Control Sample)

This sample illustrates the importance of careful planning when recording elements to concatenate when constructing a prompt. The goal is to create a prompt that is as smooth and natural-sounding as possible.

The sample presents three prompts that were created by concatenating a series of recordings. The prompts are textually identical, but have been created using three different approaches to recording the constituent pieces. The text of the three final prompts and each of the constituent recordings is as follows:

Final Prompt Output    "You can say previous, repeat, or next."
Constituent Recordings    "you can say"       "previous"       "repeat"       "or"       "next"   

Example Prompt 1
The first example prompt was constructed by recording the following sentence, which contains all of the constituent pieces, but in a different order:

          "You can say next, previous, or repeat."

The constituent pieces were then extracted, reordered, and finally concatenated to produce the final prompt.

This first example demonstrates how unnatural a prompt can sound if the constituent parts, when originally recorded, are recorded in an acoustical context that is different from the target acoustical context. This effect occurs because the coarticulatory effects produced by the word ordering in the original recording differs from the coarticulatory effects produced by the word ordering in the target prompt. Coarticulatory effects are the acoustic effects produced by the influence of one phone on the production of a neighboring phone. In other words, the effect that the final diphthong "ay" in the word "say" had on the initial "n" in the word "next" in the original recording is perceptibly different from the effect that the final retroflex "r" in the word "or" should have on the initial "n" of "next" in the target prompt.

Example Prompt 2
Although the second example prompt is an improvement over the first example, it still sounds unnatural. This prompt was constructed by recording each constituent part in isolation, without the acoustical context provided by a preceding and a following word.

Example Prompt 3
The third example prompt sounds the most natural. The third prompt was constructed using a combination of two techniques designed to improve the naturalness of the final product.

First, the same sentence that was used for example 1 ("You can say next, previous, or repeat.") was recorded. This time the word "Patrick" was inserted between each of the elements that were to be extracted. The following carrier sentence was recorded:

          "You can say Patrick Next, Patrick Repeat, Patrick, or Patrick Previous."

The word "Patrick" was chosen because it starts and ends with a plosive stop sound (a P- or T- type sound). Recording the constituent pieces in the acoustic context of a plosive stop minimizes coarticulatory effects.

Second, additional recordings of the carrier sentence were made. The positions of the words "next," "previous," and "repeat" were swapped, so that each of the three words appeared once in each of the three positions (in this case, predicate-initial, predicate-middle, and predicate-final positions). Each occurrence of each word was extracted from the carrier sentences, stored in the prompt database, and tagged according to the position in which they occurred. As a result, the prompt database contained three recordings of the word "next," tagged as either the predicate-initial, the predicate-middle, or the predicate-final instance of the word.

When the final prompt was constructed, the initial phrase "You can say" was concatenated with the predicate-initial recording of "next," the predicate-middle recording of "previous," the recording of "or," and the predicate-final recording of "repeat."

Although the effort required to create the prompt in this example was greater than the effort required to create the prompt in the other two examples, the resulting prompt sounds the most natural of the three.

  • Recording prompt output.
  • Using prompt engine markup language tagging to select context-specific recordings.
  • Using a prompt function instead of inline prompt text.

Running the Sample

Open the sample. After listening to a short introductory description of the sample, navigate between the three example prompts by saying next or previous. Listen to the current prompt example again by saying repeat.

Remarks

This sample does not support:

  • BargeIn. The user cannot interrupt the prompt with a response.
  • Confirmation and correction of the user's responses using a mixture of Implicit Confirmation (IC), Short Time-out Confirmation (STC), and Explicit Confirmation (EC) strategies.
  • The implementation of the global command "Help."

See Also

Dialog Speech Controls Overview | Command Control | QA Control | InlinePrompt Property | PromptSelectFunction Property

Creating Prompt Functions | Tuning Alignments and Extractions | Voice-only Run-time Behavior