Accuracy scores for text-to-speech voices - MS in 4th - but not for long

How does one do accuracy testing of different text to speech voices? That’s a real interesting question. A voice can sound really great for a particular string and then come out as junk given ‘tricky’ text. There are all sorts of text normalization issues (e.g., should the string “read” be pronounced as R EH D or R EE D?) which I think I’ll save for a later discussion. But having just posted a link to waveforms of all of the TTS voices, it’s interesting to see that ASRNews independently rated all of the contenders, assigning its own accuracy scores to voices based on a range of criteria. (Of course, the detailed results are for purchase.) Interestingly, Microsoft’s desktop voice (MS Sam) was rated 4th behind Scansoft, Loquendo, and IBM. We'll be closing in on the lead with the Longhorn TTS engine for sure.

Comments

  • Anonymous
    June 07, 2005
    To me, speech-to-text is much, much more interesting.

    Do you know ratings of speech-to-text engines? I know that Dragon/ScanSoft engine was pretty good, as was IBM ViaVoice, but I am not familiary with the accuracy of MS, nor with any others in the past couple of years.
  • Anonymous
    June 09, 2005
    The comment has been removed