speakSsmlAsync returns invalid audio file

Ben Carter 111 Reputation points
2021-11-20T00:29:24.073+00:00

We are using Javascript to access the API using speakSsmlAsync on the SpeechSynthesizer. We are expecting mp3 files. When we try to play these, in most software they don't play (QuickTime for example).

We are setting the audioConfig like this

const audioConfig = AudioConfig.fromAudioFileOutput(filename);

where filename is something like my-file.mp3

When I try and open in Handbrake I get errors like this:

Input #0, wav, from 'my-file.mp3':
Duration: 00:00:02.31, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s
[16:24:03] hb_stream_open: open my-file.mp3 failed
[16:24:03] scan: unrecognized file type
[16:24:03] libhb: scan thread found 0 valid title(s)
[16:24:03] macgui: ScanCore scan done

Which makes me think it's not encoded properly.

If I change the extension to .wav, it will play (although it still says it's invalid).

So

  1. what are we doing wrong?
  2. is there a way to specify the output format / rate explicitly by creating our own AudioConfig? We couldn't figure out how to do that.

Thanks!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,774 questions
0 comments No comments
{count} vote

Accepted answer
  1. Ben Carter 111 Reputation points
    2021-11-22T19:27:25.697+00:00

    Thanks.

    These functions don't appear to exist in Javascript.

    It helped though to show that it should be possible, and after some more digging around in the docs and examples, we eventually found that we could set the output format in the SpeechConfig like this

    speechConfig.speechSynthesisOutputFormat = sdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3;
    

    This gave us playable mp3 files.

    Thanks again.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. romungi-MSFT 46,831 Reputation points Microsoft Employee
    2021-11-22T09:17:36.197+00:00

    @Ben Carter In this case you need to set the output file format in the SpeechSynthesizer() using SetOutputToWaveFile()

    SpeechSynthesizer synth = new SpeechSynthesizer()  
    synth.SetOutputToWaveFile(@"C:\temp\test.wav",     
              new SpeechAudioFormatInfo(32000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));  
    

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.