The Cognitive Services Speech SDK has no sound on iPhone's Safari, but can play successfully on Mac's Safari. How should this be handled?

Accepted answer

navba-MSFT 19,655 Microsoft Employee

@jessebo Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

Plan 1:

Can you test with speakSsmlAsync instaed of speakTextAsync and check if that helps? I am sharing the sample code below:

const synthesizeSpeech = () => {
    if (synthesizer) {
        synthesizer.synthesizing = (s, e) => {
            console.log(e.result.privReason)
        };
        let ssml = `<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='zh-CN-XiaochenNeural'>${text}</voice></speak>`;
        synthesizer.speakSsmlAsync(
            ssml,
            result => {
                console.log(result)
                if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted)           
                {
                    console.log('Synthesis completed.');
                 
                } else {
                    console.error('Speech synthesis canceled, ' + result.errorDetails);
                }
                synthesizer.close();
            },
            error => {
                console.error('Error synthesizing speech: ', error);
                synthesizer.close();
            }
        );
    }
};

More info here.

Plan 2:

You can try using this code sample, and check if that gives the audio output on iOS Safari browser.

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

jessebo 20 Reputation points

2024-06-22T23:58:35.0466667+00:00

Thank you very much for your response. Plan1 is available, and you have helped me a lot. I appreciate it. I'm sorry for replying only today; I forgot about this account before, which is really embarrassing.
jessebo 20 Reputation points

2024-06-22T23:58:47.97+00:00

repeat comment
jessebo 20 Reputation points

2024-06-23T02:51:41.76+00:00
I found that the speakTextAsync function in code sample 2 works correctly on my iPhone. However, when I copied the same code to my React project, there was no voice output. It seems like the callback

let player = new window.SpeechSDK.SpeakerAudioDestination(); player.onAudioStart(...)

is never being called. Can you shed some light on this issue?

navba-MSFT 19,655 Microsoft Employee

@jessebo Thanks for getting back. I am glad that the above step 2 helped.

Now to address the issue within your react sample, requires debugging. You can add simple console.log statements and check if that helps. I am also sharing a sample code snippet here.

Disclaimer: The below code is not tested. So, you need to update it and then test it for your use case.

import React, { useState, useEffect } from 'react';

import * as SpeechSDK from 'microsoft-cognitiveservices-speech-sdk';

const SpeechComponent = () => {

  const [synthesizer, setSynthesizer] = useState(null);

  const [player, setPlayer] = useState(null);

  const [subscriptionKey, setSubscriptionKey] = useState('YOUR_SUBSCRIPTION_KEY');

  const [region, setRegion] = useState('YOUR_REGION');

  const [voice, setVoice] = useState('en-US-JennyNeural');

  const [format, setFormat] = useState('Audio16Khz32KBitRateMonoMp3');

  const [text, setText] = useState('Hello, this is a test.');

  useEffect(() => {

    if (synthesizer) {

      synthesizer.synthesizing = (s, e) => {

        console.log(e);

        // Handle synthesizing event

      };

      synthesizer.synthesisStarted = (s, e) => {

        console.log(e);

        // Handle synthesis started event

      };

      synthesizer.synthesisCompleted = (s, e) => {

        console.log(e);

        // Handle synthesis completed event

      };

      synthesizer.SynthesisCanceled = (s, e) => {

        const cancellationDetails = SpeechSDK.CancellationDetails.fromResult(e.result);

        let str = "(cancel) Reason: " + SpeechSDK.CancellationReason[cancellationDetails.reason];

        if (cancellationDetails.reason === SpeechSDK.CancellationReason.Error) {

          str += ": " + e.result.errorDetails;

        }

        console.log(e);

        // Handle synthesis canceled event

      };

    }

  }, [synthesizer]);

  const startSynthesis = () => {

    const speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey, region);

    speechConfig.speechSynthesisVoiceName = voice;

    speechConfig.speechSynthesisOutputFormat = format;

    const newPlayer = new SpeechSDK.SpeakerAudioDestination();

    newPlayer.onAudioStart = () => {

      console.log('playback started');

    };

    newPlayer.onAudioEnd = () => {

      console.log('playback finished');

    };

    const audioConfig = SpeechSDK.AudioConfig.fromSpeakerOutput(newPlayer);

    const newSynthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig, audioConfig);

    setPlayer(newPlayer);

    setSynthesizer(newSynthesizer);

    newSynthesizer.speakTextAsync(

      text,

      result => {

        if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted) {

          console.log('Synthesis finished.');

        } else {

          console.error('Speech synthesis canceled, ' + result.errorDetails);

        }

      },

      error => {

        console.error(error);

      }

    );

  };

  return (

    <div>

      <h1>Speech Synthesis Test</h1>

      <textarea value={text} onChange={(e) => setText(e.target.value)} />

      <button onClick={startSynthesis}>Start Synthesis</button>

    </div>

  );

};

export default SpeechComponent;

jessebo 20 Reputation points

2024-06-24T09:51:16.02+00:00

thank you, You are really so kind
jessebo 20 Reputation points

2024-06-24T09:59:04.4666667+00:00

in my case, all the logs are working correctly, except player log 'console.log('playback started');' I think this may be because the SDK did not select the correct player on the phone，Actually, I am using the web version of React Native, which may be more complex.

navba-MSFT 19,655 Microsoft Employee

@jessebo Given that you're using the web version of React Native, the complexities could be related to how React Native handles the web view and the compatibility of the Speech SDK with it. Here are a few steps to troubleshoot and potentially resolve the issue:

Ensure Correct Player Selection

Verify Player Initialization: Ensure the player is correctly initialized and set up for the web view environment. In some environments, the player might not be recognized correctly.

Manual Player Check: Manually check if the player is correctly assigned and valid before using it.

Adjust for React Native Web Environment

React Native for Web has some differences in how it handles certain browser APIs. Ensure your environment can support and correctly load the required functionalities.

Refactored Code for React Native Web

Here is a more tailored approach considering the React Native Web complexities:

import React, { useState, useEffect, useRef } from 'react';

import * as SpeechSDK from 'microsoft-cognitiveservices-speech-sdk';

const SpeechComponent = () => {

  const [synthesizer, setSynthesizer] = useState(null);

  const [subscriptionKey, setSubscriptionKey] = useState('YOUR_SUBSCRIPTION_KEY');

  const [region, setRegion] = useState('YOUR_REGION');

  const [voice, setVoice] = useState('en-US-JennyNeural');

  const [format, setFormat] = useState('Audio16Khz32KBitRateMonoMp3');

  const [text, setText] = useState('Hello, this is a test.');

  useEffect(() => {

    const speechConfig = SpeechSDK.SpeechConfig.fromSubscription(subscriptionKey, region);

    speechConfig.speechSynthesisVoiceName = voice;

    speechConfig.speechSynthesisOutputFormat = format;

    const player = new SpeechSDK.SpeakerAudioDestination();

    

    // Check if the player is correctly initialized

    if (player) {

      console.log('Player initialized');

    } else {

      console.error('Player not initialized');

    }

    player.onAudioStart = () => {

      console.log('playback started');

    };

    player.onAudioEnd = () => {

      console.log('playback finished');

    };

    const audioConfig = SpeechSDK.AudioConfig.fromSpeakerOutput(player);

    const newSynthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig, audioConfig);

    setSynthesizer(newSynthesizer);

    newSynthesizer.synthesizing = (s, e) => {

      console.log(e);

      // Handle synthesizing event

    };

    newSynthesizer.synthesisStarted = (s, e) => {

      console.log(e);

      // Handle synthesis started event

    };

    newSynthesizer.synthesisCompleted = (s, e) => {

      console.log(e);

      // Handle synthesis completed event

    };

    newSynthesizer.SynthesisCanceled = (s, e) => {

      const cancellationDetails = SpeechSDK.CancellationDetails.fromResult(e.result);

      let str = "(cancel) Reason: " + SpeechSDK.CancellationReason[cancellationDetails.reason];

      if (cancellationDetails.reason === SpeechSDK.CancellationReason.Error) {

        str += ": " + e.result.errorDetails;

      }

      console.log(e);

      // Handle synthesis canceled event

    };

    return () => {

      newSynthesizer.close();

    };

  }, [subscriptionKey, region, voice, format]);

  const startSynthesis = () => {

    if (synthesizer) {

      synthesizer.speakTextAsync(

        text,

        result => {

          if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted) {

            console.log('Synthesis finished.');

          } else {

            console.error('Speech synthesis canceled, ' + result.errorDetails);

          }

        },

        error => {

          console.error(error);

        }

      );

    } else {

      console.error('Synthesizer not initialized');

    }

  };

  return (

    <div>

      <h1>Speech Synthesis Test</h1>

      <textarea value={text} onChange={(e) => setText(e.target.value)} />

      <button onClick={startSynthesis}>Start Synthesis</button>

    </div>

  );

};

export default SpeechComponent;

Additional Debugging Tips

Check Audio Context: Ensure the browser allows audio playback, especially in the case of Safari on iPhone, which has strict autoplay policies.
User Interaction: Ensure the synthesis function is called in response to a user action, such as a button click, to comply with autoplay policies.
Environment Checks: Check for any specific differences in how React Native for Web handles audio elements compared to a standard web environment.

Final Steps

Manual Player Initialization: If the player still doesn't work, manually debug if player is correctly initialized by logging its properties and methods.
Fallback Options: Consider using a different audio output method or creating a more custom player if the default one fails on iOS Safari.

By following these steps and adjustments, you should be able to narrow down the issue and find a working solution for integrating Azure Cognitive Services Speech SDK with React Native for Web. Additional Debugging Tips

Check Audio Context: Ensure the browser allows audio playback, especially in the case of Safari on iPhone, which has strict autoplay policies.
User Interaction: Ensure the synthesis function is called in response to a user action, such as a button click, to comply with autoplay policies.
Environment Checks: Check for any specific differences in how React Native for Web handles audio elements compared to a standard web environment.

Final Steps

Manual Player Initialization: If the player still doesn't work, manually debug if player is correctly initialized by logging its properties and methods.
Fallback Options: Consider using a different audio output method or creating a more custom player if the default one fails on iOS Safari.

By following these steps and adjustments, you should be able to narrow down the issue and find a working solution for integrating Azure Cognitive Services Speech SDK with React Native for Web.

Share via

The Cognitive Services Speech SDK has no sound on iPhone's Safari, but can play successfully on Mac's Safari. How should this be handled?

0 additional answers