How to collect user voice in real-time from the browser and then send it to Azure Speech-to-Text via WebSocket?

CodeKidz 25 Reputation points
2023-12-15T05:06:24.9066667+00:00

I'm almost driven crazy by this problem. The audio stream I capture with MediaRecorder on Chrome only supports the webm format, while the Azure API only supports wav and ogg formats.

And there is no complete example telling me how to create a support for users to input in real time through voice, which is then forwarded to Azure Speech through my own backend service (to prevent key leakage).

All the examples are just directly interfacing with Azure Speech, which is not very useful.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,675 questions
{count} votes

2 answers

Sort by: Most helpful
  1. CodeKidz 25 Reputation points
    2023-12-15T11:21:23.98+00:00

    Finally I find this project, https://github.com/Azure-Samples/AzureSpeechReactSample

    which shows the correct way to use speech sdk in frontend.

    You don't need to connect your own server via websocket, just generate a temporary token and use the Azure Speech SDK directly in the frontend. This project provides a sample implementation using React.

    1 person found this answer helpful.
    0 comments No comments

  2. Kenneth Díaz González 0 Reputation points
    2024-07-13T07:52:58.36+00:00

    I've recently created a sample code for this, feel free to check it out:
    https://stackoverflow.com/a/78743136/26354907

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.