Is there a way to make speech service transcription faster (diarization with speakers differentiated)?

kk 0 Reputation points
2024-07-02T05:30:12.4666667+00:00

Currently the speed seems to be half the time for wav and 1:1 ratio for mp4 with gstreamer.

From this post, it seems half the time for wav file is the maximum.

https://stackoverflow.com/questions/69845073/how-to-do-voice-recognition-in-azure-and-complete-immediately

If this is true, how can I make at least the other file format transcription with gstreamer (like mp4) be as fast as wav file?

I am following this code for python from the doc.
https://video2.skills-academy.com/en-us/azure/ai-services/speech-service/how-to-use-codec-compressed-audio-input-streams?tabs=linux%2Cdebian%2Cjava-android%2Cterminal&pivots=programming-language-python

Thank you for your help!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,516 questions
{count} votes