azure speech to text only transcribe about 50% of the audio

Dan Nguyen 0 Reputation points
2024-10-03T23:54:13.0833333+00:00

I am using the speech to text but only able to transcribe about 50% of the audio. (ms word is able to transcribe the entire audio file)

audio file is wav, only 2 min and 15 sec, size: 2112 kb

this is my first time using this. Is there any setting that could affect this? below is my script

import os
import requests

# Define the necessary information
subscription_key = "********"
region = "westus"

# Set up the headers
headers = {
    "Ocp-Apim-Subscription-Key": subscription_key
}


folder_path = "**********"




headers = {
     "Ocp-Apim-Subscription-Key": subscription_key,
     "Content-Type": "audio/wav",
##     "Authorization": f"Bearer {token}"###token not needed 
 }

for file_name in os.listdir(folder_path):
     if file_name.endswith(".wav"):
         audio_file_path = os.path.join(folder_path, file_name)
         file_name_without_extension = os.path.splitext(file_name)[0]
         uri = "https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US&format=detailed"

         with open(audio_file_path, "rb") as audio_file:
             response = requests.post(uri, headers=headers, data=audio_file)

         response_json = response.json()
         transcription = response_json.get("DisplayText", "")
         output_file_path = os.path.join(folder_path, f"{file_name_without_extension}.txt")

         with open(output_file_path, "w") as output_file:
             output_file.write(transcription)

print("Transcription completed for all audio files in the folder.")
input("Press Enter to exit...")

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,714 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 45,961 Reputation points Microsoft Employee
    2024-10-04T08:23:31.97+00:00

    @Dan Nguyen You are using the REST API for short audio which can only process audio files for upto 60 seconds. See the short audio REST API reference doc.

    Requests that use the REST API for short audio and transmit audio directly can contain no more than 60 seconds of audio. The input audio formats are more limited compared to the Speech SDK.

    I would recommend using the SDK by following the samples from this page which can process longer files.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.