Is there a way for speech diarization to run across multiple files while keeping the speaker IDs the same for each speaker?

Annie 0 Reputation points
2024-06-26T23:08:15.51+00:00

Let's say I have 5 large wav files of the same 4 speakers. The files are too large to concatenate into one wav file. Is there a way I can run diarization on these 5 files and keep the same speaker number for the respective voice across all files?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,501 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 19,480 Reputation points Microsoft Employee
    2024-06-27T09:27:55.72+00:00

    @Annie Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    Azure AI Speech Service provides real-time diarization, which distinguishes between different speakers who participate in the conversation. However, the speaker ID is a generic identifier assigned to each conversation participant by the service during the recognition as different speakers are being identified from the provided audio content. More info here.

    .

    The ConversationTranscriber API, part of the Speech SDK, combines diarization with speech-to-text functionality to provide transcription outputs that contain a speaker entry for each transcribed speech. The transcription output is tagged as GUEST1, GUEST2, GUEST3, etc., based on the number of speakers in the audio conversation. More info here.

    .

    However, the current Azure AI Speech Service does not seem to support maintaining the same speaker IDs across multiple files. The diarization process identifies speakers in each individual audio file separately, and the speaker IDs are assigned within the context of each individual file.

    If you need to maintain the same speaker IDs across multiple files, you might need to implement a custom solution. This could involve using voice recognition techniques to match the identified speakers in each file to a set of known speaker profiles. Or Perhaps, a logic within your application to extract the speakerID and replace it and assign consistent speaker IDs.

    .

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    0 comments No comments