Azure AI Speech

1 answer

Azure Text To Speech docker container throws an exception with viseme

I'm using the Azure Text to Speech docker image (mcr.microsoft.com/azure-cognitive-services/speechservices/neural-text-to-speech:3.3.0-amd64-en-us-jennyneural). I'm passing it SSML through the dotnet SDK. When asking for viseme (via <mstts:viseme…

asked

Jon Peterson 26

answered

dupammi 7,955 Microsoft Vendor

0 answers

Is there a way to make speech service transcription faster (diarization with speakers differentiated)?

Currently the speed seems to be half the time for wav and 1:1 ratio for mp4 with gstreamer. From this post, it seems half the time for wav file is the…

asked

kk 0

commented

santoshkc 6,380 Microsoft Vendor

0 answers

how to assign operation permissions a resources

Hello, I am new to Azure and I want to use it to convert text to speech. when I creat the resources -> enter the speech studio and try to start the service, the system raised an error and say "You don't have operation permissions to [New],…

asked

Jingxiong Wang 0

commented

santoshkc 6,380 Microsoft Vendor

1 answer

Random Words Detected by Azure Speech Recognizer in Silence

Hello Azure Support Team, I am currently using the Azure Speech Service to recognize speech inputs in my application. The setup of my speech recognizer is as follows: export const createSpeechRecognizer = () => { const speechRecognitionConfig =…

asked

Abdul Subhan 5

edited the question

Ryan Hill 26,866 Microsoft Employee

0 answers

Speech-to-Text batch transcribe API in germanycentralwest doesn't work

Last Friday (May 31 2024) we started getting the following errors on all transcripts sent to the batch transcription API on our speech resource in…

asked

Matej the Mete 20

edited the question

Ryan Hill 26,866 Microsoft Employee

0 answers

azure prononciation assessment time limit

i am using azure prononciation assessment to assess an audio , but the problem the assessment happens only for the 1 min of the speech and it doesnt assess the rest of the audio this is my code const sdk =…

asked

Iheb Jandoubi 5

edited the question

Ryan Hill 26,866 Microsoft Employee

1 answer

azure prononciation assessment input video

can i give to azure prononciation assessment a video input ?

asked

Iheb Jandoubi 5

edited the question

Ryan Hill 26,866 Microsoft Employee

1 answer

azure prononciation assessment async assessment

i'am using azure speech recognizer sdk , to do the prononciation assessment of an audio file. the problem when the speech is in french the results are always low , and no expressive const language = await detectSingleSpeechLanguage(text) …

asked

Iheb Jandoubi 5

edited the question

Ryan Hill 26,866 Microsoft Employee

0 answers

Error while trying to train a 202240228 Whisper Large v2 baseline model

When trying to train a custom speech model using a dataset containing an audio file and its transcript, the model failed to train due to an internal error. Can anyone provide any insights on how to troubleshoot this issue?

asked

Engineering 0

edited the question

Ryan Hill 26,866 Microsoft Employee

0 answers

How to create a dataset for Azure custom speech using spx (speechCLI)

I am using the following command for creating a custom speech dataset in my Azure Speech service: spx csr dataset create --api-version v3.1 --kind "Acoustic" --name "My Custom Speech" --description "My Acoustic Dataset…

asked

Mikel Broström Zalba 20

commented

VasaviLankipalle-MSFT 15,836

1 answer

Azure Cognitive Services Speech: Unable to get Custom Translator model results from speech translation code

In test C# code that I created based on the speech translation code in the following sample (“Using custom translation in speech translation”), I’m having trouble getting Custom Translator model translation results. The code just returns a cancellation…

asked

Hirai, Tetu 0

edited a comment

Hirai, Tetu 0

1 answer

How to synchronize real world events happening while speech recognition is happening with individual spoken words

I am trying to synchronize real world events that are occuring during live streaming of speech to Azure speech recognition services (e.g., eye gaze shifts, hardware device interactions, etc.). I note the time when I start speech recognition and record…

asked

Mark Miller (DevExpress) 0

commented

Mark Miller (DevExpress) 0

0 answers

Can my web app use a GPU for AI capabilities or will I need to use an Azure VM?

I am running a web app which I deployed through docker. The web app works perfectly besides one important detail, the whisperx ai model I have takes forever to run a transcription (think hours). I run the same ai function on a "T4 GPU" using…

asked

Henrik Vlijter 0

commented

dupammi 7,955 Microsoft Vendor

1 answer

Is each voice in the voice gallery based on a clone of one specific natural person or is it synthetic?

I would like to understand whether: Each voice in the voice gallery is based on a clone of one specific natural person? Voices are synthetic (similar to those from 11Labs Voice Design) that cannot be traced back to an individual person? Thank you!

asked

mpsb 0

answered

santoshkc 6,380 Microsoft Vendor

0 answers

Microsoft: fix captioning by Speech Studio

The captioning functionality in the Speech Studio is an utter failure. This is typical output: I encourage Microsoft to implement the functionality that allows the user to specify the number of lines of text (typically one or two), and the maximum…

asked

Roy Jensen 20

commented

navba-MSFT 19,495 Microsoft Employee

1 answer

SpeakSsmlAsync is cancelled, but SpeakTextAsync is successfull

I am trying out the Azure AI service to convert text to speech from a C# WPF application. My calls through SpeakTextAsync are successfull, but my calls through SpeakSsmlAsync are returned with the Reason = Cancelled. I am on the free tier for South…

asked

One More Henry 0

commented

navba-MSFT 19,495 Microsoft Employee

0 answers

What are the HW or sound limitations for the echo cancellation algorithm in SpeechSDK

hi, I'm having some issues with the echo cancellation on my device, and I'm trying to use speech SDK, when I was analyzing the sounds that I record with microphone it seems that there are present higher harmonics which are 24dB less then primary…

asked

Faris Lemes 50

commented

navba-MSFT 19,495 Microsoft Employee

1 answer

create a basic voice-interactive dashboard

Hello Team, I need to create a basic voice-interactive dashboard using Azure Cognitive services like, Speech service, CLU(Conversational Language Understanding) & PowerBI.Also suggest if any other way to achieve this. It would be really helpful.

asked

Vijayakumar Elumalai 105

commented

Vijayakumar Elumalai 105

1 answer

As a student how can I use Azure Speech resource

I have a student subscription and want to create an Azure Speech resource, but there's a problem. Is it because of the student subscription limitation or what I can do to use Azure speech service?

asked

Aleksei Zhukov 0

edited an answer

YutongTie-MSFT 47,991

0 answers

how can I set the permission to the resources

Hello, I want to upload a text file to Speech Studio, but the system raised an error Does anyone help how I can fix this and assign a proper role for myself? I already set my role as a Cognitive Services User.

asked

Jingxiong Wang 0

commented

YutongTie-MSFT 47,991

Filter

Content

1,506 questions with Azure AI Speech tags

Azure Text To Speech docker container throws an exception with viseme

Is there a way to make speech service transcription faster (diarization with speakers differentiated)?

how to assign operation permissions a resources

Random Words Detected by Azure Speech Recognizer in Silence

Speech-to-Text batch transcribe API in germanycentralwest doesn't work

azure prononciation assessment time limit

azure prononciation assessment input video

azure prononciation assessment async assessment

Error while trying to train a 202240228 Whisper Large v2 baseline model

How to create a dataset for Azure custom speech using spx (speechCLI)

Azure Cognitive Services Speech: Unable to get Custom Translator model results from speech translation code

How to synchronize real world events happening while speech recognition is happening with individual spoken words

Can my web app use a GPU for AI capabilities or will I need to use an Azure VM?

Is each voice in the voice gallery based on a clone of one specific natural person or is it synthetic?

Microsoft: fix captioning by Speech Studio

SpeakSsmlAsync is cancelled, but SpeakTextAsync is successfull

What are the HW or sound limitations for the echo cancellation algorithm in SpeechSDK

create a basic voice-interactive dashboard

As a student how can I use Azure Speech resource

how can I set the permission to the resources