@prashantnigam-6347 Yes, the text to speech service supports other audio formats too. All supported audio formats are listed in their respective SDK references. For ex, the formats that can be set if you are using python SDK are here.
You can set the format using your speech config.
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
speech_config.set_speech_synthesis_output_format(speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3)
file_name = "outputaudio.mp3"
file_config = speechsdk.audio.AudioOutputConfig(filename=file_name)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=file_config)
The samples from SDK repo should help you set them with your application.
If you are using REST API you can set it using the header. For ex:
curl --location --request POST 'https://INSERT_REGION_HERE.tts.speech.microsoft.com/cognitiveservices/v1' \
--header 'Ocp-Apim-Subscription-Key: INSERT_SUBSCRIPTION_KEY_HERE' \
--header 'Content-Type: application/ssml+xml' \
--header 'X-Microsoft-OutputFormat: audio-16khz-128kbitrate-mono-mp3' \
--header 'User-Agent: curl' \
--data-raw '<speak version='\''1.0'\'' xml:lang='\''en-US'\''>
<voice xml:lang='\''en-US'\'' xml:gender='\''Female'\'' name='\''en-US-JennyNeural'\''>
my voice is my passport verify me
</voice>
</speak>' > output.mp3
If an answer is helpful, please click on or upvote which might help other community members reading this thread.