Quickstart: Server-side Audio Streaming
Important
Functionality described in this article is currently in public preview. This preview version is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Get started with using audio streams through Azure Communication Services Audio Streaming API. This quickstart assumes you're already familiar with Call Automation APIs to build an automated call routing solution.
Functionality described in this quickstart is currently in public preview.
Prerequisites
- An Azure account with an active subscription, for details see Create an account for free.
- An Azure Communication Services resource. See Create an Azure Communication Services resource.
- A new web service application created using the Call Automation SDK.
- The latest .NET library for your operating system.
- A websocket server that can receive media streams.
Set up a websocket server
Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.
Establish a call
Establish a call and provide streaming details
MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(
new Uri("<WEBSOCKET URL>"),
MediaStreamingContent.Audio,
MediaStreamingAudioChannel.Mixed,
MediaStreamingTransport.Websocket,
false);
var createCallOptions = new CreateCallOptions(callInvite, callbackUri)
{
CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri(cognitiveServiceEndpoint) },
MediaStreamingOptions = mediaStreamingOptions,
};
CreateCallResult createCallResult = await callAutomationClient.CreateCallAsync(createCallOptions);
Start audio streaming
How to start audio streaming:
StartMediaStreamingOptions options = new StartMediaStreamingOptions()
{
OperationCallbackUri = new Uri(callbackUriHost),
OperationContext = "startMediaStreamingContext"
};
await callMedia.StartMediaStreamingAsync(options);
When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.
The metadata packet will look like this:
{
"kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData.
"audioMetadata": {
"subscriptionId": <string>, // unique identifier for a subscription request
"encoding":<string>, // PCM only supported
"sampleRate": <int>, // 16000 default
"channels": <int>, // 1 default
"length": <int> // 640 default
}
}
Stop audio streaming
How to stop audio streaming
StopMediaStreamingOptions stopOptions = new StopMediaStreamingOptions()
{
OperationCallbackUri = new Uri(callbackUriHost)
};
await callMedia.StopMediaStreamingAsync(stopOptions);
Handling audio streams in your websocket server
The sample below demonstrates how to listen to audio streams using your websocket server.
HttpListener httpListener = new HttpListener();
httpListener.Prefixes.Add("http://localhost:80/");
httpListener.Start();
while (true)
{
HttpListenerContext httpListenerContext = await httpListener.GetContextAsync();
if (httpListenerContext.Request.IsWebSocketRequest)
{
WebSocketContext websocketContext;
try
{
websocketContext = await httpListenerContext.AcceptWebSocketAsync(subProtocol: null);
}
catch (Exception ex)
{
return;
}
WebSocket webSocket = websocketContext.WebSocket;
try
{
while (webSocket.State == WebSocketState.Open || webSocket.State == WebSocketState.CloseSent)
{
byte[] receiveBuffer = new byte[2048];
var cancellationToken = new CancellationTokenSource(TimeSpan.FromSeconds(60)).Token;
WebSocketReceiveResult receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), cancellationToken);
if (receiveResult.MessageType != WebSocketMessageType.Close)
{
var data = Encoding.UTF8.GetString(receiveBuffer).TrimEnd('\0');
try
{
var eventData = JsonConvert.DeserializeObject<AudioBaseClass>(data);
if (eventData != null)
{
if(eventData.kind == "AudioMetadata")
{
//Process audio metadata
}
else if(eventData.kind == "AudioData")
{
//Process audio data
var byteArray = eventData.audioData.data;
//use audio byteArray as you want
}
}
}
catch { }
}
}
}
catch (Exception ex) { }
}
}
Prerequisites
- Azure account with an active subscription, for details see Create an account for free.
- An Azure Communication Services resource. See Create an Azure Communication Services resource.
- A new web service application created using the Call Automation SDK.
- Java Development Kit version 8 or above.
- Apache Maven.
Set up a websocket server
Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.
Establish a call
Establish a call and provide streaming details
CallInvite callInvite = new CallInvite(target, caller);
CallIntelligenceOptions callIntelligenceOptions = new CallIntelligenceOptions().setCognitiveServicesEndpoint(appConfig.getCognitiveServiceEndpoint());
MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(appConfig.getWebSocketUrl(), MediaStreamingTransport.WEBSOCKET, MediaStreamingContentType.AUDIO, MediaStreamingAudioChannel.UNMIXED);
mediaStreamingOptions.setStartMediaStreaming(false);
CreateCallOptions createCallOptions = new CreateCallOptions(callInvite, appConfig.getCallBackUri());
createCallOptions.setCallIntelligenceOptions(callIntelligenceOptions);
createCallOptions.setMediaStreamingOptions(mediaStreamingOptions);
Response<CreateCallResult> result = client.createCallWithResponse(createCallOptions, Context.NONE);
return result.getValue().getCallConnectionProperties().getCallConnectionId();
Start audio streaming
How to start audio streaming:
StartMediaStreamingOptions startOptions = new StartMediaStreamingOptions()
.setOperationContext("startMediaStreamingContext")
.setOperationCallbackUrl(appConfig.getBasecallbackuri());
client.getCallConnection(callConnectionId)
.getCallMedia()
.startMediaStreamingWithResponse(startOptions, Context.NONE);
When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.
The metadata packet will look like this:
{
"kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData.
"audioMetadata": {
"subscriptionId": <string>, // unique identifier for a subscription request
"encoding":<string>, // PCM only supported
"sampleRate": <int>, // 16000 default
"channels": <int>, // 1 default
"length": <int> // 640 default
}
}
Stop audio streaming
How to stop audio streaming
StopMediaStreamingOptions stopOptions = new StopMediaStreamingOptions()
.setOperationCallbackUrl(appConfig.getBasecallbackuri());
client.getCallConnection(callConnectionId)
.getCallMedia()
.stopMediaStreamingWithResponse(stopOptions, Context.NONE);
Handling media streams in your websocket server
The sample below demonstrates how to listen to media stream using your websocket server. There will be two files that need to be run: App.java and WebSocketServer.java
package com.example;
import org.glassfish.tyrus.server.Server;
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class App {
public static void main(String[] args) {
Server server = new Server("localhost", 8081, "/ws", null, WebSocketServer.class);
try {
server.start();
System.out.println("Web socket running on port 8081...");
System.out.println("wss://localhost:8081/ws/server");
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
reader.readLine();
} catch (Exception e) {
e.printStackTrace();
} finally {
server.stop();
}
}
}
package com.example;
import javax.websocket.OnMessage;
import javax.websocket.Session;
import javax.websocket.server.ServerEndpoint;
import com.azure.communication.callautomation.models.streaming.StreamingData;
import com.azure.communication.callautomation.models.streaming.StreamingDataParser;
import com.azure.communication.callautomation.models.streaming.media.AudioData;
import com.azure.communication.callautomation.models.streaming.media.AudioMetadata;
@ServerEndpoint("/server")
public class WebSocketServer {
@OnMessage
public void onMessage(String message, Session session) {
// System.out.println("Received message: " + message);
StreamingData data = StreamingDataParser.parse(message);
if (data instanceof AudioMetadata) {
AudioMetadata audioMetaData = (AudioMetadata) data;
System.out.println("----------------------------------------------------------------");
System.out.println("SUBSCRIPTION ID:-->" + audioMetaData.getMediaSubscriptionId());
System.out.println("ENCODING:-->" + audioMetaData.getEncoding());
System.out.println("SAMPLE RATE:-->" + audioMetaData.getSampleRate());
System.out.println("CHANNELS:-->" + audioMetaData.getChannels());
System.out.println("LENGTH:-->" + audioMetaData.getLength());
System.out.println("----------------------------------------------------------------");
}
if (data instanceof AudioData) {
System.out.println("----------------------------------------------------------------");
AudioData audioData = (AudioData) data;
System.out.println("DATA:-->" + audioData.getData());
System.out.println("TIMESTAMP:-->" + audioData.getTimestamp());
// System.out.println("PARTICIPANT:-->" + audioData.getParticipant().getRawId()
// != null
// ? audioData.getParticipant().getRawId()
// : "");
System.out.println("IS SILENT:-->" + audioData.isSilent());
System.out.println("----------------------------------------------------------------");
}
}
}
Prerequisites
- Azure account with an active subscription, for details see Create an account for free.
- An Azure Communication Services resource. See Create an Azure Communication Services resource.
- A new web service application created using the Call Automation SDK.
- Node.js LTS installation
- A websocket server that can receive media streams.
Set up a websocket server
Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.
Establish a call
Establish a call and provide streaming details
const mediaStreamingOptions: MediaStreamingOptions = {
transportUrl: "<WEBSOCKET URL>",
transportType: "websocket",
contentType: "audio",
audioChannelType: "unmixed",
startMediaStreaming: false
}
const options: CreateCallOptions = {
callIntelligenceOptions: { cognitiveServicesEndpoint: process.env.COGNITIVE_SERVICES_ENDPOINT },
mediaStreamingOptions: mediaStreamingOptions
};
Start audio streaming
How to start audio streaming:
const streamingOptions: StartMediaStreamingOptions = {
operationContext: "startMediaStreamingContext",
operationCallbackUrl: process.env.CALLBACK_URI + "/api/callbacks"
}
await callMedia.startMediaStreaming(streamingOptions);
When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.
The metadata packet will look like this:
{
"kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData.
"audioMetadata": {
"subscriptionId": <string>, // unique identifier for a subscription request
"encoding":<string>, // PCM only supported
"sampleRate": <int>, // 16000 default
"channels": <int>, // 1 default
"length": <int> // 640 default
}
}
Stop audio streaming
How to stop audio streaming
const stopMediaStreamingOptions: StopMediaStreamingOptions = {
operationCallbackUrl: process.env.CALLBACK_URI + "/api/callbacks"
}
await callMedia.stopMediaStreaming(stopMediaStreamingOptions);
Handling audio streams in your websocket server
The sample below demonstrates how to listen to audio streams using your websocket server.
import WebSocket from 'ws';
import { streamingData } from '@azure/communication-call-automation/src/utli/streamingDataParser'
const wss = new WebSocket.Server({ port: 8081 });
wss.on('connection', (ws: WebSocket) => {
console.log('Client connected');
ws.on('message', (packetData: ArrayBuffer) => {
const decoder = new TextDecoder();
const stringJson = decoder.decode(packetData);
console.log("STRING JSON=>--" + stringJson)
//var response = streamingData(stringJson);
var response = streamingData(packetData);
if ('locale' in response) {
console.log("Transcription Metadata")
console.log(response.callConnectionId);
console.log(response.correlationId);
console.log(response.locale);
console.log(response.subscriptionId);
}
if ('text' in response) {
console.log("Transcription Data")
console.log(response.text);
console.log(response.format);
console.log(response.confidence);
console.log(response.offset);
console.log(response.duration);
console.log(response.resultStatus);
if ('phoneNumber' in response.participant) {
console.log(response.participant.phoneNumber);
}
response.words.forEach(element => {
console.log(element.text)
console.log(element.duration)
console.log(element.offset)
});
}
});
ws.on('close', () => {
console.log('Client disconnected');
});
});
// function processData(data: ArrayBuffer) {
// const byteArray = new Uint8Array(data);
// }
console.log('WebSocket server running on port 8081');
Prerequisites
- Azure account with an active subscription, for details see Create an account for free.
- An Azure Communication Services resource. See Create an Azure Communication Services resource.
- A new web service application created using the Call Automation SDK.
- Python 3.7+.
- A websocket server that can receive media streams.
Set up a websocket server
Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.
Establish a call
Establish a call and provide streaming details
media_streaming_options = MediaStreamingOptions(
transport_url="wss://e063-2409-40c2-4004-eced-9487-4dfb-b0e4-10fb.ngrok-free.app",
transport_type=MediaStreamingTransportType.WEBSOCKET,
content_type=MediaStreamingContentType.AUDIO,
audio_channel_type=MediaStreamingAudioChannelType.UNMIXED,
start_media_streaming=False
)
call_connection_properties = call_automation_client.create_call(target_participant,
CALLBACK_EVENTS_URI,
cognitive_services_endpoint=COGNITIVE_SERVICES_ENDPOINT,
source_caller_id_number=source_caller,
media_streaming=media_streaming_options
)
Start audio streaming
How to start audio streaming:
call_connection_client.start_media_streaming()
When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.
The metadata packet will look like this:
{
"kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData.
"audioMetadata": {
"subscriptionId": <string>, // unique identifier for a subscription request
"encoding":<string>, // PCM only supported
"sampleRate": <int>, // 16000 default
"channels": <int>, // 1 default
"length": <int> // 640 default
}
}
Stop audio streaming
How to stop audio streaming
call_connection_client.stop_media_streaming()
Handling audio streams in your websocket server
The sample below demonstrates how to listen to audio streams using your websocket server.
import asyncio
import json
import websockets
async def handle_client(websocket, path):
print("Client connected")
try:
async for message in websocket:
print(message)
packet_data = json.loads(message)
packet_data = message.encode('utf-8')
print("Packet DATA:-->",packet_data)
except websockets.exceptions.ConnectionClosedOK:
print("Client disconnected")
start_server = websockets.serve(handle_client, "localhost", 8081)
print('WebSocket server running on port 8081')
asyncio.get_event_loop().run_until_complete(start_server)
asyncio.get_event_loop().run_forever()
Audio streaming schema
After sending through the metadata packet, Azure Communication Services will start streaming audio media to your WebSocket server. Below is an example of what the media object your server will receive looks like.
{
"kind": <string>, // What kind of data this is, e.g. AudioMetadata, AudioData.
"audioData":{
"data": <string>, // Base64 Encoded audio buffer data
"timestamp": <string>, // In ISO 8601 format (yyyy-mm-ddThh:mm:ssZ)
"participantRawID": <string>,
"silent": <boolean> // Indicates if the received audio buffer contains only silence.
}
}
Clean up resources
If you want to clean up and remove a Communication Services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it. Learn more about cleaning up resources.
Next steps
- Learn more about Audio Streaming.
- Learn more about Call Automation and its features.
- Learn more about Play action.
- Learn more about Recognize action.