Quickstart: Server-side Audio Streaming

Important

Functionality described in this article is currently in public preview. This preview version is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Get started with using audio streams through Azure Communication Services Audio Streaming API. This quickstart assumes you're already familiar with Call Automation APIs to build an automated call routing solution.

Functionality described in this quickstart is currently in public preview.

Prerequisites

Set up a websocket server

Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.

Establish a call

Establish a call and provide streaming details

MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions( 
    new Uri("<WEBSOCKET URL>"), 
    MediaStreamingContent.Audio, 
    MediaStreamingAudioChannel.Mixed, 
    MediaStreamingTransport.Websocket, 
    false); 

 var createCallOptions = new CreateCallOptions(callInvite, callbackUri) 
 { 
     CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri(cognitiveServiceEndpoint) }, 
     MediaStreamingOptions = mediaStreamingOptions, 
 }; 

 CreateCallResult createCallResult = await callAutomationClient.CreateCallAsync(createCallOptions); 

Start audio streaming

How to start audio streaming:

StartMediaStreamingOptions options = new StartMediaStreamingOptions() 
    { 
        OperationCallbackUri = new Uri(callbackUriHost), 
        OperationContext = "startMediaStreamingContext" 
    };
    await callMedia.StartMediaStreamingAsync(options); 

When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.

The metadata packet will look like this:

{ 
    "kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData. 
    "audioMetadata": { 
        "subscriptionId": <string>, // unique identifier for a subscription request 
        "encoding":<string>, // PCM only supported 
        "sampleRate": <int>, // 16000 default 
        "channels": <int>, // 1 default 
        "length": <int> // 640 default 
    } 
} 

Stop audio streaming

How to stop audio streaming

StopMediaStreamingOptions stopOptions = new StopMediaStreamingOptions() 
    { 
        OperationCallbackUri = new Uri(callbackUriHost) 
    }; 
    await callMedia.StopMediaStreamingAsync(stopOptions); 

Handling audio streams in your websocket server

The sample below demonstrates how to listen to audio streams using your websocket server.

HttpListener httpListener = new HttpListener(); 
httpListener.Prefixes.Add("http://localhost:80/"); 
httpListener.Start(); 

while (true) 
{ 
    HttpListenerContext httpListenerContext = await httpListener.GetContextAsync(); 
    if (httpListenerContext.Request.IsWebSocketRequest) 
    { 
        WebSocketContext websocketContext; 
        try 
        { 
            websocketContext = await httpListenerContext.AcceptWebSocketAsync(subProtocol: null); 
        } 
        catch (Exception ex) 
        { 
            return; 
        } 
        WebSocket webSocket = websocketContext.WebSocket; 
        try 
        { 
            while (webSocket.State == WebSocketState.Open || webSocket.State == WebSocketState.CloseSent) 
            { 
                byte[] receiveBuffer = new byte[2048]; 
                var cancellationToken = new CancellationTokenSource(TimeSpan.FromSeconds(60)).Token; 
                WebSocketReceiveResult receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), cancellationToken); 
                if (receiveResult.MessageType != WebSocketMessageType.Close) 
                { 
                    var data = Encoding.UTF8.GetString(receiveBuffer).TrimEnd('\0'); 
                    try 
                    { 
                        var eventData = JsonConvert.DeserializeObject<AudioBaseClass>(data); 
                        if (eventData != null) 
                        { 
                            if(eventData.kind == "AudioMetadata") 
                            { 
                                //Process audio metadata 
                            } 
                            else if(eventData.kind == "AudioData")  
                            { 
                                //Process audio data 
                                var byteArray = eventData.audioData.data; 
                               //use audio byteArray as you want 
                            } 
                        } 
                    } 
                    catch { } 
                } 
            } 
        } 
        catch (Exception ex) { } 
    } 
} 

Prerequisites

Set up a websocket server

Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.

Establish a call

Establish a call and provide streaming details

CallInvite callInvite = new CallInvite(target, caller);  
              
            CallIntelligenceOptions callIntelligenceOptions = new CallIntelligenceOptions().setCognitiveServicesEndpoint(appConfig.getCognitiveServiceEndpoint());  
            MediaStreamingOptions mediaStreamingOptions = new MediaStreamingOptions(appConfig.getWebSocketUrl(), MediaStreamingTransport.WEBSOCKET, MediaStreamingContentType.AUDIO, MediaStreamingAudioChannel.UNMIXED);  
            mediaStreamingOptions.setStartMediaStreaming(false);  
          
            CreateCallOptions createCallOptions = new CreateCallOptions(callInvite, appConfig.getCallBackUri());  
            createCallOptions.setCallIntelligenceOptions(callIntelligenceOptions);  
            createCallOptions.setMediaStreamingOptions(mediaStreamingOptions);  
  
            Response<CreateCallResult> result = client.createCallWithResponse(createCallOptions, Context.NONE);  
            return result.getValue().getCallConnectionProperties().getCallConnectionId();  

Start audio streaming

How to start audio streaming:

StartMediaStreamingOptions startOptions = new StartMediaStreamingOptions()  
                                                        .setOperationContext("startMediaStreamingContext")  
                                                        .setOperationCallbackUrl(appConfig.getBasecallbackuri());  
         client.getCallConnection(callConnectionId)  
                     .getCallMedia()  
                     .startMediaStreamingWithResponse(startOptions, Context.NONE);      

When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.

The metadata packet will look like this:

{ 
    "kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData. 
    "audioMetadata": { 
        "subscriptionId": <string>, // unique identifier for a subscription request 
        "encoding":<string>, // PCM only supported 
        "sampleRate": <int>, // 16000 default 
        "channels": <int>, // 1 default 
        "length": <int> // 640 default 
    } 
} 

Stop audio streaming

How to stop audio streaming

StopMediaStreamingOptions stopOptions = new StopMediaStreamingOptions()  
                                                        .setOperationCallbackUrl(appConfig.getBasecallbackuri());  
         client.getCallConnection(callConnectionId)  
                     .getCallMedia()  
                     .stopMediaStreamingWithResponse(stopOptions, Context.NONE);

Handling media streams in your websocket server

The sample below demonstrates how to listen to media stream using your websocket server. There will be two files that need to be run: App.java and WebSocketServer.java

package com.example;

import org.glassfish.tyrus.server.Server;

import java.io.BufferedReader;
import java.io.InputStreamReader;

public class App {
    public static void main(String[] args) {

        Server server = new Server("localhost", 8081, "/ws", null, WebSocketServer.class);

        try {
            server.start();
            System.out.println("Web socket running on port 8081...");
            System.out.println("wss://localhost:8081/ws/server");
            BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
            reader.readLine();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            server.stop();
        }
    }
}
package com.example;

import javax.websocket.OnMessage;
import javax.websocket.Session;
import javax.websocket.server.ServerEndpoint;

import com.azure.communication.callautomation.models.streaming.StreamingData;
import com.azure.communication.callautomation.models.streaming.StreamingDataParser;
import com.azure.communication.callautomation.models.streaming.media.AudioData;
import com.azure.communication.callautomation.models.streaming.media.AudioMetadata;

@ServerEndpoint("/server")
public class WebSocketServer {
    @OnMessage
    public void onMessage(String message, Session session) {

        // System.out.println("Received message: " + message);

        StreamingData data = StreamingDataParser.parse(message);

        if (data instanceof AudioMetadata) {
            AudioMetadata audioMetaData = (AudioMetadata) data;
            System.out.println("----------------------------------------------------------------");
            System.out.println("SUBSCRIPTION ID:-->" + audioMetaData.getMediaSubscriptionId());
            System.out.println("ENCODING:-->" + audioMetaData.getEncoding());
            System.out.println("SAMPLE RATE:-->" + audioMetaData.getSampleRate());
            System.out.println("CHANNELS:-->" + audioMetaData.getChannels());
            System.out.println("LENGTH:-->" + audioMetaData.getLength());
            System.out.println("----------------------------------------------------------------");
        }
        if (data instanceof AudioData) {
            System.out.println("----------------------------------------------------------------");
            AudioData audioData = (AudioData) data;
            System.out.println("DATA:-->" + audioData.getData());
            System.out.println("TIMESTAMP:-->" + audioData.getTimestamp());
            // System.out.println("PARTICIPANT:-->" + audioData.getParticipant().getRawId()
            // != null
            // ? audioData.getParticipant().getRawId()
            // : "");
            System.out.println("IS SILENT:-->" + audioData.isSilent());
            System.out.println("----------------------------------------------------------------");
        }
    }
}

Prerequisites

Set up a websocket server

Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.

Establish a call

Establish a call and provide streaming details

const mediaStreamingOptions: MediaStreamingOptions = { 
          transportUrl: "<WEBSOCKET URL>", 
          transportType: "websocket", 
          contentType: "audio", 
          audioChannelType: "unmixed", 
          startMediaStreaming: false 
} 
const options: CreateCallOptions = { 
          callIntelligenceOptions: { cognitiveServicesEndpoint: process.env.COGNITIVE_SERVICES_ENDPOINT }, 
          mediaStreamingOptions: mediaStreamingOptions 
}; 

Start audio streaming

How to start audio streaming:

const streamingOptions: StartMediaStreamingOptions = { 
        operationContext: "startMediaStreamingContext", 
        operationCallbackUrl: process.env.CALLBACK_URI + "/api/callbacks" 
    } 
await callMedia.startMediaStreaming(streamingOptions); 

When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.

The metadata packet will look like this:

{ 
    "kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData. 
    "audioMetadata": { 
        "subscriptionId": <string>, // unique identifier for a subscription request 
        "encoding":<string>, // PCM only supported 
        "sampleRate": <int>, // 16000 default 
        "channels": <int>, // 1 default 
        "length": <int> // 640 default 
    } 
} 

Stop audio streaming

How to stop audio streaming

const stopMediaStreamingOptions: StopMediaStreamingOptions = { 
        operationCallbackUrl: process.env.CALLBACK_URI + "/api/callbacks" 
        } 
await callMedia.stopMediaStreaming(stopMediaStreamingOptions); 

Handling audio streams in your websocket server

The sample below demonstrates how to listen to audio streams using your websocket server.

import WebSocket from 'ws'; 
import { streamingData } from '@azure/communication-call-automation/src/utli/streamingDataParser' 
const wss = new WebSocket.Server({ port: 8081 }); 

wss.on('connection', (ws: WebSocket) => { 
    console.log('Client connected'); 
    ws.on('message', (packetData: ArrayBuffer) => { 
        const decoder = new TextDecoder(); 
        const stringJson = decoder.decode(packetData); 
        console.log("STRING JSON=>--" + stringJson) 

        //var response = streamingData(stringJson); 

        var response = streamingData(packetData); 
        if ('locale' in response) { 
            console.log("Transcription Metadata") 
            console.log(response.callConnectionId); 
            console.log(response.correlationId); 
            console.log(response.locale); 
            console.log(response.subscriptionId); 
        } 
        if ('text' in response) { 
            console.log("Transcription Data") 
            console.log(response.text); 
            console.log(response.format); 
            console.log(response.confidence); 
            console.log(response.offset); 
            console.log(response.duration); 
            console.log(response.resultStatus); 
            if ('phoneNumber' in response.participant) { 
                console.log(response.participant.phoneNumber); 
            } 
            response.words.forEach(element => { 
                console.log(element.text) 
                console.log(element.duration) 
                console.log(element.offset) 
            }); 
        } 
    }); 

    ws.on('close', () => { 
        console.log('Client disconnected'); 
    }); 
}); 

// function processData(data: ArrayBuffer) { 
//  const byteArray = new Uint8Array(data); 
// } 

console.log('WebSocket server running on port 8081'); 

Prerequisites

Set up a websocket server

Azure Communication Services requires your server application to set up a WebSocket server to stream audio in real-time. WebSocket is a standardized protocol that provides a full-duplex communication channel over a single TCP connection. You can optionally use Azure services Azure WebApps that allows you to create an application to receive audio streams over a websocket connection. Follow this quickstart.

Establish a call

Establish a call and provide streaming details

media_streaming_options = MediaStreamingOptions( 
         transport_url="wss://e063-2409-40c2-4004-eced-9487-4dfb-b0e4-10fb.ngrok-free.app", 
         transport_type=MediaStreamingTransportType.WEBSOCKET, 
         content_type=MediaStreamingContentType.AUDIO, 
         audio_channel_type=MediaStreamingAudioChannelType.UNMIXED, 
         start_media_streaming=False 
         ) 

call_connection_properties = call_automation_client.create_call(target_participant,  
                                                                    CALLBACK_EVENTS_URI, 
                                                                    cognitive_services_endpoint=COGNITIVE_SERVICES_ENDPOINT, 
                                                                    source_caller_id_number=source_caller, 
                                                                    media_streaming=media_streaming_options
) 

Start audio streaming

How to start audio streaming:

call_connection_client.start_media_streaming() 

When Azure Communication Services receives the URL for your WebSocket server, it creates a connection to it. Once Azure Communication Services successfully connects to your WebSocket server and streaming is started, it will send through the first data packet, which contains metadata about the incoming media packets.

The metadata packet will look like this:

{ 
    "kind": <string> // What kind of data this is, e.g. AudioMetadata, AudioData. 
    "audioMetadata": { 
        "subscriptionId": <string>, // unique identifier for a subscription request 
        "encoding":<string>, // PCM only supported 
        "sampleRate": <int>, // 16000 default 
        "channels": <int>, // 1 default 
        "length": <int> // 640 default 
    } 
} 

Stop audio streaming

How to stop audio streaming

call_connection_client.stop_media_streaming() 

Handling audio streams in your websocket server

The sample below demonstrates how to listen to audio streams using your websocket server.

import asyncio 
import json 
import websockets 

async def handle_client(websocket, path): 
    print("Client connected") 
    try: 
        async for message in websocket: 
            print(message) 
            packet_data = json.loads(message) 
            packet_data = message.encode('utf-8') 
            print("Packet DATA:-->",packet_data) 

    except websockets.exceptions.ConnectionClosedOK: 
        print("Client disconnected") 

start_server = websockets.serve(handle_client, "localhost", 8081) 

print('WebSocket server running on port 8081') 

asyncio.get_event_loop().run_until_complete(start_server) 
asyncio.get_event_loop().run_forever() 

Audio streaming schema

After sending through the metadata packet, Azure Communication Services will start streaming audio media to your WebSocket server. Below is an example of what the media object your server will receive looks like.

{
    "kind": <string>, // What kind of data this is, e.g. AudioMetadata, AudioData.
    "audioData":{
        "data": <string>, // Base64 Encoded audio buffer data
        "timestamp": <string>, // In ISO 8601 format (yyyy-mm-ddThh:mm:ssZ) 
        "participantRawID": <string>, 
        "silent": <boolean> // Indicates if the received audio buffer contains only silence.
    }
}

Clean up resources

If you want to clean up and remove a Communication Services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it. Learn more about cleaning up resources.

Next steps