How to receive a real-time audio stream using Websocket in Spring boot with SDK

김동윤 0 Reputation points
2024-06-09T09:41:21.25+00:00

Hello.

This is really driving me crazy.

Send Audio Stream from the Web Client to the Server

The server must convert Stream to Text using the SDK.

However, the stream in wav format does not appear to be being sent from the client to the server.

I don't see any example of using a socket in java anywhere.

Please let me know how I can send and process the live stream to the server using the socket.

The client is React.js server is Springboot.

Here is My Source Code

import com.microsoft.cognitiveservices.speech.*;
import com.microsoft.cognitiveservices.speech.audio.*;
import org.springframework.web.socket.BinaryMessage;
import org.springframework.web.socket.CloseStatus;
import org.springframework.web.socket.WebSocketSession;
import org.springframework.web.socket.handler.BinaryWebSocketHandler;

import java.nio.ByteBuffer;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class WebSocketHandler extends BinaryWebSocketHandler {

    private final ExecutorService executorService = Executors.newSingleThreadExecutor();

    @Override
    protected void handleBinaryMessage(WebSocketSession session, BinaryMessage message) {
        ByteBuffer byteBuffer = message.getPayload();
        byte[] audioBytes = new byte[byteBuffer.remaining()];
        byteBuffer.get(audioBytes);
//        System.out.println(Arrays.toString(audioBytes));
        recognizeSpeech(audioBytes);
    }

    @Override
    public void afterConnectionEstablished(WebSocketSession session) throws Exception {
        System.out.println("socket open");
        super.afterConnectionEstablished(session);
    }

    @Override
    public void afterConnectionClosed(WebSocketSession session, CloseStatus status) throws Exception {
        super.afterConnectionClosed(session, status);
        System.out.println("socket finish");
    }

    private void recognizeSpeech(byte[] audioBytes) {
        executorService.submit(() -> {
            SpeechConfig speechConfig = ~~~~
            speechConfig.setSpeechRecognitionLanguage("ko-kr");
            PushAudioInputStream pushStream = AudioInputStream.createPushStream(AudioStreamFormat.getWaveFormat(16000L, (short) 1, (short) 1, AudioStreamWaveFormat.PCM));
            AudioConfig audioConfig = AudioConfig.fromStreamInput(pushStream);
            SpeechRecognizer recognizer = new SpeechRecognizer(speechConfig, audioConfig);

            pushStream.write(audioBytes);

            try {
                SpeechRecognitionResult result = recognizer.recognizeOnceAsync().get();
                System.out.println(result);
                if (result.getReason() == ResultReason.RecognizedSpeech) {
                    System.out.println("Recognized: " + result.getText());
                } else {
                    System.out.println("Error recognizing speech: " + result.getReason());
                }
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            } finally {
                pushStream.close();
                recognizer.close();
            }
        });
    }
}


Thank you.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,516 questions
{count} votes