Do we have a batch transcription in microsoft Azure speech to text cognitive services using java sdk or java Rest API ?

Ganesh P 40 Reputation points
2024-07-10T15:10:29.04+00:00

we have a embedded speech(microphone) speech to text cognitive service support in java but I want to implement a batch transcription using microsoft Azure cognitive services using java language, do we java sdk or java Rest API support for batch transcription, please share me the github link

Thanks & Regards,

Ganesh P

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,762 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 17,471 Reputation points
    2024-07-10T22:07:43.7066667+00:00

    Hello @Ganesh P , Thanks for using Microsoft Q&A Platform.

    As mentioned in the documentation, the Speech to text REST API and Speech CLI support batch transcription.

    The available GitHub sample codes for batch transcription are for Python, Node.js, and C# clients that call the batch transcription REST API: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch/python

    Unfortunately, Java sample code related to this is not available.

    0 comments No comments

  2. Ganesh P 40 Reputation points
    2024-07-12T04:23:41.83+00:00

    package audioTranslation;

    import java.io.BufferedReader;

    import java.io.IOException;

    import java.io.InputStreamReader;

    import java.net.HttpURLConnection;

    import java.net.URI;

    import java.net.URL;

    import java.net.http.HttpClient;

    import java.net.http.HttpRequest;

    import java.net.http.HttpResponse;

    import java.nio.charset.StandardCharsets;

    import java.nio.file.Files;

    import java.nio.file.Path;

    import java.nio.file.Paths;

    import java.util.ArrayList;

    import java.util.List;

    import org.json.JSONArray;

    import org.json.JSONObject;

    import com.fasterxml.jackson.databind.JsonNode;

    import com.fasterxml.jackson.databind.ObjectMapper;

    import com.fasterxml.jackson.databind.node.ArrayNode;

    import com.fasterxml.jackson.databind.node.ObjectNode;

    public class MultipleAudioTranscription {

    private static final String SUBSCRIPTION_KEY ="xxxxx"; // your azure speech to text service subscriptionKey
    
    private static final String REGION = "southeastasia"; // region
    

    private static final String AUDIO_URL1 ="audiofile1publicURLfromtheCloudStorageS3"; // I think you can take any cloud statorage

    private static final String AUDIO_URL2 ="audiofile2publicURLfromtheCloudStorageS3";

    private static final String TRANSCRIPTIONS_URL = String.format("https://%s.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions", REGION);
    
    
    
    // this is the transcribed  files downloading path
    
    private static final String transcribedTextDownloadingPath="E:\\GTA-Application-Docs\\AudioTranslation\\1\\transcribedOutput\\";  
    
    
    
    private static List<String> audioUrls = new ArrayList<>();
    
    public static void main(String[] args) throws Exception {
    
    	
    
    	audioUrls.add(AUDIO_URL1);
    
    	audioUrls.add(AUDIO_URL2);
    
    	 
    
    	
    
        HttpClient client = HttpClient.newHttpClient();
    
        ObjectMapper mapper = new ObjectMapper();
    
        ObjectNode body = mapper.createObjectNode();
    
        ArrayNode contentUrlsArray = body.putArray("contentUrls");
    
        for (String url : audioUrls) {
    
            contentUrlsArray.add(url);
    
        }
    
        body.put("locale", "en-US");
    
        body.put("displayName", "Multiple audio transcription");
    
        ObjectNode properties = body.putObject("properties");
    
        properties.put("wordLevelTimestampsEnabled", false);
    
        properties.put("punctuationMode", "DictatedAndAutomatic");
    
        properties.put("profanityFilterMode", "Masked");
    
        // Step 1: Create the transcription job
    
        HttpRequest request = HttpRequest.newBuilder()
    
                .uri(URI.create(TRANSCRIPTIONS_URL))
    
                .header("Ocp-Apim-Subscription-Key", SUBSCRIPTION_KEY)
    
                .header("Content-Type", "application/json")
    
                .POST(HttpRequest.BodyPublishers.ofString(body.toString()))
    
                .build();
    
        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
    
        if (response.statusCode() != 201 && response.statusCode() != 202) {
    
            throw new Exception("Failed to create transcription job: " + response.statusCode() + " - " + response.body());
    
        }
    
        String statusUrl = response.headers().firstValue("Location").orElse(null);
    
        if (statusUrl == null) {
    
            JsonNode jsonResponse = mapper.readTree(response.body());
    
            statusUrl = jsonResponse.get("self").asText();
    
        }
    
        System.out.println("Transcription job created. Check status at: " + statusUrl);
    
        // Step 2: Periodically check the status of the transcription job
    
        while (true) {
    
            JsonNode statusResponse = checkTranscriptionStatus(client, statusUrl);
    
            String status = statusResponse.get("status").asText();
    
            System.out.println("Transcription status: " + status);
    
            if ("Succeeded".equals(status)) {
    
                System.out.println("Transcription succeeded!");
    
                List<String> transcribedTexts = getTranscribedTexts(statusUrl, SUBSCRIPTION_KEY);
    
                for (int i = 0; i < transcribedTexts.size(); i++) {
    
                    System.out.println("Transcription for Audio " + (i + 1) + ": " + transcribedTexts.get(i));
    
                    
    
                    
    
                    
    
                    saveTranscriptionToFile(transcribedTexts.get(i), transcribedTextDownloadingPath+"transcription_" + (i + 1) + ".txt");
    
                }
    
                break;
    
            } else if ("Failed".equals(status)) {
    
                System.out.println("Transcription failed.");
    
                if (statusResponse.has("error")) {
    
                    System.out.println("Error details: " + statusResponse.get("error").toString());
    
                }
    
                break;
    
            }
    
            Thread.sleep(30000);
    
        }
    
    }
    
    private static JsonNode checkTranscriptionStatus(HttpClient client, String statusUrl) throws Exception {
    
        HttpRequest request = HttpRequest.newBuilder()
    
                .uri(URI.create(statusUrl))
    
                .header("Ocp-Apim-Subscription-Key", SUBSCRIPTION_KEY)
    
                .build();
    
        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
    
        if (response.statusCode() == 200) {
    
            ObjectMapper mapper = new ObjectMapper();
    
            return mapper.readTree(response.body());
    
        } else {
    
            System.out.println("Error checking transcription status: " + response.statusCode());
    
            return null;
    
        }
    
    }
    
    private static List<String> getTranscribedTexts(String statusUrl, String subscriptionKey) throws Exception {
    
        List<String> transcribedTexts = new ArrayList<>();
    
        String filesUrl = statusUrl + "/files";
    
        // Set up the HTTP GET request
    
        URL url = new URL(filesUrl);
    
        HttpURLConnection connection = (HttpURLConnection) url.openConnection();
    
        connection.setRequestMethod("GET");
    
        connection.setRequestProperty("Ocp-Apim-Subscription-Key", subscriptionKey);
    
        // Check the response code
    
        int responseCode = connection.getResponseCode();
    
        if (responseCode == 200) {
    
            // Read the response
    
            BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
    
            StringBuilder response = new StringBuilder();
    
            String inputLine;
    
            while ((inputLine = in.readLine()) != null) {
    
                response.append(inputLine);
    
            }
    
            in.close();
    
            // Parse the JSON response
    
            JSONObject jsonResponse = new JSONObject(response.toString());
    
            JSONArray valuesArray = jsonResponse.getJSONArray("values");
    
            for (int i = 0; i < valuesArray.length(); i++) {
    
                JSONObject fileObject = valuesArray.getJSONObject(i);
    
                if ("Transcription".equals(fileObject.getString("kind"))) {
    
                    String transcriptionFileUrl = fileObject.getJSONObject("links").getString("contentUrl");
    
                    // Set up the HTTP GET request for the transcription file
    
                    URL transcriptionUrl = new URL(transcriptionFileUrl);
    
                    HttpURLConnection transcriptionConnection = (HttpURLConnection) transcriptionUrl.openConnection();
    
                    transcriptionConnection.setRequestMethod("GET");
    
                    // Check the response code
    
                    int transcriptionResponseCode = transcriptionConnection.getResponseCode();
    
                    if (transcriptionResponseCode == 200) {
    
                        // Read the response
    
                        BufferedReader transcriptionIn = new BufferedReader(new InputStreamReader(transcriptionConnection.getInputStream()));
    
                        StringBuilder transcriptionResponse = new StringBuilder();
    
                        while ((inputLine = transcriptionIn.readLine()) != null) {
    
                            transcriptionResponse.append(inputLine);
    
                        }
    
                        transcriptionIn.close();
    
                        // Parse the JSON response
    
                        JSONObject transcriptionContent = new JSONObject(transcriptionResponse.toString());
    
                        if (transcriptionContent.has("combinedRecognizedPhrases")) {
    
                            JSONArray combinedRecognizedPhrases = transcriptionContent.getJSONArray("combinedRecognizedPhrases");
    
                            StringBuilder transcriptionText = new StringBuilder();
    
                            /*
    
                            for (int j = 0; j < combinedRecognizedPhrases.length(); j++) {
    
                                JSONObject phrase = combinedRecognizedPhrases.getJSONObject(j);
    
                                transcriptionText.append(phrase.getString("display")).append(" ");
    
                            }
    
                            */
    
                            transcriptionText .append(combinedRecognizedPhrases.getJSONObject(0).getString("display")).append(" ");
    
                            transcribedTexts.add(transcriptionText.toString().trim());
    
                        }
    
                    }
    
                }
    
            }
    
        } else {
    
            System.out.println("Error retrieving transcription files: " + responseCode);
    
        }
    
        return transcribedTexts;
    
    }
    
    private static void saveTranscriptionToFile(String transcription, String fileName) {
    
        try {
    
            Path filePath = Paths.get(fileName);
    
            Files.write(filePath, transcription.getBytes(StandardCharsets.UTF_8));
    
            System.out.println("Transcription saved to: " + filePath.toAbsolutePath());
    
        } catch (IOException e) {
    
            e.printStackTrace();
    
        }
    
    }
    

    }

    Hi Team, here i tried like above for batch transcription and got the result also, if any body knows better and optimized solution please update here, so that we all can improve

    Thanks & Regards,

    Ganesh

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.