How to prepare plain text data for speech service custom model training

hexarrior 40 Reputation points
2024-07-10T08:39:02.13+00:00

Hi, I'm trying to train my custom speech-to-text model to improve its accuracy in recognizing industry-specific jargon(computer science).

Q1: For example, some domain specific terminologies like 'LinkedList', 'HashMap', is it better to format as it is or split into two words like 'linked list', 'hash map'?

Q2: Also is it better to use sentences like 'can you explain the usage of dubbo framework' or just the jargon 'dubbo'?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,675 questions
0 comments No comments
{count} votes

Accepted answer
  1. santoshkc 7,865 Reputation points Microsoft Vendor
    2024-07-10T09:45:30.0933333+00:00

    Hi @hexarrior,

    Thank you for reaching out to Microsoft Q&A forum!

    To effectively train your Azure custom speech-to-text model, especially for recognizing industry-specific jargon, please follow:

    • Use the exact terminology as it is commonly written in the industry, e.g., 'LinkedList', 'HashMap'. This will help the model recognize these terms as single entities, which is crucial for technical jargon.
    • Incorporate jargon within sentences to provide context. For instance, use "Can you explain the usage of Dubbo framework?" instead of just "Dubbo". This helps the model learn the contextual usage and pronunciation of the terms. Provide diverse sentences that use the jargon in different contexts. This variation improves the model's ability to generalize and accurately recognize jargon in various scenarios. Please look into this info: Plain-text data for training.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.