Hi, @Minseong
We have a preview feature which can probably handle your ask.
You can add one additional field "NBestPhonemeCount" to the json config as below:
var pronAssessmentConfig = PronunciationAssessmentConfig.FromJson($"{<!-- -->{\"referenceText\":\"<reference text>\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"dimension\":\"Comprehensive\",\"enableMiscue\":\"False\",\"NBestPhonemeCount\":5}}");
This additional field can trigger the "NBestPhonemes" section in the output json payload, meaning the top phonemes which are most probably spoken by the speaker, ranking by a score which indicates the probability.
You the treat the top1 as the actual spoken phoneme.
See below for example:
"Words": [
{
"Word" : "Good",
"Offset" : 500000,
"Duration" : 2700000,
"PronunciationAssessment": {
"AccuracyScore" : 100.0,
"ErrorType" : "None"
},
"Syllables" : [
{
"Syllable" : "ɡuhd",
"Offset" : 500000,
"Duration" : 2700000,
"PronunciationAssessment" : {
"AccuracyScore": 100.0
}
}
],
"Phonemes": [
{
"Phoneme" : "ɡ",
"Offset" : 500000,
"Duration": 1200000,
"PronunciationAssessment": {
"AccuracyScore": 100.0,
"NBestPhonemes": [
{
"Phoneme": "g",
"Score": 100.0
},
{
"Phoneme": "k",
"Score": 5.0
},
... // remaining n best phonemes
]
}
},
Thanks,
Yinhe