Viseme Event time offsets in Custom Neural Voice are weird.

Question

I found the Viseme Event time offsets in Custom Neural Voice are strange. (ko-KR in use)
The following cases are the results of outputting the Visems Event from the voice synthesized by the same Text with different VoiceNames (InJoonNeural, OHW_Neural).

Below is InJoonNeural provided by the existing Neural Voice.

InJoonNeural

Viseme : 0, Time : 50ms
Viseme : 20, Time : 50ms
Viseme : 4, Time : 350ms
Viseme : 21, Time : 450ms
Viseme : 2, Time : 525ms
Viseme : 19, Time : 625ms
Viseme : 12, Time : 650ms
Viseme : 4, Time : 650ms
Viseme : 6, Time : 775ms
Viseme : 8, Time : 806ms
Viseme : 0, Time : 50ms

Below is the Custom Neural Voice, OHW_Neural.

OHW_Neural

Viseme : 0, Time : 50ms
Viseme : 20, Time : 100ms
Viseme : 4, Time : 225ms
Viseme : 21, Time : 325ms
Viseme : 2, Time : 375ms
Viseme : 19, Time : 437ms
Viseme : 4, Time : 650ms
Viseme : 6, Time : 700ms
Viseme : 8, Time : 893ms
Viseme : 0, Time : 1087ms

Compared with InJoonNeural, the time between No. 6 and No. 8 of InJoonNeural is 25 ms, while the time between No. 6 and No. 7 of the corresponding OHW_Neural is 113 ms, showing a large difference.

When comparing this with the directly synthesized wav file, I found that the Viseme Event is being output when the Viseme Event should not appear as the voice is almost finished. (From No. 7 of OHW_Neural)

Is there any way to improve the problems mentioned above?
I wonder if this problem can be improved if the pronunciation of the training data used in Custom Neural Voice is correct.

Accepted Answer

Hi, following up. Different voice have different speaking rates. So, viseme time can't be compared between voices. For the issue "When comparing this with the directly synthesized wav file, I found that the Viseme Event is being output when the Viseme Event should not appear as the voice is almost finished. (From No. 7 of OHW_Neural)", please redeploy your endpoint. You will use the latest code by redeploy endpoint.

--- *Kindly Accept Answer if the information helps. Thanks.*

Share via

Viseme Event time offsets in Custom Neural Voice are weird.

0 additional answers

Your answer