How to fix an issue where my 3D Blendshapes do not align with the audio.

Question

I'm trying to apply viseme 3D Blend Shapes to drive my 3d avatar.

When the result is returned, the audio plays before the response's FrameIndex and BlendShape.

I received event.animation and used it to set the weight for each blend shape name.

However, during the first round, it couldn't parse to JSON, and I'm not sure why. In my opinion,

it doesn't return event.animation in the first round, only event.audioOffset and event.visemeID,

but event.animation is empty. Below, you can see the issue.

Error parsing JSON: Error Domain=NSCocoaErrorDomain Code=3840 "Unable to parse empty data." UserInfo={NSDebugDescription=Unable to parse empty data.} Error parsing JSON: Error Domain=NSCocoaErrorDomain Code=3840 "Unable to parse empty data." UserInfo={NSDebugDescription=Unable to parse empty data.} .. .. .. Error parsing JSON: Error Domain=NSCocoaErrorDomain Code=3840 "Unable to parse empty data." UserInfo={NSDebugDescription=Unable to parse empty data.}

And then I got index = 0 blendshapeName: eyeBlinkLeft value: 0.171 Morpher weight = 0.171 for blendshapeName: eyeBlinkLeft index = 1 blendshapeName: eyeLookDownLeft value: 0.164 Morpher weight = 0.164 for blendshapeName: eyeLookDownLeft .. .. .. index = 54 blendshapeName: rightEyeRoll value: 0.0 Morpher weight = 0.0 for blendshapeName: rightEyeRoll

And then I got My 3D avatar with blend shape plays, but it doesn’t align with the audio.

I read your description

“ Each viseme event includes a series of frames in the Animation SDK property. These frames are grouped to best align the facial positions with the audio. Your 3D engine should render each group of BlendShapes frames immediately before the corresponding audio chunk. The FrameIndex value indicates how many frames preceded the current list of frames.

The output json looks like the following sample. Each frame within BlendShapes contains an array of 55 facial positions represented as decimal values between 0 to 1.

JSON

{

"FrameIndex":0,

"BlendShapes":[

[0.021,0.321,...,0.258],

[0.045,0.234,...,0.288],

...

]

}

The decimal values in the json response are in the same order as described in the following facial positions table. The order of BlendShapes is as follows. “

I think I can actually send the JSON immediately in the closure addVisemeReceivedEventHandler at event.animation.

This is a part of my code. Could you help me improve or fix this issue?

Thank you very much.

 func synthesisToSpeaker() {
        guard let subscriptionKey = sub, let region = region else {
            print("Speech key and region are not set.")
            return
        }
        
        var speechConfig: SPXSpeechConfiguration?
        do {
            try speechConfig = SPXSpeechConfiguration(subscription: subscriptionKey, region: region)
        } catch {
            print("Error creating speech configuration: \(error)")
            return
        }
        
        speechConfig?.speechSynthesisVoiceName = "en-US-AvaMultilingualNeural"
        speechConfig?.setSpeechSynthesisOutputFormat(.raw16Khz16BitMonoPcm)
        
        guard let synthesizer = try? SPXSpeechSynthesizer(speechConfig!) else {
            print("Error creating speech synthesizer.")
            return
        }
        
        let ssml = """
               
                          
                    
                 Hello World, May I help you?
                
               
               """
        
        // Subscribe to viseme received event
        synthesizer.addVisemeReceivedEventHandler { (synthesizer, event) in
            self.mapBlendshapesToModel(jsonString: event.animation,
                                       node: self.contentNode)
           //print("\(event.animation)")
        }
        
        do {
            let result = try synthesizer.speakSsml(ssml)
            
            switch result.reason {
            case .recognizingSpeech:
                print("Synthesis recognizingSpeech")
            case .recognizedSpeech:
                print("Synthesis recognizedSpeech")
            case .synthesizingAudioCompleted:
                print("Synthesis synthesizingAudioCompleted")
            default:
                print("Synthesis failed: \(result.description)")
            }
        } catch {
            debugPrint("speakSsml failed")
        }
    }

func mapBlendshapesToModel(jsonString: String, node: SCNNode?) {
        guard let jsonData = jsonString.data(using: .utf8) else {
            print("Invalid JSON Data")
            return
        }
        
        guard let node = node else {
            print("Node is nil")
            return
        }
        
        do {
            let json = try JSONSerialization.jsonObject(with: jsonData, options: [])
            if let dictionary = json as? [String: Any] {
                if let frameIndex = dictionary["FrameIndex"] as? Int,
                   let blendShapes = dictionary["BlendShapes"] as? [[Double]] {
                    //setup my 3d
                }
            }
        } catch {
            print("Error parsing JSON: \(error)")
        }
    }

Share via

How to fix an issue where my 3D Blendshapes do not align with the audio.

1 answer

Your answer