Speaker-adaptive visual speech synthesis in the HMM-framework

Schabus, Dietmar; Pucher, Michael; Hofer, Gregor

doi:10.21437/Interspeech.2012-291

Speaker-adaptive visual speech synthesis in the HMM-framework

Dietmar Schabus, Michael Pucher, Gregor Hofer

In this paper we apply speaker-adaptive and speaker-dependent training of hidden Markov models (HMMs) to visual speech synthesis. In speaker-dependent training we use data from one speaker to train a visual and acoustic HMM. In speaker-adaptive training, first a visual background model (average voice) from multiple speakers is trained. This background model is then adapted to a new target speaker using (a small amount of) data from the target speaker. This concept has been successfully applied to acoustic speech synthesis. This paper demonstrates how model adaption is applied to the visual domain to synthesize animations of talking faces. A perceptive evaluation is performed, showing that speaker-adaptive modeling outperforms speaker-dependent models for small amounts of training / adaptation data.

Index Terms: Visual speech synthesis, speaker-adaptive training, facial animation

doi: 10.21437/Interspeech.2012-291

Cite as: Schabus, D., Pucher, M., Hofer, G. (2012) Speaker-adaptive visual speech synthesis in the HMM-framework. Proc. Interspeech 2012, 979-982, doi: 10.21437/Interspeech.2012-291

@inproceedings{schabus12_interspeech,
  author={Dietmar Schabus and Michael Pucher and Gregor Hofer},
  title={{Speaker-adaptive visual speech synthesis in the HMM-framework}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={979--982},
  doi={10.21437/Interspeech.2012-291}
}