Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System

Choi, Kyoungho; Luo, Ying; Hwang, Jenq-Neng

doi:10.1023/A:1011171430700

Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System

Published: 01 August 2001

Volume 29, pages 51–61, (2001)
Cite this article

Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Kyoungho Choi¹,
Ying Luo¹ &
Jenq-Neng Hwang¹

164 Accesses
36 Citations
Explore all metrics

Abstract

MPEG-4 standard allows composition of natural or synthetic video with facial animation. Based on this standard, an animated face can be inserted into natural or synthetic video to create new virtual working environments such as virtual meetings or virtual collaborative environments. For these applications, audio-to-visual conversion techniques can be used to generate a talking face that is synchronized with the voice. In this paper, we address audio-to-visual conversion problems by introducing a novel Hidden Markov Model Inversion (HMMI) method. In training audio-visual HMMs, the model parameters {λ_av} can be chosen to optimize some criterion such as maximum likelihood. In inversion of audio-visual HMMs, visual parameters that optimize some criterion can be found based on given speech and model parameters {λ_av}. By using the proposed HMMI technique, an animated talking face can be synchronized with audio and can be driven realistically. The HMMI technique combined with MPEG-4 standard to create a virtual conference system, named VIRTUAL-FACE, is introduced to show the role of HMMI for applications of MPEG-4 facial animation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech-Driven Facial Animation Using Manifold Relevance Determination

Analysis of Facial Motion Capture Data for Visual Speech Synthesis

A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators

References

K. Kiyokawa, H. Takemura, and N. Yokoya, “SeamlessDesign: A Face-to-Face Collaborative Virtual/Augmented Environment for Rapid Prototyping of Geometrically Constrained 3-D Ob-jects, ” IEEE International Conference on Multimedia Comput-ing and Systems, vol. 2, 1999, pp. 447–453.
Article Google Scholar
Yao-Jen Chang, Chih-Chung Chen, Jen-Chung Chou, and Yung-Chang Chen, “Implementation of a Virtual Chat Room for Mul-timedia Communications, ” 1999 IEEE 3rd Workshop on Multi-media Signal Processing, 1999, pp. 599–604.
S. Yura, T. Usaka, and K. Sakamura, “Video Avatar: Embed-ded Video for Collaborative Virtual Environment, ” IEEE Inter-national Conference on Multimedia Computing and Systems, vol. 2, 1999, pp. 433–438.
Article Google Scholar
S. Morishima and H. Harashima, “A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface, ” IEEE Journal on Sel. Areas in Communications, vol. 9, no. 4, 1991, pp. 594–600.
Article Google Scholar
Fabio Lavagetto, “Converting Speech into Lip Movement: A Multimedia Telephone for Hard of Hearing People, ” IEEE Transaction on Rehabilitation Engineering, vol. 3, no. 1, 1995, pp. 90–102.
Article Google Scholar
Ram R. Rao, Tsuhan Chen, and Russell M. Mersereau, “Audio-to-Visual Conversion for Multimedia Communication, ” IEEE. Transactions on Industrial Electronics, vol. 45, no. 1, 1998, pp. 15–22.
Article Google Scholar
S. Nakamura, E. Yamamoto, and K. Shikano, “Speech-Lip Movement Synthesis Maximizing Audio-Visual Joint Probability Based on EM Algorithm, ” IEEE International Workshop on Multimedia Signal Processing, 1998, pp. 53–58.
KyoungHo Choi and J.N. Hwang, “Baum–Welch HMM Inversion for Audio-to-Visual Conversion, ” IEEE International Workshop on Multimedia Signal Processing, 1999, pp. 175–180.
S.Y. Moon and J.N. Hwang, “Noisy Speech Recognition Using Robust Inversion of Hidden Markov Models, ” IEEE International Conf. Acoust., Speech, Signal Processing, 1995, pp. 145–148.
S.Y. Moon and J.N. Hwang, “Robust Speech Recognition Based on Joint Model and Feature Space Optimization of Hidden Markov Models, ” IEEE Transactions on Neural Networks, vol. 8, no. 2, 1997, pp. 194–204.
Article Google Scholar
L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall Inc., 1993.
Nadia Magnenat Thalmann, Prem Kalra, and Marc Escher, “Face to Virtual Face, ” Proceedings of the IEEE, vol. 86, no. 5, 1998, pp. 870–883.
Article Google Scholar
Fabio Lavagetto, “Time-Delay Neural Networks for Estimating Lip Movements From Speech Analysis: A Useful Tool in Audio-Video Synchronization, ” IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, no. 5, 1997, pp. 786–800.
Article Google Scholar
Won-Sook Lee, Marc Escher, Gael Sannier, and Nadia Magnenat-Thalmann, “MPEG-4 Compatible Faces from Orthogonal Photos, ” International Conference on Computer An-imation, 1999, pp. 186–194.
Won-Sook Lee and N. Magnenat-Thalmann, “Fast Head Modeling for Animation, ” Journal of Image and Vision Computing, vol. 18, no 4, 2000, pp. 355–364.
Article Google Scholar
L. Moccozet and N. Magnenat-Thalmann, “Dirichlet Free-Form Deformations and Their Application to Hand Simulation, ” The Proceedings of Computer Animation, 1997, pp. 93–102.
Frederic Pighin, Richard Szeliski, and David H. Salesin, “Resynthesizing Facial Animation Through 3D Model-Based Tracking, ” The Proceedings of the Seventh IEEE Internation Conference on Computer Vision, vol. 1, 1999, pp. 143–150.
Article Google Scholar
J. Strom, T. Jebara, S. Basu, and A. Pentland, “Real Time Tracking and Modeling of Faces: An EKF-based Analysis by Synthesis Approach, ” Proceedings IEEE International Workshop on Modeling People, 1999, pp. 55–61.

Download references

Author information

Authors and Affiliations

Information Processing Lab., Department of Electrical Engineering, University of Washington, Box #352500, Seattle, WA, 98195-2500, USA
Kyoungho Choi, Ying Luo & Jenq-Neng Hwang

Authors

Kyoungho Choi
View author publications
You can also search for this author in PubMed Google Scholar
Ying Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jenq-Neng Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, K., Luo, Y. & Hwang, JN. Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 29, 51–61 (2001). https://doi.org/10.1023/A:1011171430700

Download citation

Published: 01 August 2001
Issue Date: August 2001
DOI: https://doi.org/10.1023/A:1011171430700

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System

Abstract

Access this article

Similar content being viewed by others

Speech-Driven Facial Animation Using Manifold Relevance Determination

Analysis of Facial Motion Capture Data for Visual Speech Synthesis

A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Hidden Markov Model Inversion for Audio-to-Visual Conversion in an MPEG-4 Facial Animation System

Abstract

Access this article

Similar content being viewed by others

Speech-Driven Facial Animation Using Manifold Relevance Determination

Analysis of Facial Motion Capture Data for Visual Speech Synthesis

A Real-Time 3D Visual Singing Synthesis: From Appearance to Internal Articulators

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation