A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

Xie, Lei; Meng, Helen; Liu, Zhi-Qiang

doi:10.1007/11939993_64

Lei Xie²²,
Helen Meng²² &
Zhi-Qiang Liu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1571 Accesses

Abstract

This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ostermann, J., Weissenfeld, A.: Talking Faces–Technologies and Applications. In: Proc. 17th ICPR (2004)
Google Scholar
Pighin, F., Hecker, D., Lischinski, R., Szeliski, D.H.: Synthesizing Realistic Facial Expressions from Photographs. Siggraph, 75–84 (1998)
Google Scholar
Cosatto, E., Ostermann, J.: Lifelike Talking Faces for Interactive Services. Proceedings of IEEE 91(9), 1406–1429 (2003)
Article Google Scholar
Olives, J.-L., Sams, M., Kulju, J., Seppaia, O., Karjalainen, M., Altosaar, T., Lemmetty, S., Toyra, K., Vainio, M.: Towards a High Quality Finnish Talking Head. In: IEEE 3rd Workshop on Multimedia Signal Processing, pp. 433–437 (1999)
Google Scholar
Pelachaud, C.E., Magno-Caldognetto, Z.C., Cosi, P.: Modelling an Italian Talking Head. In: Proc. Audio-Visual Speech Processing, pp. 72–77 (2001)
Google Scholar
Wang, J.-Q., Wong, K.-H., Heng, P.-A., Meng, H., Wong, T.-T.: A Real-Time Cantonese Text-To-Audiovisual Speech Synthesizer. In: Proc. ICASSP, pp. 653–656 (2004)
Google Scholar
Verma, A., Subramaniam, V., Rajput, N., Neti, C.: Animating Expressive Faces Across Languages. IEEE Trans. on Multimedia 6(6), 791–800 (2003)
Article Google Scholar
Xie, L., Liu, Z.-Q.: An Articulatory Appraoch to Video-Realistic Mouth Animation. In: Proc. of ICASSP, pp. 593–596 (2006)
Google Scholar
Young, S., Evermann, G., Kershaw, D., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2), Cambridge University Engineering Department (2002), http://htk.eng.cam.ac.uk/
Linguistic Society of Hong Kong. Cantonese Transcription Scheme (1997)
Google Scholar
Hui, P.Y., Lo, W.K., Meng, H.: Tow Robust Methods for Cantonese Spoken Document Retrieval. In: Proc. of 2003 ISCA Workshop on Multilingual Spoken Document Retrieval, pp. 7–12 (2003)
Google Scholar
Xie, L., Liu, Z.-Q.: A Coupled HMM Approach to Video-Realisic Speech Animation. Pattern Recognition (submitted)(2006)
Google Scholar
Cosatto, E.: Sample-Based Talking-Head Synthesis. Ph.D Thesis of Swiss Federal Institue of Technology (2002)
Google Scholar
Pérez, P., Gangnet, M., Blake, A.: Poisson Image Editing. Siggraph, 313–318 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Human-Computer Communications Laboratory, Dept. of Systems Engineering & Engineering Management, The Chinese University of Hong Kong, Hong Kong
Lei Xie & Helen Meng
School of Creative Media, City University of Hong Kong, Hong Kong
Zhi-Qiang Liu

Authors

Lei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Helen Meng
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Qiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, L., Meng, H., Liu, ZQ. (2006). A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_64

Download citation

DOI: https://doi.org/10.1007/11939993_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics