Skip to main content

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

  • Conference paper
Chinese Spoken Language Processing (ISCSLP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

  • 1571 Accesses

Abstract

This paper proposes a novel approach towards a video- realistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we map it to an English phoneme transcription via a translingual mapping scheme that involves symbol mapping and time alignment from Cantonese syllables to English phonemes. With the phoneme transcription, the input speech, and the audio-visual models for English, an EM-based conversion algorithm is adopted to generate mouth animation parameters associated with the input Cantonese audio. We have carried out audio-visual syllable recognition experiments to objectively evaluate the proposed talking face. Results show that the visual speech synthesized by the Cantonese talking face can effectively increase the accuracy of Cantonese syllable recognition under noisy acoustic conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ostermann, J., Weissenfeld, A.: Talking Faces–Technologies and Applications. In: Proc. 17th ICPR (2004)

    Google Scholar 

  2. Pighin, F., Hecker, D., Lischinski, R., Szeliski, D.H.: Synthesizing Realistic Facial Expressions from Photographs. Siggraph, 75–84 (1998)

    Google Scholar 

  3. Cosatto, E., Ostermann, J.: Lifelike Talking Faces for Interactive Services. Proceedings of IEEE 91(9), 1406–1429 (2003)

    Article  Google Scholar 

  4. Olives, J.-L., Sams, M., Kulju, J., Seppaia, O., Karjalainen, M., Altosaar, T., Lemmetty, S., Toyra, K., Vainio, M.: Towards a High Quality Finnish Talking Head. In: IEEE 3rd Workshop on Multimedia Signal Processing, pp. 433–437 (1999)

    Google Scholar 

  5. Pelachaud, C.E., Magno-Caldognetto, Z.C., Cosi, P.: Modelling an Italian Talking Head. In: Proc. Audio-Visual Speech Processing, pp. 72–77 (2001)

    Google Scholar 

  6. Wang, J.-Q., Wong, K.-H., Heng, P.-A., Meng, H., Wong, T.-T.: A Real-Time Cantonese Text-To-Audiovisual Speech Synthesizer. In: Proc. ICASSP, pp. 653–656 (2004)

    Google Scholar 

  7. Verma, A., Subramaniam, V., Rajput, N., Neti, C.: Animating Expressive Faces Across Languages. IEEE Trans. on Multimedia 6(6), 791–800 (2003)

    Article  Google Scholar 

  8. Xie, L., Liu, Z.-Q.: An Articulatory Appraoch to Video-Realistic Mouth Animation. In: Proc. of ICASSP, pp. 593–596 (2006)

    Google Scholar 

  9. Young, S., Evermann, G., Kershaw, D., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.2), Cambridge University Engineering Department (2002), http://htk.eng.cam.ac.uk/

  10. Linguistic Society of Hong Kong. Cantonese Transcription Scheme (1997)

    Google Scholar 

  11. Hui, P.Y., Lo, W.K., Meng, H.: Tow Robust Methods for Cantonese Spoken Document Retrieval. In: Proc. of 2003 ISCA Workshop on Multilingual Spoken Document Retrieval, pp. 7–12 (2003)

    Google Scholar 

  12. Xie, L., Liu, Z.-Q.: A Coupled HMM Approach to Video-Realisic Speech Animation. Pattern Recognition (submitted)(2006)

    Google Scholar 

  13. Cosatto, E.: Sample-Based Talking-Head Synthesis. Ph.D Thesis of Swiss Federal Institue of Technology (2002)

    Google Scholar 

  14. Pérez, P., Gangnet, M., Blake, A.: Poisson Image Editing. Siggraph, 313–318 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xie, L., Meng, H., Liu, ZQ. (2006). A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_64

Download citation

  • DOI: https://doi.org/10.1007/11939993_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49665-6

  • Online ISBN: 978-3-540-49666-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics