Skip to main content

Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models

  • Conference paper

Abstract

Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real word in the presence of noise. It was needed novel approaches that use other orthogonal sources of information to the acoustic input that not only considerably improve the performance in severely degraded conditions, but also are independent to the type of noise and reverberation. Visual speech is one such source not perturbed by the acoustic environment and noise. In this paper, it was presented own approach to lip-tracking and fusion of signals audio and video for audio-visual speech and speaker recognition system. It was presented video analysis of visual speech for extraction visual features from a talking person in color video sequences. It was developed a method for automatically localization of face, eyes, region of mouth, corners and contour of mouth. It was proposed synchronous and two asynchronous of methods of fusion of signals audio and video. Finally, the paper will show results of lip-tracking depending on various factors (lighting, beard), results of speech and speaker recognition in noisy environments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Herda, L., Fua, P., Plankers, R., Boulic, R., Thalmann, D.: Skeleton-based motion capture for robust reconstruction of human motion. Proceedings Computer Animation 2000, pp. 77–83, 2000

    Google Scholar 

  2. Aydin, Y., Nakajama, H.: Realistic articulated character positioning and balance control in interactive environments. Proceedings Computer Animation 1999, pp. 160–168, 1999

    Google Scholar 

  3. Zhi, Q., Kaynak, M. N. N., Sengupta, K., Cheok, A. D., Ko, C. C.: A study of the modeling aspects in bimodal speech recognition. Proc. 2001 IEEE International Conference on Multimedia and Expo (ICME2001), 2001

    Google Scholar 

  4. Jian, Z., Kaynak, M. N. N., Cheok, A. D., Chung, K. C.: Real-time Lip-tracking For Virtual Lip Implementation in Virtual Environments and Computer Games. Proc. 2001 International Fuzzy Systems Conference, 2001

    Google Scholar 

  5. Neti, C, Potamianos, G., Luttin, J., Mattews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio Visual Speech-Recognition. Workshop 2000 Final Report, October 12, 2000

    Google Scholar 

  6. McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature, 264:746–748, 1976

    Article  Google Scholar 

  7. Massaro, D. W., Stork, D. G.: Speech recognition and sensory integration. American Scientist, 86(3):236–244, 1998

    Article  Google Scholar 

  8. Hennecke, M. E., Stork, D. G., Prasad, K. V.: Visionary speech: Looking ahead to practical speechreading systems. In Stork and Hennecke [91], pages 331–349

    Google Scholar 

  9. Steifelhagen, R., Meier, U., Yang, J.: Real-Time Lip-Tranking for Lipreading.

    Google Scholar 

  10. Kuchariev, G., Kuźmiński, A.: Biometric technique. Part 1: Methods of face recognition. Departament of Computer Science, Szczecin University of Technology, 2003

    Google Scholar 

  11. Gee, A. H., Cipolla, R.: Fast visual tracking by temporal consensus. Technical Report CUED/F-INFENG/TR-207, University of Cambridge, February 1995

    Google Scholar 

  12. Basu, S., Oliver, N., Pentland, A.: 3D modeling and tracking of human lip motions. In Proc. International Conference on Computer Vision, 1998

    Google Scholar 

  13. Chan, M. T., Zhang, Y., Huang, T. S.: Real-time lip-tracking and bimodal continuous speech recognition. In Proc. IEEE 2nd Workshop on Multimedia Signal Processing, pages 65–70, Redondo Beach, 1988

    Google Scholar 

  14. Kubanek, M.: Method of edge EDGE to extraction of features of image of mouth in technique of integrated recognizing of speech audio-video. Information Sciences, Publisher of Czestochowa University of Technology, Czestochowa 2003, nr 4, s. 115–125

    Google Scholar 

  15. Kaucic, R., Dalton, B., Blake, A.: Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications. In Proc. European Conf. Computer Vision, pp. 376–387, Cambridge, UK, 1996

    Google Scholar 

  16. Summerfield, Q., MacLeod, A., McGrath, M., Broke, M.: Lips, teeth and the benefits of lipreading. In A. W. Young and H. D. Ellis, editors, Handbook of Research on Face Processing, pp. 223–233, Elsevier Science Publishers, 1989

    Google Scholar 

  17. Luttein, J.: Visual Speech and Speaker Recognition. Dissertation submitted to the University of Sheffield for the degree of Doctor of Philosophy, May 1997

    Google Scholar 

  18. Rabiner, L., Yuang, B. H.: Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series, 1993

    Google Scholar 

  19. Kaynak, M. N. N., Zhi, Q, Cheok, A. D., Sengupta, K., Chung, K. C.: Audio-Visual Modeling for Bimodal Speech Recognition. Proc. 2001 International Fuzzy Systems Conference, 2001

    Google Scholar 

  20. Bogert, B. P., Healy, M. J. R., Tukey, J. W.: The Frequency Analysis of Time-Series for Echoes. Proc. Symp. Time Series Analysis, 1963, Chap, pp. 209–243

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Science+Business Media, LLC

About this paper

Cite this paper

Kubanek, M. (2006). Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models. In: Saeed, K., Pejaś, J., Mosdorf, R. (eds) Biometrics, Computer Security Systems and Artificial Intelligence Applications. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-36503-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-36503-9_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-36232-8

  • Online ISBN: 978-0-387-36503-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics