Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models

Kubanek, Mariusz

doi:10.1007/978-0-387-36503-9_5

Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models

Mariusz Kubanek⁴

Conference paper

825 Accesses
16 Citations

Abstract

Mainstream automatic speech recognition has focused almost exclusively on the acoustic signal. The performance of these systems degrades considerably in the real word in the presence of noise. It was needed novel approaches that use other orthogonal sources of information to the acoustic input that not only considerably improve the performance in severely degraded conditions, but also are independent to the type of noise and reverberation. Visual speech is one such source not perturbed by the acoustic environment and noise. In this paper, it was presented own approach to lip-tracking and fusion of signals audio and video for audio-visual speech and speaker recognition system. It was presented video analysis of visual speech for extraction visual features from a talking person in color video sequences. It was developed a method for automatically localization of face, eyes, region of mouth, corners and contour of mouth. It was proposed synchronous and two asynchronous of methods of fusion of signals audio and video. Finally, the paper will show results of lip-tracking depending on various factors (lighting, beard), results of speech and speaker recognition in noisy environments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Herda, L., Fua, P., Plankers, R., Boulic, R., Thalmann, D.: Skeleton-based motion capture for robust reconstruction of human motion. Proceedings Computer Animation 2000, pp. 77–83, 2000
Google Scholar
Aydin, Y., Nakajama, H.: Realistic articulated character positioning and balance control in interactive environments. Proceedings Computer Animation 1999, pp. 160–168, 1999
Google Scholar
Zhi, Q., Kaynak, M. N. N., Sengupta, K., Cheok, A. D., Ko, C. C.: A study of the modeling aspects in bimodal speech recognition. Proc. 2001 IEEE International Conference on Multimedia and Expo (ICME2001), 2001
Google Scholar
Jian, Z., Kaynak, M. N. N., Cheok, A. D., Chung, K. C.: Real-time Lip-tracking For Virtual Lip Implementation in Virtual Environments and Computer Games. Proc. 2001 International Fuzzy Systems Conference, 2001
Google Scholar
Neti, C, Potamianos, G., Luttin, J., Mattews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio Visual Speech-Recognition. Workshop 2000 Final Report, October 12, 2000
Google Scholar
McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature, 264:746–748, 1976
Article Google Scholar
Massaro, D. W., Stork, D. G.: Speech recognition and sensory integration. American Scientist, 86(3):236–244, 1998
Article Google Scholar
Hennecke, M. E., Stork, D. G., Prasad, K. V.: Visionary speech: Looking ahead to practical speechreading systems. In Stork and Hennecke [91], pages 331–349
Google Scholar
Steifelhagen, R., Meier, U., Yang, J.: Real-Time Lip-Tranking for Lipreading.
Google Scholar
Kuchariev, G., Kuźmiński, A.: Biometric technique. Part 1: Methods of face recognition. Departament of Computer Science, Szczecin University of Technology, 2003
Google Scholar
Gee, A. H., Cipolla, R.: Fast visual tracking by temporal consensus. Technical Report CUED/F-INFENG/TR-207, University of Cambridge, February 1995
Google Scholar
Basu, S., Oliver, N., Pentland, A.: 3D modeling and tracking of human lip motions. In Proc. International Conference on Computer Vision, 1998
Google Scholar
Chan, M. T., Zhang, Y., Huang, T. S.: Real-time lip-tracking and bimodal continuous speech recognition. In Proc. IEEE 2^nd Workshop on Multimedia Signal Processing, pages 65–70, Redondo Beach, 1988
Google Scholar
Kubanek, M.: Method of edge EDGE to extraction of features of image of mouth in technique of integrated recognizing of speech audio-video. Information Sciences, Publisher of Czestochowa University of Technology, Czestochowa 2003, nr 4, s. 115–125
Google Scholar
Kaucic, R., Dalton, B., Blake, A.: Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications. In Proc. European Conf. Computer Vision, pp. 376–387, Cambridge, UK, 1996
Google Scholar
Summerfield, Q., MacLeod, A., McGrath, M., Broke, M.: Lips, teeth and the benefits of lipreading. In A. W. Young and H. D. Ellis, editors, Handbook of Research on Face Processing, pp. 223–233, Elsevier Science Publishers, 1989
Google Scholar
Luttein, J.: Visual Speech and Speaker Recognition. Dissertation submitted to the University of Sheffield for the degree of Doctor of Philosophy, May 1997
Google Scholar
Rabiner, L., Yuang, B. H.: Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series, 1993
Google Scholar
Kaynak, M. N. N., Zhi, Q, Cheok, A. D., Sengupta, K., Chung, K. C.: Audio-Visual Modeling for Bimodal Speech Recognition. Proc. 2001 International Fuzzy Systems Conference, 2001
Google Scholar
Bogert, B. P., Healy, M. J. R., Tukey, J. W.: The Frequency Analysis of Time-Series for Echoes. Proc. Symp. Time Series Analysis, 1963, Chap, pp. 209–243
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer & Information Sciences, Czestochowa University of Technology, Dabrowskiego Street, 73, 42-200, Czestochowa, Poland
Mariusz Kubanek

Authors

Mariusz Kubanek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer Science, Bialystok Technical University, Wiejska 45A, 15-351, Bialystok, Poland
Khalid Saeed
Faculty of Computer Science, Szczecin University of Technology, Zolnierska 49, 71 210, Szczecin, Poland
Jerzy Pejaś
University of Finance and Management in Bialystok, Ciepla 40, 15 472, Bialystok, Poland
Romuald Mosdorf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kubanek, M. (2006). Method of Speech Recognition and Speaker Identification using Audio-Visual of Polish Speech and Hidden Markov Models. In: Saeed, K., Pejaś, J., Mosdorf, R. (eds) Biometrics, Computer Security Systems and Artificial Intelligence Applications. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-36503-9_5

Download citation

DOI: https://doi.org/10.1007/978-0-387-36503-9_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-36232-8
Online ISBN: 978-0-387-36503-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics