Abstract
In this paper an in depth analysis is undertaken into effective strategies for integrating the audio-visual modalities for the purposes of text-dependent speaker recognition. Our work is based around the well known hidden Markov model (HMM) classifier framework for modelling speech. A framework is proposed to handle the mismatch between train and test observation sets, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audio-visual speaker recognition applications can be attributed to train/test mismatches we propose that the main impetus of practical audio-visual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended, based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise. Results are presented on the M2VTS database.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
J. Kittler, “Combining classifiers: A theoretical framework,” Pattern Analysis and Applications, vol. 1, no. 1, pp. 18–27, 1998.
K. Fukunaga, Introduction to statistical pattern recognition. 24–28 Oval Road, London NW1 7DX: Academic Press Inc., 2nd ed., 1990.
J.R. Movellan and P. Mineiro, “Modularity and catastrophic fusion: A bayesian approach with applications to audio-visual speech recognition,” Tech. Rep. 97.01, Departement of Cognitive Science, USCD, San Diego, CA, 1997.
S. Pigeon, “The M2VTS database,” (Laboratoire de Telecommunications et Teledection, Place du Levant, 2-B-1348 Louvain-La-Neuve, Belgium), 1996.
P. Jourlin, J. Luettin, D. Genoud, and H. Wassner, “Acoustic-labial speaker verification,” Pattern Recognition Letters, 1997.
S. Lucey, S. Sridharan, and V. Chandran, “Improved facial feature detection for AVSP via unsupervised clustering and discriminant analysis,” EURASIP Journal on Applied Signal Processing, March 2003.
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book (for HTK version 2.2). Entropic Ltd., 1999.
R. J. Mammone, X. Zhang, and R.P. Ramachandran, “Robust speaker recognition: A feature-based approach,” IEEE Signal Processing Magazine, vol. 13, pp. 58–70, September 1996. 933
S. Dupont and J. Luettin, “Audio-visual speech modeling for continuous speech recognition,” IEEE Transactions on Multimedia, vol. 2, pp. 141–151, September 2000.
S. Lucey, S. Sridharan, and V. Chandran, “A link between cepstral shrinking and the weighted product rule in audio-visual speech recognition,” in International Conference on Spoken Language Processing, (Denver, Colorado), September 2002.
A. Adjoudani and C. Benoit, “Audio-visual speech recognition compared across two architectures,” in EUROSPEECH’95, (Madrid Spain), pp. 1563–1566, September 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lucey, S., Chen, T. (2003). Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy. In: Kittler, J., Nixon, M.S. (eds) Audio- and Video-Based Biometric Person Authentication. AVBPA 2003. Lecture Notes in Computer Science, vol 2688. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44887-X_108
Download citation
DOI: https://doi.org/10.1007/3-540-44887-X_108
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40302-9
Online ISBN: 978-3-540-44887-7
eBook Packages: Springer Book Archive