Abstract
According to some significant advantages, the text-dependent speaker recognition is still widely used in biometric systems. These systems are, in comparison with the text-independent, more accurate and resistant against the replay attacks. There are many approaches regarding the text-dependent recognition. This paper introduces a combination of classifiers based on fractional distances, biometric dispersion matcher and dynamic time warping. The first two mentioned classifiers are based on a voice imprint. They have low memory requirements while the recognition procedure is fast. This is advantageous especially in low-cost biometric systems supplied by batteries. It is shown that using the trained score fusion, it is possible to reach successful detection rate equal to 98.98% and 92.19% in case of microphone mismatch. During verification, system reached equal error rate 2.55% and 6.77% when assuming the microphone mismatch. System was tested using Catalan database which consists of 48 speakers (three 3s training samples per speaker).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
BenZeghiba, M.F., Bourlard, H.: User-customized Password Speaker Verification Using Multiple Reference and Background Models. Speech Communication 8, 1200–1213 (2006), iDIAP-RR 04-41
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support Vector Machines for Speaker and Language Recognition. Computer Speech & Language 20(2-3), 210–229 (2006), http://www.sciencedirect.com/science/article/B6WCW-4GSSP9F-1/2/4aaea6467cc61ee4919a9b1c953316b1 , odyssey 2004: The speaker and Language Recognition Workshop - Odyssey-04
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters 13(5), 308–311 (2006)
Das., A., Chittaranjan, G., Srinivasan, V.: Text-dependent Speaker Recognition by Compressed Feature-dynamics Derived from Sinusoidal Representation of Speech. In: 16th European Signal Processing Conference (EUSIPCO 2008), Lausanne, Switzerland (2008)
Davis, S., Mermelstein, P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)
Fàbregas, J., Faundez-Zanuy, M.: Biometric Dispersion Matcher. Pattern Recogn. 41, 3412–3426 (2008), http://portal.acm.org/citation.cfm?id=1399656.1399907
Fàbregas, J., Faundez-Zanuy, M.: Biometric Dispersion Matcher Versus LDA. Pattern Recogn. 42, 1816–1823 (2009), http://portal.acm.org/citation.cfm?id=1542560.1542866
Furui, S.: Cepstral Analysis Technique for Automatic Speaker Verification. IEEE Transactions on Acoustics, Speech and Signal Processing 29(2), 254–272 (1981)
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of the Acoustical Society of America 87(4), 1738–1752 (1990), http://link.aip.org/link/?JAS/87/1738/1
Kinnunen, T., Li, H.: An Overview of Text-independent Speaker Recognition: From Features to Supervectors. Speech Communication 52(1), 12–40 (2010), http://www.sciencedirect.com/science/article/B6V1C-4X4Y22C-1/2/7926da351ef5c650f2a1a37adcd839a1
Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust Speaker Recognition: a Feature-based Approach. IEEE Signal Processing Magazine 13(5), 58 (1996)
Mekyska, J., Faundez-Zanuy, M., Smekal, Z., Fàbregas, J.: Text-dependent Speaker Recognition in Low-cost Systems. In: 6th International Conference on Teleinformatics, Dolni Morava, Czech Republic, pp. 154–158 (2011)
Reynolds, D.A.: Speaker Identification and Verification Using Gaussian Mixture Speaker Models. Speech Commun. 17, 91–108 (1995), http://portal.acm.org/citation.cfm?id=211311.211317
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing (2000)
Swanson, A.L., Ramachandran, R.P., Chin, S.H.: Fast Adaptive Component Weighted Cepstrum Pole Filtering for Speaker Identification. In: Proceedings of the 2004 International Symposium on Circuits and Systems, ISCAS 2004, vol. 5, pp. 612–615 (May 2004)
Vivaracho-Pascual, C., Faundez-Zanuy, M., Pascual, J.M.: An Efficient Low Cost Approach for On-line Signature Recognition Based on Length Normalization and Fractional Distances. Pattern Recogn. 42, 183–193 (2009), http://portal.acm.org/citation.cfm?id=1412761.1413027
Wong, E., Sridharan, S.: Comparison of Linear Prediction Cepstrum Coefficients and Mel-requency Cepstrum Coefficients for Language Identification. In: Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 95–98 (2001)
Yegnanarayana, B., Kishore, S.P.: AANN: an Alternative to GMM for Pattern Recognition. Neural Networks 15(3), 459–469 (2002), http://www.sciencedirect.com/science/article/B6T08-459952R-2/2/a53c123eaecb7ccb7b50baec88885192
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mekyska, J., Faundez-Zanuy, M., Smékal, Z., Fàbregas, J. (2011). Score Fusion in Text-Dependent Speaker Recognition Systems. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-25775-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25774-2
Online ISBN: 978-3-642-25775-9
eBook Packages: Computer ScienceComputer Science (R0)