Score Fusion in Text-Dependent Speaker Recognition Systems

Mekyska, Jiří; Faundez-Zanuy, Marcos; Smékal, Zdeněk; Fàbregas, Joan

doi:10.1007/978-3-642-25775-9_12

Score Fusion in Text-Dependent Speaker Recognition Systems

Jiří Mekyska²¹,
Marcos Faundez-Zanuy²²,
Zdeněk Smékal²¹ &
…
Joan Fàbregas²²

Conference paper

2538 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6800))

Abstract

According to some significant advantages, the text-dependent speaker recognition is still widely used in biometric systems. These systems are, in comparison with the text-independent, more accurate and resistant against the replay attacks. There are many approaches regarding the text-dependent recognition. This paper introduces a combination of classifiers based on fractional distances, biometric dispersion matcher and dynamic time warping. The first two mentioned classifiers are based on a voice imprint. They have low memory requirements while the recognition procedure is fast. This is advantageous especially in low-cost biometric systems supplied by batteries. It is shown that using the trained score fusion, it is possible to reach successful detection rate equal to 98.98% and 92.19% in case of microphone mismatch. During verification, system reached equal error rate 2.55% and 6.77% when assuming the microphone mismatch. System was tested using Catalan database which consists of 48 speakers (three 3s training samples per speaker).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BenZeghiba, M.F., Bourlard, H.: User-customized Password Speaker Verification Using Multiple Reference and Background Models. Speech Communication 8, 1200–1213 (2006), iDIAP-RR 04-41
Article Google Scholar
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support Vector Machines for Speaker and Language Recognition. Computer Speech & Language 20(2-3), 210–229 (2006), http://www.sciencedirect.com/science/article/B6WCW-4GSSP9F-1/2/4aaea6467cc61ee4919a9b1c953316b1 , odyssey 2004: The speaker and Language Recognition Workshop - Odyssey-04
Article Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters 13(5), 308–311 (2006)
Article Google Scholar
Das., A., Chittaranjan, G., Srinivasan, V.: Text-dependent Speaker Recognition by Compressed Feature-dynamics Derived from Sinusoidal Representation of Speech. In: 16th European Signal Processing Conference (EUSIPCO 2008), Lausanne, Switzerland (2008)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Fàbregas, J., Faundez-Zanuy, M.: Biometric Dispersion Matcher. Pattern Recogn. 41, 3412–3426 (2008), http://portal.acm.org/citation.cfm?id=1399656.1399907
Article MATH Google Scholar
Fàbregas, J., Faundez-Zanuy, M.: Biometric Dispersion Matcher Versus LDA. Pattern Recogn. 42, 1816–1823 (2009), http://portal.acm.org/citation.cfm?id=1542560.1542866
Article MATH Google Scholar
Furui, S.: Cepstral Analysis Technique for Automatic Speaker Verification. IEEE Transactions on Acoustics, Speech and Signal Processing 29(2), 254–272 (1981)
Article Google Scholar
Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of the Acoustical Society of America 87(4), 1738–1752 (1990), http://link.aip.org/link/?JAS/87/1738/1
Article Google Scholar
Kinnunen, T., Li, H.: An Overview of Text-independent Speaker Recognition: From Features to Supervectors. Speech Communication 52(1), 12–40 (2010), http://www.sciencedirect.com/science/article/B6V1C-4X4Y22C-1/2/7926da351ef5c650f2a1a37adcd839a1
Article Google Scholar
Mammone, R.J., Zhang, X., Ramachandran, R.P.: Robust Speaker Recognition: a Feature-based Approach. IEEE Signal Processing Magazine 13(5), 58 (1996)
Article Google Scholar
Mekyska, J., Faundez-Zanuy, M., Smekal, Z., Fàbregas, J.: Text-dependent Speaker Recognition in Low-cost Systems. In: 6th International Conference on Teleinformatics, Dolni Morava, Czech Republic, pp. 154–158 (2011)
Google Scholar
Reynolds, D.A.: Speaker Identification and Verification Using Gaussian Mixture Speaker Models. Speech Commun. 17, 91–108 (1995), http://portal.acm.org/citation.cfm?id=211311.211317
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing (2000)
Google Scholar
Swanson, A.L., Ramachandran, R.P., Chin, S.H.: Fast Adaptive Component Weighted Cepstrum Pole Filtering for Speaker Identification. In: Proceedings of the 2004 International Symposium on Circuits and Systems, ISCAS 2004, vol. 5, pp. 612–615 (May 2004)
Google Scholar
Vivaracho-Pascual, C., Faundez-Zanuy, M., Pascual, J.M.: An Efficient Low Cost Approach for On-line Signature Recognition Based on Length Normalization and Fractional Distances. Pattern Recogn. 42, 183–193 (2009), http://portal.acm.org/citation.cfm?id=1412761.1413027
Article MATH Google Scholar
Wong, E., Sridharan, S.: Comparison of Linear Prediction Cepstrum Coefficients and Mel-requency Cepstrum Coefficients for Language Identification. In: Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 95–98 (2001)
Google Scholar
Yegnanarayana, B., Kishore, S.P.: AANN: an Alternative to GMM for Pattern Recognition. Neural Networks 15(3), 459–469 (2002), http://www.sciencedirect.com/science/article/B6T08-459952R-2/2/a53c123eaecb7ccb7b50baec88885192
Article Google Scholar

Download references

Author information

Authors and Affiliations

Signal Processing Laboratory, Department of Telecommunications, Faculty of Electrical Engineering and Communication, Brno University of Technology, Brno, Czech Republic
Jiří Mekyska & Zdeněk Smékal
Escola Universitària Politècnica de Mataró, Barcelona, Spain
Marcos Faundez-Zanuy & Joan Fàbregas

Authors

Jiří Mekyska
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Faundez-Zanuy
View author publications
You can also search for this author in PubMed Google Scholar
Zdeněk Smékal
View author publications
You can also search for this author in PubMed Google Scholar
Joan Fàbregas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Psychology and IIASS, International Institute for Advanced Scientific Studies, Second University of Naples, Vietri sul Mare, SA, Italy
Anna Esposito
School of Computing Science, University of Glasgow, Glasgow, UK
Alessandro Vinciarelli
Department of Telecommunication and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, 1117, Budapest, Hungary
Klára Vicsi
TELECOM ParisTech, CNRS-LTCI UMR 5141, 75014, Paris, France
Catherine Pelachaud
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500 AE, Enschede, The Netherlands
Anton Nijholt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mekyska, J., Faundez-Zanuy, M., Smékal, Z., Fàbregas, J. (2011). Score Fusion in Text-Dependent Speaker Recognition Systems. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds) Analysis of Verbal and Nonverbal Communication and Enactment. The Processing Issues. Lecture Notes in Computer Science, vol 6800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25775-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-25775-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25774-2
Online ISBN: 978-3-642-25775-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics