Developing Speaker Recognition System: From Prototype to Practical Application

Fränti, Pasi; Saastamoinen, Juhani; Kärkkäinen, Ismo; Kinnunen, Tomi; Hautamäki, Ville; Sidoroff, Ilja

doi:10.1007/978-3-642-02312-5_12

Pasi Fränti¹⁶,
Juhani Saastamoinen¹⁶,
Ismo Kärkkäinen¹⁷,
Tomi Kinnunen¹⁶,
Ville Hautamäki¹⁶ &
…
Ilja Sidoroff¹⁶

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 8))

Included in the following conference series:

International Conference on Forensics in Telecommunications, Information, and Multimedia

Abstract

In this paper, we summarize the main achievements made in the 4-year PUMS project during 2003-2007. The emphasis is on the practical implementations, how we have moved from Matlab and Praat scripting to C/C++ implemented applications in Windows, UNIX, Linux and Symbian environments, with the motivation to enhance technology transfer. We summarize how the baseline methods have been implemented in practice, how the results are utilized in forensic applications, and compare recognition results to the state-ofart and existing commercial products such as ASIS, FreeSpeech and VoiceNet.

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-642-02312-5_25

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10(1-3), 42–54 (2000)
Article Google Scholar
Brummer, N., Burget, L., Cernocky, J., Glembek, O., Grezl, F., Karafiat, M., van Leeuwen, D.A., Matejka, P., Schwarz, P., Strasheim, A.: Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006. IEEE Trans. Audio, Speech and Language Processing 15(7), 2072–2084 (2007)
Article Google Scholar
Burget, L., Matejka, P., Schwarz, P., Glembek, O., Cernocky, J.H.: Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System. IEEE Trans. Audio, Speech and Language Processing 15(7), 1979–1986 (2007)
Article Google Scholar
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Computer Speech and Language 20(2-3), 210–229 (2006)
Article Google Scholar
ETSI, Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels, ETSI EN 301 708 Recommendation (1999)
Google Scholar
Fränti, P., Saastamoinen, J., Kärkkäinen, I., Kinnunen, T., Hautamäki, V., Sidoroff, I.: Implementing speaker recognition system: from Matlab to practice. Research Report A-2007-4, Dept. of Comp. Science, Univ. of Joensuu, Finland (November 2007), http://cs.joensuu.fi/sipu/pub.htm
Hautamäki, V., Kinnunen, T., Kärkkäinen, I., Saastamoinen, J., Tuononen, M., Fränti, P.: Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Processing Letters 15, 162–165 (2008)
Article Google Scholar
Hautamäki, V., Tuononen, M., Niemi-Laitinen, T., Fränti, P.: Improving speaker verification by periodicity based voice activity detection. In: Int. Conf. on Speech and Computer (SPECOM 2007), Moscow, Russia, vol. 2, pp. 645–650 (2007)
Google Scholar
ITU, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, ITU-T Recommendation G.729-Annex B (1996)
Google Scholar
Kay, S.M.: Fundamentals of Statistical Signal Processing, Detection Theory, vol. 2. Prentice Hall, Englewood Cliffs (1998)
Google Scholar
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of inter-speaker variability in speaker verification. IEEE Transactions on Audio, Speech and Language Processing 16(5), 980–988 (2008)
Article Google Scholar
Kinnunen, T., Gonzalez-Hautamäki, R.: Long-Term F0 Modeling for Text-Independent Speaker Recognition. In: Int. Conf. on Speech and Computer (SPECOM 2005), Patras, Greece, pp. 567–570 (October 2005)
Google Scholar
Kinnunen, T., Karpov, E., Fränti, P.: Real-time speaker identification and verification. IEEE Trans. on Audio, Speech and Language Processing 14(1), 277–288 (2006)
Article MATH Google Scholar
Kinnunen, T., Hautamäki, V., Fränti, P.: On the use of long-term average spectrum in automatic speaker recognition. In: Huo, Q., Ma, B., Chng, E.-S., Li, H. (eds.) ISCSLP 2006. LNCS, vol. 4274, pp. 559–567. Springer, Heidelberg (2006)
Google Scholar
Kinnunen, T., Chernenko, E., Tuononen, M., Fränti, P., Li, H.: Voice activity detection using MFCC features and support vector machine. In: Int. Conf. on Speech and Computer (SPECOM 2007), Moscow, Russia, vol. 2, pp. 556–561 (2007)
Google Scholar
Kinnunen, T., Saastamoinen, J., Hautamäki, V., Vinni, M., Fränti, P.: Comparative evaluation of maximum a posteriori vector quantization and Gaussian mixture models in speaker verification. Pattern Recognition Letters (accepted)
Google Scholar
Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication 26(4), 283–297 (1998)
Article Google Scholar
Ma, B., Zhu, D., Tong, R., Li, H.: Speaker Cluster based GMM tokenization for speaker recognition. In: Proc. Interspeech 2006, Pittsburg, USA, pp. 505–508 (September 2006)
Google Scholar
Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P.: Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: 2nd Baltic Conf. on Human Language Technologies (HLT 2005), Tallinn, Estonia, pp. 317–322 (April 2005)
Google Scholar
Ramirez, J., Segura, J.C., Benitez, C., de la Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Communications 42(34), 271–287 (2004)
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10(1), 19–41 (2000)
Article Google Scholar
Saastamoinen, J., Karpov, E., Hautamäki, V., Fränti, P.: Accuracy of MFCC based speaker recognition in series 60 device. Journal of Applied Signal Processing (17), 2816–2827 (2005)
Google Scholar
Saastamoinen, J., Fiedler, Z., Kinnunen, T., Fränti, P.: On factors affecting MFCC-based speaker recognition accuracy. In: Int. Conf. on Speech and Computer (SPECOM 2005), Patras, Greece, pp. 503–506 (October 2005)
Google Scholar
Tong, R., Ma, B., Lee, K.A., You, C.H., Zhou, D.L., Kinnunen, T., Sun, H.W., Dong, M.H., Ching, E.S., Li, H.Z.: Fusion of acoustic and tokenization features for speaker recognition. In: 5th In. Symp. on Chinese Spoken Language Proc., Singapore, pp. 566–577 (2006)
Google Scholar
Tuononen, M., González Hautamäki, R., Fränti, P.: Automatic voice activity detection in different speech applications. In: Int. Conf. on Forensic Applications and Techniques in Telecommunications, Information and Multimedia (e-Forensics 2008), Adelaide, Australia, Article No.12 (January 2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Speech & Image Processing Unit, Dept. of Computer Science and Statistics, University of Joensuu, Finland
Pasi Fränti, Juhani Saastamoinen, Tomi Kinnunen, Ville Hautamäki & Ilja Sidoroff
Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore
Ismo Kärkkäinen

Authors

Pasi Fränti
View author publications
You can also search for this author in PubMed Google Scholar
Juhani Saastamoinen
View author publications
You can also search for this author in PubMed Google Scholar
Ismo Kärkkäinen
View author publications
You can also search for this author in PubMed Google Scholar
Tomi Kinnunen
View author publications
You can also search for this author in PubMed Google Scholar
Ville Hautamäki
View author publications
You can also search for this author in PubMed Google Scholar
Ilja Sidoroff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, University of Adelaide, P.O. Box, SA 5005, Australia
Matthew Sorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fränti, P., Saastamoinen, J., Kärkkäinen, I., Kinnunen, T., Hautamäki, V., Sidoroff, I. (2009). Developing Speaker Recognition System: From Prototype to Practical Application. In: Sorell, M. (eds) Forensics in Telecommunications, Information and Multimedia. e-Forensics 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 8. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02312-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-02312-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02311-8
Online ISBN: 978-3-642-02312-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics