Speaker Recognition: Introduction

Zheng, Thomas Fang; Li, Lantian

doi:10.1007/978-981-10-3238-7_1

Thomas Fang Zheng³ &
Lantian Li³

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSIGNAL))

613 Accesses

Abstract

In the ancient war times, officers and soldiers could recognize one friend or foe through the predetermined password(s). In real life, we human are able to get in and out of a house using keys or e-cards. While surfing the Internet, the user logins in websites or mail servers with his/her account and password.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wikipedia. https://en.wikipedia.org/wiki/Biometrics
Zhang C (2014) Research on short utterance speaker recognition. Tsinghua University, Ph.D. Dissertation
Google Scholar
Zheng TF, Jin Q, Li L et al (2014) An overview of robustness related issues in speaker recognition. Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA). IEEE, pp 1–10
Google Scholar
Furui S (2005) 50 years of progress in speech and speaker recognition. SPECOM 2005, Patras, pp 1–9
Google Scholar
Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am 50(2B):637–655
Article Google Scholar
Doddington GR, Flanagan JL, Lummis R C (1972) Automatic speaker verification by non-linear time alignment of acoustic parameters. U.S. Patent 3,700,815 [P], pp 10–24
Google Scholar
Atal BS (1972) Automatic speaker recognition based on pitch contours. J Acoust Soc Am 52(6B):1687–1697
Article Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Article Google Scholar
Vergin R (1999) O’shaughnessy D, Farhat A. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans Speech Audio Process 7(5):525–532
Article Google Scholar
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49
Article MATH Google Scholar
Burton D, Shore J, Buck J (1983) A generalization of isolated word recognition using vector quantization. Acoustics, speech, and signal processing. IEEE international conference on ICASSP’83. IEEE vol 8, pp 1021–1024
Google Scholar
Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Magazine 3(1):4–16
Article Google Scholar
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer 29(3):31–44
Article Google Scholar
Reynolds D (2015) Gaussian mixture models. Encyclopedia of biometrics, pp 827–832
Google Scholar
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1–3):19–41
Article Google Scholar
Dehak N, Dumouchel P, Kenny P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 15(7):2095–2103
Article Google Scholar
Dehak N, Kenny P, Dehak R et al (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Hatch AO, Kajarekar SS, Stolcke A (2006) Within-class covariance normalization for SVM-based speaker recognition. INTERSPEECH
Google Scholar
Solomonoff A, Quillen C, Campbell WM (2004) Channel compensation for SVM speaker recognition. Odyssey, vol 4, pp 219–226
Google Scholar
McLaren M, Van Leeuwen D (2011) Source-normalised-and-weighted LDA for robust speaker recognition using i-vectors. Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on. IEEE, pp 5456–5459
Google Scholar
Ioffe S (2006) Probabilistic linear discriminant analysis. European conference on computer vision. Springer, Berlin, pp 531–542
Google Scholar
Prince SJD, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE, pp 1–8
Google Scholar
Yang L (2007) An overview of distance metric learning. Proceedings of the computer vision and pattern recognition conference
Google Scholar
Dahl GE, Yu D, Deng L et al (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30–42
Article Google Scholar
Graves A, Jaitly N (2014) Towards end-To-end speech recognition with recurrent neural networks. ICML, vol 14, pp 1764–1772
Google Scholar
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. INTERSPEECH, pp 338–342
Google Scholar
Lei Y, Scheffer N, Ferrer L et al (2014) A novel scheme for speaker recognition using a phonetically-aware deep neural network. Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE, pp 1695–1699
Google Scholar
Kenny P, Gupta V, Stafylakis T et al (2014) Deep neural networks for extracting baum-welch statistics for speaker recognition. Proc. Odyssey, pp 293–298
Google Scholar
Wang J, Wang D, Zhu Z et al (2014) Discriminative scoring for speaker recognition based on i-vectors. Asia-pacific signal and information processing association, 2014 annual summit and conference (APSIPA). IEEE, pp 1–5
Google Scholar
Variani E, Lei X, McDermott E et al (2014) Deep neural networks for small footprint text-dependent speaker verification. Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE, pp 4052–4056
Google Scholar
Li L, Lin Y, Zhang Z et al (2015) Improved deep speaker feature learning for text-dependent speaker recognition. Signal and information processing association annual summit and conference (APSIPA), 2015 Asia-Pacific. IEEE, pp 426–429
Google Scholar
Chen N, Qian Y, Yu K (2015) Multi-task learning for text-dependent speaker verification. Sixteenth annual conference of the international speech communication association, pp 185–189
Google Scholar
Wang D, Zheng TF (2015) Transfer learning for speech and language processing. Signal and information processing association annual summit and conference (APSIPA), 2015 Asia-Pacific. IEEE, pp 1225–1237
Google Scholar
Tang Z, Li L, Wang D et al (2016) Collaborative joint training with multi-task recurrent model for speech and speaker recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing
Google Scholar
Snyder D, Ghahremani P, Povey D et al (2016) Deep neural network-based speaker embeddings for end-to-end speaker verification, 2016 IEEE Workshop on Spoken Language Technology
Google Scholar
Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872
Article Google Scholar
Campbell JP (1997) Speaker recognition: a tutorial. Proc IEEE 85(9):1437–1462
Article Google Scholar
Tranter SE, Reynolds DA (2006) An overview of automatic speaker diarization systems. IEEE Trans Audio Speech Lang Process 14(5):1557–1565
Article Google Scholar
Martin A, Doddington G, Kamm T et al (1997) The DET curve in assessment of detection task performance. Proc of the European conference on speech communication and technology (Eurospeech 1997), Rhodes, Greece, vol 4, pp 1895–1898
Google Scholar
The NIST year 2006 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/sre/2006/sre-06_evalplan-v9.pdf

Download references

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology, Division of Technical Innovation and Development, Department of Computer Science and Technology, Center for Speech and Language Technologies, Research Institute of Information Technology, Tsinghua University, Beijing, 100084, China
Thomas Fang Zheng & Lantian Li

Authors

Thomas Fang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Lantian Li
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zheng, T.F., Li, L. (2017). Speaker Recognition: Introduction. In: Robustness-Related Issues in Speaker Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Singapore. https://doi.org/10.1007/978-981-10-3238-7_1

Download citation

DOI: https://doi.org/10.1007/978-981-10-3238-7_1
Published: 07 April 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3237-0
Online ISBN: 978-981-10-3238-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics