Abstract
For the task of speaker recognition from audio, it is known that speakers experience different levels of error rates. In this work, predicting the proneness to false alarm and false reject of a given speaker embedding is investigated. Although exact prediction of biometric error behaviour appears to be a difficult problem, it is seen that the tendency to false alarm and false reject errors can be predicted directly from embeddings by training a neural network in a supervised manner. This prediction might be useful for several applications such as normalization of verification scores, incorporating those characteristics in embedding training or using it as an adversarial objective. We have utilized this predicted behaviour for a fast score normalization method. Our approach is compared to the frequently employed biometric normalization method that is s-norm which is a cohort-based technique and accounts only for imposter calibration. The proposed normalization is not only faster than s-norm, but also it outperforms s-norm by 8% and 3% for male and female speakers, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D.: Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. National Inst of Standards and Technology Gaithersburg Md (1998)
Stoll, L.L.: Finding difficult speakers in automatic speaker recognition. Doctoral dissertation, UC Berkeley (2011)
Stoll, L., Doddington, G.R.: Hunting for wolves in speaker recognition. In: Odyssey, p. 29 (2010)
Marras, M., Korus, P., Memon, N.D., Fenu, G.: Adversarial optimization for dictionary attacks on speaker verification. In: Interspeech, pp. 2913–2917 (2019)
Chen, K.: Towards better making a decision in speaker verification. Pattern Recogn. 36(2), 329–346 (2003)
Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Odyssey, vol. 14, June 2010
Yager, N., Dunstone, T.: The biometric menagerie. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 220–230 (2008)
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE, April 2018
Snyder, D., et al.: The JHU speaker recognition system for the VOiCES 2019 challenge. In: Proceedings of Interspeech 2019, pp. 2468–2472 (2019). https://doi.org/10.21437/Interspeech.2019-2979
Raj, D., Snyder, D., Povey, D., Khudanpur, S.: Probing the information encoded in x-vectors. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 726–733. IEEE, December 2019
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: INTERSPEECH (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ülgen, İ.R., Erden, M., Arslan, L.M. (2021). Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_74
Download citation
DOI: https://doi.org/10.1007/978-3-030-87802-3_74
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)