Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme

Ülgen, İsmail Rasim; Erden, Mustafa; Arslan, Levent M.

doi:10.1007/978-3-030-87802-3_74

İsmail Rasim Ülgen^10,11,
Mustafa Erden¹⁰ &
Levent M. Arslan^10,11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1574 Accesses

Abstract

For the task of speaker recognition from audio, it is known that speakers experience different levels of error rates. In this work, predicting the proneness to false alarm and false reject of a given speaker embedding is investigated. Although exact prediction of biometric error behaviour appears to be a difficult problem, it is seen that the tendency to false alarm and false reject errors can be predicted directly from embeddings by training a neural network in a supervised manner. This prediction might be useful for several applications such as normalization of verification scores, incorporating those characteristics in embedding training or using it as an adversarial objective. We have utilized this predicted behaviour for a fast score normalization method. Our approach is compared to the frequently employed biometric normalization method that is s-norm which is a cohort-based technique and accounts only for imposter calibration. The proposed normalization is not only faster than s-norm, but also it outperforms s-norm by 8% and 3% for male and female speakers, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D.: Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. National Inst of Standards and Technology Gaithersburg Md (1998)
Google Scholar
Stoll, L.L.: Finding difficult speakers in automatic speaker recognition. Doctoral dissertation, UC Berkeley (2011)
Google Scholar
Stoll, L., Doddington, G.R.: Hunting for wolves in speaker recognition. In: Odyssey, p. 29 (2010)
Google Scholar
Marras, M., Korus, P., Memon, N.D., Fenu, G.: Adversarial optimization for dictionary attacks on speaker verification. In: Interspeech, pp. 2913–2917 (2019)
Google Scholar
Chen, K.: Towards better making a decision in speaker verification. Pattern Recogn. 36(2), 329–346 (2003)
Article Google Scholar
Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Odyssey, vol. 14, June 2010
Google Scholar
Yager, N., Dunstone, T.: The biometric menagerie. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 220–230 (2008)
Article Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE, April 2018
Google Scholar
Snyder, D., et al.: The JHU speaker recognition system for the VOiCES 2019 challenge. In: Proceedings of Interspeech 2019, pp. 2468–2472 (2019). https://doi.org/10.21437/Interspeech.2019-2979
Raj, D., Snyder, D., Povey, D., Khudanpur, S.: Probing the information encoded in x-vectors. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 726–733. IEEE, December 2019
Google Scholar
Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)
Google Scholar
Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: INTERSPEECH (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Sestek, Istanbul, Turkey
İsmail Rasim Ülgen, Mustafa Erden & Levent M. Arslan
Electrical and Electronics Engineering Department, Bogazici University, Istanbul, Turkey
İsmail Rasim Ülgen & Levent M. Arslan

Authors

İsmail Rasim Ülgen
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Erden
View author publications
You can also search for this author in PubMed Google Scholar
Levent M. Arslan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to İsmail Rasim Ülgen .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ülgen, İ.R., Erden, M., Arslan, L.M. (2021). Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_74

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_74
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics