Skip to main content

Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

  • 1574 Accesses

Abstract

For the task of speaker recognition from audio, it is known that speakers experience different levels of error rates. In this work, predicting the proneness to false alarm and false reject of a given speaker embedding is investigated. Although exact prediction of biometric error behaviour appears to be a difficult problem, it is seen that the tendency to false alarm and false reject errors can be predicted directly from embeddings by training a neural network in a supervised manner. This prediction might be useful for several applications such as normalization of verification scores, incorporating those characteristics in embedding training or using it as an adversarial objective. We have utilized this predicted behaviour for a fast score normalization method. Our approach is compared to the frequently employed biometric normalization method that is s-norm which is a cohort-based technique and accounts only for imposter calibration. The proposed normalization is not only faster than s-norm, but also it outperforms s-norm by 8% and 3% for male and female speakers, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D.: Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. National Inst of Standards and Technology Gaithersburg Md (1998)

    Google Scholar 

  2. Stoll, L.L.: Finding difficult speakers in automatic speaker recognition. Doctoral dissertation, UC Berkeley (2011)

    Google Scholar 

  3. Stoll, L., Doddington, G.R.: Hunting for wolves in speaker recognition. In: Odyssey, p. 29 (2010)

    Google Scholar 

  4. Marras, M., Korus, P., Memon, N.D., Fenu, G.: Adversarial optimization for dictionary attacks on speaker verification. In: Interspeech, pp. 2913–2917 (2019)

    Google Scholar 

  5. Chen, K.: Towards better making a decision in speaker verification. Pattern Recogn. 36(2), 329–346 (2003)

    Article  Google Scholar 

  6. Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Odyssey, vol. 14, June 2010

    Google Scholar 

  7. Yager, N., Dunstone, T.: The biometric menagerie. IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 220–230 (2008)

    Article  Google Scholar 

  8. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE, April 2018

    Google Scholar 

  9. Snyder, D., et al.: The JHU speaker recognition system for the VOiCES 2019 challenge. In: Proceedings of Interspeech 2019, pp. 2468–2472 (2019). https://doi.org/10.21437/Interspeech.2019-2979

  10. Raj, D., Snyder, D., Povey, D., Khudanpur, S.: Probing the information encoded in x-vectors. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 726–733. IEEE, December 2019

    Google Scholar 

  11. Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2017)

    Google Scholar 

  12. Chung, J.S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: INTERSPEECH (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to İsmail Rasim Ülgen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ülgen, İ.R., Erden, M., Arslan, L.M. (2021). Predicting Biometric Error Behaviour from Speaker Embeddings and a Fast Score Normalization Scheme. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_74

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_74

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics