The National Institute of Standards and Technology (NIST) 2012 speaker recognition evaluation posed several new challenges including noisy data, varying test-sample length and number of enrollment samples, and a new metric. Target speakers were known during system development and could be used for model training and score normalization. For the evaluation, SRI International (SRI) submitted a system consisting of six subsystems that use different low- and high-level features, some specifically designed for noise robustness, fused at the score and iVector levels. This paper presents SRI's submission along with a careful analysis of the approaches that provided gains for this challenging evaluation including a multiclass voice-activity detection system, the use of noisy data in system training, and the fusion of subsystems using acoustic characterization metadata.
Cite as: Ferrer, L., McLaren, M., Scheffer, N., Lei, Y., Graciarena, M., Mitra, V. (2013) A noise-robust system for NIST 2012 speaker recognition evaluation. Proc. Interspeech 2013, 1981-1985, doi: 10.21437/Interspeech.2013-471
@inproceedings{ferrer13_interspeech, author={Luciana Ferrer and Mitchell McLaren and Nicolas Scheffer and Yun Lei and Martin Graciarena and Vikramjit Mitra}, title={{A noise-robust system for NIST 2012 speaker recognition evaluation}}, year=2013, booktitle={Proc. Interspeech 2013}, pages={1981--1985}, doi={10.21437/Interspeech.2013-471} }