Many spectrum estimation methods and speech enhancement algorithms have previously been evaluated for noise-robust speaker identification (SID). However, these techniques have mostly been evaluated over artificially noised, mismatched training tasks with GMM-UBM speaker models. It is therefore unclear whether performance improvements observed with these methods translate to a broader range of noisy SID tasks. This study compares selected spectrum estimation methods from three classes: cochlear filterbanks, alternative time-domain windowing, and linear predictionbased techniques, as well as a set of frequency-domain noise reduction algorithms, across a suite of 8 evaluation tasks. The evaluation tasks are designed to expand upon the limited tasks addressed in past evaluations by exploring three research questions: performance on real noise versus artificial noise, performance on matched training tasks versus mismatched tasks, and performance when paired with an i-vector backend versus a GMM-UBM backend. We find that noise-robust spectrum estimation methods can improve the performance of SID systems over the range of noise tasks evaluated, including real noisy tasks, matched training tasks, and i-vector backends. However, performance on the typical GMM-UBM mismatched artificially noised case did not predict performance on other tasks. Finally, the matched enrollment case is a significantly different problem than the mismatched enrollment case.
Cite as: Godin, K.W., Sadjadi, S.O., Hansen, J.H.L. (2013) Impact of noise reduction and spectrum estimation on noise robust speaker identification. Proc. Interspeech 2013, 3656-3660, doi: 10.21437/Interspeech.2013-685
@inproceedings{godin13_interspeech, author={Keith W. Godin and Seyed Omid Sadjadi and John H. L. Hansen}, title={{Impact of noise reduction and spectrum estimation on noise robust speaker identification}}, year=2013, booktitle={Proc. Interspeech 2013}, pages={3656--3660}, doi={10.21437/Interspeech.2013-685} }