Voice activity detection, i.e., discrimination of the speech/non-speech segments in a speech signal, is an important enabling technology for a variety of speech-based applications including the speaker recognition. In this work we provide a performance evaluation of the following supervised and unsupervised VAD algorithms in the context of text-dependent speaker recognition on the RSR2015 (Robust Speaker Recognition 2015) task : Energy-based VAD with and without hangover scheme and endpoint detection, vector quantization-based VAD, Gaussian mixtures model (GMM)-based VAD (both supervised and unsupervised way), and sequential GMM-based VAD. Experimental results show that both the supervised and unsupervised GMM-based VADs perform better than the other VAD algorithms. Considering all three evaluation metrics (equal error rate, old (SRE 2008) and new (SRE 2010) normalized detection cost functions) unsupervised GMM-based VAD performed the best.
Cite as: Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., Dumouchel, P. (2014) Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus. Proc. The Speaker and Language Recognition Workshop (Odyssey 2014), 123-130, doi: 10.21437/Odyssey.2014-14
@inproceedings{kenny14_odyssey, author={Patrick Kenny and Themos Stafylakis and Pierre Ouellet and Md Jahangir Alam and Pierre Dumouchel}, title={{Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus}}, year=2014, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)}, pages={123--130}, doi={10.21437/Odyssey.2014-14} }