ISCA Archive Odyssey 2014
ISCA Archive Odyssey 2014

Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus

Patrick Kenny, Themos Stafylakis, Pierre Ouellet, Md Jahangir Alam, Pierre Dumouchel

Voice activity detection, i.e., discrimination of the speech/non-speech segments in a speech signal, is an important enabling technology for a variety of speech-based applications including the speaker recognition. In this work we provide a performance evaluation of the following supervised and unsupervised VAD algorithms in the context of text-dependent speaker recognition on the RSR2015 (Robust Speaker Recognition 2015) task : Energy-based VAD with and without hangover scheme and endpoint detection, vector quantization-based VAD, Gaussian mixtures model (GMM)-based VAD (both supervised and unsupervised way), and sequential GMM-based VAD. Experimental results show that both the supervised and unsupervised GMM-based VADs perform better than the other VAD algorithms. Considering all three evaluation metrics (equal error rate, old (SRE 2008) and new (SRE 2010) normalized detection cost functions) unsupervised GMM-based VAD performed the best.


doi: 10.21437/Odyssey.2014-14

Cite as: Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., Dumouchel, P. (2014) Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus. Proc. The Speaker and Language Recognition Workshop (Odyssey 2014), 123-130, doi: 10.21437/Odyssey.2014-14

@inproceedings{kenny14_odyssey,
  author={Patrick Kenny and Themos Stafylakis and Pierre Ouellet and Md Jahangir Alam and Pierre Dumouchel},
  title={{Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus}},
  year=2014,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)},
  pages={123--130},
  doi={10.21437/Odyssey.2014-14}
}