Existing automatic speaker verification (ASV) systems perform with high accuracy when the speech signal is collected close to the mouth of the speaker (< 1 ft). However, the performance of these systems reduces significantly when speech signals are collected at a distance from the speaker (2-6 ft). The objective of this paper is to address some issues in the processing of speech signals collected at a distance from the speaker, for text-dependent ASV system. An acoustic feature derived from short segments of speech signals is proposed for the ASV task. The key idea is to exploit the high signal-to-noise nature of short segments of speech in the vicinity of impulse-like excitations. We show that the proposed feature yields better performance of speaker verification than the mel-frequency cepstral coefficients (MFCCs). In addition, regions of high signal-to-reverberation ratio, duration and pitch information are used to improve the performance of the ASV system for distant speech.
Cite as: Avinash, B., Guruprasad, S., Yegnanarayana, B. (2010) Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals. Proc. Interspeech 2010, 1073-1076, doi: 10.21437/Interspeech.2010-141
@inproceedings{avinash10_interspeech, author={B. Avinash and S. Guruprasad and Bayya Yegnanarayana}, title={{Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals}}, year=2010, booktitle={Proc. Interspeech 2010}, pages={1073--1076}, doi={10.21437/Interspeech.2010-141} }