In this paper, we present a novel approach to speaker clustering involving the use of hetero-associative neural network (HANN) to compute very low dimensional speaker discriminatory features (in our case 1-dimensional) in a data-driven manner. A HANN trained to map input feature space onto speaker labels through a bottle-neck hidden layer is expected to learn very low dimensional feature subspace essentially containing speaker information. The lower dimensional features are further used in a simple k-means clustering algorithm to obtain speaker segmentation. Evaluation of this approach on a database of real-life conversational speech from call-centers show that clustering performance achieved is similar to that of the state-of-the-art systems, although our approach uses just 1-dimensional features. Augmenting these features with the traditional mel-frequency cepstral coefficients (MFCC) features in the state-of-the-art system resulted in improved clustering performance.
Cite as: Ikbal, S., Visweswariah, K. (2008) Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering. Proc. Interspeech 2008, 28-31, doi: 10.21437/Interspeech.2008-5
@inproceedings{ikbal08_interspeech, author={Shajith Ikbal and Karthik Visweswariah}, title={{Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering}}, year=2008, booktitle={Proc. Interspeech 2008}, pages={28--31}, doi={10.21437/Interspeech.2008-5} }