Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering

Ikbal, Shajith; Visweswariah, Karthik

doi:10.21437/Interspeech.2008-5

Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering

Shajith Ikbal, Karthik Visweswariah

In this paper, we present a novel approach to speaker clustering involving the use of hetero-associative neural network (HANN) to compute very low dimensional speaker discriminatory features (in our case 1-dimensional) in a data-driven manner. A HANN trained to map input feature space onto speaker labels through a bottle-neck hidden layer is expected to learn very low dimensional feature subspace essentially containing speaker information. The lower dimensional features are further used in a simple k-means clustering algorithm to obtain speaker segmentation. Evaluation of this approach on a database of real-life conversational speech from call-centers show that clustering performance achieved is similar to that of the state-of-the-art systems, although our approach uses just 1-dimensional features. Augmenting these features with the traditional mel-frequency cepstral coefficients (MFCC) features in the state-of-the-art system resulted in improved clustering performance.

doi: 10.21437/Interspeech.2008-5

Cite as: Ikbal, S., Visweswariah, K. (2008) Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering. Proc. Interspeech 2008, 28-31, doi: 10.21437/Interspeech.2008-5

@inproceedings{ikbal08_interspeech,
  author={Shajith Ikbal and Karthik Visweswariah},
  title={{Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={28--31},
  doi={10.21437/Interspeech.2008-5}
}