Speaker model quantization for unsupervised speaker indexing

Kwon, Soonil; Narayanan, Shrikanth

doi:10.21437/Interspeech.2004-571

Speaker model quantization for unsupervised speaker indexing

Soonil Kwon, Shrikanth Narayanan

Speaker indexing sequentially detects points where speaker identity changes in a multi-speaker audio stream, and classifies each detected segment according to the speaker's identity. In unsupervised speaker indexing scenarios, there is no prior information/data about the speakers in the target data. To address this issue, a predetermined generic "speaker-independent" model set, called Sample Speaker Models (SSM), was previously proposed. While this set can be useful for more accurate speaker modeling and clustering without any target speaker models, an optimal method for sampling the models from such a set is still required. To address this problem, the Speaker Model Quantization (SMQ) method, motivated by Tree Structured Vector Quantization, is proposed. Experiments were performed with telephone conversations and broadcast news. Results showed that our new sampling approach outperformed the baseline by 5.5% absolute (37.7% relative) in error rate on 2 speaker telephone conversations, 10.7% absolute (42.5% relative) on broadcast news.

doi: 10.21437/Interspeech.2004-571

Cite as: Kwon, S., Narayanan, S. (2004) Speaker model quantization for unsupervised speaker indexing. Proc. Interspeech 2004, 1517-1520, doi: 10.21437/Interspeech.2004-571

@inproceedings{kwon04b_interspeech,
  author={Soonil Kwon and Shrikanth Narayanan},
  title={{Speaker model quantization for unsupervised speaker indexing}},
  year=2004,
  booktitle={Proc. Interspeech 2004},
  pages={1517--1520},
  doi={10.21437/Interspeech.2004-571}
}