Speaker indexing sequentially detects points where speaker identity changes in a multi-speaker audio stream, and classifies each detected segment according to the speaker's identity. In unsupervised speaker indexing scenarios, there is no prior information/data about the speakers in the target data. To address this issue, a predetermined generic "speaker-independent" model set, called Sample Speaker Models (SSM), was previously proposed. While this set can be useful for more accurate speaker modeling and clustering without any target speaker models, an optimal method for sampling the models from such a set is still required. To address this problem, the Speaker Model Quantization (SMQ) method, motivated by Tree Structured Vector Quantization, is proposed. Experiments were performed with telephone conversations and broadcast news. Results showed that our new sampling approach outperformed the baseline by 5.5% absolute (37.7% relative) in error rate on 2 speaker telephone conversations, 10.7% absolute (42.5% relative) on broadcast news.
Cite as: Kwon, S., Narayanan, S. (2004) Speaker model quantization for unsupervised speaker indexing. Proc. Interspeech 2004, 1517-1520, doi: 10.21437/Interspeech.2004-571
@inproceedings{kwon04b_interspeech, author={Soonil Kwon and Shrikanth Narayanan}, title={{Speaker model quantization for unsupervised speaker indexing}}, year=2004, booktitle={Proc. Interspeech 2004}, pages={1517--1520}, doi={10.21437/Interspeech.2004-571} }