This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations. Only one central microphone is used to record the meeting. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. The new system determines the model complexity automatically. It adapts the segment model from a universal background model, and uses the cross-likelihood ratio instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Together this reduces the speaker diarization error rate from 21.76% to 17.21% compared with the baseline system [1].
Cite as: Fu, R., Benest, I.D. (2007) An improved speaker diarization system. Proc. Interspeech 2007, 2605-2608, doi: 10.21437/Interspeech.2007-587
@inproceedings{fu07_interspeech, author={Rong Fu and Ian D. Benest}, title={{An improved speaker diarization system}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={2605--2608}, doi={10.21437/Interspeech.2007-587} }