An improved speaker diarization system

Fu, Rong; Benest, Ian D.

doi:10.21437/Interspeech.2007-587

An improved speaker diarization system

Rong Fu, Ian D. Benest

This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations. Only one central microphone is used to record the meeting. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. The new system determines the model complexity automatically. It adapts the segment model from a universal background model, and uses the cross-likelihood ratio instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Together this reduces the speaker diarization error rate from 21.76% to 17.21% compared with the baseline system [1].

doi: 10.21437/Interspeech.2007-587

Cite as: Fu, R., Benest, I.D. (2007) An improved speaker diarization system. Proc. Interspeech 2007, 2605-2608, doi: 10.21437/Interspeech.2007-587

@inproceedings{fu07_interspeech,
  author={Rong Fu and Ian D. Benest},
  title={{An improved speaker diarization system}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={2605--2608},
  doi={10.21437/Interspeech.2007-587}
}