Phoneme background model for information bottleneck based speaker diarization

Yella, Sree Harsha; Motlicek, Petr; Bourlard, Hervé

doi:10.21437/Interspeech.2014-144

Phoneme background model for information bottleneck based speaker diarization

Sree Harsha Yella, Petr Motlicek, Hervé Bourlard

Acoustic variability of speakers arises due to differences in their vocal tract characteristics. These individual speaker characteristics are reflected in a speech signal when speakers pronounce a given phoneme. The current work hypothesizes that clusters within a phoneme spoken by multiple speakers roughly correspond to different speakers. Based on this hypothesis, a Gaussian mixture model (GMM) based phoneme background model (PBM) is estimated. The components of such a PBM are used as a set of relevance variables in information bottleneck based speaker diarization system. Experiments are done using phone transcripts obtained from ground-truth and automatic speech recognition (ASR) system to estimate the PBM. The diarization experiments done on meeting recordings from AMI and NISTRT corpora show that the proposed method achieves significant improvements over the system using a background model which ignores phoneme information.

doi: 10.21437/Interspeech.2014-144

Cite as: Yella, S.H., Motlicek, P., Bourlard, H. (2014) Phoneme background model for information bottleneck based speaker diarization. Proc. Interspeech 2014, 597-601, doi: 10.21437/Interspeech.2014-144

@inproceedings{yella14_interspeech,
  author={Sree Harsha Yella and Petr Motlicek and Hervé Bourlard},
  title={{Phoneme background model for information bottleneck based speaker diarization}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={597--601},
  doi={10.21437/Interspeech.2014-144}
}