Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models

Markov, Konstantin P.; Nakagawa, Seiichi

doi:10.21437/ICSLP.1996-448

Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models

Konstantin P. Markov, Seiichi Nakagawa

In this paper we propose a new speaker identification system, where the likelihood normalization technique, widely used for speaker verification, is introduced. In the new system, which is based on Gaussian Mixture Models, every frame of the test utterance is inputed to all the reference models in parallel. In this procedure, for each frame, likelihoods from all the models are available, hence they can be normalized at every frame. A special kind of likelihood normalization, called Weighting Models Rank, is also proposed. Experiments were performed using two databases - TIMIT and NTT. Evaluation results clearly show that frame level likelihood normalization technique is superior to the standard accumulated likelihood approach.

doi: 10.21437/ICSLP.1996-448

Cite as: Markov, K.P., Nakagawa, S. (1996) Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models. Proc. 4th International Conference on Spoken Language Processing (ICSLP 1996), 1764-1767, doi: 10.21437/ICSLP.1996-448

@inproceedings{markov96_icslp,
  author={Konstantin P. Markov and Seiichi Nakagawa},
  title={{Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models}},
  year=1996,
  booktitle={Proc. 4th International Conference on Spoken Language Processing (ICSLP 1996)},
  pages={1764--1767},
  doi={10.21437/ICSLP.1996-448}
}