Speaker verification using target and background dependent linear transforms and multi-system fusion

Navratil, Jiri; Chaudhari, Upendra V.; Ramaswamy, Ganesh N.

doi:10.21437/Eurospeech.2001-359

Speaker verification using target and background dependent linear transforms and multi-system fusion

Jiri Navratil, Upendra V. Chaudhari, Ganesh N. Ramaswamy

This paper describes a GMM-based speaker verification system that uses speaker-dependent background models transformed by speaker-specific maximum likelihood linear transforms to achieve a sharper separation between the target and the nontarget acoustic region. The effect of tying, or coupling, Gaussian components between the target and the background model is studied and shown to be a relevant factor with respect to the desired operating point. A fusion of scores from multiple systems built on different acoustic features via a neural network with performance gains over linear combination is also presented. Results obtained on the 1999 speaker recognition evaluation set indicate reductions of the minimum detection cost of up to 13% and 25% for all tests and electret-only tests respectively, as compared to a baseline GMM system. The neural fusion of three systems gains further 5% cost reduction.

doi: 10.21437/Eurospeech.2001-359

Cite as: Navratil, J., Chaudhari, U.V., Ramaswamy, G.N. (2001) Speaker verification using target and background dependent linear transforms and multi-system fusion. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1389-1392, doi: 10.21437/Eurospeech.2001-359

@inproceedings{navratil01_eurospeech,
  author={Jiri Navratil and Upendra V. Chaudhari and Ganesh N. Ramaswamy},
  title={{Speaker verification using target and background dependent linear transforms and multi-system fusion}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={1389--1392},
  doi={10.21437/Eurospeech.2001-359}
}