ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Rapid vocal tract length normalization using maximum likelihood estimation

Tadashi Emori, Koichi Shinoda

Recently, vocal tract length normalization (VTLN) techniques have been developed for speaker normalization in speech recognition. This paper proposes a new VTLN method, in which the vocal tract length is normalized in the cepstrum space by means of linear mapping whose parameter is derived using maximum-likelihood estimation. The computational costs of this method are much lower than that of such conventional methods as ML-VTLN, in which the parameter for mapping is selected from among several parameters. Further, the new method offers greater precision in determining parameters for individual speakers. Experimental use of the method resulted in an error reduction rate of 7.1%. A combination of the proposed method with cepstrum mean normalization (CMN) method was also examined and found to reduce the error rate even more, by 14.6%.


doi: 10.21437/Eurospeech.2001-204

Cite as: Emori, T., Shinoda, K. (2001) Rapid vocal tract length normalization using maximum likelihood estimation. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1649-1652, doi: 10.21437/Eurospeech.2001-204

@inproceedings{emori01_eurospeech,
  author={Tadashi Emori and Koichi Shinoda},
  title={{Rapid vocal tract length normalization using maximum likelihood estimation}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={1649--1652},
  doi={10.21437/Eurospeech.2001-204}
}