Transformation and combination of hiden Markov models for speaker selection training

Huang, Chao; Chen, Tao; Chang, Eric

doi:10.21437/Interspeech.2004-5

Transformation and combination of hiden Markov models for speaker selection training

Chao Huang, Tao Chen, Eric Chang

This paper presents a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for test speaker using Gaussian mixture model, which is more reliable given very limited adaptation data. Then cohort models are linearly transformed closer to each test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed models. Combination weights as well as bias items are adaptively learned from adaptation data. Experiments showed that model transformation before combination would improve the robustness of the scheme. With only 30s of adaptation data, about 14.9% relative error rate reduction is achieved on a large vocabulary continuous speech recognition task.

doi: 10.21437/Interspeech.2004-5

Cite as: Huang, C., Chen, T., Chang, E. (2004) Transformation and combination of hiden Markov models for speaker selection training. Proc. Interspeech 2004, 9-12, doi: 10.21437/Interspeech.2004-5

@inproceedings{huang04_interspeech,
  author={Chao Huang and Tao Chen and Eric Chang},
  title={{Transformation and combination of hiden Markov models for speaker selection training}},
  year=2004,
  booktitle={Proc. Interspeech 2004},
  pages={9--12},
  doi={10.21437/Interspeech.2004-5}
}