Speaker variability is one of the major error sources for ASR systems. Speaker adaptation estimates speaker specific models from the speaker independent ones to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. One of the commonly adopted approaches is the transformation based method. In this paper, the discriminative input and output transforms for speaker adaptation in the hybrid NN/HMM systems are compared and further investigated with both structural and data-driven constraints. Experimental results show that the data-driven constrained discriminative transforms are much more robust for unsupervised adaptation.
Cite as: Li, B., Sim, K.C. (2010) Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. Proc. Interspeech 2010, 526-529, doi: 10.21437/Interspeech.2010-214
@inproceedings{li10_interspeech, author={Bo Li and Khe Chai Sim}, title={{Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems}}, year=2010, booktitle={Proc. Interspeech 2010}, pages={526--529}, doi={10.21437/Interspeech.2010-214} }