Towards speaker adaptive training of deep neural network acoustic models

Miao, Yajie; Zhang, Hao; Metze, Florian

doi:10.21437/Interspeech.2014-490

Towards speaker adaptive training of deep neural network acoustic models

Yajie Miao, Hao Zhang, Florian Metze

We investigate the concept of speaker adaptive training (SAT) in the context of deep neural network (DNN) acoustic models. Previous studies have shown success of performing speaker adaptation for DNNs in speech recognition. In this paper, we apply SAT to DNNs by learning two types of feature mapping neural networks. Given an initial DNN model, these networks take speaker i-vectors as additional information and project DNN inputs into a speaker-normalized space. The final SAT model is obtained by updating the canonical DNN in the normalized feature space. Experiments on a Switchboard 110-hour setup show that compared with the baseline DNN, the SAT-DNN model brings 7.5% and 6.0% relative improvement when DNN inputs are speaker-independent and speaker-adapted features respectively. Further evaluations on the more challenging BABEL datasets reveal significant word error rate reduction achieved by SAT-DNN.

doi: 10.21437/Interspeech.2014-490

Cite as: Miao, Y., Zhang, H., Metze, F. (2014) Towards speaker adaptive training of deep neural network acoustic models. Proc. Interspeech 2014, 2189-2193, doi: 10.21437/Interspeech.2014-490

@inproceedings{miao14c_interspeech,
  author={Yajie Miao and Hao Zhang and Florian Metze},
  title={{Towards speaker adaptive training of deep neural network acoustic models}},
  year=2014,
  booktitle={Proc. Interspeech 2014},
  pages={2189--2193},
  doi={10.21437/Interspeech.2014-490}
}