fMLLR based feature-space speaker adaptation of DNN acoustic models

Parthasarathi, Sree Hari Krishnan; Hoffmeister, Bjorn; Matsoukas, Spyros; Mandal, Arindam; Strom, Nikko; Garimella, Sri

doi:10.21437/Interspeech.2015-720

fMLLR based feature-space speaker adaptation of DNN acoustic models

Sree Hari Krishnan Parthasarathi, Bjorn Hoffmeister, Spyros Matsoukas, Arindam Mandal, Nikko Strom, Sri Garimella

We investigate the problem of speaker adaptation of DNN acoustic models in two settings: the traditional unsupervised adaptation and a supervised adaptation (SuA) where a few minutes of transcribed speech is available. SuA presents additional difficulties when a test speaker's adaptation information does not match the registered speaker's information. Employing feature-space maximum likelihood linear regression (fMLLR) transformed features as side-information to the DNN, we reintroduce some classical ideas for combining adapted and unadapted features: early and late fusion methods, as well as the estimation of the fMLLR transforms using simple target models (STM). Results show that early fusion helps DNNs generalize better when features are combined after a non-linear bottleneck layer, while late fusion improves robustness, specifically in mismatched cases. STM give consistent improvements in both settings.

doi: 10.21437/Interspeech.2015-720

Cite as: Parthasarathi, S.H.K., Hoffmeister, B., Matsoukas, S., Mandal, A., Strom, N., Garimella, S. (2015) fMLLR based feature-space speaker adaptation of DNN acoustic models. Proc. Interspeech 2015, 3630-3634, doi: 10.21437/Interspeech.2015-720

@inproceedings{parthasarathi15_interspeech,
  author={Sree Hari Krishnan Parthasarathi and Bjorn Hoffmeister and Spyros Matsoukas and Arindam Mandal and Nikko Strom and Sri Garimella},
  title={{fMLLR based feature-space speaker adaptation of DNN acoustic models}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={3630--3634},
  doi={10.21437/Interspeech.2015-720}
}