Online speaker adaptation with pre-computed FMLLR transformations

Fischer, Volker; Kunzmann, Siegfried

doi:10.21437/Interspeech.2011-657

Online speaker adaptation with pre-computed FMLLR transformations

Volker Fischer, Siegfried Kunzmann

This paper presents a memory efficient single pass speech recognizer that makes use of pre-computed FMLLR transformations for online speaker adaptation. For that purpose we apply unsupervised segment clustering to the training corpus, create a transformation matrix for each cluster, and train a text-independent Gaussian mixture classifier for cluster selection during runtime. We use the RWTH Aachen University open source speech recognition toolkit for evaluation and compare the results to a standard speaker adaptive two pass decoding strategy. Results indicate that the method improves single pass recognition in VTLN feature space almost without overhead due to cluster selection, and show a relative improvement of up to 15 percent over speaker adaptative decoding, if only little data is available for unsupervised online adaptation.

doi: 10.21437/Interspeech.2011-657

Cite as: Fischer, V., Kunzmann, S. (2011) Online speaker adaptation with pre-computed FMLLR transformations. Proc. Interspeech 2011, 2569-2572, doi: 10.21437/Interspeech.2011-657

@inproceedings{fischer11_interspeech,
  author={Volker Fischer and Siegfried Kunzmann},
  title={{Online speaker adaptation with pre-computed FMLLR transformations}},
  year=2011,
  booktitle={Proc. Interspeech 2011},
  pages={2569--2572},
  doi={10.21437/Interspeech.2011-657},
  issn={2308-457X}
}