This paper presents a memory efficient single pass speech recognizer that makes use of pre-computed FMLLR transformations for online speaker adaptation. For that purpose we apply unsupervised segment clustering to the training corpus, create a transformation matrix for each cluster, and train a text-independent Gaussian mixture classifier for cluster selection during runtime. We use the RWTH Aachen University open source speech recognition toolkit for evaluation and compare the results to a standard speaker adaptive two pass decoding strategy. Results indicate that the method improves single pass recognition in VTLN feature space almost without overhead due to cluster selection, and show a relative improvement of up to 15 percent over speaker adaptative decoding, if only little data is available for unsupervised online adaptation.
Cite as: Fischer, V., Kunzmann, S. (2011) Online speaker adaptation with pre-computed FMLLR transformations. Proc. Interspeech 2011, 2569-2572, doi: 10.21437/Interspeech.2011-657
@inproceedings{fischer11_interspeech, author={Volker Fischer and Siegfried Kunzmann}, title={{Online speaker adaptation with pre-computed FMLLR transformations}}, year=2011, booktitle={Proc. Interspeech 2011}, pages={2569--2572}, doi={10.21437/Interspeech.2011-657}, issn={2308-457X} }