Comparison of subspace methods for Gaussian mixture models in speech recognition

Varjokallio, Matti; Kurimo, Mikko

doi:10.21437/Interspeech.2007-573

Comparison of subspace methods for Gaussian mixture models in speech recognition

Matti Varjokallio, Mikko Kurimo

Speech recognizers typically use high-dimensional feature vectors to capture the essential cues for speech recognition purposes. The acoustics are then commonly modeled with a Hidden Markov Model with Gaussian Mixture Models as observation probability density functions. Using unrestricted Gaussian parameters might lead to intolerable model costs both evaluation- and storagewise, which limits their practical use only to some high-end systems. The classical approach to tackle with these problems is to assume independent features and constrain the covariance matrices to being diagonal. This can be thought as constraining the second order parameters to lie in a fixed subspace consisting of rank-1 terms. In this paper we discuss the differences between recently proposed subspace methods for GMMs with emphasis placed on the applicability of the models to a practical LVCSR system.

doi: 10.21437/Interspeech.2007-573

Cite as: Varjokallio, M., Kurimo, M. (2007) Comparison of subspace methods for Gaussian mixture models in speech recognition. Proc. Interspeech 2007, 2121-2124, doi: 10.21437/Interspeech.2007-573

@inproceedings{varjokallio07_interspeech,
  author={Matti Varjokallio and Mikko Kurimo},
  title={{Comparison of subspace methods for Gaussian mixture models in speech recognition}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={2121--2124},
  doi={10.21437/Interspeech.2007-573}
}