Group sparse hidden Markov models for speech recognition

Chien, Jen-Tzung; Chiang, Cheng-Chun

doi:10.21437/Interspeech.2012-508

Group sparse hidden Markov models for speech recognition

Jen-Tzung Chien, Cheng-Chun Chiang

This paper presents the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The group of common bases represents the features across states within a HMM. The group of individual bases compensates the intra-state residual information. Importantly, the sparse prior for sensing weights is controlled by the Laplacian scale mixture (LSM) distribution which is obtained by multiplying Laplacian variable with an inverse Gamma variable. The scale mixture parameter in LSM makes the distribution even sparser. This parameter serves as an automatic relevance determination for selecting relevant bases from two groups. The weights and two sets of bases in GS-HMMs are estimated via Bayesian learning. We apply this framework for acoustic modeling and show the robustness of GS-HMMs for speech recognition in presence of different noises types and SNRs.

Index Terms: Bayesian learning, group sparsity, hidden Markov model, speech recognition

doi: 10.21437/Interspeech.2012-508

Cite as: Chien, J.-T., Chiang, C.-C. (2012) Group sparse hidden Markov models for speech recognition. Proc. Interspeech 2012, 2646-2649, doi: 10.21437/Interspeech.2012-508

@inproceedings{chien12b_interspeech,
  author={Jen-Tzung Chien and Cheng-Chun Chiang},
  title={{Group sparse hidden Markov models for speech recognition}},
  year=2012,
  booktitle={Proc. Interspeech 2012},
  pages={2646--2649},
  doi={10.21437/Interspeech.2012-508}
}