An HMM acoustic model incorporating various additional knowledge sources

Sakti, Sakriani; Markov, Konstantin; Nakamura, Satoshi

doi:10.21437/Interspeech.2007-572

An HMM acoustic model incorporating various additional knowledge sources

Sakriani Sakti, Konstantin Markov, Satoshi Nakamura

We introduce a method of incorporating additional knowledge sources into an HMM-based statistical acoustic model. The probabilistic relationship between information sources is first learned through a Bayesian network to easily integrate any additional knowledge sources that might come from any domain and then the global joint probability density function (PDF) of the model is formulated. Where the model becomes too complex and direct BN inference is intractable, we utilize a junction tree algorithm to decompose the global joint PDF into a linked set of local conditional PDFs. This way, a simplified form of the model can be constructed and reliably estimated using a limited amount of training data. Here, we apply this framework to incorporate accents, gender, and wide-phonetic knowledge information at the HMM phonetic model level. The performance of the proposed method was evaluated on an LVCSR task using two different types of accented English speech data. Experimental results revealed that our method improves word accuracy with respect to standard HMM.

doi: 10.21437/Interspeech.2007-572

Cite as: Sakti, S., Markov, K., Nakamura, S. (2007) An HMM acoustic model incorporating various additional knowledge sources. Proc. Interspeech 2007, 2117-2120, doi: 10.21437/Interspeech.2007-572

@inproceedings{sakti07_interspeech,
  author={Sakriani Sakti and Konstantin Markov and Satoshi Nakamura},
  title={{An HMM acoustic model incorporating various additional knowledge sources}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={2117--2120},
  doi={10.21437/Interspeech.2007-572}
}