Dynamic integration of multiple feature streams for robust real-time LVCSR

Sato, Shoei; Onoe, Kazuo; Kobayashi, Akio; Homma, Shinich; Imai, Toru; Takagi, Tohru; Kobayashi, Tetsunori

doi:10.21437/Interspeech.2007-373

Dynamic integration of multiple feature streams for robust real-time LVCSR

Shoei Sato, Kazuo Onoe, Akio Kobayashi, Shinich Homma, Toru Imai, Tohru Takagi, Tetsunori Kobayashi

We present a novel method of integrating the likelihoods of multiple feature streams for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a heavier weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to bring out discriminative ability. The weight is calculated in real time from mutual information between an input stream and active HMM states in a search space. In this paper, we describe three features that are extracted through auditory filters by taking into account the human auditory system extracting amplitude and frequency modulations. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments using field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9% relative to the best result obtained from a single stream.

doi: 10.21437/Interspeech.2007-373

Cite as: Sato, S., Onoe, K., Kobayashi, A., Homma, S., Imai, T., Takagi, T., Kobayashi, T. (2007) Dynamic integration of multiple feature streams for robust real-time LVCSR. Proc. Interspeech 2007, 1146-1149, doi: 10.21437/Interspeech.2007-373

@inproceedings{sato07_interspeech,
  author={Shoei Sato and Kazuo Onoe and Akio Kobayashi and Shinich Homma and Toru Imai and Tohru Takagi and Tetsunori Kobayashi},
  title={{Dynamic integration of multiple feature streams for robust real-time LVCSR}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={1146--1149},
  doi={10.21437/Interspeech.2007-373}
}