Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition

Nanjo, Hiroaki; Kato, Kazuomi; Kawahara, Tatsuya

doi:10.21437/Eurospeech.2001-592

Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition

Hiroaki Nanjo, Kazuomi Kato, Tatsuya Kawahara

The paper addresses large vocabulary spontaneous speech recognition focusing on acoustic modeling that considers the speaking rate. Using the real lecture speech corpus collected under the priority research project in Japan, we have made baseline acoustic model, and evaluated on the automatic transcription of oral presentations by experienced speakers and obtained word accuracy of 58.2%. Compared with read speech, we have observed significant difference in the speaking rate. To handle fast and poorly articulated phone segments, several extensions of the modeling are explored. Specifically, we introduce state-skipping modeling, speech rate-dependent model, and syllable sub-word modeling. As a result, we reduced the word error rate by absolute 0.8%-2.0%. We also address a language modeling especially on effective use of various large text corpora.

doi: 10.21437/Eurospeech.2001-592

Cite as: Nanjo, H., Kato, K., Kawahara, T. (2001) Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 2531-2534, doi: 10.21437/Eurospeech.2001-592

@inproceedings{nanjo01_eurospeech,
  author={Hiroaki Nanjo and Kazuomi Kato and Tatsuya Kawahara},
  title={{Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={2531--2534},
  doi={10.21437/Eurospeech.2001-592}
}