Noise robust speech recognition using F0 contour extracted by hough transform

Iwano, Koji; Seki, Takahiro; Furui, Sadaoki

doi:10.21437/ICSLP.2002-313

Noise robust speech recognition using F0 contour extracted by hough transform

Koji Iwano, Takahiro Seki, Sadaoki Furui

This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, fundamental frequency (F0) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using Hough transform, which achieves high extraction rates under various noise environments. Then it proposes a robust speech recognition method using syllable HMMs which model both segmental spectral features and F0 contours. Speaker-independent experiments are conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. The recognition accuracy is improved in all noise conditions, and the best absolute improvement of digit accuracy is about 4.7%. This improvement is achieved due to the more precise digit boundary detection by the robust prosodic information.

doi: 10.21437/ICSLP.2002-313

Cite as: Iwano, K., Seki, T., Furui, S. (2002) Noise robust speech recognition using F0 contour extracted by hough transform. Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), 941-944, doi: 10.21437/ICSLP.2002-313

@inproceedings{iwano02_icslp,
  author={Koji Iwano and Takahiro Seki and Sadaoki Furui},
  title={{Noise robust speech recognition using F0 contour extracted by hough transform}},
  year=2002,
  booktitle={Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002)},
  pages={941--944},
  doi={10.21437/ICSLP.2002-313}
}