Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments

Yamade, Shingo; Lee, Akinobu; Saruwatari, Hiroshi; Shikano, Kiyohiro

doi:10.21437/Eurospeech.2003-434

Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments

Shingo Yamade, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

Noise and speaker adaptation techniques are essential to realize robust speech recognition in noisy environments. In this paper, first, a noise robust speech recognition algorithm is implemented by superimposing a small quantity of noise data on spectral subtracted input speech. According to the recognition experiments, 30dB SNR noise superimposition on input speech after spectral subtraction increases the robustness against different noises significantly. Next, we apply this noise robust speech recognition to the unsupervised speaker adaptation algorithm based on HMM sufficient statistics in different noise environments. The HMM sufficient statistics for each speaker are calculated from 25dB SNR office noise added speech database beforehand. We evaluate successfully our proposed unsupervised speaker adaptation algorithm in noisy environments with 20k dictation task using 11 kinds of different noises, including office, car, exhibition, and crowd noises.

doi: 10.21437/Eurospeech.2003-434

Cite as: Yamade, S., Lee, A., Saruwatari, H., Shikano, K. (2003) Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments. Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), 1493-1496, doi: 10.21437/Eurospeech.2003-434

@inproceedings{yamade03_eurospeech,
  author={Shingo Yamade and Akinobu Lee and Hiroshi Saruwatari and Kiyohiro Shikano},
  title={{Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments}},
  year=2003,
  booktitle={Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003)},
  pages={1493--1496},
  doi={10.21437/Eurospeech.2003-434}
}