ISCA Archive Eurospeech 1997
ISCA Archive Eurospeech 1997

Normalization of speaker variability by spectrum warping for robust speech recognition

Y.C. Chu, Charlie Jie, Vincent Tung, Ben Lin, Richard Lee

This paper examines techniques for normalization of unseen speakers in recognition. Two implementations of linear spectrum warping were examined: time domain resampling and filter bank scaling. It is shown that for seen speakers, the models trained by unwarped utterances are less sensitive to spectrum warping by filter bank scaling than by resampling. A pitch-based scheme for warping factor estimation has been proposed. The method is shown to be cost-effective in reducing the variability of unseen speakers compared to the ML-based methods. In particular the combination of filter bank scaling with the pitch- based warping factor estimation reduces the error rate of isolated Mandarin digit recognition by more than 30% for unseen speakers.


doi: 10.21437/Eurospeech.1997-116

Cite as: Chu, Y.C., Jie, C., Tung, V., Lin, B., Lee, R. (1997) Normalization of speaker variability by spectrum warping for robust speech recognition. Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997), 1127-1130, doi: 10.21437/Eurospeech.1997-116

@inproceedings{chu97_eurospeech,
  author={Y.C. Chu and Charlie Jie and Vincent Tung and Ben Lin and Richard Lee},
  title={{Normalization of speaker variability by spectrum warping for robust speech recognition}},
  year=1997,
  booktitle={Proc. 5th European Conference on Speech Communication and Technology (Eurospeech 1997)},
  pages={1127--1130},
  doi={10.21437/Eurospeech.1997-116}
}