ISCA Archive Interspeech 2004
ISCA Archive Interspeech 2004

A new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering

Shang-nien Tsai, Lin-shan Lee

In this paper, a new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering is proposed. The progressive histogram equalization (PHEQ) performs the histogram equalization (HEQ) progressively with respect to a reference interval which moves with the present frame to be processed. The multi-eigenvector temporal filtering (m-eigen) uses the linear combination of m eigenvectors corresponding to the largest eigenvalues in the PCA-based temporal filtering approach. The very useful handling of two-stage Wiener filtering (2WF) and SNR-dependent waveform processing (SWP) are first applied to remove the noise and enhance the overall SNR. MFCC parameters are then extracted, followed by the progressive histogram equalization. The multi-eigenvector temporal filtering is finally performed to produce robust feature extraction for speech recognition. Extensive experiments with respect to AURORA2 database and testing conditions verified the effectiveness of each component here and showed that the proposed front-end gives better overall performance when compared to the Advanced Front-End recently announced by ETSI, especially under channel-mismatched conditions.


doi: 10.21437/Interspeech.2004-109

Cite as: Tsai, S.-n., Lee, L.-s. (2004) A new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering. Proc. Interspeech 2004, 165-168, doi: 10.21437/Interspeech.2004-109

@inproceedings{tsai04b_interspeech,
  author={Shang-nien Tsai and Lin-shan Lee},
  title={{A new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering}},
  year=2004,
  booktitle={Proc. Interspeech 2004},
  pages={165--168},
  doi={10.21437/Interspeech.2004-109}
}