ISCA Archive Eurospeech 2003
ISCA Archive Eurospeech 2003

Noise-robust automatic speech recognition using orthogonalized distinctive phonetic feature vectors

Takashi Fukuda, Tsuneo Nitta

With the aim of using an automatic speech recognition (ASR) system in practical environments, various approaches focused on noise-robustness such as noise adaptation and reduction techniques have been investigated. We have previously proposed a distinctive phonetic feature (DPF) parameter set for a noise-robust ASR system, which reduced the effect of high-level additive noise[1]. This paper describes an attempt to apply an orthogonalized DPF parameter set as an input of HMMs. In our proposed method, orthogonal bases are calculated using conventional DPF vectors that represent 38 Japanese phonemes, then the Karhunen-Loeve transform (KLT) is used to orthogonalize the DPFs, output from a multilayer neural network (MLN), by using the orthogonal bases. In experiments, orthogonalized DPF parameters were firstly compared with original DPF parameters on an isolated spoken-word recognition task with clean speech. Noise robustness was then tested with four types of additive noise. The proposed orthogonalized DPFs can reduce the error rate in an isolated spoken-word recognition task both with clean speech and with speech contaminated by additive noise. Furthermore, we achieved significant improvements over a baseline system with MFCC and dynamic feature-set when combining the orthogonalized DPFs with conventional static MFCCs and (Delta)P.


doi: 10.21437/Eurospeech.2003-515

Cite as: Fukuda, T., Nitta, T. (2003) Noise-robust automatic speech recognition using orthogonalized distinctive phonetic feature vectors. Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), 2189-2192, doi: 10.21437/Eurospeech.2003-515

@inproceedings{fukuda03b_eurospeech,
  author={Takashi Fukuda and Tsuneo Nitta},
  title={{Noise-robust automatic speech recognition using orthogonalized distinctive phonetic feature vectors}},
  year=2003,
  booktitle={Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003)},
  pages={2189--2192},
  doi={10.21437/Eurospeech.2003-515}
}