ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio

Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki

This paper proposes a front-end processing method for automatic speech recognition (ASR) that employs a voice activity detection (VAD) method based on the periodic to aperiodic component ratio (PAR). The proposed VAD method is called PARADE (PAR based Activity DEtection). By considering the powers of the periodic and aperiodic components of the observed signals simultaneously, PARADE can detect speech segments more precisely in the presence of noise than conventional VAD methods. In this paper, PARADE is applied to a front-end processing technique that employs a robust feature extraction method called SPADE (Subband based Periodicity and Aperiodicity DEcomposition). The noisy ASR performance was examined with the CENSREC-1-C database, which includes connected continuous digit speech utterances drawn from CENSREC-1 (Japanese version of AURORA-2). The result shows that the SPADE front-end combined with PARADE achieves average word accuracy of 74.22% at signal to noise ratios of 0 to 20 dB. This accuracy is significantly higher than that achieved by the ETSI ES 202 050 front-end (63.66%) and the SPADE front-end without PARADE (64.28%). This result also confirmed that PARADE can improve the performance of front-end processing.


doi: 10.21437/Interspeech.2007-93

Cite as: Ishizuka, K., Nakatani, T., Fujimoto, M., Miyazaki, N. (2007) Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio. Proc. Interspeech 2007, 230-233, doi: 10.21437/Interspeech.2007-93

@inproceedings{ishizuka07_interspeech,
  author={Kentaro Ishizuka and Tomohiro Nakatani and Masakiyo Fujimoto and Noboru Miyazaki},
  title={{Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={230--233},
  doi={10.21437/Interspeech.2007-93},
  issn={2308-457X}
}