Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition

Yamakawa, Nobuhide; Kitahara, Tetsuro; Takahashi, Toru; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

doi:10.21437/Interspeech.2010-641

Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition

Nobuhide Yamakawa, Tetsuro Kitahara, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research on environmental sound recognition has not shown great development in comparison with that on speech and musical signals. One of the reasons is that the category of environmental sounds covers a broad range of acoustical natures. We classified them in order to explore suitable recognition techniques for each characteristic. We focus on impulsive sounds and their non-stationary feature within and between analytic frames. We used matching-pursuit as a framework to use wavelet analysis for extracting temporal variation of audio features inside a frame. We also investigated the validity of modeling decaying patterns of sounds using Hidden Markov Models. Experimental results indicate that sounds with multiple impulsive signals are recognized better by using time-frequency analyzing bases than by frequency domain analysis. Classification of sound classes with a long and clear decaying pattern improves when multiple number of HMMs are applied.

doi: 10.21437/Interspeech.2010-641

Cite as: Yamakawa, N., Kitahara, T., Takahashi, T., Komatani, K., Ogata, T., Okuno, H.G. (2010) Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition. Proc. Interspeech 2010, 2342-2345, doi: 10.21437/Interspeech.2010-641

@inproceedings{yamakawa10_interspeech,
  author={Nobuhide Yamakawa and Tetsuro Kitahara and Toru Takahashi and Kazunori Komatani and Tetsuya Ogata and Hiroshi G. Okuno},
  title={{Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition}},
  year=2010,
  booktitle={Proc. Interspeech 2010},
  pages={2342--2345},
  doi={10.21437/Interspeech.2010-641}
}