We present a 2-D spectro-temporal Gabor filterbank based on the 2-D Fast Fourier Transform, and show how it may be used to analyze localized patches of a spectrogram. We argue that the 2-D Gabor filterbank has the capacity to decompose a patch into its underlying dominant spectro-temporal components, and we illustrate the response of our filterbank to different speech phenomena such as harmonicity, formants, vertical onsets/offsets, noise, and overlapping simultaneous speakers.
Cite as: Ezzat, T., Bouvrie, J., Poggio, T. (2007) Spectro-temporal analysis of speech using 2-d Gabor filters. Proc. Interspeech 2007, 506-509, doi: 10.21437/Interspeech.2007-236
@inproceedings{ezzat07_interspeech, author={Tony Ezzat and Jake Bouvrie and Tomaso Poggio}, title={{Spectro-temporal analysis of speech using 2-d Gabor filters}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={506--509}, doi={10.21437/Interspeech.2007-236} }