ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Model-driven detection of clean speech patches in noise

Jonathan Laidler, Martin Cooke, Neil D. Lawrence

Listeners may be able to recognise speech in adverse conditions by "glimpsing" time-frequency regions where the target speech is dominant. Previous computational attempts to identify such regions have been source-driven, using primitive cues. This paper describes a model-driven approach in which the likelihood of spectro-temporal patches of a noisy mixture representing speech is given by a generative model. The focus is on patch size and patch modelling. Small patches lead to a lack of discrimination, while large patches are more likely to contain contributions from other sources. A "cleanness" measure reveals that a good patch size is one which extends over a quarter of the speech frequency range and lasts for 40 ms. Gaussian mixture models are used to represent patches. A compact representation based on a 2D discrete cosine transform leads to reasonable speech/background discrimination.


doi: 10.21437/Interspeech.2007-333

Cite as: Laidler, J., Cooke, M., Lawrence, N.D. (2007) Model-driven detection of clean speech patches in noise. Proc. Interspeech 2007, 922-925, doi: 10.21437/Interspeech.2007-333

@inproceedings{laidler07_interspeech,
  author={Jonathan Laidler and Martin Cooke and Neil D. Lawrence},
  title={{Model-driven detection of clean speech patches in noise}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={922--925},
  doi={10.21437/Interspeech.2007-333}
}