In this work, we demonstrate that the most widely-used model for the relationship between noisy speech, clean speech and noise in the log-Mel domain is inaccurate due to its disregard of the phase. Moreover, we show how a more exact model can be derived by averaging over the phase in the log-Mel domain, and how this can profitably be applied to particle filter based sequential noise compensation. Experimental results confirm the superiority of the phase-averaged model for both clean speech estimation in general and the particle filter in particular. Reductions in word error rate of up to 17% relative were obtained on a large vocabulary task.
Cite as: Faubel, F., McDonough, J., Klakow, D. (2008) A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain. Proc. Interspeech 2008, 553-556, doi: 10.21437/Interspeech.2008-164
@inproceedings{faubel08_interspeech, author={Friedrich Faubel and John McDonough and Dietrich Klakow}, title={{A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain}}, year=2008, booktitle={Proc. Interspeech 2008}, pages={553--556}, doi={10.21437/Interspeech.2008-164} }