In this paper we evaluate some model-based and data-driven algorithms for robust speech recognition in noise, using the experimental framework provided by ETSI Aurora 2. Specifically, we focus on statistical linear approximation (SLA), sequential interacting multiple models (S-IMM), and histogram normalization (HN). As the baseline for the feature extraction scheme we use the ETSI front-end. Recognition tests on a subset of Aurora 2 show that SLA is approximately 4% better than HN and that S-IMM is worse than HN by almost 3% in terms of absolute word accuracy. A comparison with the ETSI advanced front-end (AFE) is also presented. While none of these algorithms outperforms AFE, we identify the reasons why this might have happened and point out potential directions for improvement.
Cite as: Setiawan, P., Stan, S., Fingscheidt, T. (2004) Revisiting some model-based and data-driven denoising algorithms in Aurora 2 context. Proc. Interspeech 2004, 145-148, doi: 10.21437/Interspeech.2004-104
@inproceedings{setiawan04_interspeech, author={Panji Setiawan and Sorel Stan and Tim Fingscheidt}, title={{Revisiting some model-based and data-driven denoising algorithms in Aurora 2 context}}, year=2004, booktitle={Proc. Interspeech 2004}, pages={145--148}, doi={10.21437/Interspeech.2004-104} }