Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

Salishev, Sergey; Klotchkov, Ilya; Barabanov, Andrey

doi:10.1007/978-3-319-66429-3_52

Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

Sergey Salishev¹⁶,
Ilya Klotchkov¹⁷ &
Andrey Barabanov¹⁶

Conference paper
First Online: 13 August 2017

2204 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Abstract

We propose a novel computationally efficient real-time microphone array speech enhancement postfilter with a small delay that takes into account features of speech signal and recognition algorithms. The algorithm is efficient for small microphone arrays. The filter is based on applying a binary classification model to the Log Short-Term Spectral Amplitude (Log-STSA). The proposed algorithm allows substantial improvement of recognition accuracy with minor increase in complexity compared to Wiener post-filter and lower complexity compared to existing voice model based approaches. Objective tests using dual microphone array, ETSI binaural noise database, TIDIGITS database, and CMU Sphinx 4 speech recognizer demonstrate overall 41% Error Rate reduction for SNR from 15 dB to 0 dB. Subjective evaluation also demonstrates substantial noise reduction and intelligibility improvement without musical noise artifacts common for Wiener and Spectral Subtraction based methods. Testing with SiSEC10 four microphone linear equispaced array database shows that recognition accuracy is improved with increased base and/or number of microphones in array.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

ETSI EG 202 396–1 speech and multimedia transmission quality (STQ); part 1: Background noise simulation technique and background noise database, March 2009
Google Scholar
Source separation in the presence of real-world background noise: Test database for 2 channels case (2010). http://www.irisa.fr/metiss/SiSEC10/noise/SiSEC2010_diffuse_noise_2ch.html. Accessed 27 May 2017
Aleinik, S.: Optimization of Zelinski post-filtering calculation. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 523–530. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_63
Chapter Google Scholar
Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third chimespeech separation and recognition challenge: dataset, task and baselines. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 504–511. IEEE (2015)
Google Scholar
Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process. 33(2), 443–445 (1985)
Article Google Scholar
Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)
Article Google Scholar
Kamkar-Parsi, A.H., Bouchard, M.: Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 521–533 (2009)
Article Google Scholar
Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.: Design of the CMU sphinx-4 decoder. In: Eighth European Conference on Speech Communication and Technology (2003)
Google Scholar
Lefkimmiatis, S., Maragos, P.: A generalized estimation approach for linear and nonlinear microphone array post-filters. Speech Commun. 49(7), 657–666 (2007)
Article Google Scholar
Leonard, R.G., Doddington, G.: Tidigits. Linguistic Data Consortium, Philadelphia (1993)
Google Scholar
McCowan, I.A., Bourlard, H.: Microphone array post-filter based on noise field coherence. IEEE Trans. Speech Audio Process. 11(6), 709–716 (2003)
Article Google Scholar
McCowan, I.A., Marro, C., Mauuary, L.: Robust speech recognition using near-field superdirective beamforming with post-filtering. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, Proceedings, vol. 3, pp. 1723–1726. IEEE (2000)
Google Scholar
Plapous, C., Marro, C., Scalart, P.: Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006)
Article Google Scholar
Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. IEEE Sig. Process. Mag. 22(5), 101–116 (2005)
Article Google Scholar
Schmidt, G.: Lecture notes in pattern recognition: noise suppression (2016). http://dss.kirat-online.de/images/teaching/lectures/pattern_recognition/slides/pattern_recognition_02_noise_suppression.pdf. Accessed 27 May 2017
Yoshioka, T., Nakatani, T.: A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation. In: 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp. 219–224. IEEE (2011)
Google Scholar
Zelinski, R.: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1988, pp. 2578–2581. IEEE (1988)
Google Scholar

Download references

Acknowledgements

The work was supported by Saint-Petersburg State University, grants 6.37.349.2015 and 6.38.230.2015.

Author information

Authors and Affiliations

Saint Petersburg State University, Saint Petersburg, Russia
Sergey Salishev & Andrey Barabanov
Intel Labs, Intel Corporation, Santa Clara, CA, 95054-1549, USA
Ilya Klotchkov

Authors

Sergey Salishev
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Klotchkov
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Barabanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Salishev .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salishev, S., Klotchkov, I., Barabanov, A. (2017). Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_52
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics