Skip to main content

Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Abstract

We propose a novel computationally efficient real-time microphone array speech enhancement postfilter with a small delay that takes into account features of speech signal and recognition algorithms. The algorithm is efficient for small microphone arrays. The filter is based on applying a binary classification model to the Log Short-Term Spectral Amplitude (Log-STSA). The proposed algorithm allows substantial improvement of recognition accuracy with minor increase in complexity compared to Wiener post-filter and lower complexity compared to existing voice model based approaches. Objective tests using dual microphone array, ETSI binaural noise database, TIDIGITS database, and CMU Sphinx 4 speech recognizer demonstrate overall 41% Error Rate reduction for SNR from 15 dB to 0 dB. Subjective evaluation also demonstrates substantial noise reduction and intelligibility improvement without musical noise artifacts common for Wiener and Spectral Subtraction based methods. Testing with SiSEC10 four microphone linear equispaced array database shows that recognition accuracy is improved with increased base and/or number of microphones in array.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. ETSI EG 202 396–1 speech and multimedia transmission quality (STQ); part 1: Background noise simulation technique and background noise database, March 2009

    Google Scholar 

  2. Source separation in the presence of real-world background noise: Test database for 2 channels case (2010). http://www.irisa.fr/metiss/SiSEC10/noise/SiSEC2010_diffuse_noise_2ch.html. Accessed 27 May 2017

  3. Aleinik, S.: Optimization of Zelinski post-filtering calculation. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS, vol. 9811, pp. 523–530. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_63

    Chapter  Google Scholar 

  4. Barker, J., Marxer, R., Vincent, E., Watanabe, S.: The third chimespeech separation and recognition challenge: dataset, task and baselines. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 504–511. IEEE (2015)

    Google Scholar 

  5. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process. 33(2), 443–445 (1985)

    Article  Google Scholar 

  6. Griffiths, L., Jim, C.: An alternative approach to linearly constrained adaptive beamforming. IEEE Trans. Antennas Propag. 30(1), 27–34 (1982)

    Article  Google Scholar 

  7. Kamkar-Parsi, A.H., Bouchard, M.: Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 521–533 (2009)

    Article  Google Scholar 

  8. Lamere, P., Kwok, P., Walker, W., Gouvea, E., Singh, R., Raj, B., Wolf, P.: Design of the CMU sphinx-4 decoder. In: Eighth European Conference on Speech Communication and Technology (2003)

    Google Scholar 

  9. Lefkimmiatis, S., Maragos, P.: A generalized estimation approach for linear and nonlinear microphone array post-filters. Speech Commun. 49(7), 657–666 (2007)

    Article  Google Scholar 

  10. Leonard, R.G., Doddington, G.: Tidigits. Linguistic Data Consortium, Philadelphia (1993)

    Google Scholar 

  11. McCowan, I.A., Bourlard, H.: Microphone array post-filter based on noise field coherence. IEEE Trans. Speech Audio Process. 11(6), 709–716 (2003)

    Article  Google Scholar 

  12. McCowan, I.A., Marro, C., Mauuary, L.: Robust speech recognition using near-field superdirective beamforming with post-filtering. In: 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2000, Proceedings, vol. 3, pp. 1723–1726. IEEE (2000)

    Google Scholar 

  13. Plapous, C., Marro, C., Scalart, P.: Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006)

    Article  Google Scholar 

  14. Raj, B., Stern, R.M.: Missing-feature approaches in speech recognition. IEEE Sig. Process. Mag. 22(5), 101–116 (2005)

    Article  Google Scholar 

  15. Schmidt, G.: Lecture notes in pattern recognition: noise suppression (2016). http://dss.kirat-online.de/images/teaching/lectures/pattern_recognition/slides/pattern_recognition_02_noise_suppression.pdf. Accessed 27 May 2017

  16. Yoshioka, T., Nakatani, T.: A microphone array system integrating beamforming, feature enhancement, and spectral mask-based noise estimation. In: 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp. 219–224. IEEE (2011)

    Google Scholar 

  17. Zelinski, R.: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In: 1988 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1988, pp. 2578–2581. IEEE (1988)

    Google Scholar 

Download references

Acknowledgements

The work was supported by Saint-Petersburg State University, grants 6.37.349.2015 and 6.38.230.2015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergey Salishev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Salishev, S., Klotchkov, I., Barabanov, A. (2017). Microphone Array Post-filter in Frequency Domain for Speech Recognition Using Short-Time Log-Spectral Amplitude Estimator and Spectral Harmonic/Noise Classifier. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics