Skip to main content

Binaural Scene Analysis with Multidimensional Statistical Filters

  • Chapter
The Technology of Binaural Listening

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

Abstract

The segregation of concurrent speakers and other sound sources is an important aspect in improving the performance of audio technology, such as noise reduction and automatic speech recognition, ASR, in difficult acoustic conditions. This technology is relevant for applications like hearing aids, mobile audio devices, robotics, hands-free audio communication and speech-based computer interfaces. Computational auditory-scene analysis (CASA) techniques simulate aspects of processing properties of the human perceptual system using statistical signal-processing techniques to improve inferences about the causes of audio input received by the system. This study argues that CASA is a promising approach to achieve source separation and outlines several theoretical arguments to support this hypothesis. With a focus on computational binaural scene analysis, principles of CASA techniques are reviewed. Furthermore, in an experimental approach, the applicability of a recent model of binaural interaction to improve ASR performance in multi-speaker conditions with spatially separated moving speakers is explored. The binaural model provides input to a statistical inference filter that employs a priori information on possible movements of the sources in order to track the positions of the speakers. The tracks are used to adapt a beamformer that selects a specific speaker. The output of the beamformer is subsequently used for an ASR task. Compared to the unprocessed, that is, mixed, data in a two-speaker condition, the word recognition rates obtained with the enhanced signals based on binaural information were increased from 30.8 to 88.4 %, demonstrating the potential of the proposed CASA-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that this approach can be extended to more inputs, for example, multiple microphones or audiovisual input, or might be restricted to a single input.The current study covers its application to binaural input signals like recordings from a dummy head.

  2. 2.

    A demo folder containing the file exp_spille2013 used to run the IPD model and to generate Fig. 6 is available in the AMToolbox [56].

  3. 3.

    The algorithm is part of a Matlab-Toolbox provided by [25].

References

  1. M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for online nonlinear / non-Gaussian bayesian tracking. IEEE Trans. Signal Process., 50:174–188, 2002.

    Google Scholar 

  2. R. Beutelmann and T. Brand. Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. J. Acoust. Soc. Am., 120:331–342, 2006.

    Google Scholar 

  3. J. Bitzer and K. U. Simmer. Superdirective microphone arrays. In M. Brandstein and D. Ward, editors, Microphone Arrays, chapter 2. Springer, 2001.

    Google Scholar 

  4. A. Brand, O. Behrend, T. Marquardt, D. McAlpine, and B. Grothe. Precise inhibition is essential for microsecond interaural time difference coding. Nature, 417:543–547, 2002.

    Google Scholar 

  5. J. Breebaart, S. van de Par, and A. Kohlrausch. Binaural processing model based on contralateral inhibition. I. Model structure. J. Acoust. Soc. Am., 110:1074–1088, 2001.

    Google Scholar 

  6. A. S. Bregman. Auditory scene analysis: The perceptual organization of sound. MIT Press, 1990.

    Google Scholar 

  7. K. O. Bushara, T. Hanakawa, I. Immisch, K. Toma, K. Kansaku, and M. Hallett. Neural correlates of cross-modal binding. Nat. Neurosci., 6:190–195, 2003.

    Google Scholar 

  8. C. E. Carr and M. Konishi. Axonal delay lines for time measurement in the owl’s brainstem. Proc. Natl. Acad. Sci. U. S. A., 85:8311–8315, 1988.

    Google Scholar 

  9. G. Casella and C. Robert. Rao-Blackwellisation of sampling schemes. Biometrika, 83:81–94, 1996.

    Google Scholar 

  10. H. Christensen, N. M. N. Ma, S. N. Wrigley, and J. Barker. A speech fragment approach to localising multiple speakers in reverberant environments. In IEEE ICASSP, 2009.

    Google Scholar 

  11. M. Cooke. Glimpsing speech. Journal of Phonetics, 31:579–584, 2003.

    Google Scholar 

  12. H. Cox, R. Zeskind, and M. Owen. Robust adaptive beamforming. IEEE Trans. Acoust., Speech, Signal Process., 35:1365–1376, 1987.

    Google Scholar 

  13. S. B. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Process., 28:357–366,1980.

    Google Scholar 

  14. M. Dietz, S. D. Ewert, and V. Hohmann. Lateralization of stimuli with independent fine-structure and envelope-based temporal disparities. J. Acoust. Soc. Am., 125:1622–1635, 2009.

    Google Scholar 

  15. M. Dietz, S. D. Ewert, and V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. Speech Commun., 53:592–605, 2011.

    Google Scholar 

  16. M. Dietz, S. D. Ewert, and V. Hohmann. Lateralization based on interaural differences in the second-order amplitude modulator. J. Acoust. Soc. Am., 131:398–408, 2012.

    Google Scholar 

  17. M. Dietz, S. D. Ewert, V. Hohmann, and B. Kollmeier. Coding of temporally fluctuating interaural timing disparities in a binaural processing model based on phase differences. Brain Res., 1220:234–245, 2008.

    Google Scholar 

  18. M. Dietz, T. Marquardt, D. Greenberg, D. McAlpine. The influence of the envelope waveform on binaural tuning of neurons in the inferior colliculus and its relation to binaural perception. In B. C. J. Moore, R. Patterson, I. M. Winter, R. P. Carlyon, H. E. Gockel, editors, Basic Aspects of Hearing: Physiology and Perception, chapter 25. Springer, New York, 2013.

    Google Scholar 

  19. A. Doucet, N. de Freitas, and N. Gordon. An introduction to sequential Monte Carlo methods. In A. Doucet, N. de Freitas, and N. Gordon, editors, Sequential Monte Carlo Methods in Practice. Springer, 2001.

    Google Scholar 

  20. C. Faller and J. Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am., 116:3075–3089, 2004.

    Google Scholar 

  21. K. Friston and S. Kiebel. Cortical circuits for perceptual inference. Neural Networks, 22:1093–1104, 2009.

    Google Scholar 

  22. M. J. Goupell and W. M. Hartmann. Interaural fluctuations and the detection of interaural incoherence: Bandwidth effects. J. Acoust. Soc. Am., 119:3971–3986, 2006.

    Google Scholar 

  23. S. Harding, J. P. Barker, and G. J. Brown. Mask estimation for missing data speech recognition based on statistics of binaural interaction. IEEE T. Audio. Speech., 14:58–67, 2006.

    Google Scholar 

  24. J. Hartikainen and S. Särkkä. Optimal filtering with Kalman filters and smoothersa Manual for Matlab toolbox EKF/UKF. Technical report, Department of Biomedical Engineering and Computational Science, Helsinki University of Technology, 2008.

    Google Scholar 

  25. J. Hartikainen and S. Särkkä. RBMCDAbox-Matlab tooolbox of rao-blackwellized data association particle filters. Technical report, Department of Biomedical Engineering and Computational Science, Helsinki University of Technology, 2008.

    Google Scholar 

  26. H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am., 87:1738–1752, 1990.

    Google Scholar 

  27. V. Hohmann. Frequency analysis and synthesis using a Gammatone filterbank. Acta Acustica united with Acustica, 88:433–442, 2002.

    Google Scholar 

  28. L. a. Jeffress. A place theory of sound localization. J. Comp. Physiol. Psychol., 41:35–39, 1948.

    Google Scholar 

  29. H. Kayser, S. D. Ewert, J. Anemüller, T. Rohdenburg, V. Hohmann, and B. Kollmeier. Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses. EURASIP Journal on Advances in Signal Processing, 2009:298605, 2009.

    Google Scholar 

  30. M. Klein-Hennig, M. Dietz, V. Hohmann, and S. D. Ewert. The influence of different segments of the ongoing envelope on sensitivity to interaural time delays. J. Acoust. Soc. Am., 129:3856–3872, 2011.

    Google Scholar 

  31. M. Kleinschmidt. Methods for capturing spectro-temporal modulations in automatic speech recognition. Acta Acustica united with Acustica, 88:416–422, 2002.

    Google Scholar 

  32. D. Kolossa, F. Astudillo, A. Abad, S. Zeiler, R. Saeidi, P. Mowlaee, R. Martin. CHiME challenge : Approaches to robustness using beamforming and uncertainty-of-observation techniques. Int. Workshop on Machine Listening in Multisource, Environments, 1:6–11, 2011.

    Google Scholar 

  33. A.-G. Lang and A. Buchner. Relative influence of interaural time and intensity differences on lateralization is modulated by attention to one or the other cue: 500-Hz sine tones. J. Acoust. Soc. Am., 126:2536–2542, 2009.

    Google Scholar 

  34. N. Le Goff, J. Buchholz, and T. Dau. Modeling localization of complex sounds in the impaired and aided impaired auditory system. In J. Blauert, editor, The technology of binaural listening, chapter 5. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  35. W. Lindemann. Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. J. Acoust. Soc. Am., 80:1608–1622, 1986.

    Google Scholar 

  36. R. F. Lyon. A computational model of binaural localization and separation. In IEEE ICASSP, volume 8, pages 1148–1151, 1983.

    Google Scholar 

  37. T. May, S. Van De Par, and A. Kohlrausch. A probabilistic model for robust localization based on a binaural auditory front-end. IEEE T. Audio. Speech., 19:1–13, 2011.

    Google Scholar 

  38. T. May, S. Van De Par, and A. Kohlrausch. A binaural scene analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE T. Audio. Speech., 20:1–15, 2012.

    Google Scholar 

  39. T. May, S. Van De Par, and A. Kohlrausch. Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE T. Audio. Speech., 20:108–121, 2012.

    Google Scholar 

  40. T. May, S. van de Par, and A. Kohlrausch. Binaural localization and detection of speakers in complex acoustic scenes. In J. Blauert, editor, The technology of binaural listening, chapter 15. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  41. D. McAlpine and B. Grothe. Sound localization and delay lines-do mammals fit the model? Trends Neurosci., 26:347–350, 2003.

    Google Scholar 

  42. D. McAlpine, D. Jiang, and a. R. Palmer. A neural code for low-frequency sound localization in mammals. Nat. Neurosci., 4:396–401, 2001.

    Google Scholar 

  43. J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. J. Acoust. Soc. Am., 119:463–479, 2006.

    Google Scholar 

  44. J. Nix and V. Hohmann. Combined estimation of spectral envelopes and sound source direction of concurrent voices by multidimensional statistical filtering. IEEE T. Audio. Speech., 15:995–1008, 2007.

    Google Scholar 

  45. B. Opitz, A. Mecklinger, A. D. Friederici, and D. Y. Von Cramon. The functional neuroanatomy of novelty processing: integrating ERP and fMRI results. Cereb. Cortex, 9:379–391, 1999.

    Google Scholar 

  46. B. Osnes, K. Hugdahl, and K. Specht. Effective connectivity analysis demonstrates involvement of premotor cortex during speech perception. Neuroimage, 54:2437–2445, 2011.

    Google Scholar 

  47. P. Paavilainen, M. Jaramillo, R. Näätänen, and I. Winkler. Neuronal populations in the human brain extracting invariant relationships from acoustic variance. Neurosci. Lett., 265:179–182, 1999.

    Google Scholar 

  48. K. Palomäki and G. J. Brown. A computational model of binaural speech recognition: Role of across-frequency vs. within-frequency processing and internal noise. Speech Commun., 53:924–940, 2011.

    Google Scholar 

  49. K. J. Palomäki, G. J. Brown, and D. Wang. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Commun., 43:361–378, 2004.

    Google Scholar 

  50. D. P. Phillips. A perceptual architecture for sound lateralization in man. Hear. Res., 238:124–132, 2008.

    Google Scholar 

  51. V. Pulkki and T. Hirvonen. Functional count-comparison model for binaural decoding. Acta Acustica united with Acustica, 95:883–900, 2009.

    Google Scholar 

  52. L. Rayleigh. On our perception of sound direction. Philos. Mag., 13:214–232, 1907.

    Google Scholar 

  53. H. Riedel and B. Kollmeier. Interaural delay-dependent changes in the binaural difference potential of the human auditory brain stem response. Hear. Res., 218:5–19, 2006.

    Google Scholar 

  54. N. Roman, D. Wang, and G. J. Brown. Speech segregation based on sound localization. J. Acoust. Soc. Am., 114:2236–2252, 2003.

    Google Scholar 

  55. S. Särkkä, A. Vehtari, and J. Lampinen. Rao-Blackwellized particle filter for multiple target tracking. Information Fusion, 8:2–15, 2007.

    Google Scholar 

  56. P. Søndergaard and P. Majdak. The auditory-modeling toolbox.In J. Blauert, editor, The technology of binaural listening, chapter 2. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  57. S. Spors and H. Wierstorf. Evaluation of perceptual properties of phase-mode beamforming in the context of data-based binaural synthesis. In 5th International Symposium on Communications Control and Signal Processing (ISCCSP), 2012, pages 1–4, 2012.

    Google Scholar 

  58. R. Stern and N. Morgan. Hearing is believing: Biologically-inspired feature extraction for robust automatic speech recognition. IEEE Signal Processing Magazine, 29:34–43, 2012.

    Google Scholar 

  59. R. Stern, A. Zeiberg, and C. Trahiotis. Lateralization of complex binaural stimuli: A weighted-image model. J. Acoust. Soc. Am., 84:156–165, 1988.

    Google Scholar 

  60. R. M. Stern and H. S. Colburn. Theory of binaural interaction based in auditory-nerve data. IV. A model for subjective lateral position. J. Acoust. Soc. Am., 64:127–140, 1978.

    Google Scholar 

  61. S. K. Thompson, K. von Kriegstein, A. Deane-Pratt, T. Marquardt, R. Deichmann, T. D. Griffiths, and D. McAlpine. Representation of interaural time delay in the human auditory midbrain. Nat. Neurosci., 9:1096–1098, 2006.

    Google Scholar 

  62. S. P. Thompson. On binaural audition. Philos. Mag., 4:274–276, 1877.

    Google Scholar 

  63. S. P. Thompson.On the function of the two ears in the perception of space. Philos. Mag., 13:406–416, 1882.

    Google Scholar 

  64. M. van der Heijden and C. Trahiotis. Masking with interaurally delayed stimuli: the use of "internal" delays in binaural detection. J. Acoust. Soc. Am., 105:388–399, 1999.

    Google Scholar 

  65. G. von Békésy. Zur Theorie des Hörens. Über das Richtungshören bei einer Zeitdifferenz oder Lautstärkenunggleichheit der beiderseitigen Schalleinwirkungen. Phys. Z., 31:824–835, 1930.

    Google Scholar 

  66. C. Wacongne, J. P. Changeux, and S. Dehaene. A neuronal model of predictive coding accounting for the mismatch negativity. J. Neurosci., 32:3665–3678, 2012.

    Google Scholar 

  67. K. C. Wagener and T. Brand. Sentence intelligibility in noise for listeners with normal hearing and hearing impairment: influence of measurement procedure and masking parameters. Int. J. Audiol., 44:144–156, 2005.

    Google Scholar 

  68. D. Wang and G. J. Brown. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press, 2006.

    Google Scholar 

  69. S. Wilson, A. Saygin, M. Sereno, and M. Iacoboni. Listening to speech activates motor areas involved in speech production. Nat. Neurosci., 7:701–702, 2004.

    Google Scholar 

  70. I. Winkler. Interpreting the Mismatch Negativity. J. Psychophysiol., 21:147–163, 2007.

    Google Scholar 

  71. J. Woodruff and D. Wang. Binaural localization of multiple sources in reverberant and noisy environments. IEEE T. Audio. Speech., 20:1503–1512, 2012.

    Google Scholar 

  72. S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland. The HTK book. Cambridge University Engineering Department, 3, 2002.

    Google Scholar 

Download references

Acknowledgments

Supported by the DFG—SFB/TRR 31 The active auditory system, URL: http://www.uni-oldenburg.de/sfbtr31. The authors would like to thank M. Klein-Hennig for casting the IPD model code in the AMToolbox format, D. Marquardt and G. Coleman for their contributions to the beamforming algorithm, M. R. Schädler for sharing the code of the OLSA recognition system, H. Kayser for support with the HRIR database, and two anonymous reviewers for constructive suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Hohmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Spille, C., Meyer, B.T., Dietz, M., Hohmann, V. (2013). Binaural Scene Analysis with Multidimensional Statistical Filters. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37762-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37761-7

  • Online ISBN: 978-3-642-37762-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics