Skip to main content

Further Challenges and the Road Ahead

  • Chapter
The Technology of Binaural Listening

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

Abstract

Models of binaural hearing are well established versatile tools for many technological applications. Traditionally, most of these models are restricted to the processing of the acoustical input signals to the two ears. Yet, signal processing alone cannot model cognitive processes like the identification of salient perceptual cues, focused attention, the formation of aural objects, the composition of aural scenes and their interpretation, as well as the assignment of meaning to them and, eventually, the performance of quality judgements. Further, for many technological purposes, human listeners have to be conceived as active agents that explore their environment actively in a multi-modal fashion, thereby also considering information from senses other than hearing. To include these functions, binaural models will have to become more intelligent and, consequently, contain increasing inherent knowledge, coupled with means to further develop this knowledge in situation- and task-specific ways. In this chapter, a general vision is presented of how such future systems may be constructed, and some tools are introduced that may be useful in this context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Graphical models are convenient when it comes to the implementation of a working artificial-listening system, but whether and—if yes—how it actually maps to the processes which integrate and disambiguate sensory information in the human brain remains a matter of future research. It has been suggested that neural systems implement Bayesian inference including even belief propagation [24, 25], but there is also evidence, that competition between neural assemblies and an attractor dynamics [23] may play an important role in sensory processing.

  2. 2.

    Mel-frequency cepstral coefficients, MFCCs, are the DCT coefficients of the logarithm of a mel-scaled signal spectrum. They have been introduced for the purpose of speech recognition [21], but have since proven versatile and found use in many other acoustic classification applications.

  3. 3.

    The term active listening in the sense used here is not synonymous with a specific oral-communication technique that requires listeners to feed back to talkers what they hear.

References

  1. K. Adiloğlu, R. Annies, H. Purwins, and K. Obermayer. Deliverable 5.2, visualisation and measurement assisted design. Technical report, Neural Information Processing Group, TU Berlin, 2009.

    Google Scholar 

  2. K. Adiloğlu, R. Annies, E. Wahlen, H. Purwins, and K. Obermayer. A graphical representation and dissimilarity measure for basic everyday sound events. IEEE Transactions Audio, Speech and Language Processing, 20:1542–1552, 2012.

    Google Scholar 

  3. J. Aloimonos. Active perception. Lawrence Erlbaum, 1993.

    Google Scholar 

  4. M. Altinsoy. The quality of auditory-tactile virtual environments. J. Audio Engr. Soc., 60:38–46, 2012.

    Google Scholar 

  5. S. Argentieri, A. Portello, M. Bernard, P. Danés, and B. Gas. Binaural systems in robotics. In J. Blauert, editor, The technology of binaural listening, chapter 9. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  6. L. Avant and H. Helson. Theories of perception. In B. Wolman, editor, Hdb. of General Psychology, pages 419–448. Prentice Hall, Englewood Cliffs, 1973.

    Google Scholar 

  7. M. Bernard, P. Pirim, A. de Cheveign, B. Gas, and IEEE/RSJ. Sensomotoric learning of sound localization from auditory evoked behavior. In: Proc. Intl. Conf. Robotics and Automation, ICRA ’ 2012. pages 91–96, St. Paul MN, 2012.

    Google Scholar 

  8. J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. Signal Processing Magazine, IEEE, 22:89–100, 2005.

    Google Scholar 

  9. J. Blauert. Analysis and synthesis of auditory scenes. In J. Blauert, editor, Communication Acoustics, chapter 1, pages 1–26. Springer, Berlin-Heidelberg-New York, 2005.

    Google Scholar 

  10. J. Blauert. Conceptual aspects regarding the qualification of spaces for aural performances. Act. Acust./Acustica, 99:1–13, 2013.

    Google Scholar 

  11. J. Blauert, ed. The technology of binaural listening. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  12. J. Blauert, J. Braasch, J. Buchholz, H. Colburn, U. Jekosch, A. Kohlrausch, J. Mourjopoulos, V. Pulkki, and A. Raake. Aural assessement by means of binaural algorithms - the AABB A project. In J. Buchholz, T. Dau, J. Dalsgaard, and T. Paulsen, editors, Binaural Processing and Spatial Hearing, pages 303–343. The Danavox Jubilee Foundation, Ballerup, Denmark, 2009.

    Google Scholar 

  13. J. Blauert and U. Jekosch. Concepts behind sound quality, some basic consideration. In Proc. InterNoise 2003, pages 72–76. Korean Acoust. Soc., 2003.

    Google Scholar 

  14. J. Blauert and U. Jekosch. A layer model of sound quality. J. Audio-Engr. Soc., 60:4–12, 2012.

    Google Scholar 

  15. J. Blauert and K. Obermayer. Rückkopplungswege in Modellen der binauralen Signalverarbeitung (feedback paths in models of binaural signal processing). In Fortschr. Akustik, DAGA 2012, pages 2015–2016. Deutsche Ges.f. Akustik, DEGA, Berlin, 2012.

    Google Scholar 

  16. J. Braasch, S. Clapp, A. P. T. Pastore,, and N. Xiang. Binaural evaluation of auditory scenes using head movements. In J. Blauert, editor, The technology of binaural listening, chapter 8. Springer, Berlin-Heidelberg-New York NY, 2013.

    Google Scholar 

  17. A. Bregman. Auditory scene analysis - the perceptual organization of sound. MIT press, Cambridge MA, 1990.

    Google Scholar 

  18. N. Clark, G. Brown, T. Jürgens, and R. Meddis. A frequency-selective feedback model of auditory efferent suppression and its implication for the recognition of speech in noise. J. Acoust. Soc. Am., 132:1535–1541, 2012.

    Google Scholar 

  19. R. Clifton, B. Morongiello, J. Kulig, and J. Dowde. Newborn’s orientation towards sounds: Possible implication for cortical development. Child develop., 52:883–838, 1981.

    Google Scholar 

  20. D. Corkhill. Collaborating software: blackboard and multi-agent systems and the future. Proc. Intl. Lisp Conf., New York NY, 2003.

    Google Scholar 

  21. S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing, 28:357–366, 1980.

    Google Scholar 

  22. M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba, Y. Kubo, M. Souden, S.-J. Hahm, and A. Nakamura. Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation. In Intl. Worksh. Machine Listening in Multisource Environments, CHiME 2011, pages 12–17, 2011.

    Google Scholar 

  23. L. Dempere-Marco, D. Melcher, and G. Deco. Effective visual working memory capacity: An emergent effect from the neural dynamics in an attractor network. PLoS ONE, 7:e42719, 2012.

    Google Scholar 

  24. S. Deneve. Bayesian spiking neurons I: Inference. Neural Computation, 20:91–117, 2008.

    Google Scholar 

  25. S. Deneve. Bayesian spiking neurons II: Learning. Neural Computation, 20:118–145, 2008.

    Google Scholar 

  26. DIN EN ISO 9000. Qualitätsmanagementsystem, Grundlagen und Begriffe (quality management system, fundamentals and concepts). Dtsch. Inst. f. Normung, Berlin, 2005.

    Google Scholar 

  27. R. Engelmore and A. Morgan (eds.). Blackboard systems. Addison-Wesley, Boston MA, 1988.

    Google Scholar 

  28. L. Erman. The Hearsay II speech-understanding system - integrating knowledge to resolve uncertainty. Computing surveys, 12:213–253, 1980.

    Google Scholar 

  29. S. Gold, A. Rangarajan, C.-P. Lu, and E. Mjolsness. New algorithms for 2d and 3d point matching: Pose estimation and correspondence. Pattern Recognition, 31:957–964, 1998.

    Google Scholar 

  30. S. Haykin. Neural networks - a comprehensive foundation. Macmillan, New York NY, 1994.

    Google Scholar 

  31. J. He and Y. Yu. Role of descending control in the auditory pathway. In A. Rees and A. Palmer, editors, Oxford Hdb. of Auditory Science, volume 2: The auditory brain. Oxford Univ. press, New York NY, 2009.

    Google Scholar 

  32. F.-F. Henrich and K. Obermayer. Active learning by spherical subdivision. J. Machine Learning Res., 9:105–130, 2008.

    Google Scholar 

  33. J. R. Hershey, S. J. Rennie, P. A. Olsen, and T. T. Kristjansson. Super-human multi-talker speech recognition: A graphical modeling approach. Comput. Speech Lang., 24:45–66, 2010.

    Google Scholar 

  34. S. Hochreiter, T. Knebel, and K. Obermayer. An SMO algorithm for the potential support vector machine. Neural Computation, 20:271–287, 2008.

    Google Scholar 

  35. S. Hochreiter and K. Obermayer. Support vector machines for dyadic data. Neural Computation, 18:1472–1510, 2006.

    Google Scholar 

  36. B. Julesz and I. Hirsh. Visual and auditory perception - an essay of comparison. In E. Davis jr and P. Denes, editors, Human communication - a unified view, pages 283–340. McGraw Hill, New York NY, 1972.

    Google Scholar 

  37. A. Kohlrausch, J. Braasch, D. Kolossa, and J. Blauert. An introduction to binaural processing. In J. Blauert, editor, The technology of binaural listening, chapter 1. Springer, Berlin-Heidelberg-NewYork NY, 2013.

    Google Scholar 

  38. A. Kohlrausch and S. van de Par. Audio-visual interaction in the context of multi-media applications. In J. Blauert, editor, Communication Acoustics, pages 109–134. Springer, Berlin-Heidelberg-New York NY, 2005.

    Google Scholar 

  39. H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83–97, 1955.

    Google Scholar 

  40. S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions Signal Processing, 41:3397–3415, 1993.

    Google Scholar 

  41. R. Meddis, R. Ferry, and G. Brown. Speech innoise and the medial olovo-cochlear efferent system. J. Acoust. Soc. Am., 123:3051–3051, 2008.

    Google Scholar 

  42. D. Messing, L. Delhorne, E. Bruckert, L. Braida, and O. Ghitza. A non-linear efferent-inspired model of the auditory system - matching human confusion in stationary noise. Speech Communication, 51:668–683, 2009.

    Google Scholar 

  43. R. D. Patterson and J. Holdsworth. A functional model of neural activity patterns and auditory images. Advances in Speech, Hearing and Language Processing, 3:547–563, 1996.

    Google Scholar 

  44. B. Scharf. Human hearing without efferent input to the cochlea. J. Acoust. Soc. Am., 95:2813, 1994.

    Google Scholar 

  45. B. Schofield. Structural organization of the descending pathway. In A. Rees and A. Palmer, editors, Oxford Hdb. of Auditory Science, volume 2: The auditory brain. Oxford Univ. press, New York NY, 2009.

    Google Scholar 

  46. B. P. Schölkopf and A. J. S. AJ. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, 2002.

    Google Scholar 

  47. L. Schwabe and K. Obermayer. Learning top-down gain control of feature selectivity in a recurrent network of a visual cortical area. Vision Research, 45:3202–3209, 2005.

    Google Scholar 

  48. E. Smith and M. S. Lewicki. Efficient coding of time-relative structure using spikes. Neural Computation, 17:19–45, 2006.

    Google Scholar 

  49. R. Welch and D. Warren. Intersensory interaction. In K.R. Boff, L.Kaufmann, and J. Thomas, editors, Hdb. of Perception and Human Performance, chapter 25, pages 1–36. Kluwer Academic, Dordrecht, 1989.

    Google Scholar 

  50. S. Wolf. Lokalisation von Schallquellen in geschlossenen Rumen (Localization of sound sources in enclosed spaces). doct. diss., Ruhr-Univ. Bochum, Germany, 1991.

    Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge suggestions of their external reviewers who helped to improve the clarity of presentation. Particular thanks are due to P. A. Cariani, who contributed relevantly by commenting the chapter from the viewpoint of biological cybernetics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Blauert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Blauert, J., Kolossa, D., Obermayer, K., Adiloğlu, K. (2013). Further Challenges and the Road Ahead. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37762-4_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37761-7

  • Online ISBN: 978-3-642-37762-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics