Abstract
Models of binaural hearing are well established versatile tools for many technological applications. Traditionally, most of these models are restricted to the processing of the acoustical input signals to the two ears. Yet, signal processing alone cannot model cognitive processes like the identification of salient perceptual cues, focused attention, the formation of aural objects, the composition of aural scenes and their interpretation, as well as the assignment of meaning to them and, eventually, the performance of quality judgements. Further, for many technological purposes, human listeners have to be conceived as active agents that explore their environment actively in a multi-modal fashion, thereby also considering information from senses other than hearing. To include these functions, binaural models will have to become more intelligent and, consequently, contain increasing inherent knowledge, coupled with means to further develop this knowledge in situation- and task-specific ways. In this chapter, a general vision is presented of how such future systems may be constructed, and some tools are introduced that may be useful in this context.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Graphical models are convenient when it comes to the implementation of a working artificial-listening system, but whether and—if yes—how it actually maps to the processes which integrate and disambiguate sensory information in the human brain remains a matter of future research. It has been suggested that neural systems implement Bayesian inference including even belief propagation [24, 25], but there is also evidence, that competition between neural assemblies and an attractor dynamics [23] may play an important role in sensory processing.
- 2.
Mel-frequency cepstral coefficients, MFCCs, are the DCT coefficients of the logarithm of a mel-scaled signal spectrum. They have been introduced for the purpose of speech recognition [21], but have since proven versatile and found use in many other acoustic classification applications.
- 3.
The term active listening in the sense used here is not synonymous with a specific oral-communication technique that requires listeners to feed back to talkers what they hear.
References
K. Adiloğlu, R. Annies, H. Purwins, and K. Obermayer. Deliverable 5.2, visualisation and measurement assisted design. Technical report, Neural Information Processing Group, TU Berlin, 2009.
K. Adiloğlu, R. Annies, E. Wahlen, H. Purwins, and K. Obermayer. A graphical representation and dissimilarity measure for basic everyday sound events. IEEE Transactions Audio, Speech and Language Processing, 20:1542–1552, 2012.
J. Aloimonos. Active perception. Lawrence Erlbaum, 1993.
M. Altinsoy. The quality of auditory-tactile virtual environments. J. Audio Engr. Soc., 60:38–46, 2012.
S. Argentieri, A. Portello, M. Bernard, P. Danés, and B. Gas. Binaural systems in robotics. In J. Blauert, editor, The technology of binaural listening, chapter 9. Springer, Berlin-Heidelberg-New York NY, 2013.
L. Avant and H. Helson. Theories of perception. In B. Wolman, editor, Hdb. of General Psychology, pages 419–448. Prentice Hall, Englewood Cliffs, 1973.
M. Bernard, P. Pirim, A. de Cheveign, B. Gas, and IEEE/RSJ. Sensomotoric learning of sound localization from auditory evoked behavior. In: Proc. Intl. Conf. Robotics and Automation, ICRA ’ 2012. pages 91–96, St. Paul MN, 2012.
J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. Signal Processing Magazine, IEEE, 22:89–100, 2005.
J. Blauert. Analysis and synthesis of auditory scenes. In J. Blauert, editor, Communication Acoustics, chapter 1, pages 1–26. Springer, Berlin-Heidelberg-New York, 2005.
J. Blauert. Conceptual aspects regarding the qualification of spaces for aural performances. Act. Acust./Acustica, 99:1–13, 2013.
J. Blauert, ed. The technology of binaural listening. Springer, Berlin-Heidelberg-New York NY, 2013.
J. Blauert, J. Braasch, J. Buchholz, H. Colburn, U. Jekosch, A. Kohlrausch, J. Mourjopoulos, V. Pulkki, and A. Raake. Aural assessement by means of binaural algorithms - the AABB A project. In J. Buchholz, T. Dau, J. Dalsgaard, and T. Paulsen, editors, Binaural Processing and Spatial Hearing, pages 303–343. The Danavox Jubilee Foundation, Ballerup, Denmark, 2009.
J. Blauert and U. Jekosch. Concepts behind sound quality, some basic consideration. In Proc. InterNoise 2003, pages 72–76. Korean Acoust. Soc., 2003.
J. Blauert and U. Jekosch. A layer model of sound quality. J. Audio-Engr. Soc., 60:4–12, 2012.
J. Blauert and K. Obermayer. Rückkopplungswege in Modellen der binauralen Signalverarbeitung (feedback paths in models of binaural signal processing). In Fortschr. Akustik, DAGA 2012, pages 2015–2016. Deutsche Ges.f. Akustik, DEGA, Berlin, 2012.
J. Braasch, S. Clapp, A. P. T. Pastore,, and N. Xiang. Binaural evaluation of auditory scenes using head movements. In J. Blauert, editor, The technology of binaural listening, chapter 8. Springer, Berlin-Heidelberg-New York NY, 2013.
A. Bregman. Auditory scene analysis - the perceptual organization of sound. MIT press, Cambridge MA, 1990.
N. Clark, G. Brown, T. Jürgens, and R. Meddis. A frequency-selective feedback model of auditory efferent suppression and its implication for the recognition of speech in noise. J. Acoust. Soc. Am., 132:1535–1541, 2012.
R. Clifton, B. Morongiello, J. Kulig, and J. Dowde. Newborn’s orientation towards sounds: Possible implication for cortical development. Child develop., 52:883–838, 1981.
D. Corkhill. Collaborating software: blackboard and multi-agent systems and the future. Proc. Intl. Lisp Conf., New York NY, 2003.
S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing, 28:357–366, 1980.
M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba, Y. Kubo, M. Souden, S.-J. Hahm, and A. Nakamura. Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation. In Intl. Worksh. Machine Listening in Multisource Environments, CHiME 2011, pages 12–17, 2011.
L. Dempere-Marco, D. Melcher, and G. Deco. Effective visual working memory capacity: An emergent effect from the neural dynamics in an attractor network. PLoS ONE, 7:e42719, 2012.
S. Deneve. Bayesian spiking neurons I: Inference. Neural Computation, 20:91–117, 2008.
S. Deneve. Bayesian spiking neurons II: Learning. Neural Computation, 20:118–145, 2008.
DIN EN ISO 9000. Qualitätsmanagementsystem, Grundlagen und Begriffe (quality management system, fundamentals and concepts). Dtsch. Inst. f. Normung, Berlin, 2005.
R. Engelmore and A. Morgan (eds.). Blackboard systems. Addison-Wesley, Boston MA, 1988.
L. Erman. The Hearsay II speech-understanding system - integrating knowledge to resolve uncertainty. Computing surveys, 12:213–253, 1980.
S. Gold, A. Rangarajan, C.-P. Lu, and E. Mjolsness. New algorithms for 2d and 3d point matching: Pose estimation and correspondence. Pattern Recognition, 31:957–964, 1998.
S. Haykin. Neural networks - a comprehensive foundation. Macmillan, New York NY, 1994.
J. He and Y. Yu. Role of descending control in the auditory pathway. In A. Rees and A. Palmer, editors, Oxford Hdb. of Auditory Science, volume 2: The auditory brain. Oxford Univ. press, New York NY, 2009.
F.-F. Henrich and K. Obermayer. Active learning by spherical subdivision. J. Machine Learning Res., 9:105–130, 2008.
J. R. Hershey, S. J. Rennie, P. A. Olsen, and T. T. Kristjansson. Super-human multi-talker speech recognition: A graphical modeling approach. Comput. Speech Lang., 24:45–66, 2010.
S. Hochreiter, T. Knebel, and K. Obermayer. An SMO algorithm for the potential support vector machine. Neural Computation, 20:271–287, 2008.
S. Hochreiter and K. Obermayer. Support vector machines for dyadic data. Neural Computation, 18:1472–1510, 2006.
B. Julesz and I. Hirsh. Visual and auditory perception - an essay of comparison. In E. Davis jr and P. Denes, editors, Human communication - a unified view, pages 283–340. McGraw Hill, New York NY, 1972.
A. Kohlrausch, J. Braasch, D. Kolossa, and J. Blauert. An introduction to binaural processing. In J. Blauert, editor, The technology of binaural listening, chapter 1. Springer, Berlin-Heidelberg-NewYork NY, 2013.
A. Kohlrausch and S. van de Par. Audio-visual interaction in the context of multi-media applications. In J. Blauert, editor, Communication Acoustics, pages 109–134. Springer, Berlin-Heidelberg-New York NY, 2005.
H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83–97, 1955.
S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions Signal Processing, 41:3397–3415, 1993.
R. Meddis, R. Ferry, and G. Brown. Speech innoise and the medial olovo-cochlear efferent system. J. Acoust. Soc. Am., 123:3051–3051, 2008.
D. Messing, L. Delhorne, E. Bruckert, L. Braida, and O. Ghitza. A non-linear efferent-inspired model of the auditory system - matching human confusion in stationary noise. Speech Communication, 51:668–683, 2009.
R. D. Patterson and J. Holdsworth. A functional model of neural activity patterns and auditory images. Advances in Speech, Hearing and Language Processing, 3:547–563, 1996.
B. Scharf. Human hearing without efferent input to the cochlea. J. Acoust. Soc. Am., 95:2813, 1994.
B. Schofield. Structural organization of the descending pathway. In A. Rees and A. Palmer, editors, Oxford Hdb. of Auditory Science, volume 2: The auditory brain. Oxford Univ. press, New York NY, 2009.
B. P. Schölkopf and A. J. S. AJ. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, 2002.
L. Schwabe and K. Obermayer. Learning top-down gain control of feature selectivity in a recurrent network of a visual cortical area. Vision Research, 45:3202–3209, 2005.
E. Smith and M. S. Lewicki. Efficient coding of time-relative structure using spikes. Neural Computation, 17:19–45, 2006.
R. Welch and D. Warren. Intersensory interaction. In K.R. Boff, L.Kaufmann, and J. Thomas, editors, Hdb. of Perception and Human Performance, chapter 25, pages 1–36. Kluwer Academic, Dordrecht, 1989.
S. Wolf. Lokalisation von Schallquellen in geschlossenen Rumen (Localization of sound sources in enclosed spaces). doct. diss., Ruhr-Univ. Bochum, Germany, 1991.
Acknowledgments
The authors gratefully acknowledge suggestions of their external reviewers who helped to improve the clarity of presentation. Particular thanks are due to P. A. Cariani, who contributed relevantly by commenting the chapter from the viewpoint of biological cybernetics.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Blauert, J., Kolossa, D., Obermayer, K., Adiloğlu, K. (2013). Further Challenges and the Road Ahead. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-37762-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37761-7
Online ISBN: 978-3-642-37762-4
eBook Packages: EngineeringEngineering (R0)