Further Challenges and the Road Ahead

Blauert, J.; Kolossa, D.; Obermayer, K.; Adiloğlu, K.

doi:10.1007/978-3-642-37762-4_18

J. Blauert²,
D. Kolossa²,
K. Obermayer³ &
…
K. Adiloğlu³

Part of the book series: Modern Acoustics and Signal Processing ((MASP))

4091 Accesses
7 Citations

Abstract

Models of binaural hearing are well established versatile tools for many technological applications. Traditionally, most of these models are restricted to the processing of the acoustical input signals to the two ears. Yet, signal processing alone cannot model cognitive processes like the identification of salient perceptual cues, focused attention, the formation of aural objects, the composition of aural scenes and their interpretation, as well as the assignment of meaning to them and, eventually, the performance of quality judgements. Further, for many technological purposes, human listeners have to be conceived as active agents that explore their environment actively in a multi-modal fashion, thereby also considering information from senses other than hearing. To include these functions, binaural models will have to become more intelligent and, consequently, contain increasing inherent knowledge, coupled with means to further develop this knowledge in situation- and task-specific ways. In this chapter, a general vision is presented of how such future systems may be constructed, and some tools are introduced that may be useful in this context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Graphical models are convenient when it comes to the implementation of a working artificial-listening system, but whether and—if yes—how it actually maps to the processes which integrate and disambiguate sensory information in the human brain remains a matter of future research. It has been suggested that neural systems implement Bayesian inference including even belief propagation [24, 25], but there is also evidence, that competition between neural assemblies and an attractor dynamics [23] may play an important role in sensory processing.
2.
Mel-frequency cepstral coefficients, MFCCs, are the DCT coefficients of the logarithm of a mel-scaled signal spectrum. They have been introduced for the purpose of speech recognition [21], but have since proven versatile and found use in many other acoustic classification applications.
3.
The term active listening in the sense used here is not synonymous with a specific oral-communication technique that requires listeners to feed back to talkers what they hear.

References

K. Adiloğlu, R. Annies, H. Purwins, and K. Obermayer. Deliverable 5.2, visualisation and measurement assisted design. Technical report, Neural Information Processing Group, TU Berlin, 2009.
Google Scholar
K. Adiloğlu, R. Annies, E. Wahlen, H. Purwins, and K. Obermayer. A graphical representation and dissimilarity measure for basic everyday sound events. IEEE Transactions Audio, Speech and Language Processing, 20:1542–1552, 2012.
Google Scholar
J. Aloimonos. Active perception. Lawrence Erlbaum, 1993.
Google Scholar
M. Altinsoy. The quality of auditory-tactile virtual environments. J. Audio Engr. Soc., 60:38–46, 2012.
Google Scholar
S. Argentieri, A. Portello, M. Bernard, P. Danés, and B. Gas. Binaural systems in robotics. In J. Blauert, editor, The technology of binaural listening, chapter 9. Springer, Berlin-Heidelberg-New York NY, 2013.
Google Scholar
L. Avant and H. Helson. Theories of perception. In B. Wolman, editor, Hdb. of General Psychology, pages 419–448. Prentice Hall, Englewood Cliffs, 1973.
Google Scholar
M. Bernard, P. Pirim, A. de Cheveign, B. Gas, and IEEE/RSJ. Sensomotoric learning of sound localization from auditory evoked behavior. In: Proc. Intl. Conf. Robotics and Automation, ICRA ’ 2012. pages 91–96, St. Paul MN, 2012.
Google Scholar
J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. Signal Processing Magazine, IEEE, 22:89–100, 2005.
Google Scholar
J. Blauert. Analysis and synthesis of auditory scenes. In J. Blauert, editor, Communication Acoustics, chapter 1, pages 1–26. Springer, Berlin-Heidelberg-New York, 2005.
Google Scholar
J. Blauert. Conceptual aspects regarding the qualification of spaces for aural performances. Act. Acust./Acustica, 99:1–13, 2013.
Google Scholar
J. Blauert, ed. The technology of binaural listening. Springer, Berlin-Heidelberg-New York NY, 2013.
Google Scholar
J. Blauert, J. Braasch, J. Buchholz, H. Colburn, U. Jekosch, A. Kohlrausch, J. Mourjopoulos, V. Pulkki, and A. Raake. Aural assessement by means of binaural algorithms - the AABB A project. In J. Buchholz, T. Dau, J. Dalsgaard, and T. Paulsen, editors, Binaural Processing and Spatial Hearing, pages 303–343. The Danavox Jubilee Foundation, Ballerup, Denmark, 2009.
Google Scholar
J. Blauert and U. Jekosch. Concepts behind sound quality, some basic consideration. In Proc. InterNoise 2003, pages 72–76. Korean Acoust. Soc., 2003.
Google Scholar
J. Blauert and U. Jekosch. A layer model of sound quality. J. Audio-Engr. Soc., 60:4–12, 2012.
Google Scholar
J. Blauert and K. Obermayer. Rückkopplungswege in Modellen der binauralen Signalverarbeitung (feedback paths in models of binaural signal processing). In Fortschr. Akustik, DAGA 2012, pages 2015–2016. Deutsche Ges.f. Akustik, DEGA, Berlin, 2012.
Google Scholar
J. Braasch, S. Clapp, A. P. T. Pastore,, and N. Xiang. Binaural evaluation of auditory scenes using head movements. In J. Blauert, editor, The technology of binaural listening, chapter 8. Springer, Berlin-Heidelberg-New York NY, 2013.
Google Scholar
A. Bregman. Auditory scene analysis - the perceptual organization of sound. MIT press, Cambridge MA, 1990.
Google Scholar
N. Clark, G. Brown, T. Jürgens, and R. Meddis. A frequency-selective feedback model of auditory efferent suppression and its implication for the recognition of speech in noise. J. Acoust. Soc. Am., 132:1535–1541, 2012.
Google Scholar
R. Clifton, B. Morongiello, J. Kulig, and J. Dowde. Newborn’s orientation towards sounds: Possible implication for cortical development. Child develop., 52:883–838, 1981.
Google Scholar
D. Corkhill. Collaborating software: blackboard and multi-agent systems and the future. Proc. Intl. Lisp Conf., New York NY, 2003.
Google Scholar
S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing, 28:357–366, 1980.
Google Scholar
M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba, Y. Kubo, M. Souden, S.-J. Hahm, and A. Nakamura. Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation. In Intl. Worksh. Machine Listening in Multisource Environments, CHiME 2011, pages 12–17, 2011.
Google Scholar
L. Dempere-Marco, D. Melcher, and G. Deco. Effective visual working memory capacity: An emergent effect from the neural dynamics in an attractor network. PLoS ONE, 7:e42719, 2012.
Google Scholar
S. Deneve. Bayesian spiking neurons I: Inference. Neural Computation, 20:91–117, 2008.
Google Scholar
S. Deneve. Bayesian spiking neurons II: Learning. Neural Computation, 20:118–145, 2008.
Google Scholar
DIN EN ISO 9000. Qualitätsmanagementsystem, Grundlagen und Begriffe (quality management system, fundamentals and concepts). Dtsch. Inst. f. Normung, Berlin, 2005.
Google Scholar
R. Engelmore and A. Morgan (eds.). Blackboard systems. Addison-Wesley, Boston MA, 1988.
Google Scholar
L. Erman. The Hearsay II speech-understanding system - integrating knowledge to resolve uncertainty. Computing surveys, 12:213–253, 1980.
Google Scholar
S. Gold, A. Rangarajan, C.-P. Lu, and E. Mjolsness. New algorithms for 2d and 3d point matching: Pose estimation and correspondence. Pattern Recognition, 31:957–964, 1998.
Google Scholar
S. Haykin. Neural networks - a comprehensive foundation. Macmillan, New York NY, 1994.
Google Scholar
J. He and Y. Yu. Role of descending control in the auditory pathway. In A. Rees and A. Palmer, editors, Oxford Hdb. of Auditory Science, volume 2: The auditory brain. Oxford Univ. press, New York NY, 2009.
Google Scholar
F.-F. Henrich and K. Obermayer. Active learning by spherical subdivision. J. Machine Learning Res., 9:105–130, 2008.
Google Scholar
J. R. Hershey, S. J. Rennie, P. A. Olsen, and T. T. Kristjansson. Super-human multi-talker speech recognition: A graphical modeling approach. Comput. Speech Lang., 24:45–66, 2010.
Google Scholar
S. Hochreiter, T. Knebel, and K. Obermayer. An SMO algorithm for the potential support vector machine. Neural Computation, 20:271–287, 2008.
Google Scholar
S. Hochreiter and K. Obermayer. Support vector machines for dyadic data. Neural Computation, 18:1472–1510, 2006.
Google Scholar
B. Julesz and I. Hirsh. Visual and auditory perception - an essay of comparison. In E. Davis jr and P. Denes, editors, Human communication - a unified view, pages 283–340. McGraw Hill, New York NY, 1972.
Google Scholar
A. Kohlrausch, J. Braasch, D. Kolossa, and J. Blauert. An introduction to binaural processing. In J. Blauert, editor, The technology of binaural listening, chapter 1. Springer, Berlin-Heidelberg-NewYork NY, 2013.
Google Scholar
A. Kohlrausch and S. van de Par. Audio-visual interaction in the context of multi-media applications. In J. Blauert, editor, Communication Acoustics, pages 109–134. Springer, Berlin-Heidelberg-New York NY, 2005.
Google Scholar
H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83–97, 1955.
Google Scholar
S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions Signal Processing, 41:3397–3415, 1993.
Google Scholar
R. Meddis, R. Ferry, and G. Brown. Speech innoise and the medial olovo-cochlear efferent system. J. Acoust. Soc. Am., 123:3051–3051, 2008.
Google Scholar
D. Messing, L. Delhorne, E. Bruckert, L. Braida, and O. Ghitza. A non-linear efferent-inspired model of the auditory system - matching human confusion in stationary noise. Speech Communication, 51:668–683, 2009.
Google Scholar
R. D. Patterson and J. Holdsworth. A functional model of neural activity patterns and auditory images. Advances in Speech, Hearing and Language Processing, 3:547–563, 1996.
Google Scholar
B. Scharf. Human hearing without efferent input to the cochlea. J. Acoust. Soc. Am., 95:2813, 1994.
Google Scholar
B. Schofield. Structural organization of the descending pathway. In A. Rees and A. Palmer, editors, Oxford Hdb. of Auditory Science, volume 2: The auditory brain. Oxford Univ. press, New York NY, 2009.
Google Scholar
B. P. Schölkopf and A. J. S. AJ. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, 2002.
Google Scholar
L. Schwabe and K. Obermayer. Learning top-down gain control of feature selectivity in a recurrent network of a visual cortical area. Vision Research, 45:3202–3209, 2005.
Google Scholar
E. Smith and M. S. Lewicki. Efficient coding of time-relative structure using spikes. Neural Computation, 17:19–45, 2006.
Google Scholar
R. Welch and D. Warren. Intersensory interaction. In K.R. Boff, L.Kaufmann, and J. Thomas, editors, Hdb. of Perception and Human Performance, chapter 25, pages 1–36. Kluwer Academic, Dordrecht, 1989.
Google Scholar
S. Wolf. Lokalisation von Schallquellen in geschlossenen Rumen (Localization of sound sources in enclosed spaces). doct. diss., Ruhr-Univ. Bochum, Germany, 1991.
Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge suggestions of their external reviewers who helped to improve the clarity of presentation. Particular thanks are due to P. A. Cariani, who contributed relevantly by commenting the chapter from the viewpoint of biological cybernetics.

Author information

Authors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Bochum, Germany
J. Blauert & D. Kolossa
Neural Information Systems, Technische Universität Berlin, Berlin, Germany
K. Obermayer & K. Adiloğlu

Authors

J. Blauert
View author publications
You can also search for this author in PubMed Google Scholar
D. Kolossa
View author publications
You can also search for this author in PubMed Google Scholar
K. Obermayer
View author publications
You can also search for this author in PubMed Google Scholar
K. Adiloğlu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Blauert .

Editor information

Editors and Affiliations

Fak. Elektrotechnik, LS Allgm.Elektrotechn.+Akustik, Univ. Bochum, Bochum, Germany
Jens Blauert

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Blauert, J., Kolossa, D., Obermayer, K., Adiloğlu, K. (2013). Further Challenges and the Road Ahead. In: Blauert, J. (eds) The Technology of Binaural Listening. Modern Acoustics and Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37762-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-37762-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37761-7
Online ISBN: 978-3-642-37762-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics