Skip to main content

Deep Learning and Modelling of Audio-, Visual-, and Multimodal Audio-Visual Data in Brain-Inspired SNN

  • Chapter
  • First Online:
Time-Space, Spiking Neural Networks and Brain-Inspired Artificial Intelligence

Part of the book series: Springer Series on Bio- and Neurosystems ((SSBN,volume 7))

  • 2778 Accesses

Abstract

This chapter presents methods for audio-, visual- and for the integrated audio and visual information processing using brain-inspired SNN architectures such as NeuCube. Case studies are presented for short musical pieces recognition, fast moving object recognition, age-invariant face identification, moving digits recognition and other.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. N. Kasabov, Evolving Connectionist Systems: The Knowledge Engineering Approach (Springer, 2007)

    Google Scholar 

  2. L. Benuskova, N. Kasabov, Computational Neurogenetic Modelling, Topics in Biomedical Engineering. International Book Series, ISBN 978-0-387-48355-9

    Google Scholar 

  3. G. Saraceno, Deep learning and memorizing of spectro-temporal data (music) in the spatio-temporal brain (Master Thesis, University of Trento, 2017)

    Google Scholar 

  4. J.L. Eriksson, A.E.P. Villa, Artificial neural networks simulation of learning of auditory equivalence classes for vowels, in International Joint Conference on Neural Networks, IJCNN. (Vancouver, 2006), pp. 1453–1460

    Google Scholar 

  5. D.D. Greenwood, Critical bandwidth and the frequency coordinates of the basilar membrane. J. Acoust. Soc. Am. 33(10), 1961 (1961). https://doi.org/10.1141/1.1908437

    Article  Google Scholar 

  6. D.D. Greenwood, A cochlear frequency-position function for several species–29 years later. J. Acoust. Soc. Am. 87(6), 2592–2605 (1990)

    Google Scholar 

  7. E. de Boer, H.R. de Jongh, On cochlear encoding: potentialities and limitations of the reverse-correlation technique. J. Acoust. Soc. Am. 63(1), 115–135 (1978)

    Google Scholar 

  8. J.B. Allen, How do humans process and recognize speech? IEEE Trans. Speech Audio Process. 2(4), 567–577 (1994)

    Google Scholar 

  9. E. Zwicker, Subdivision of the audible frequency range into critical bands (Frequenzgruppen). J. Acoust. Soc. Am. 33(1961), 248 (1961)

    Article  Google Scholar 

  10. B.R. Glasberg, B.C. Moore, Derivation of auditory filter shapes from notched-noise data. Hear Res. 47(1–2), 103–138 (1990)

    Google Scholar 

  11. T.J. Cole, J.A. Blendy, A.P. Monaghan, K. Krieglstein, W. Schmid, A. Aguzzi, G. Fantuzzi, E. Hummler, K. Unsicker, G. Schütz, Targeted disruption of the glucocorticoid receptor gene blocks adrenergic chromaffin cell development and severely retards lung maturation. Genes Dev. 9(14), 1608–1621 (1995)

    Google Scholar 

  12. A.M. Aertsen, J.H. Olders, P.I. Johannesma, Spectro-temporal receptive fields of auditory neurons in the grassfrog. III. Analysis of the stimulus-event relation for natural stimuli. Biol. Cybern. 39(3), 195–209 (1981)

    Google Scholar 

  13. N. Kasabov, E. Postma, J. van den Herik, AVIS: a connectionist-based framework for integrated auditory and visual information processing. Inf. Sci. 143(2000), 147–148 (2000)

    Google Scholar 

  14. E.O. Postma, H.J. van den Herik, P.T.W. Hudson, Image recognition by brains and machines, in Brain-like Computing and Intelligent Information Systems, ed. by S. Amari, N. Kasabov (Springer, Singapore, 1998), pp. 25–47

    Google Scholar 

  15. E.O. Postma, H. J. van den Herik, P.T.W. Hudson, SCAN: a scalable model of covert attention. Neural Netw. 10, 993–1015 (1997)

    Google Scholar 

  16. K. Kim, N. Relkin, K.-M. Lee, J. Hirsch, Distinct cortical areas associated with native and second languages. Nature 388, 171–174 (1997)

    Google Scholar 

  17. S. Wysoski, L. Benuskova, N. Kasabov, Evolving spiking neural networks for audio-visual information processing. Neural Netw. 23(7), 819–835 (2010)

    Google Scholar 

  18. S. Wysoski, L. Benuskova, N. Kasabov, Fast and adaptive network of spiking neurons for multi-view visual pattern recognition. Neurocomputing 71(14–15), 2563–2575 (2008)

    Google Scholar 

  19. A. Ross, A.K. Jain, Information fusion in biometrics. Pattern Recognit. Lett. 24(14), 2115–2145 (2003)

    Google Scholar 

  20. C. Sanderson, K.K. Paliwal, Identity verification using speech and face information. Digital Signal Process. 14(2004), 449–480 (2004)

    Article  Google Scholar 

  21. A. Sharkey, Combining Artificial Neural Nets: Ensemble and Modular Multi-net Systems (Springer, Heidelberg, 1999)

    Google Scholar 

  22. B.E. Stein, M.A. Meredith, The Merging of the Senses (MIT Press, Cambridge, 1993)

    Google Scholar 

  23. S.G. Wysoski, L. Benuskova, N. Kasabov, Adaptive spiking neural networks for audiovisual pattern recognition, ICONIP. Lecture notes in computer science (2007) (to appear)

    Google Scholar 

  24. C. Sanderson, K.K. Paliwal, Identity verification using speech and face information. Digital Signal Process. 14(2004), 449–480 (2004)

    Article  Google Scholar 

  25. N. Kasabov, NeuCube: a spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data. Neural Netw. Off. J. Int. Neural Netw. Soc. 52, 62–76 (2014). https://doi.org/10.1016/j.neunet.2014.01.006

    Article  Google Scholar 

  26. A. Wendt, G. Sraceno, L. Paulum, N. Kasabov, Audio-visual data processing and concept formation (internal report), https://kedri.aut.ac.nz/R-and-D-Systems/audio-visual-data-processing-and-concept-formation#auditory

  27. C. Ge, N. Kasabov, Z. Liu, J. Yang, A spiking neural network model for obstacle avoidance in simulated prosthetic vision. Inf. Sci. 399(30–42), 2017 (2017)

    Google Scholar 

  28. A.R. McIntosh, R.E. Cabeza, N.J. Lobaugh, Analysis of neural interactions explains the activation of occipital cortex by an auditory stimulus. J. Neurophysiol. 80(1998), 2790–2796 (1998)

    Article  Google Scholar 

  29. N. Kasabov, K. Dhoble, N. Nuntalid, G. Indiveri, Dynamic evolving spiking neural networks for on-line spatio- and spectro-temporal pattern recognition. Neural Netw. Off. J. Int. Neural Netw. Soc. 41, 188–201 (2014). https://doi.org/10.1016/j.neunet.2014.11.014

    Article  Google Scholar 

  30. P. Viola, M.J. Jones, Rapid object detection using a boosted cascade of simple features. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1(2001), 511–518 (2001)

    Google Scholar 

  31. A. Delorme, L. Perrinet, S. Thorpe, Networks of integrate-and-fire neurons using rank order coding. Neurocomputing 2001, 38–48 (2001)

    Google Scholar 

  32. A. Delorme, S. Thorpe, Face identification using one spike per neuron: resistance to image degradation. Neural Netw. 14(2001), 795–803 (2001)

    Article  Google Scholar 

  33. D. Gonzalo, T. Shallice, R. Dolan, Time-dependent changes in learning audiovisual associations: a single-trial fMRI study. NeuroImage 11, 243–255 (2000)

    Article  Google Scholar 

  34. A.M. Burton, V. Bruce, R.A. Johnston, Understanding face recognition with an interactive activation model. B. J. Psychol. 81, 361–380 (1990)

    Article  Google Scholar 

  35. A.W. Ellis, A. Young, D.C. Hay, Modelling the recognition of faces and words, in Modelling Cognition, ed. by P.E. Morris (Wiley, London, 1987), p. 1987

    Google Scholar 

  36. H.D. Ellis, D.M. Jones, N. Mosdell, Intra- and inter-modal repetition priming of familiar faces and voices. B. J. Psycol. 88, 143–156 (1997)

    Article  Google Scholar 

  37. K. Kriegstein, A. von, Giraud, Implicit multisensory associations influence voice recognition. PLoS Biol. 4(10), 1809–1820 (2006)

    Google Scholar 

  38. W. Cui, W.Q. Yan, N. Kasabov, Deep learning with NeuCube for fats moving object recognition (KEDRI internal report), https://kedri.aut.ac.nz/R-and-D-Systems/fast-moving-object-recognition

  39. F.B. Alvi, R. Pears, N. Kasabov, An evolving spatio-temporal approach for gender and age group classification with spiking neural networks. Evolv. Syst. (2017). https://doi.org/10.1007/s14530-017-9175-y

  40. J.K. Ricanek, T. Tesafaye, Morph: a longitudinal image database of normal adult age-progression, in 7th International Conference on Automatic Face and Gesture Recognition. FGR 2006. (IEEE, 2006), pp. 341–345

    Google Scholar 

  41. L.G. Farkas, Anthropometry of the Head and Face (Raven Press, New York, 1994)

    Google Scholar 

  42. Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  43. L. Paulun, A. Abbott, N. Kasabov, A retinotopic spiking neural network system for accurate recognition of moving objects using NeuCube and dynamic vision sensors. Front. Comput. Neurosci. 12, 42 (2018)

    Google Scholar 

  44. D. Purves, Neuroscience (Sinauer, Sunderland, MA, 2014)

    Google Scholar 

  45. T. Delbruck, Frame-free dynamic digital vision (University of Tokyo, 2008). L. Gorelick, M. Blank, E. Shechtman, Actions as Space-Time Shapes (2007). http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html. Accessed 29 Aug 2017

  46. S. Song, K.D. Miller, L.F. Abbott, Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat. Neurosci. 3, 919–926 (2000). https://doi.org/10.1038/78829

    Article  Google Scholar 

  47. N. Kasabov, E. Capecci, Spiking neural network methodology for modelling, classification and understanding of EEG spatio-temporal data measuring cognitive processes. Inf. Sci. 294, 565–575 (2015). https://doi.org/10.1016/j.ins.2014.06.028

    Article  MathSciNet  Google Scholar 

  48. T. Serrano-Gotarredona, B. Linares-Barranco, Poker-DVS and MNIST-DVS. Their history, how they were made, and other details. Front. Neurosci. 9, 481 (2015). https://doi.org/10.3389/fnins.2015.00481

    Article  Google Scholar 

  49. Q. Liu, G. Pineda-García, E. Stromatias, T. Serrano-Gotarredona, S.B. Furber, Benchmarking spike-based visual recognition: a dataset and evaluation. Front. Neurosci. 10, 496 (2016). https://doi.org/10.3389/fnins.2016.00496

    Article  Google Scholar 

  50. A. Yousefzadeh, T. Serrano-Gotarredona, B. Linares-Barranco, MNIST-DVS and FLASH-MNIST-DVS Databases (2015). http://www2.imse-cnm.csic.es/caviar/MNISTDVS.html. Accessed 21 Aug 2017

  51. S. Thorpe, J. Gautrais, Rank order coding, in Computational Neuroscience: Trends in Research, 1998, ed. by J.M. Bower (Springer US, Boston, 1999), pp. 114–118

    Google Scholar 

  52. H. Wei, Y. Ren, A mathematical model of retinal ganglion cells and its applications in image representation. Neural Process. Lett. 38, 205–226 (2014). https://doi.org/10.1007/s11063-014-9249-6

    Article  Google Scholar 

  53. M. Nelson, J. Rinzel, The Hodgkin-Huxley model, in The Book of GENESIS, vol. 4, ed. by J.M. Bower, D. Beeman, (Springer, New York, 1995), pp. 27–51

    Google Scholar 

  54. S. Monsell, J. Driver, Control of Cognitive Processes: Attention and Performance XVIII (MIT Press, Cambridge, 2000)

    Google Scholar 

  55. J.A. Perez-Carrasco, C. Serrano, B. Acha, T. Serrano-Gotarredona, B. Linares-Barranco, spike-based convolutional network for real-time processing, in Proceedings of 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 Aug 2010 (IEEE, Piscataway, NJ, 2010), pp. 3085–3088

    Google Scholar 

  56. O. Bichler, D. Querlioz, S.J. Thorpe, J.P. Bourgoin, C. Gamrat, Extraction of temporally correlated features from dynamic vision sensors with spike-timing-dependent plasticity. Neural Netw. Off. J. Int. Neural Netw. Soc. 32, 339–348 (2014). https://doi.org/10.1016/j.neunet.2014.02.022

    Article  Google Scholar 

  57. A. Jimenez-Fernandez, C. Lujan-Martinez, R. Paz-Vicente, A. Linares-Barranco, G. Jimenez, A. Civit, in From Vision Sensor to Actuators, Spike Based Robot Control through Address-Event-Representation. IWANN 2009: Bio-Inspired Systems: Computational and Ambient Intelligence (2009), pp. 797–804

    Google Scholar 

  58. F. Perez-Peña, A. Morgado-Estevez, A. Linares-Barranco, A. Jimenez-Fernandez, F. Gomez-Rodriguez, G. Jimenez-Moreno, et al., Neuro-inspired spike-based motion: from dynamic vision sensor to robot motor open-loop control through spike-VITE. Sensors 14, 15805–15832 (2014). https://doi.org/10.3390/s141115805 (Basel, Switzerland)

  59. I. Laptev, B. Caputo, Recognition of Human Actions (2005). http://www.nada.kth.se/cvap/actions/. Accessed 29 Aug 2017

  60. L. Gorelick, M. Blank, E. Shechtman, Actions as Space-Time Shapes (2007). http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html. Accessed 29 Aug 2017

  61. C.A. Curcio, K.R. Sloan, R.E. Kalina, A.E. Hendrickson, Human photoreceptor topography. J. Comp. Neurol. 292, 497–523 (1990). https://doi.org/10.1002/cne.902920402

    Article  Google Scholar 

Download references

Acknowledgements

The material in this chapter is partially published in several publications as referenced in the text. I acknowledge the contribution of my co-authors L. Paulin, A. Wendt, F.Alvi, W. Cui, W. Yan. L. Benuskova, G. Saraceno. The DVS (not used here as a hardware, but as a bench mark data set recorded with its use) is developed in the INI ETH/UZH by T. Delbruck and his team.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikola K. Kasabov .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kasabov, N.K. (2019). Deep Learning and Modelling of Audio-, Visual-, and Multimodal Audio-Visual Data in Brain-Inspired SNN. In: Time-Space, Spiking Neural Networks and Brain-Inspired Artificial Intelligence . Springer Series on Bio- and Neurosystems, vol 7. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-57715-8_13

Download citation

Publish with us

Policies and ethics