Skip to main content

DIRAC: Detection and Identification of Rare Audio-Visual Events

  • Chapter
Detection and Identification of Rare Audiovisual Cues

Abstract

The DIRAC project was an integrated project that was carried out between January 1st 2006 and December 31st 2010. It was funded by the European Commission within the Sixth Framework Research Programme (FP6) under contract number IST-027787. Ten partners joined forces to investigate the concept of rare events in machine and cognitive systems, and developed multi-modal technology to identify such events and deal with them in audio-visual applications.

This document summarizes the project and its achievements. In Section 2 we present the research and engineering problem that the project set out to tackle, and discuss why we believe that advance made on solving these problems will get us closer to achieving the general objective of building artificial cognitive system with cognitive capabilities. We describe the approach taken to solving the problem, detailing the theoretical framework we came up with. We further describe how the inter-disciplinary nature of our research and evidence collected from biological and cognitive systems gave us the necessary insights and support for the proposed approach. In Section 3 we describe our efforts towards system design that follow the principles identified in our theoretical investigation. In Section 4 we describe a variety of algorithms we have developed in the context of different applications, to implement the theoretical framework described in Section 2. In Section 5 we describe algorithmic progress on a variety of questions that concern the learning of those rare events as defined in our Section 2. Finally, in Section 6 we describe our application scenarios, an integrated test-bed developed to test our algorithms in an integrated way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bach, J.-H., Anemüller, J.: Detecting novel objects through classifier incongruence. In: Interspeech, pp. 2206–2209 (2010)

    Google Scholar 

  2. Bach, J.-H., Kollmeier, B., Anemüller, J.: Modulation-based detection of speech in real background noise: Generalization to novel background classes. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 41–44 (2010)

    Google Scholar 

  3. De Baene, W., Premereur, E., Vogels, R.: Properties of shape tuning of macaque inferior temporal neurons examined using Rapid Serial Visual Presentation. Journal of Neurophysiology 97, 2900–2916 (2007)

    Article  Google Scholar 

  4. Burget, L., Schwarz, P., Matejka, P., Hannemann, M., Rastrow, A., White, C., Khudanpur, S., Hermansky, H., Cernocky, J.: Combination of strongly and weakly constrained recognizers for reliable detection of OOVs. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), p. 4 (2008)

    Google Scholar 

  5. Castellini, C., Tommasi, T., Noceti, N., Odone, F., Caputo, B.: Using object affordances to improve object recognition. IEEE Transaction on Autonomous Mental Development (2011)

    Google Scholar 

  6. De Baene, W., Vogels, R.: Effects of adaptation on the stimulus selectivity of macaque inferior temporal spiking activity and local field potentials. Cerebral Cortex 20(9), 2145–2165 (2010)

    Article  Google Scholar 

  7. De Baene, W., Ons, B., Wagemans, J., Vogels, R.: Effects of category learning on the stimulus selectivity of macaque inferior temporal neurons. Learning and Memory 15, 717–727 (2008)

    Article  Google Scholar 

  8. Deliano, Ohl: Neurodynamics of category learning: Towards understanding the creation of meaning in the brain. New Mathematics and Natural Computation (NMNC) 5, 61–81 (2009)

    Article  MATH  Google Scholar 

  9. Hannemann, M., et al.: Similarity scoring for recognized repeated Out-of-Vocabulary words. In: Proc. Interspeech 2010, Makuhari, Japan (2010)

    Google Scholar 

  10. Hermansky, H.: Dealing With Unexpected Words in Automatic Recognition of Speech. Technical report, Idiap Research Institute (2008)

    Google Scholar 

  11. Herrmann, C.S., Ohl, F.W.: Cognitive adequacy in brain-like intelligence. In: Sendhoff, B., Körner, E., Sporns, O., Ritter, H., Doya, K. (eds.) Creating Brain-Like Intelligence. LNCS, vol. 5436, pp. 314–327. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Jie, L., Orabona, F., Caputo, B.: An online framework for learning novel concepts over multiple cues. In: Proceedings of Asian Conference on Computer Vision (ACCV), vol. 1, pp. 1–12 (2009)

    Google Scholar 

  13. Ketabdar, H., Hannemann, M., Hermansky, H.: Detection of Out-of-Vocabulary (2007)

    Google Scholar 

  14. Kayser, H., Ewert, S.D., Anemüller, J., Rohdenburg, T., Hohmann, V., Kollmeier, B.: Database of Multichannel In-Ear and Behind-the-Ear Head-Related and Binaural Room Impulse Responses. EURASIP Journal on Advances in Signal Processing, 1–10 (2009)

    Google Scholar 

  15. Words in Posterior Based ASR. In: 8th Annual Conference of the International Speech Communication Association INTERSPEECH 2007, pp. 1757–1760 (2007)

    Google Scholar 

  16. Kombrink, S.: OOV detection and beyond. In: DIRAC Workshop at ECML/PKDD (2010)

    Google Scholar 

  17. Kombrink, S., Hannemann, M., Burget, L., Heřmanský, H.: Recovery of rare words in lecture speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS(LNAI), vol. 6231, pp. 330–337. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Kombrink, S., Burget, L., Matejka, P., Karafiat, M., Hermansky, H.: Posterior-based Out of Vocabulary Word Detection in Telephone Speech. In: ISCA, Interspeech 2009, Brighton, GB, pp. 80–83 (2009), ISSN 1990-9772

    Google Scholar 

  19. Mikolov, T., Karafiát, M., Burget, L., Černocký, J., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), pp. 1045–1048 (2010)

    Google Scholar 

  20. Nater, F., Grabner, H., Jaeggli, T., Gool, L.v.: Tracker trees for unusual event detection. In: ICCV 2009 Workshop on Visual Surveillance (2009)

    Google Scholar 

  21. Nater, F., Vangeneugden, J., Grabner, H., Gool, L.v., Vogels, R.: Discrimination of locomotion direction at different speeds: A comparison between macaque monkeys and algorithms. In: ECML Workshop on rare audio-visual cues (2010)

    Google Scholar 

  22. Orabona, F., Jie, L., Caputo, B.: Online-Batch Strongly Convex Multi Kernel Learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (2010)

    Google Scholar 

  23. Orabona, F., Caputo, B., Fillbrandt, A., Ohl, F.: A Theoretical Framework for Transfer of Knowledge Across Modalities in Artificial and Biological Systems. In: IEEE 8th International Conference on Development and Learning, ICDL 2009 (2009)

    Google Scholar 

  24. Orabona, F., Castellini, C., Caputo, B., Luo, J., Sandini, G.: Towards Life-long Learning for Cognitive Systems: Online Independent Support Vector Machine. Pattern Recognition 43(4), 1402–1412 (2010)

    Article  MATH  Google Scholar 

  25. Orabona, F., Keshet, J., Caputo, B.: Bounded kernel-based perceptrons. Journal of Machine Learning Research 10, 2643–2666 (2009)

    MathSciNet  Google Scholar 

  26. Orabona, F., Keshet, J., Caputo, B.: The projectron: a bounded kernel-based perceptron. In: 25th International Conference on Machine Learning (2008)

    Google Scholar 

  27. Orabona, F., Castellini, C., Caputo, B., Luo, J., Sandini, G.: Indoor Place Recognition using Online Independent Support Vector Machines. In: Proceedings of the 18th British Machine Vision Conference (BMVC), pp. 1090–1099 (2007)

    Google Scholar 

  28. Pajdla, T., Havlena, M., Heller, J., Kayser, H., Bach, J.-H., Anemüller, J.: Incongruence Detection for Detecting, Removing, and Repairing Incorrect Functionality in Low-Level Processing (CTU-CMP-2009-19). Technical report, CTU Research Report (2009)

    Google Scholar 

  29. Schmidt, D., Anemüeller, J.: Acoustic Feature Selection for Speech Detection Based on Amplitude Modulation Spectrograms. In: Fortschritte der Akustik: DAGA 2007, Deutsche Gesellschaft für Akustik (DEGA), pp. 347–348 (2007)

    Google Scholar 

  30. Szöke, I., Fapso, M., Burget, L., Cernocky, J.: Hybrid Word-Subword Decoding for Spoken Term Detection. In: SSCS 2008 - Speech search Workshop at SIGIR, p. 4 (2008)

    Google Scholar 

  31. Tommasi, T., Orabona, F., Caputo, B.: Safety in numbers: learning categories from few examples with multi model knowledge transfer. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010 (2010)

    Google Scholar 

  32. Tommasi, T., Caputo, B.: The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories. In: British Machine Vision Conference, BMVC 2009 (2009)

    Google Scholar 

  33. Vangeneugden, J., De Mazière, P., Van Hulle, M., Jaeggli, T., Van Gool, L., Vogels, R.: Distinct Mechanisms for Coding of Visual Actions in Macaque Temporal Cortex. Journal of Neuroscience 31(2), 385–401 (2011)

    Article  Google Scholar 

  34. Vangeneugden, J., Vancleef, K., Jaeggli, T., Van Gool, L., Vogels, R.: Discrimination of locomotion direction in impoverished displays of walkers by macaque monkeys. Journal of Vision 10(4), 22.1–22.19 (2010)

    Article  Google Scholar 

  35. Vangeneugden, J., Pollick, F., Vogels, R.: Functional differentiation of macaque visual temporal cortical neurons using a parametric action space. Cerebral Cortex 19(3), 593–611 (2009)

    Article  Google Scholar 

  36. Verhoef, B.E., Kayaert, G., Franko, E., Vangeneugden, J., Vogels, R.: Stimulus similarity-contingent neural adaptation can be time and cortical area dependent. Journal of Neuroscience 28, 10631–10640 (2008)

    Article  Google Scholar 

  37. White, C., Zweig, G., Burget, L., Schwarz, P., Hermansky, H.: Confidence Estimation, Oov Detection And Language Id Using Phone-To-Word Transduction And Phone-Level Alignments. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 4085–4088 (2008)

    Google Scholar 

  38. Witte, H., Charpentier, M., Mueller, M., Voigt, T., Deliano, M., Garke, B., Veit, P., Hempel, T., Diez, A., Reiher, A., Ohl, F., Dadgar, A., Christen, J., Krost, A.: Neuronal cells on GaN-based materials. Deutsche Physikalische Gesellschaft, Spring Meeting of the Deutsche Physikalische Gesellschaft, Berlin (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Anemüller, J. et al. (2012). DIRAC: Detection and Identification of Rare Audio-Visual Events. In: Weinshall, D., Anemüller, J., van Gool, L. (eds) Detection and Identification of Rare Audiovisual Cues. Studies in Computational Intelligence, vol 384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24034-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24034-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24033-1

  • Online ISBN: 978-3-642-24034-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics