skip to main content
10.1145/2578726.2578765acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
tutorial

Interactive Surveillance Event Detection through Mid-level Discriminative Representation

Published:01 April 2014Publication History

ABSTRACT

Event detection from real surveillance videos with complicated background environment is always a very hard task. Different from the traditional retrospective and interactive systems designed on this task, which are mainly executed on video fragments located within the event-occurrence time, in this paper we propose a new interactive system constructed on the mid-level discriminative representations (patches/shots) which are closely related to the event (might occur beyond the event-occurrence period) and are easier to be detected than video fragments. By virtue of such easily-distinguished mid-level patterns, our framework realizes an effective labor division between computers and human participants. The task of computers is to train classifiers on a bunch of mid-level discriminative representations, and to sort all the possible mid-level representations in the evaluation sets based on the classifier scores. The task of human participants is then to readily search the events based on the clues offered by these sorted mid-level representations. For computers, such mid-level representations, with more concise and consistent patterns, can be more accurately detected than video fragments utilized in the conventional framework, and on the other hand, a human participant can always much more easily search the events of interest implicated by these location-anchored mid-level representations than conventional video fragments containing entire scenes. Both of these two properties facilitate the availability of our framework in real surveillance event detection applications.

References

  1. S. Ali and M. Shah. Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):288--303, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. F. Bobick and J. W. Davis. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3):257--267, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Cai, Q. Chen, L. Brown, A. Datta, Q. Fan, R. Feris, S. Yan, A. Hauptmann, and S. Pankanti. CMU-IBM-NUS@ TRECVID 2012: Surveillance event detection. In Proc. TRECVID, 2012.Google ScholarGoogle Scholar
  5. M. Chen and A. Hauptmann. Mosift: Recognizing human actions in surveillance videos. Technical report, Carnegie Mellon University, 2009.Google ScholarGoogle Scholar
  6. Y. Cheng, L. Brown, Q. Fan, R. Feris, A. Choudhary, and S. Pankanti. IBM-Northwestern@TRECVID 2013: Surveillance event detection(sed). In Proc. TRECVID, 2013.Google ScholarGoogle Scholar
  7. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. Efros. What makes paris look like paris? In SIGGRAPH, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. V. Duong, H. H. Bui, D. Q. Phung, and S. Venkatesh. Activity recognition and abnormality detection with the switching hidden semi-markov model. In CVPR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627--1645, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Fengjun and R. Nevatia. Single view human action recognition using key pose matching and viterbi path searching. In CVPR, 2007.Google ScholarGoogle Scholar
  12. S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221--231, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  15. I. Laptev and P. Perez. Retrieving actions in movies. In ICCV, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  16. N. Oliver, A. Garg, and E. Horvitz. Layered representations for learning and inferring office activity from multiple sensory channels. Computer Vision and Image Understanding, 96(2):163--180, 11 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Paul, G. Awad, M. Michel, J. Fiscus, W. Kraaij, A. Smeaton, and G. Quenot. TRECVID 2011-an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proc. TRECVID, 2011.Google ScholarGoogle Scholar
  18. O. Paul, A. George, M. Martial, F. Jonathan, S. Greg, S. Barbara, K. Wessel, F. S. Alan, and Q. Georges. TRECVID 2012 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proc. TRECVID, 2013.Google ScholarGoogle Scholar
  19. R. Poppe. A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976--990, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Rose, J. Fiscus, P. Over, J. Garofolo, and M. Michel. The TRECVID 2008 event detection evaluation. In Applications of Computer Vision (WACV), 2009.Google ScholarGoogle Scholar
  21. D. A. Sadlier and N. E. O'Connor. Event detection in field sports video using audio-visual features and a support vector machine. IEEE Transactions on Circuits and Systems for Video Technology, 15(10):1225--1233, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Shyu, Z. Xie, M. Chen, and S. Chen. Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia, 10(2):252--259, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Singh, A. Gupta, and A. Efros. Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Stauffer and W. E. L. Grimson. Adaptive background mixture models for real-time tracking. In CVPR, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3):480--492, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Wang, A. Klaser, C. Schmid, and C. L. Liu. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1):60--79, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  27. Z. Xia, X. Fang, Y. Wang, H. Zhang, and Y. Tian. PKU-NEC@TRECVID 2012 SED: Uneven-sequence based event detection in surveillance video. In Proc. TRECVID, 2012.Google ScholarGoogle Scholar
  28. K. Yan, R. Sukthankar, and M. Hebert. Event detection in crowded videos. In ICCV, 2007.Google ScholarGoogle Scholar
  29. X. Yang, Z. Liu, E. Zavesky, D. Gibbon, and B. Shahraray. AT&T research at TRECVID 2013:surveillance event detection. In Proc. TRECVID, 2013.Google ScholarGoogle Scholar
  30. J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In CVPR, 2009.Google ScholarGoogle Scholar
  31. Z. Zhang, Y. Hu, S. Chan, and L. Chia. Motion context: A new representation for human action recognition. In ECCV, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Z. Zhao, Y. Zhao, Y. Hua, W. Wang, D. Wan, G. Jia, Z. Li, F. Su, and A. Cai. BUPT-MCPRL at TRECVID. In Proc. TRECVID, 2012.Google ScholarGoogle Scholar
  33. G. Zhu, M. Yang, K. Yu, W. Xu, and Y. Gong. Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor. In ACM MM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Interactive Surveillance Event Detection through Mid-level Discriminative Representation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICMR '14: Proceedings of International Conference on Multimedia Retrieval
      April 2014
      564 pages
      ISBN:9781450327824
      DOI:10.1145/2578726

      Copyright © 2014 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 April 2014

      Check for updates

      Qualifiers

      • tutorial
      • Research
      • Refereed limited

      Acceptance Rates

      ICMR '14 Paper Acceptance Rate21of111submissions,19%Overall Acceptance Rate254of830submissions,31%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader