tutorial

Interactive Surveillance Event Detection through Mid-level Discriminative Representation

Authors:
Chenqiang Gao

SCS, Carnegie Mellon, University, USA

SCS, Carnegie Mellon, University, USA
View Profile

,
Deyu Meng

SMS, Xi'an Jiaotong, University, China

SMS, Xi'an Jiaotong, University, China
View Profile

,
Wei Tong

SCS, Carnegie Mellon, University, USA

SCS, Carnegie Mellon, University, USA
View Profile

,
Yi Yang

SCS, Carnegie Mellon, University, USA

SCS, Carnegie Mellon, University, USA
View Profile

,
Yang Cai

SCS, Carnegie Mellon, University, USA

SCS, Carnegie Mellon, University, USA
View Profile

,
Haoquan Shen

SCS, Carnegie Mellon, University, USA

SCS, Carnegie Mellon, University, USA
View Profile

,
Gaowen Liu

CSD, University of Trento, Italy

CSD, University of Trento, Italy
View Profile

,
Shicheng Xu

SCS, Carnegie Mellon, University, USA

SCS, Carnegie Mellon, University, USA
View Profile

,
Alexander G. Hauptmann

SCS, Carnegie Mellon, University, USA

SCS, Carnegie Mellon, University, USA
View Profile

ICMR '14: Proceedings of International Conference on Multimedia RetrievalApril 2014Pages 305–312https://doi.org/10.1145/2578726.2578765

Published:01 April 2014Publication History

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

Pages 305–312

ABSTRACT

Event detection from real surveillance videos with complicated background environment is always a very hard task. Different from the traditional retrospective and interactive systems designed on this task, which are mainly executed on video fragments located within the event-occurrence time, in this paper we propose a new interactive system constructed on the mid-level discriminative representations (patches/shots) which are closely related to the event (might occur beyond the event-occurrence period) and are easier to be detected than video fragments. By virtue of such easily-distinguished mid-level patterns, our framework realizes an effective labor division between computers and human participants. The task of computers is to train classifiers on a bunch of mid-level discriminative representations, and to sort all the possible mid-level representations in the evaluation sets based on the classifier scores. The task of human participants is then to readily search the events based on the clues offered by these sorted mid-level representations. For computers, such mid-level representations, with more concise and consistent patterns, can be more accurately detected than video fragments utilized in the conventional framework, and on the other hand, a human participant can always much more easily search the events of interest implicated by these location-anchored mid-level representations than conventional video fragments containing entire scenes. Both of these two properties facilitate the availability of our framework in real surveillance event detection applications.

References

S. Ali and M. Shah. Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):288--303, 2010. Google ScholarDigital Library
M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In ICCV, 2005. Google ScholarDigital Library
A. F. Bobick and J. W. Davis. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3):257--267, 2001. Google ScholarDigital Library
Y. Cai, Q. Chen, L. Brown, A. Datta, Q. Fan, R. Feris, S. Yan, A. Hauptmann, and S. Pankanti. CMU-IBM-NUS@ TRECVID 2012: Surveillance event detection. In Proc. TRECVID, 2012.Google Scholar
M. Chen and A. Hauptmann. Mosift: Recognizing human actions in surveillance videos. Technical report, Carnegie Mellon University, 2009.Google Scholar
Y. Cheng, L. Brown, Q. Fan, R. Feris, A. Choudhary, and S. Pankanti. IBM-Northwestern@TRECVID 2013: Surveillance event detection(sed). In Proc. TRECVID, 2013.Google Scholar
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, 2005. Google ScholarDigital Library
C. Doersch, S. Singh, A. Gupta, J. Sivic, and A. Efros. What makes paris look like paris? In SIGGRAPH, 2012. Google ScholarDigital Library
T. V. Duong, H. H. Bui, D. Q. Phung, and S. Venkatesh. Activity recognition and abnormality detection with the switching hidden semi-markov model. In CVPR, 2005. Google ScholarDigital Library
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627--1645, 2010. Google ScholarDigital Library
L. Fengjun and R. Nevatia. Single view human action recognition using key pose matching and viterbi path searching. In CVPR, 2007.Google Scholar
S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221--231, 2013. Google ScholarDigital Library
I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003. Google ScholarDigital Library
I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008.Google ScholarCross Ref
I. Laptev and P. Perez. Retrieving actions in movies. In ICCV, 2007.Google ScholarCross Ref
N. Oliver, A. Garg, and E. Horvitz. Layered representations for learning and inferring office activity from multiple sensory channels. Computer Vision and Image Understanding, 96(2):163--180, 11 2004. Google ScholarDigital Library
O. Paul, G. Awad, M. Michel, J. Fiscus, W. Kraaij, A. Smeaton, and G. Quenot. TRECVID 2011-an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proc. TRECVID, 2011.Google Scholar
O. Paul, A. George, M. Martial, F. Jonathan, S. Greg, S. Barbara, K. Wessel, F. S. Alan, and Q. Georges. TRECVID 2012 -- an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proc. TRECVID, 2013.Google Scholar
R. Poppe. A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976--990, 2010. Google ScholarDigital Library
T. Rose, J. Fiscus, P. Over, J. Garofolo, and M. Michel. The TRECVID 2008 event detection evaluation. In Applications of Computer Vision (WACV), 2009.Google Scholar
D. A. Sadlier and N. E. O'Connor. Event detection in field sports video using audio-visual features and a support vector machine. IEEE Transactions on Circuits and Systems for Video Technology, 15(10):1225--1233, 2005. Google ScholarDigital Library
M. Shyu, Z. Xie, M. Chen, and S. Chen. Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia, 10(2):252--259, 2008. Google ScholarDigital Library
S. Singh, A. Gupta, and A. Efros. Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012. Google ScholarDigital Library
C. Stauffer and W. E. L. Grimson. Adaptive background mixture models for real-time tracking. In CVPR, 1999.Google ScholarCross Ref
A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3):480--492, 2012. Google ScholarDigital Library
H. Wang, A. Klaser, C. Schmid, and C. L. Liu. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1):60--79, 2013.Google ScholarCross Ref
Z. Xia, X. Fang, Y. Wang, H. Zhang, and Y. Tian. PKU-NEC@TRECVID 2012 SED: Uneven-sequence based event detection in surveillance video. In Proc. TRECVID, 2012.Google Scholar
K. Yan, R. Sukthankar, and M. Hebert. Event detection in crowded videos. In ICCV, 2007.Google Scholar
X. Yang, Z. Liu, E. Zavesky, D. Gibbon, and B. Shahraray. AT&T research at TRECVID 2013:surveillance event detection. In Proc. TRECVID, 2013.Google Scholar
J. Yuan, Z. Liu, and Y. Wu. Discriminative subvolume search for efficient action detection. In CVPR, 2009.Google Scholar
Z. Zhang, Y. Hu, S. Chan, and L. Chia. Motion context: A new representation for human action recognition. In ECCV, 2008.Google ScholarDigital Library
Z. Zhao, Y. Zhao, Y. Hua, W. Wang, D. Wan, G. Jia, Z. Li, F. Su, and A. Cai. BUPT-MCPRL at TRECVID. In Proc. TRECVID, 2012.Google Scholar
G. Zhu, M. Yang, K. Yu, W. Xu, and Y. Gong. Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor. In ACM MM, 2009. Google ScholarDigital Library

Index Terms

Interactive Surveillance Event Detection through Mid-level Discriminative Representation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization

Recommendations

Unsupervised Discovery of Mid-Level Discriminative Patches
Proceedings, Part II, of the 12th European Conference on Computer Vision --- ECCV 2012 - Volume 7573

The goal of this paper is to discover a set of discriminative patches which can serve as a fully unsupervised mid-level visual representation. The desired patches need to satisfy two requirements: 1 to be representative, they need to occur frequently ...
Read More
Detection bank: an object detection based video representation for multimedia event recognition
MM '12: Proceedings of the 20th ACM international conference on Multimedia

While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual ...
Read More
Interactive surveillance event detection at TRECVid2012
ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

This demonstration shows the integration of video analysis and search tools to facilitate the interactive retrieval of video segments depicting specific activities from surveillance footage. The implementation was developed by members of the SAVASA ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '14: Proceedings of International Conference on Multimedia Retrieval
April 2014
564 pages
ISBN:9781450327824
DOI:10.1145/2578726
Conference Chairs:
Mohan Kankanhalli
National University of Singapore
,
Stefan Rueger
The Open University, UK
,
R. Manmatha
A9.com, USA
,
General Chairs:
Joemon Jose
University of Glasgow, UK
,
Keith van Rijsbergen
University of Glasgow, UK
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 2014
Check for updates
Author Tags
Interactive
Mid-level discriminative representation
Surveillance event detection
TRECVID
Qualifiers
- tutorial
- Research
- Refereed limited
Conference

Acceptance Rates
ICMR '14 Paper Acceptance Rate21of111submissions,19%Overall Acceptance Rate254of830submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 283
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Interactive Surveillance Event Detection through Mid-level Discriminative Representation

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised Discovery of Mid-Level Discriminative Patches

Detection bank: an object detection based video representation for multimedia event recognition

Interactive surveillance event detection at TRECVid2012