HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Amer, Mohamed Rabie; Lei, Peng; Todorovic, Sinisa

doi:10.1007/978-3-319-10599-4_37

Mohamed Rabie Amer¹⁹,
Peng Lei¹⁹ &
Sinisa Todorovic¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8694))

Included in the following conference series:

European Conference on Computer Vision

17k Accesses
60 Citations

Abstract

This paper addresses the problem of recognizing and localizing coherent activities of a group of people, called collective activities, in video. Related work has argued the benefits of capturing long-range and higher-order dependencies among video features for robust recognition. To this end, we formulate a new deep model, called Hierarchical Random Field (HiRF). HiRF models only hierarchical dependencies between model variables. This effectively amounts to modeling higher-order temporal dependencies of video features. We specify an efficient inference of HiRF that iterates in each step linear programming for estimating latent variables. Learning of HiRF parameters is specified within the max-margin framework. Our evaluation on the benchmark New Collective Activity and Collective Activity datasets, demonstrates that HiRF yields superior recognition and localization as compared to the state of the art.

Download to read the full chapter text

Chapter PDF

Empowering Relational Network by Self-attention Augmented Conditional Random Fields for Group Activity Recognition

A progressive hierarchical analysis model for collective activity recognition

Article 07 October 2021

Learning Latent Constituents for Recognition of Group Activities in Video

Keywords

References

Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Comput. Surv. 43, 16:1–16:43 (2011)
Google Scholar
Amer, M., Todorovic, S., Fern, A., Zhu, S.: Monte carlo tree search for scheduling activity recognition. In: ICCV (2013)
Google Scholar
Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/Bottom-up inference for multiscale activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 187–200. Springer, Heidelberg (2012)
Chapter Google Scholar
Brendel, W., Fern, A., Todorovic, S.: Probabilistic event logic for interval-based event recognition. In: CVPR (2011)
Google Scholar
Chaquet, J.M., Carmona, E.J., Fernández-Caballero, A.: A survey of video datasets for human action and activity recognition. CVIU 117(6), 633–659 (2013)
Google Scholar
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012)
Chapter Google Scholar
Choi, W., Shahid, K., Savarese, S.: What are they doing?: Collective activity classification using spatio-temporal relationship among people. In: ICCV (2009)
Google Scholar
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Eslami, S.M.A., Heess, N., Williams, C.K.I., Winn, J.: The shape boltzmann machine: a strong model of object shape. IJCV (2013)
Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)
Google Scholar
Kae, A., Sohn, K., Lee, H., Learned-Miller, E.: Augmenting crfs with boltzmann machine shape priors for image labeling. In: CVPR (2013)
Google Scholar
Khamis, S., Morariu, V.I., Davis, L.S.: Combining per-frame and per-track cues for multi-person action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 116–129. Springer, Heidelberg (2012)
Chapter Google Scholar
Khamis, S., Morariu, V., Davis, L.: A flow model for joint action recognition and identity maintenance. In: CVPR (2012)
Google Scholar
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)
Google Scholar
Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)
Google Scholar
Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing contextual group activities. TPAMI (2012)
Google Scholar
Li, Y., Tarlow, D., Zemel, R.: Exploring complositional high order pattern potentials for structured output learning. In: CVPR (2013)
Google Scholar
Morariu, V.I., Davis, L.S.: Multi-agent event recognition in structured scenarios. In: Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Odashima, S., Shimosaka, M., Kaneko, T., Fukui, R., Sato, T.: Collective activity localization with contextual spatial pyramid. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 243–252. Springer, Heidelberg (2012)
Chapter Google Scholar
Pei, M., Jia, Y., Zhu, S.C.: Parsing video events with goal inference and intent prediction. In: ICCV (2011)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Stochastic Representation and Recognition of High-level Group Activities. IJCV (2011)
Google Scholar
Wang, S.B., Quattoni, A., Morency, L.P., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: CVPR (2006)
Google Scholar
Wang, Y., Mori, G.: Hidden part models for human action recognition: Probabilistic versus max margin. TPAMI (2011)
Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. CVIU 115, 224–241 (2011)
Google Scholar
Yuille, A.L., Rangarajan, A.: The concave-convex procedure. Neural Comput. 15(4), 915–936 (2003)
Article MATH Google Scholar
Zeng, Z., Ji, Q.: Knowledge based activity recognition with Dynamic Bayesian Network. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 532–546. Springer, Heidelberg (2010)
Chapter Google Scholar
Zhu, Y., Nayak, N.M., Roy-Chowdhury, A.K.: Context-aware modeling and recognition of activities in video. In: CVPR (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
Mohamed Rabie Amer, Peng Lei & Sinisa Todorovic

Authors

Mohamed Rabie Amer
View author publications
You can also search for this author in PubMed Google Scholar
Peng Lei
View author publications
You can also search for this author in PubMed Google Scholar
Sinisa Todorovic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amer, M.R., Lei, P., Todorovic, S. (2014). HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-10599-4_37
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Abstract

Chapter PDF

Similar content being viewed by others

Empowering Relational Network by Self-attention Augmented Conditional Random Fields for Group Activity Recognition

A progressive hierarchical analysis model for collective activity recognition

Learning Latent Constituents for Recognition of Group Activities in Video

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Abstract

Chapter PDF

Similar content being viewed by others

Empowering Relational Network by Self-attention Augmented Conditional Random Fields for Group Activity Recognition

A progressive hierarchical analysis model for collective activity recognition

Learning Latent Constituents for Recognition of Group Activities in Video

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation