Skip to main content
Log in

Action recognition by hidden temporal models

The Visual Computer Aims and scope Submit manuscript

Abstract

We focus on the recognition of human actions in uncontrolled videos that may contain complex temporal structures. It is a difficult problem because of the large intra-class variations in viewpoint, video length, motion pattern, etc. To address these difficulties, we propose a novel system in this paper that represents each action class by hidden temporal models. In this system, we represent the crucial action event per category by a video segment that covers a fixed number of frames and can move temporally within the sequences. To capture the temporal structures, the video segment is described by a temporal pyramid model. To capture large intra-class variations, multiple models are combined using Or operation to represent alternative structures. The index of model and the start frame of segment are both treated as hidden variables. We implement a learning procedure based on the latent SVM method. The proposed approach is tested on two difficult benchmarks: the Olympic Sports and HMDB51 data sets. The experimental results reveal that our system is comparable to the state-of-the-art methods in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://vision.stanford.edu/Datasets/OlympicSports/.

  2. http://serre-lab.clps.brown.edu/resources/HMDB/index.htm.

References

  1. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 257–267 (2001)

    Article  Google Scholar 

  2. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004)

    Article  Google Scholar 

  3. Brendel, W., Todorovic, S.: Learning spatiotemporal graphs of human activities. In: ECCV (2011)

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

  5. Dalal, N., Triggs, B., Schimid, C.: Human detection using oriented histograms of flow and appearance. In: ECCV (2006)

  6. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. (2008)

  7. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: CVPR (2008)

  8. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  9. Hoai, M., la Torre, F.D.: Max-margin early event detectors. In: CVPR (2012)

  10. Ikizler, N., Forsyth, D.: Searching for complex human activities with no visual examples. Int. J. Comput. Vis. 80(3), 337–357 (2008)

    Article  Google Scholar 

  11. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV (2007)

  12. Jiang, Y.G., Dai, Q., Xue, X., Liu, W., Ngo, C.W.: Trajectory-based modeling of human actions with motion reference points. In: ECCV (2012)

  13. Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: ECCV (2012)

  14. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video dataset for human action recognition. In: ICCV (2011)

  15. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of CVPR (2008)

  16. Laxton, B., Lim, J., Kriegman, D.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: CVPR (2007)

  17. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)

  18. Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)

  19. Liu, J., Shah, M.: Learning human actions via information maximization. In: Proceedings of CVPR (2008)

  20. Natarajan, P., Nevatia, R.: View and scale invariant action recognition using multiview shape-flow models. In: CVPR (2008)

  21. Niebles, J.C., Chen, C.W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: ECCV (2010)

  22. Ogale, A.S., Karapurkar, A., Gutemberg, G.F., Aloimonos, Y.: View invariant identification of pose sequences for action recognition. In: VACE (2004)

  23. Sadanand, S., Corso, J.: Action bank: a high-level representation of activity in video. In: CVPR (2012)

  24. Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: ECCV (2010)

  25. Schindler, K., Gool, L.V.: Action snippets: how many frames does human action recognition require? In: CVPR (2008)

  26. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)

    Article  Google Scholar 

  27. Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)

  28. Tran, D., Yuan, J.: Optimal spatio-temporal path discovery for video event detection. In: CVPR (2011)

  29. Tran, D., Yuan, J.: Max-margin structured output regression for spatio-temporal action localization. In: NIPS (2012)

  30. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)

  31. Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. In: CVPR (2010)

  32. Wang, H., Klaser, A., Schimid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)

  33. Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  34. Wang, Y., Mori, G.: Hidden part models for human action recognition: probabilistic versus max margin. IEEE Trans. Pattern Anal. Mach. Intell. 33(7), 1310–1323 (2011)

    Article  Google Scholar 

  35. Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: ICCV (2007)

  36. Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)

    Article  Google Scholar 

  37. Xiang, T., Gong, S.: Beyong tracking: modelling action and understanding behavior. Int. J. Comput. Vis. 67(1), 21–51 (2006)

    Article  Google Scholar 

  38. Yao, B., Fei-Fei, L.: Recognizing human actions in still images by modeling the mutual context of objects and human poses. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1691–1703 (2012)

    Article  Google Scholar 

  39. Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)

  40. Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)

    Article  Google Scholar 

  41. Yuille, A., Rangarajan, A.: The concave-convex procedure (cccp). In: NIPS, pp. 1033–1040 (2001)

Download references

Acknowledgments

This work is supported by the National Basic Research Program of China (2013CB329401), the Natural Science Foundation of China (61375034, 61203263) and the NUDT Open Project of National Key Lab of High Performance Computing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dewen Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, J., Hu, D. & Chen, F. Action recognition by hidden temporal models. Vis Comput 30, 1395–1404 (2014). https://doi.org/10.1007/s00371-013-0899-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-013-0899-9

Keywords

Navigation