Abstract
In this paper, we address the problem of recognizing human actions from videos. Most of the existing approaches employ low-level features (e.g., local features and global features) to represent an action video. However, algorithms based on low-level features are not robust to complex environments such as cluttered background, camera movement and illumination change. Therefore, we propose a novel random forest learning framework to construct a discriminative and informative mid-level feature from low-level features of densely sampled 3D cuboids. Each cuboid is classified by the corresponding random forests with a novel fusion scheme, and the cuboid’s posterior probabilities of all categories are normalized to generate a histogram. After that, we obtain our mid-level feature by concatenating histograms of all the cuboids. Since a single low-level feature is not enough to capture the variations of human actions, multiple complementary low-level features (i.e., optical flow and histogram of gradient 3D features) are employed to describe 3D cuboids. Moreover, temporal context between local cuboids is exploited as another type of low-level feature. The above three low-level features (i.e., optical flow, histogram of gradient 3D features and temporal context) are effectively fused in the proposed learning framework. Finally, the mid-level feature is employed by a random forest classifier for robust action recognition. Experiments on the Weizmann, UCF sports, Ballet, and multi-view IXMAS datasets demonstrate that out mid-level feature learned from multiple low-level features can achieve a superior performance over state-of-the-art methods.
Similar content being viewed by others
References
Efros A A, Berg A C, Mori G, et al. Recognizing action at a distance. In: Proceedings of 9th IEEE Conference on Computer Vision (ICCV), Nice, 2003. 726–733
Thurau C, Hlavac V. Pose primitive based human action recognition in videos or still images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, 2005. 886–893
Laptev I, Marszalek M, Schmid C, et al. Learning realistic human actions from movies. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Klaser A, Marszalek M, Schmid C. A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Vision Conference (BMVC), Leeds, 2008. 1–10
Wang H, Ullah M M, Klaser A, et al. Evaluation of local spatio-temporal features for action recognition. In: Proceedings of the British Machine Vision Conference (BMVC), London, 2009. 1–11
Wu X X, Xu D, Duan L X, et al. Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 489–496
Liu J G, Ali S, Shah M. Recognizing human actions using multiple features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Wang Y, Mori G. Max-margin hidden conditional random fields for human action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, 2009. 872–879
Han L, Wu X X, Liang W, et al. Discriminative human action recognition in the learned hierarchical manifold space. Image Vis Comput, 2010, 28: 836–849
Fathi A, Mori G. Action recognition by learning mid-level motion features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Niebles J C, Li F F. A hierarchical model of shape and appearance for human action classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, 2007. 1–8
Kong Y, Zhang X Q, Hu W M, et al. Adaptive learning codebook for action recognition. Pattern Recogn Lett, 2011, 32: 1178–1186
Lu Z W, Peng Y X, Ip H H S. Spectral learning of latent semantics for action recognition. In: Proceedings of IEEE Conference on Computer Vision (ICCV), Barcelona, 2011. 1503–1510
Wang Y, Mori G. Hidden part models for human action recognition: probabilistic versus max-margin. IEEE Trans Pattern Anal Mach Intell, 2011, 33: 1310–1323
Niebles J C, Chen C W, Li F F. Modeling temporal structure of decomposable motion segments for activity classification. In: Proceedings of the 11th European Conference on Computer Vision (ECCV), Heraklion, 2010. 392–405
Raptis M, Kokkinos I, Soatto S. Discovering discriminative action parts from mid-level video representations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2012. 1242–1249
Liu J G, Kuipers B, Savarese S. Recognizing human actions by attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 3337–3344
Bosch A, Zisserman A, Muoz X. Image classification using random forests and ferns. In: Proceedings of IEEE Conference on Computer Vision (ICCV), Rio de Janeiro, 2007. 1–8
Yu G, Yuan J S, Liu Z C. Unsupervised random forest indexing for fast action search. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 865–872
Shotton J, Fitzgibbon A, Cook M, et al. Real-time human pose recognition in parts from single depth images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, 2011. 116–124
Breiman L. Random forests. Mach Learn, 2001, 45: 5–32
Lepetit V, Fua P. Keypoint recognition using randomized trees. IEEE Trans Pattern Anal Mach Intell, 2006, 28: 1465–1479
Breiman L. Randomizing outputs to increase prediction accuracy. Mach Learn, 2000, 40: 229–242
Blank M, Gorelick L, Shechtman E, et al. Actions as space-time shapes. In: Proceedings of 10th IEEE Conference on Computer Vision (ICCV), Beijing, 2005. 1395–1402
Rodriguez M D, Ahmed J, Shah M. Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008. 1–8
Weinland D, Boyer E, Ronfard R. Action recognition from arbitrary views using 3D exemplars. In: Proceedings of IEEE Conference on Computer Vision (ICCV), Rio de Janeiro, 2007. 1–7
Wu X X, Jia Y D, Liang W. Incremental discriminant-analysis of canonical correlations for action recognition. Pattern Recogn, 2010, 43: 4190–4197
Yao A, Gall J, Gool L V. A hough transform-based voting framework for action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010. 2061–2068
Wang H, Klaser A, Schmid C, et al. Action recognition by dense trajectories. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 3169–3176
Kovashka A, Grauman K. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010. 2046–2053
Junejo I N, Dexter E, Laptev I, et al. Cross-view action recognition from temporal self-similarities. In: Proceedings of the 10th European Conference on Computer Vision (ECCV), Mardi, 2008. 1–19
Liu J G, Shah M, Kuipers B, et al. Cross-view action recognition via view knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, 2011. 3209–3216
Weinland D, Ozuysal M, Fua P. Making action recognition robust to occlusions and viewpoint changes. In: Proceedings of the 11th European Conference on Computer Vision (ECCV), Heraklion, 2010. 635–648
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, C., Pei, M., Wu, X. et al. Learning a discriminative mid-level feature for action recognition. Sci. China Inf. Sci. 57, 1–13 (2014). https://doi.org/10.1007/s11432-013-4938-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4938-y