Abstract
An action recognition algorithm which works with binocular videos is presented. The proposed method uses standard bag-of-words approach, where each action clip is represented as a histogram of visual words. However, instead of using classical monocular HoG/HoF features, we construct features from the scene-flow computed by a matching algorithm on the sequence of stereo images. The resulting algorithm has a comparable or slightly better recognition accuracy than standard monocular solution in controlled setup with a single actor present in the scene. However, we show its significantly improved performance in case of strong background clutter due to other people freely moving behind the actor.
Chapter PDF
Similar content being viewed by others
References
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. CVIU 115, 224–241 (2011)
Poppe, R.: A survey on vision-based human action recognition. IVC 28, 976–990 (2010)
Laptev, I.: On space-time interest points. IJCV 64 (2005)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. VS-PETS (2005)
Bregonzio, M., Gong, S., Xiang, T.: Recognising action as clouds of space-time interest points. In: Proc. CVPR (2009)
Wang, H., Klaser, A., Laptev, I., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proc. BMVC (2009)
Tuytelaars, T.: Dense interest points. In: Proc. CVPR (2010)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proc. BMVC (2008)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: Proc. CVPR (2011)
Roh, M.C., Shin, H.K., Lee, S.W.: View-independent human action recognition with volume motion template on single stereo camera. Pattern Recognition Letters 31, 639–647 (2010)
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. In: Proc. ICCV (2007)
Yan, P., Khan, S.M., Shah, M.: Learning 4D action feautre models for arbitrary view action recognition. In: Proc. CVPR (2008)
Uddin, M.Z., Thang, N.D., Kim, J.T., Kim, T.S.: Human activity recognition using body joint-angle features and hidden Markov model. ETRI Journal 33, 569–579 (2011)
Holte, M.B., Moeslund, T.B., Fihl, P.: View-invariant gesture recognition using 3d optical flow and harmonic motion context. CVIU 114, 1353–1361 (2010)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: Proc. CVPR Workshop on Human Communicative Behaviour Analysis (2010)
Zhang, H., Parker, L.E.: 4-dimensional local spatio-temporal features for human activity recognition. In: Proc. IROS (2011)
Ni, P.B., Wang, G., Moulin: RGBD-HuDaAct: A color-depth video database for human daily activity recognition. In: Proc. ICCV Workshop on Consumer Depth Cameras for Computer Vision (2011)
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M.: Real-time human pose recognition in parts from single depth images. In: Proc. CVPR (2011)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: Proc. ICRA (2012)
Xia, L., Chen, C.C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints. In: Proc. CVPR Workshop on Human Activity Understanding from 3D Data (HAU3D) (2012)
Cech, J., Sanchez-Riera, J., Horaud, R.P.: Scene flow estimation by growing correspondence seeds. In: Proc. CVPR (2011)
Šochman, J., Matas, J.: Waldboost – learning for time constrained sequential detection. In: CVPR (2005)
Alameda-Pineda, X., Sanchez-Riera, J., Franc, V., Wienke, J., Cech, J., Kulkarni, K., Deleforge, A., Horaud, R.P.: Ravel: An annotated corpus for training robots with audiovisual abilities. Journal on Multimodal User Interfaces (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sanchez-Riera, J., Čech, J., Horaud, R. (2012). Action Recognition Robust to Background Clutter by Using Stereo Vision. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33863-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-33863-2_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33862-5
Online ISBN: 978-3-642-33863-2
eBook Packages: Computer ScienceComputer Science (R0)