Skip to main content
Log in

View-Invariant Representation and Recognition of Actions

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Analysis of human perception of motion shows that information for representing the motion is obtained from the dramatic changes in the speed and direction of the trajectory. In this paper, we present a computational representation of human action to capture these dramatic changes using spatio-temporal curvature of 2-D trajectory. This representation is compact, view-invariant, and is capable of explaining an action in terms of meaningful action units called dynamic instants and intervals. A dynamic instant is an instantaneous entity that occurs for only one frame, and represents an important change in the motion characteristics. An interval represents the time period between two dynamic instants during which the motion characteristics do not change. Starting without a model, we use this representation for recognition and incremental learning of human actions. The proposed method can discover instances of the same action performed by differentpeople from different view points. Experiments on 47 actions performed by 7 individuals in an environment with no constraints shows the robustness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bobick, A. and Davis, J.W. 1997. Action recognition using temporal templates. In CVPR-97, pp. 125–146.

  • Comaniciu, D., Ramesh, V., and Meer, P. 2000. Real-time tracking of non-rigid objects using mean shift. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149.

    Google Scholar 

  • Davis, J., Bobick, A., and Richards, W. 2000. Categorical representation and recognition of oscillatory motion patterns. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 628–635.

  • Gould, K. and Shah, M. 1989. The trajectory primal sketch: A multi-scale scheme for representing motion characteristics. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pp. 79–85.

  • Izumi, M. and Kojiama, A. 2000. “Generating natural language description of human behavior from video images.” In ICPR-2000, vol. 4, pp. 728–731

    Google Scholar 

  • Jagacinski, R.J., Johnson, W.W., and Miller, R.A. 1983. Quantify-ing the cognitive trajectories of extrapolated movements. Journal of Exp. Psychology: Human Perception and Performance, 9: 43–57.

    Google Scholar 

  • Kjeldesn, R. and Kender, J. 1996. Finding skin in color images. In Int. Workshop on Automatic Face and Gesture Recognition, pp. 312–317.

  • Koller, D., Heinze, D., and Nagel, H.-H. 1991. Algorithmic characterization of vehicle trajectories from image sequences by motion verbs. In CVPR-91, pp. 90–95.

  • Madabushi, A. and Aggarwal, J.K. 2000. Using head movement to recognize activity. In Proc. Int Conf on Pattern Recognition, vol. 4, pp. 698–701.

    Google Scholar 

  • Mundy, J.L. and Zisserman, A. 1992. Geometric Invariance in Computer Vision. The MIT Press. ISBN 0-262-13285-0.

  • Newtson, D. and Engquist, G. 1976. The perceptual organization of ongoing behavior. Journal of Experimental Social Psychology, 12(5):436–450.

    Google Scholar 

  • Parish, D.H., Sperling, G., and Landy, M.S. 1990. Intelligent temporal sub-sampling of American sign language using event boundaries. J. Exptl. Psychol.: Human Perception and Performance, 16:282–294.

    Google Scholar 

  • Perona. P. and Malik, J. 1990. Scale-space and edge detection using anisotropic diffusion. IEEE PAMI, 12(7).

  • Polana, R. 1994. Temporal texture and activity recognition. Ph.D. Thesis, University of Rochester.

  • Rosen, K.H. 1999. Discrete Mathematics and its Applications. 4th edn. McGraw-Hill: New York.

    Google Scholar 

  • Rubin, J.M. and Richards, W.A. 1985. Boundaries of visual motion. Tech. Rep. AIM-835, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, p. 149.

  • Seitz, S.M. and Dyer, C.R. 1997. View-invariant analysis of cyclic motion. International Journal of Computer Vision, 25:1–25.

    Google Scholar 

  • Shapiro, L.S., Zisserman, A., and Brady, M. 1995. “3D motion recovery via affine epipolar geometry.” Int. J. of Computer Vision, 16:147–182.

    Google Scholar 

  • Siskind, J.M. and Moris, Q. 1996. A maximum likelihood ap-proach to visual event classification. In ECCV-96, pp. 347–360.

  • Starner, T. and Pentland, A. 1996. Real-time American sign language recognition from video using hidden Markov models. In Motion-Based Recognition, M. Shah and R. Jain (Eds.). Kluwer Academic Publishers: Dordrecht. Computational Imaging and Vision Series.

    Google Scholar 

  • Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: Afactorization method. Int. J. of Computer Vision, 9(2):137–154.

    Google Scholar 

  • Tsai, Ping-Sing, Shah, M., Keiter, K., and Kasparis, T. 1994. Cyclic motion detection for motion based recognition. Pattern Recognition, 27(12).

  • Tsotsos, J.K. et al. 1980. “A framework for visual motion under-standing.” IEEE PAMI, 2(6):563–573.

    Google Scholar 

  • Zacks, J. and Tversky, B. 2001. Event structure in perception and cognition. Psychological Bulletin, 127(1):3–21.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, C., Yilmaz, A. & Shah, M. View-Invariant Representation and Recognition of Actions. International Journal of Computer Vision 50, 203–226 (2002). https://doi.org/10.1023/A:1020350100748

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020350100748

Navigation