Abstract
Analysis of human perception of motion shows that information for representing the motion is obtained from the dramatic changes in the speed and direction of the trajectory. In this paper, we present a computational representation of human action to capture these dramatic changes using spatio-temporal curvature of 2-D trajectory. This representation is compact, view-invariant, and is capable of explaining an action in terms of meaningful action units called dynamic instants and intervals. A dynamic instant is an instantaneous entity that occurs for only one frame, and represents an important change in the motion characteristics. An interval represents the time period between two dynamic instants during which the motion characteristics do not change. Starting without a model, we use this representation for recognition and incremental learning of human actions. The proposed method can discover instances of the same action performed by differentpeople from different view points. Experiments on 47 actions performed by 7 individuals in an environment with no constraints shows the robustness of the proposed method.
Similar content being viewed by others
References
Bobick, A. and Davis, J.W. 1997. Action recognition using temporal templates. In CVPR-97, pp. 125–146.
Comaniciu, D., Ramesh, V., and Meer, P. 2000. Real-time tracking of non-rigid objects using mean shift. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149.
Davis, J., Bobick, A., and Richards, W. 2000. Categorical representation and recognition of oscillatory motion patterns. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 628–635.
Gould, K. and Shah, M. 1989. The trajectory primal sketch: A multi-scale scheme for representing motion characteristics. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pp. 79–85.
Izumi, M. and Kojiama, A. 2000. “Generating natural language description of human behavior from video images.” In ICPR-2000, vol. 4, pp. 728–731
Jagacinski, R.J., Johnson, W.W., and Miller, R.A. 1983. Quantify-ing the cognitive trajectories of extrapolated movements. Journal of Exp. Psychology: Human Perception and Performance, 9: 43–57.
Kjeldesn, R. and Kender, J. 1996. Finding skin in color images. In Int. Workshop on Automatic Face and Gesture Recognition, pp. 312–317.
Koller, D., Heinze, D., and Nagel, H.-H. 1991. Algorithmic characterization of vehicle trajectories from image sequences by motion verbs. In CVPR-91, pp. 90–95.
Madabushi, A. and Aggarwal, J.K. 2000. Using head movement to recognize activity. In Proc. Int Conf on Pattern Recognition, vol. 4, pp. 698–701.
Mundy, J.L. and Zisserman, A. 1992. Geometric Invariance in Computer Vision. The MIT Press. ISBN 0-262-13285-0.
Newtson, D. and Engquist, G. 1976. The perceptual organization of ongoing behavior. Journal of Experimental Social Psychology, 12(5):436–450.
Parish, D.H., Sperling, G., and Landy, M.S. 1990. Intelligent temporal sub-sampling of American sign language using event boundaries. J. Exptl. Psychol.: Human Perception and Performance, 16:282–294.
Perona. P. and Malik, J. 1990. Scale-space and edge detection using anisotropic diffusion. IEEE PAMI, 12(7).
Polana, R. 1994. Temporal texture and activity recognition. Ph.D. Thesis, University of Rochester.
Rosen, K.H. 1999. Discrete Mathematics and its Applications. 4th edn. McGraw-Hill: New York.
Rubin, J.M. and Richards, W.A. 1985. Boundaries of visual motion. Tech. Rep. AIM-835, Massachusetts Institute of Technology, Artificial Intelligence Laboratory, p. 149.
Seitz, S.M. and Dyer, C.R. 1997. View-invariant analysis of cyclic motion. International Journal of Computer Vision, 25:1–25.
Shapiro, L.S., Zisserman, A., and Brady, M. 1995. “3D motion recovery via affine epipolar geometry.” Int. J. of Computer Vision, 16:147–182.
Siskind, J.M. and Moris, Q. 1996. A maximum likelihood ap-proach to visual event classification. In ECCV-96, pp. 347–360.
Starner, T. and Pentland, A. 1996. Real-time American sign language recognition from video using hidden Markov models. In Motion-Based Recognition, M. Shah and R. Jain (Eds.). Kluwer Academic Publishers: Dordrecht. Computational Imaging and Vision Series.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: Afactorization method. Int. J. of Computer Vision, 9(2):137–154.
Tsai, Ping-Sing, Shah, M., Keiter, K., and Kasparis, T. 1994. Cyclic motion detection for motion based recognition. Pattern Recognition, 27(12).
Tsotsos, J.K. et al. 1980. “A framework for visual motion under-standing.” IEEE PAMI, 2(6):563–573.
Zacks, J. and Tversky, B. 2001. Event structure in perception and cognition. Psychological Bulletin, 127(1):3–21.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rao, C., Yilmaz, A. & Shah, M. View-Invariant Representation and Recognition of Actions. International Journal of Computer Vision 50, 203–226 (2002). https://doi.org/10.1023/A:1020350100748
Issue Date:
DOI: https://doi.org/10.1023/A:1020350100748