Elsevier

Image and Vision Computing

Volume 26, Issue 12, 1 December 2008, Pages 1621-1635
Image and Vision Computing

Extraction and temporal segmentation of multiple motion trajectories in human motion

https://doi.org/10.1016/j.imavis.2008.03.006Get rights and content

Abstract

A new method for extraction and temporal segmentation of multiple motion trajectories in human motion is presented. The proposed method extracts motion trajectories generated by body parts without any initialization or any assumption on color distribution. Motion trajectories are very compact and representative features for activity recognition. Tracking human body parts (hands and feet) is inherently difficult because the body parts which generate most of the motion trajectories are relatively small compared to the human body. This problem is overcome by using a new motion segmentation method: at every frame, candidate motion locations are detected and set as significant motion points (SMPs). The motion trajectories are obtained by combining these SMPs and the color–optical flow based tracker results. These motion trajectories are inturn used as features for temporal segmentation of specific activities from continuous video sequences. The proposed approach is tested on actual ballet step sequences. Experimental results show that the proposed method can successfully extract and temporally segment multiple motion trajectories from human motion.

Introduction

As more and more video data are available everywhere, there is growing interest in video indexing and classification techniques. Human activities are very good cues for indexing and classification in most of video sequences. Fig. 1 shows some examples of human movements. As can be seen in these pictures, in most cases, human activities can be described by the motion trajectories generated from body parts suggesting that motion trajectories could potentially be used as features for activity recognition.

This paper presents a new method for extracting motion trajectories from human motions and also shows how to extract temporal segmentation of specific activities from continuous video sequences.

Motion trajectories have several advantages over other features such as intensity [31], [32], silhouettes [27], [28], [29], and contours [33]. Motion trajectories are very compact as each motion is represented by a pixel location with correspondences between two subsequent frames. Since motion trajectories explicitly specify the movements from the body parts, they are very representative and smooth. Finally, motion trajectories are separable, for they are generated from different body parts separately (e.g., in the first sequence in Fig. 1, motion trajectories are generated from the left hand to the right hand and then to the right foot).

In this work, motion trajectories are used as features for achieving temporal segmentation of specific human activities from continuous video sequences. In most of the available video sequences, a large number of video segments with different contents are included in a continuous fashion. Therefore, the temporal segmentation of specific contents is a very critical task in video indexing systems.

To extract motion trajectories without any initialization, dominant motions should be detected and tracked. The dominant motion inturn is extracted from articulated motions, i.e., when the whole arm is moving in a hand gesture, the hand which generates the dominant motion will be detected. Previous motion detection algorithms, such as the ones described in [34], [35], [36], [37] and motion segmentation algorithms described in [38], [39], [42] do not use articulated motions, which are not feasible for our purposes.

In many cases of human activities such as dance and sports typically, the whole body movements are captured in video. In these sequences, body parts (hands and feet) generating the motion trajectories appear as small regions, which makes it very difficult to get enough information of the parts to model their shape or their color distribution. Fig. 2 shows the results of the Kernel based object tracking [15] which maximizes the likelihood of the color distribution. For both cases of the hand and the foot tracking, the tracker failed after several frames. Similarity of color distributions between hands and arms (feet and legs) also contributes toward failures.

To overcome the problems described above we propose a new motion segmentation method using mode seeking on the optical flow magnitude to find dominant motion blobs in each frame. The primary significant motion point (SMP) in each motion blob is obtained as a by-product of the motion segmentation algorithm. After the SMPs (Fig. 3) are obtained in every frame, they are used as candidate locations of trajectories, making the tracking procedure possible without any initialization. In each frame, the SMPs are either connected to continuing tracks (trajectories) or new tracks are started from these points. To make the tracking procedure more robust and reliable, our color–optical flow based tracker is applied to each continuing track. This tracker calculates the displacement (tracking result) of continuing tracks in the current frame. This displacement and the SMPs are then used as candidate locations in the current frame, after which the best matches between the continuing tracks and the candidate locations are found by optimizing a cost function.

The multiple motion trajectories obtained by the approach described above are used for temporal segmentation of activities. For each time instance, the optimal alignment between the trajectories from test sequence and the model trajectories is found. A dissimilarity score is then calculated using the dynamic time warping (DTW) algorithm. The algorithm estimates the start and the end point of each activity. Our approach provides temporal segmentation of separate as well as combination of body part trajectories (movements).

This paper is organized as follows. Section 2 presents a brief review on extraction of motion trajectories and activity recognition based on motion trajectories. In Sections 3 Extraction of motion trajectories, 4 Temporal segmentation of motion trajectories, the proposed algorithms for extraction of motion trajectories and temporal segmentation are described in detail. Section 5 discusses the data and results of our experiments. Finally, we present our conclusions in Section 6.

Section snippets

Previous work

A number of attempts have been made to obtain motion trajectories by solving the motion correspondence problem. One of the best known statistical approach is the multiple hypothesis tracker (MHT) [1]. The MHT attempts to solve the data association problem by finding the best hypothesis when each hypothesis represents an assignment of measurements to features. There have been efforts to make the MHT more practical by restraining the number of hypothesis [2], [3]. Another statistical approach is

Extraction of motion trajectories

An overview of our approach to extract motion trajectories is shown in Fig. 4. The input for this extraction procedure is a video sequence. The results from SMP detection and the color–optical flow based tracker are fused to generate more reliable motion trajectories.

Temporal segmentation of motion trajectories

The trajectories established by the method described in the previous section are used for temporal segmentation of specific activities from continuous video sequences.

Our algorithm uses the DTW (dynamic time warping) to deal with temporal variance of two different segments from motion trajectories. The DTW finds the optimal alignment of two temporal signals using similarity score at each time instance. More details on the DTW can be found in [40], [41]. For two temporal signals, A and B, the

Description of data

Actual ballet sequences from commercial video (American Ballet Theater) and ballet sequences captured on digital camcorder were tested for extraction of motion trajectories. For the experiment of temporal segmentation, each ballet movement was captured using a digital camcorder mounted on a stand for both training and testing sequences. In the video sequences used for the experiment, three dancers performed each ballet step more than six times. The sequences were captured with several different

Conclusions

A new approach for extraction and temporal segmentation of motions trajectories is presented. Multiple motion trajectories from human motions are extracted without any initialization and any assumption about color distribution. Separate activity recognition for hands and feet is possible, which is not feasible using other features. Temporal segmentation of hand and leg movements provides more accurate interpretation of the whole body movements. Experimental results show that the motion

Acknowledgments

The authors thank Padmanabhan Soundararajan who provided many valuable comments which helped to significantly improve the presentation of this paper. The authors also thank Michael J. Black for making his optical flow computation code available. A portion of paper has appeared in [43], [44], [45].

References (45)

  • K. Rangarajan et al.

    Establishing motion correspondence

    CVGIP: Image Understanding

    (1991)
  • M.J. Black et al.

    The robust estimation of multiple motions: parametric and piecewise-smooth flow fields

    Computer Vision and Image Understanding

    (1996)
  • D.B. Reid

    An algorithm for tracking multiple targets

    IEEE Transactions on Automatic Control

    (1979)
  • I.J. Cox et al.

    An efficient implementation of Reid’s multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1996)
  • I.J. Cox et al.

    A comparison of two algorithms for determining ranked assignments with application to multitarget tracking and motion correspondence

    IEEE Transactions on Aerospace Electronic Systems

    (1997)
  • M. Shah et al.

    Motion trajectories

    IEEE Transactions on Systems, Man, and Cybernetics

    (1993)
  • C. Rao et al.

    View-invariant representation and recognition of actions

    International Journal of Computer Vision

    (2002)
  • C. Stauffer et al.

    Learning patterns of activity using real-time tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2000)
  • C. Rasmussen et al.

    Probabilistic data association methods for tracking complex visual objects

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • C.J. Veenman et al.

    Resolving motion correspondence for densely moving points

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • A.D. Wilson et al.

    Parametric Hidden Markov Models for gesture recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1999)
  • M.-H. Yang et al.

    Extraction of 2D motion trajectories and its application to hand gesture recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • H.-K. Lee et al.

    An HMM-based threshold model approach for gesture recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1999)
  • Y. Cheng

    Mean shift, mode seeking, and clustering

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1995)
  • P. Meer et al.

    Mean shift: a robust approach toward feature space analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • D. Comaniciu et al.

    Kernel-based object tracking

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • G.R. Bradski, Computer vision face tracking for use in a perceptual user interface, Intel Technology Journal, 2nd...
  • C. Rao, A. Gritai, M. Shah, View-invariant alignment and matching of video sequences, in: Proceedings of the IEEE...
  • M.J. Black, A.D.Jepson, A probabilistic framework for matching temporal trajectories: condensation-based recognition of...
  • A.F. Bobick et al.

    A state-based approach to the representation and recognition of gestures

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • J. Barron et al.

    Performance of optical flow techniques

    International Journal of Computer Vision

    (1994)
  • P. Anandan

    A computational framework and an algorithm for the measurement of visual motion

    International Journal of Computer Vision

    (1989)
  • Cited by (9)

    • The retrieval of motion event by associations of temporal frequent pattern growth

      2013, Future Generation Computer Systems
      Citation Excerpt :

      T. D’Orazio proposed the ball detection algorithm by tracking the linear trajectory [12]. Junghye Min obtained the human motion trajectories by the color-optical flow and then analyzed the head and hand motion events [13]. The study presented in [14–16] used clustering-based approaches to detect anomalous video events.

    • Video-object segmentation and 3D-trajectory estimation for monocular video sequences

      2011, Image and Vision Computing
      Citation Excerpt :

      Many methods have been proposed for this problem. One popular method is to incorporate motion information into video-object segmentation by means of optical flow [3–8]. Although optical flow can provide a dense motion field, it has a limited ability to handle overlapped motion fields and large inter-frame motion.

    • Motion segmentation method for hybrid characteristic on human motion

      2009, Journal of Biomechanics
      Citation Excerpt :

      Joint angle is often adopted as a measure when researchers are dealing with ergonomic issues. Some papers divided joint angle into two- or three-dimensional space, especially for those video-based methods (Albu et al., 2008; Lu et al., 2000; Lu and Ferrier, 2004; Min et al., 2008; Niyogi and Adelson, 1994; Ormoneit et al., 2005; Polana and Nelson, 1997; Zhang and Troje, 2007). A hierarchical digital human model (DHM) was defined and the joint angle estimation was adopted according to the algorithm of hierarchical rotational matrix calculation mentioned in the previous study (Lau and Wong, 2007).

    View all citing articles on Scopus
    1

    This work was done when Junghye Min was with Department of Computer Science and Engineering in Pennsylvania State University doing her Ph.D.

    View full text