3D Human model adaptation by frame selection and shape–texture optimization

https://doi.org/10.1016/j.cviu.2011.08.002Get rights and content

Abstract

We present a novel approach for 3D human body shape model adaptation to a sequence of multi-view images, given an initial shape model and initial pose sequence. In a first step, the most informative frames are determined by optimization of an objective function that maximizes a shape–texture likelihood function and a pose diversity criterion (i.e. the model surface area that lies close to the occluding contours), in the selected frames. Thereafter, a batch-mode optimization is performed of the underlying shape- and pose-parameters, by means of an objective function that includes both contour and texture cues over the selected multi-view frames.

Using above approach, we implement automatic pose and shape estimation using a three-step procedure: first, we recover initial poses over a sequence using an initial (generic) body model. Both model and poses then serve as input to the above mentioned adaptation process. Finally, a more accurate pose recovery is obtained by means of the adapted model.

We demonstrate the effectiveness of our frame selection, model adaptation and integrated pose and shape recovery procedure in experiments using both challenging outdoor data and the HumanEva data set.

Highlights

► Novel approach for 3D human shape and pose adaptation to multi-view images. ► Automatic frame selection method to select suitable frames for model adaptation. ► Efficient optimization using differentiable shape–texture objective function. ► We demonstrate effectiveness of integrated pose and shape recovery on complex data.

Introduction

Markerless 3D human pose recovery has many potential applications in areas such as animation, interactive games, motion analysis and surveillance. Over the last decade, considerable advances have been made in the area of pose initialization and tracking, however, the problem of 3D human shape estimation has received less attention in comparison. Suitable models are often being assumed given or acquired in a separate, controlled model acquisition step. This decoupling is not surprising, given the high dimensionality of the joint pose and model space; typical 3D human pose recovery systems have 25–50 DOFs, and at least that many parameters for modeling shape.

The premise in this paper is that, due its high dimensionality, we cannot accurately estimate the human pose and model space jointly from scratch. We will show, however, that given an approximate initial human model (e.g. anthropometric mean) and a state-of-the-art multi-view pose recovery system [13], [26], we can obtain pose estimates which are, at least for a subset of frames, accurate enough to start an optimization process to improve the human model used. This subsequently facilitates more accurate pose recovery.

Given that a person naturally takes different poses and positions over time, one can selectively choose the frames of a sequence at which to adapt a model. Not all frames of a sequence are equally suited; some frames involve poses with depth ambiguity, others involve cases where body parts are self-occluded. Overall, one would like to select a (preferably small) set of frames that contains a certain diversity in poses being observed. This paper describes how to select a suitable subset of frames and how to perform the optimization of the shape and pose parameters robustly, based on shape and texture cues. As we will see in the experiments, our frame selection not only improves system efficiency (by processing only a subset of a sequence), it also leads to a better behaved optimization.

Section snippets

Previous work and contributions

There is an extensive amount of literature on 3D human pose recovery, see recent surveys [13], [26], [35]. Due to space limitations, we focus on literature that recovers both 3D human pose and shape, i.e. the main paper scope.

Previous work on 3D human pose and shape recovery can be distinguished by the type of shape representation used and the way in which shape model adaptation takes place. 3D human shape models come in roughly three categories. The first category represents 3D human shape by

Overview

Fig. 1a shows the proposed procedure for automatic human pose and shape estimation. In a first stage, an existing state-of-the-art pose recovery method that relies on a fixed (“generic”) shape model is applied to estimate human pose on a “training” sequence (of interest are methods like [19], that are fully automatic and do not require manual intervention or particular intialization poses). Given that the shape model is imprecise, however, we expect pose recovery quality to be lacking. In a

Experimental results

Our experimental data consists of recordings from three synchronized and calibrated color CCD cameras looking over a train station platform. In 13 sequences (S01–S13) of about 11 seconds on average (captured at 20 Hz), four actors (subjects P1–P4) perform unscripted movements such as waving, gesticulation and walking in front of a cluttered background.1 Ground truth pose was

Conclusion

We presented an efficient method for 3D human shape and pose adaptation that addresses both the selection of a suitable set of input frames and an adaptation step. The latter is carried out as stochastic gradient-based optimization using an objective function based on shape and texture cues; we showed that the proposed optimization approach can handle input pose errors up to 15 cm well. In a series of experiments, we demonstrated that the main components of our approach outperform the

References (41)

  • M. Bray et al.

    Fast stochastic optimization for articulated structure tracking

    Image and Vision Computing

    (2007)
  • R. Kehl et al.

    Markerless tracking of complex human motions from multiple views

    Computer Vision and Image Understanding

    (2006)
  • T. Moeslund et al.

    A survey of advances in vision-based human motion capture and analysis

    Computer Vision and Image Understanding

    (2006)
  • D. Anguelov et al.

    SCAPE: shape completion and animation of people

    ACM Transactions on Graphics

    (2005)
  • A. Balan, M. Black, The naked truth: Estimating body shape under clothing, in: Proceedings of the European Conference...
  • A. Balan, L. Sigal, M. Black, J. Davis, H. Haussecker, Detailed human shape and pose from images, in: Proceedings of...
  • L. Ballan, G.M. Cortelazzo, Marker-less motion capture of skinned models in a four camera set-up using optical flow and...
  • M.A. Brubaker et al.

    Physics-based person tracking using the anthropomorphic walker

    International Journal of Computer Vision

    (2010)
  • J. Canny, A computational approach to edge detection, in: RCV87, 1987, pp....
  • G. Cheung et al.

    Shape-from-silhouette across time – Parts I and II

    International Journal of Computer Vision

    (2005)
  • S. Corazza et al.

    Markerless motion capture through visual hull, articulated ICP and subject specific model generation

    International Journal of Computer Vision

    (2010)
  • E. de Aguiar, C. Theobalt, C. Stoll, H.P. Seidel, Marker-less deformable mesh tracking for human shape and motion...
  • M. de la Gorce, N. Paragios, D.J. Fleet, Model-based hand tracking with texture, shading and self-occlusions, in:...
  • J. Deutscher et al.

    Articulated body motion capture by stochastic search

    International Journal of Computer Vision

    (2005)
  • D.A. Forsyth et al.

    Computational studies of human motion

    Foundations and Trends in Computer Graphics and Vision

    (2005)
  • J. Gall et al.

    Optimization and filtering for human motion capture

    International Journal of Computer Vision

    (2010)
  • J. Gall, C. Stoll, E. de Aguiar, C. Theobalt, B. Rosenhahn, H.-P. Seidel, Motion capture using joint skeleton tracking...
  • K. Grauman, G. Shakhnarovich, T.J. Darrell, A Bayesian approach to image-based visual hull reconstruction, in:...
  • P. Guan, A. Weiss, A.O. Balan, M.J. Black, Estimating human shape and pose from a single image, in: Proceedings of the...
  • N. Hasler, C. Stoll, B. Rosenhahn, T. Thormaehlen, H.-P. Seidel, Estimating body shape of dressed humans, in: Shape...
  • Cited by (14)

    • Joint multi-person detection and tracking from overlapping cameras

      2014, Computer Vision and Image Understanding
      Citation Excerpt :

      Furthermore, the 3D scene reconstruction could be used to create more discriminative appearance models for better track disambiguation with techniques such as in [50,51].

    • Coupled person orientation estimation and appearance modeling using spherical harmonics

      2014, Image and Vision Computing
      Citation Excerpt :

      We conclude and suggest directions for future work in Section 5. Extensive research has been performed in the areas of person appearance modeling [1–5], 3D body shape modeling [6–9] and pose estimation [8–16]. This section focuses on the work that we consider to be most closely related to our paper.

    • Tracking with Particle Filter for High-dimensional Observation and State Spaces

      2014, Tracking with Particle Filter for High-dimensional Observation and State Spaces
    • Method and apparatus of 3D kinematic calibration for lab setting

      2014, IEEE International Conference on Control and Automation, ICCA
    View all citing articles on Scopus
    View full text