3D Human model adaptation by frame selection and shape–texture optimization
Highlights
► Novel approach for 3D human shape and pose adaptation to multi-view images. ► Automatic frame selection method to select suitable frames for model adaptation. ► Efficient optimization using differentiable shape–texture objective function. ► We demonstrate effectiveness of integrated pose and shape recovery on complex data.
Introduction
Markerless 3D human pose recovery has many potential applications in areas such as animation, interactive games, motion analysis and surveillance. Over the last decade, considerable advances have been made in the area of pose initialization and tracking, however, the problem of 3D human shape estimation has received less attention in comparison. Suitable models are often being assumed given or acquired in a separate, controlled model acquisition step. This decoupling is not surprising, given the high dimensionality of the joint pose and model space; typical 3D human pose recovery systems have 25–50 DOFs, and at least that many parameters for modeling shape.
The premise in this paper is that, due its high dimensionality, we cannot accurately estimate the human pose and model space jointly from scratch. We will show, however, that given an approximate initial human model (e.g. anthropometric mean) and a state-of-the-art multi-view pose recovery system [13], [26], we can obtain pose estimates which are, at least for a subset of frames, accurate enough to start an optimization process to improve the human model used. This subsequently facilitates more accurate pose recovery.
Given that a person naturally takes different poses and positions over time, one can selectively choose the frames of a sequence at which to adapt a model. Not all frames of a sequence are equally suited; some frames involve poses with depth ambiguity, others involve cases where body parts are self-occluded. Overall, one would like to select a (preferably small) set of frames that contains a certain diversity in poses being observed. This paper describes how to select a suitable subset of frames and how to perform the optimization of the shape and pose parameters robustly, based on shape and texture cues. As we will see in the experiments, our frame selection not only improves system efficiency (by processing only a subset of a sequence), it also leads to a better behaved optimization.
Section snippets
Previous work and contributions
There is an extensive amount of literature on 3D human pose recovery, see recent surveys [13], [26], [35]. Due to space limitations, we focus on literature that recovers both 3D human pose and shape, i.e. the main paper scope.
Previous work on 3D human pose and shape recovery can be distinguished by the type of shape representation used and the way in which shape model adaptation takes place. 3D human shape models come in roughly three categories. The first category represents 3D human shape by
Overview
Fig. 1a shows the proposed procedure for automatic human pose and shape estimation. In a first stage, an existing state-of-the-art pose recovery method that relies on a fixed (“generic”) shape model is applied to estimate human pose on a “training” sequence (of interest are methods like [19], that are fully automatic and do not require manual intervention or particular intialization poses). Given that the shape model is imprecise, however, we expect pose recovery quality to be lacking. In a
Experimental results
Our experimental data consists of recordings from three synchronized and calibrated color CCD cameras looking over a train station platform. In 13 sequences (S01–S13) of about 11 seconds on average (captured at 20 Hz), four actors (subjects P1–P4) perform unscripted movements such as waving, gesticulation and walking in front of a cluttered background.1 Ground truth pose was
Conclusion
We presented an efficient method for 3D human shape and pose adaptation that addresses both the selection of a suitable set of input frames and an adaptation step. The latter is carried out as stochastic gradient-based optimization using an objective function based on shape and texture cues; we showed that the proposed optimization approach can handle input pose errors up to 15 cm well. In a series of experiments, we demonstrated that the main components of our approach outperform the
References (41)
- et al.
Fast stochastic optimization for articulated structure tracking
Image and Vision Computing
(2007) - et al.
Markerless tracking of complex human motions from multiple views
Computer Vision and Image Understanding
(2006) - et al.
A survey of advances in vision-based human motion capture and analysis
Computer Vision and Image Understanding
(2006) - et al.
SCAPE: shape completion and animation of people
ACM Transactions on Graphics
(2005) - A. Balan, M. Black, The naked truth: Estimating body shape under clothing, in: Proceedings of the European Conference...
- A. Balan, L. Sigal, M. Black, J. Davis, H. Haussecker, Detailed human shape and pose from images, in: Proceedings of...
- L. Ballan, G.M. Cortelazzo, Marker-less motion capture of skinned models in a four camera set-up using optical flow and...
- et al.
Physics-based person tracking using the anthropomorphic walker
International Journal of Computer Vision
(2010) - J. Canny, A computational approach to edge detection, in: RCV87, 1987, pp....
- et al.
Shape-from-silhouette across time – Parts I and II
International Journal of Computer Vision
(2005)
Markerless motion capture through visual hull, articulated ICP and subject specific model generation
International Journal of Computer Vision
Articulated body motion capture by stochastic search
International Journal of Computer Vision
Computational studies of human motion
Foundations and Trends in Computer Graphics and Vision
Optimization and filtering for human motion capture
International Journal of Computer Vision
Cited by (14)
Joint multi-person detection and tracking from overlapping cameras
2014, Computer Vision and Image UnderstandingCitation Excerpt :Furthermore, the 3D scene reconstruction could be used to create more discriminative appearance models for better track disambiguation with techniques such as in [50,51].
Coupled person orientation estimation and appearance modeling using spherical harmonics
2014, Image and Vision ComputingCitation Excerpt :We conclude and suggest directions for future work in Section 5. Extensive research has been performed in the areas of person appearance modeling [1–5], 3D body shape modeling [6–9] and pose estimation [8–16]. This section focuses on the work that we consider to be most closely related to our paper.
Sub-sample swapping for sequential Monte Carlo approximation of high-dimensional densities in the context of complex object tracking
2013, International Journal of Approximate ReasoningCombinatorial Resampling Particle Filter: An Effective and Efficient Method for Articulated Object Tracking
2015, International Journal of Computer VisionTracking with Particle Filter for High-dimensional Observation and State Spaces
2014, Tracking with Particle Filter for High-dimensional Observation and State SpacesMethod and apparatus of 3D kinematic calibration for lab setting
2014, IEEE International Conference on Control and Automation, ICCA