A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera

Baak, Andreas; Müller, Meinard; Bharaj, Gaurav; Seidel, Hans-Peter; Theobalt, Christian

doi:10.1007/978-1-4471-4640-7_5

Andreas Baak⁶,
Meinard Müller⁷,
Gaurav Bharaj⁸,
Hans-Peter Seidel⁸ &
…
Christian Theobalt⁸

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

6467 Accesses
50 Citations
3 Altmetric

Abstract

The 3D reconstruction of complex human motions from 2D color images is a challenging and sometimes intractable problem. The pose estimation problem becomes more feasible when using streams of 2.5D monocular depth images as provided by a depth camera. However, due to low resolution of and challenging noise characteristics in depth camera images as well as self-occlusions in the movements, the pose estimation task is still far from being simple. Furthermore, in real-time scenarios, the reconstruction task becomes even more challenging since global optimization strategies are prohibitive. To facilitate tracking of full-body human motions from a single depth-image stream, we introduce a data-driven hybrid strategy that combines local pose optimization with global retrieval techniques. Here, the final pose estimate at each frame is determined from the tracked and retrieved pose hypotheses which are fused using a fast selection scheme. Our algorithm reconstructs complex full-body poses in real time and effectively prevents temporal drifting, thus making it suitable for various real-time interaction scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Azad, P., Asfour, T., Dillmann, R.: Robust real-time stereo-based markerless human motion capture. In: IEEE/RAS International Conference on Humanoid Robots, pp. 700–707 (2008)
Google Scholar
Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: Accompanied video to [3]. http://www.youtube.com/watch?v=QWNn01FWUkk (2011)
Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: IEEE International Conference on Computer Vision, pp. 1092–1099 (2011)
Google Scholar
Baak, A., Rosenhahn, B., Müller, M., Seidel, H.P.: Stabilizing motion tracking using retrieved motion priors. In: IEEE International Conference on Computer Vision, pp. 1428–1435 (2009)
Chapter Google Scholar
Bălan, A.O., Sigal, L., Black, M.J., Davis, J.E., Haussecker, H.W.: Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern Recognition (2007)
Google Scholar
Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)
Article Google Scholar
Bleiweiss, A., Kutliroff, E., Eilat, G.: Markerless motion capture using a single depth sensor. In: SIGGRAPH ASIA Sketches (2009)
Google Scholar
Bo, L., Sminchisescu, C.: Twin gaussian processes for structured prediction. Int. J. Comput. Vis. 87(1–2), 28–52 (2010)
Article Google Scholar
Bregler, C., Malik, J., Pullen, K.: Twist based acquisition and tracking of animal and human kinematics. Int. J. Comput. Vis. 56(3), 179–194 (2004)
Article Google Scholar
Brubaker, M.A., Fleet, D.J., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Comput. Vis. 87, 140–155 (2010)
Article Google Scholar
Cormen, T.H., Stein, C., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (2001)
MATH Google Scholar
Demirdjian, D., Taycher, L., Shakhnarovich, G., Graumanand, K., Darrell, T.: Avoiding the streetlight effect: tracking by exploring likelihood modes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 357–364 (2005)
Google Scholar
Deutscher, J., Reid, I.: Articulated body motion capture by stochastic search. Int. J. Comput. Vis. 61(2), 185–205 (2005)
Article Google Scholar
Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: From canonical poses to 3D motion capture using a single camera. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1165–1181 (2010)
Article Google Scholar
Friborg, R., Hauberg, S., Erleben, K.: GPU accelerated likelihoods for stereo-based articulated tracking. In: European Conference on Computer Vision—Workshop on Computer Vision on GPUs (2010)
Google Scholar
Gall, J., Fossati, A., van Gool, L.: Functional categorization of objects using real-time markerless motion capture. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1969–1976 (2011)
Google Scholar
Gall, J., Stoll, C., de Aguiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1746–1753 (2009)
Chapter Google Scholar
Ganapathi, V., Plagemann, C., Thrun, S., Koller, D.: Real time motion capture using a single time-of-flight camera. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Google Scholar
Girshick, R.B., Shotton, A., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: IEEE International Conference on Computer Vision, pp. 415–422 (2011)
Google Scholar
Grest, D., Krüger, V., Koch, R.: Single view motion tracking by depth and silhouette information. In: Proceedings of the Scandinavian Conference on Image Analysis, pp. 719–729. Springer, Berlin (2007)
Google Scholar
Guan, P., Weiss, A., Bălan, A.O., Black, M.J.: Estimating human shape and pose from a single image. In: IEEE International Conference on Computer Vision, pp. 1381–1388 (2009)
Chapter Google Scholar
Hasler, N., Ackermann, H., Rosenhahn, B., Thormählen, T., Seidel, H.P.: Multilinear pose and body shape estimation of dressed subjects from image sets. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1823–1830 (2010)
Google Scholar
Heikkila, J., Silven, O.: A four-step camera calibration procedure with implicit image correction. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1112 (1997)
Google Scholar
Knoop, S., Vacek, S., Dillmann, R.: Fusion of 2D and 3D sensor data for articulated body tracking. Robot. Auton. Syst. 57(3), 321–329 (2009)
Article Google Scholar
Kolb, A., Barth, E., Koch, R., Larsen, R.: Time-of-flight sensors in computer graphics. Comput. Graph. Forum 29(1), 141–159 (2010)
Article Google Scholar
Lewis, J.P., Cordner, M., Fong, N.: Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation. In: Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques SIGGRAPH, pp. 165–172. ACM/Addison-Wesley, New York/Reading (2000)
Google Scholar
Lindner, M., Schiller, I., Kolb, A., Koch, R.: Time-of-flight sensor calibration for accurate range sensing. Comput. Vis. Image Underst. 114(12), 1318–1328 (2010). Special issue on Time-of-Flight Camera Based Computer Vision
Article Google Scholar
López-Méndez, A., Alcoverro, M., Pardàs, M., Casas, J.R.: Real-time upper body tracking with online initialization using a range sensor. In: International Conference on Computer Vision Workshops, pp. 391–398 (2011)
Chapter Google Scholar
MATLAB camera calibration toolbox. http://www.vision.caltech.edu/bouguetj/calib_doc (2012)
Microsoft: Kinect SDK beta. http://www.microsoft.com/en-us/kinectforwindows (2012)
Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Underst. 104(2), 90–126 (2006)
Article Google Scholar
Murray, R.M., Li, Z., Sastry, S.S.: A Mathematical Introduction to Robotic Manipulation. CRC Press, Boca Raton (1994)
MATH Google Scholar
Okada, R., Soatto, S.: Relevant feature selection for human pose estimation and localization in cluttered images. In: Proceedings of the European Conference on Computer Vision, pp. 434–445 (2008)
Google Scholar
Okada, R., Stenger, B.: A single camera motion capture system for human–computer interaction. IEICE Trans. Inf. Syst. E91-D, 1855–1862 (2008)
Article Google Scholar
Pekelny, Y., Gotsman, C.: Articulated object reconstruction and markerless motion capture from depth video. Comput. Graph. Forum 27(2), 399–408 (2008)
Article Google Scholar
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Realtime identification and localization of body parts from depth images. In: IEEE International Conference on Robotics and Automation (2010)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Article Google Scholar
Primesense: Primesense NITE middleware. http://www.primesense.com (2012)
Romero, J., Kjellström, H., Kragic, D.: Hands in action: real-time 3D reconstruction of hands in interaction with objects. In: IEEE International Conference on Robotics and Automation, pp. 458–463 (2010)
Chapter Google Scholar
Rosales, R., Sclaroff, S.: Inferring body pose without tracking body parts. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 721–727 (2000)
Google Scholar
Rosales, R., Sclaroff, S.: Combining generative and discriminative models in a framework for articulated pose estimation. Int. J. Comput. Vis. 67, 251–276 (2006)
Article Google Scholar
Rosenhahn, B., Schmaltz, C., Brox, T., Weickert, J., Cremers, D., Seidel, H.P.: Markerless motion capture of man–machine interaction. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Salzmann, M., Urtasun, R.: Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (2010)
Google Scholar
Schwarz, L.A., Mateus, D., Castañeda, V., Navab, N.: Manifold learning for ToF-based human body tracking and activity recognition. In: British Machine Vision Conference (2010)
Google Scholar
Schwarz, L., Mkhytaryan, A., Mateus, D., Navab, N.: Estimating human 3D pose from time-of-flight images based on geodesic distances and optical flow. In: IEEE Conference on Automatic Face and Gesture Recognition (2011)
Google Scholar
Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: International Conference on Computer Vision, pp. 750–757 (2003)
Chapter Google Scholar
Shapiro, L.G., Stockman, G.C.: Computer Vision. Prentice Hall, New York (2002)
Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: IEEE Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Siddiqui, M., Medioni, G.: Human pose estimation from a single view point, real-time range sensor. In: Computer Vision and Pattern Recognition Workshops (2010)
Google Scholar
Sigal, L., Bălan, A.O., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing Systems, pp. 1337–1344 (2008)
Google Scholar
Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: International Conference on Computer Vision, pp. 951–958 (2011)
Chapter Google Scholar
Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. ACM Trans. Graph. 28(3) (2009)
Google Scholar
Wei, X., Chai, J.: Videomocap: modeling physically realistic human motion from monocular video sequences. ACM Trans. Graph. 29(4), 42:1–42:10 (2010)
Google Scholar
Weiss, A., Hirshberg, D., Black, M.J.: Home 3D body scans from noisy image and range data. In: IEEE International Conference on Computer Vision, pp. 1951–1958 (2011)
Google Scholar
Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3D pose estimation from a single depth image. In: International Conference on Computer Vision, pp. 731–738 (2011)
Chapter Google Scholar
Zhu, Y., Dariush, B., Fujimura, K.: Controlled human pose estimation from depth image streams. In: Computer Vision and Pattern Recognition Workshops (2008)
Google Scholar
Zhu, Y., Dariush, B., Fujimura, K.: Kinematic self retargeting: a framework for human pose estimation. Comput. Vis. Image Underst. 114(12), 1362–1375 (2010). Special issue on Time-of-Flight Camera Based Computer Vision
Article Google Scholar

Download references

Acknowledgements

This work was supported by the German Research Foundation (DFG CL 64/5-1) and by the Intel Visual Computing Institute. Meinard Müller has been funded by the Cluster of Excellence on Multimodal Computing and Interaction (MMCI) and is now with the University of Bonn.

Author information

Authors and Affiliations

MPI Informatik & Saarland University, Campus E1.4, 66123, Saarbrücken, Germany
Andreas Baak
Bonn University & MPI Informatik, Römerstraße 164, 53117, Bonn, Germany
Meinard Müller
MPI Informatik, Campus E1.4, 66123, Saarbrücken, Germany
Gaurav Bharaj, Hans-Peter Seidel & Christian Theobalt

Authors

Andreas Baak
View author publications
You can also search for this author in PubMed Google Scholar
Meinard Müller
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Bharaj
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Seidel
View author publications
You can also search for this author in PubMed Google Scholar
Christian Theobalt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreas Baak .

Editor information

Editors and Affiliations

Computer Vision Laboratory, ETH Zürich, Sternwartstrasse 7, Zürich, 8092, Switzerland
Andrea Fossati
Perceiving Systems Department, Max Planck Inst. for Intelligent Systems, Spemannstrasse 41, Tübingen, 72076, Germany
Juergen Gall
Computer Vision Laboratory, ETH Zürich, Sternwartstrasse 7, Zürich, 8092, Switzerland
Helmut Grabner
Intel Science and Technology Center, Allen Center 462, Seattle, 98195, Washington, USA
Xiaofeng Ren
Industrial Perception, Industrial Ave 911, Palo Alto, 94303, California, USA
Kurt Konolige

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baak, A., Müller, M., Bharaj, G., Seidel, HP., Theobalt, C. (2013). A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera. In: Fossati, A., Gall, J., Grabner, H., Ren, X., Konolige, K. (eds) Consumer Depth Cameras for Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-4640-7_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4640-7_5
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4639-1
Online ISBN: 978-1-4471-4640-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics