Skip to main content

3D Hand Pose Estimation via Regularized Graph Representation Learning

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13069))

Included in the following conference series:

Abstract

This paper addresses the problem of 3D hand pose estimation from a monocular RGB image. While previous methods have shown great success, the structure of hands has not been fully exploited, which is critical in pose estimation. To this end, we propose a regularized graph representation learning under a conditional adversarial learning framework for 3D hand pose estimation, aiming to capture structural inter-dependencies of hand joints. In particular, we estimate an initial hand pose from a parametric hand model as a prior of hand structure, which regularizes the inference of the structural deformation in the prior pose for accurate graph representation learning via residual graph convolution. To optimize the hand structure further, we propose two bone-constrained loss functions, which characterize the morphable structure of hand poses explicitly. Also, we introduce an adversarial learning framework conditioned on the input image with a multi-source discriminator, which imposes the structural constraints onto the distribution of generated 3D hand poses for anthropomorphically valid hand poses. Extensive experiments demonstrate that our model sets the new state-of-the-art in 3D hand pose estimation from a monocular image on five standard benchmarks.

This work was supported by National Natural Science Foundation of China under contract No. 61972009.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017

    Google Scholar 

  2. Athitsos, V., Sclaroff, S.: Estimating 3d hand pose from a cluttered image. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition (2003)

    Google Scholar 

  3. Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Google Scholar 

  4. Boukhayma, A., Bem, R.D., Torr, P.H.: 3D hand shape and pose from images in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Google Scholar 

  5. Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3d hand pose estimation from monocular rgb images. In: The European Conference on Computer Vision (ECCV) (September 2018)

    Google Scholar 

  6. Choi, C.: Deephand: robust hand pose estimation by completing a matrix imputed with deep features. In: Computer Vision & Pattern Recognition (2016)

    Google Scholar 

  7. De, L.G.M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)

    Article  Google Scholar 

  8. Doosti, B., Naha, S., Mirbagheri, M., Crandall, D.J.: Hope-net: a graph-based model for hand-object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)

    Google Scholar 

  9. Fitzgibbon, A.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings, pp. 3633–3642 (2015)

    Google Scholar 

  10. Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand pointnet: 3d hand pose estimation using point sets, pp. 8417–8426 (June 2018). https://doi.org/10.1109/CVPR.2018.00878

  11. Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

    Google Scholar 

  12. Ge, L., et al.: 3D hand shape and pose estimation from a single rgb image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

    Google Scholar 

  14. Hui, L., Yuan, J., Lee, J., Ge, L., Thalmann, D.: Hough forest with optimized leaves for global hand pose estimation with arbitrary postures. IEEE Trans. Cybern. PP(99), 1–15 (2017)

    Google Scholar 

  15. Hürst, W., van Wezel, C.: Gesture-based interaction via finger tracking for mobile augmented reality. Multimed. Tools Appl. 62, 233–258 (2011)

    Article  Google Scholar 

  16. Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: IEEE Conference on Computer Vision & Pattern Recognition (2015)

    Google Scholar 

  17. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  18. Kulon, D., Wang, H., Güler, R.A., Bronstein, M.M., Zafeiriou, S.: Single image 3d hand reconstruction with mesh convolutions. In: BMVC (September 2019)

    Google Scholar 

  19. Li, G., Muller, M., Thabet, A., Ghanem, B.: Deepgcns: can gcns go as deep as cnns? In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)

    Google Scholar 

  20. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015). https://doi.org/10.1145/2816795.2818013, http://doi.acm.org/10.1145/2816795.2818013

  21. Malik, J., Elhayek, A., Nunnari, F., Varanasi, K., Stricker, D.: Deephps: end-to-end estimation of 3d hand pose and shape by learning from synthetic depth. In: 2018 International Conference on 3D Vision (3DV) (2018)

    Google Scholar 

  22. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=B1QRgziT-

  23. Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect, vol. 1 (January 2011). https://doi.org/10.5244/C.25.101

  24. Panteleris, P., Argyros, A.A.: Back to RGB: 3d tracking of hands and hand-object interactions based on short-baseline stereo. CoRR abs/1705.05301 (2017). http://arxiv.org/abs/1705.05301

  25. Piumsomboon, T., Clark, A., Billinghurst, M., Cockburn, A.: User-defined gestures for augmented reality. In: Kotzé, P., Marsden, G., Lindgaard, G., Wesson, J., Winckler, M. (eds.) INTERACT 2013. LNCS, vol. 8118, pp. 282–299. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40480-1_18

    Chapter  Google Scholar 

  26. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6), 245:1–245:17 (2017). https://doi.org/10.1145/3130800.3130883, http://doi.acm.org/10.1145/3130800.3130883

  27. Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. CoRR abs/1803.11404 (2018). http://arxiv.org/abs/1803.11404

  28. Tang, D., Yu, T.H., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: IEEE International Conference on Computer Vision (2013)

    Google Scholar 

  29. Tkach, A., Pauly, M., Tagliasacchi, A.: Sphere-meshes for real-time hand modeling and tracking. ACM Trans. Graph. 35(6), 222:1–222:11 (2016). https://doi.org/10.1145/2980179.2980226, http://doi.acm.org/10.1145/2980179.2980226

  30. Wandt, B., Ackermann, H., Rosenhahn, B.: A kinematic chain space for monocular motion capture (February 2017)

    Google Scholar 

  31. Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)

    Google Scholar 

  32. Yuan, S., et al.: Depth-based 3d hand pose estimation: from current achievements to future goals, pp. 2636–2645 (June 2018). https://doi.org/10.1109/CVPR.2018.00279

  33. Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3d hand pose tracking and estimation using stereo matching (October 2016)

    Google Scholar 

  34. Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single rgb images. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Hu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1912 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, Y., Hu, W. (2021). 3D Hand Pose Estimation via Regularized Graph Representation Learning. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13069. Springer, Cham. https://doi.org/10.1007/978-3-030-93046-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93046-2_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93045-5

  • Online ISBN: 978-3-030-93046-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics