3D Hand Pose Estimation via Regularized Graph Representation Learning

He, Yiming; Hu, Wei

doi:10.1007/978-3-030-93046-2_46

Yiming He¹⁴ &
Wei Hu¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13069))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1987 Accesses
3 Citations

Abstract

This paper addresses the problem of 3D hand pose estimation from a monocular RGB image. While previous methods have shown great success, the structure of hands has not been fully exploited, which is critical in pose estimation. To this end, we propose a regularized graph representation learning under a conditional adversarial learning framework for 3D hand pose estimation, aiming to capture structural inter-dependencies of hand joints. In particular, we estimate an initial hand pose from a parametric hand model as a prior of hand structure, which regularizes the inference of the structural deformation in the prior pose for accurate graph representation learning via residual graph convolution. To optimize the hand structure further, we propose two bone-constrained loss functions, which characterize the morphable structure of hand poses explicitly. Also, we introduce an adversarial learning framework conditioned on the input image with a multi-source discriminator, which imposes the structural constraints onto the distribution of generated 3D hand poses for anthropomorphically valid hand poses. Extensive experiments demonstrate that our model sets the new state-of-the-art in 3D hand pose estimation from a monocular image on five standard benchmarks.

This work was supported by National Natural Science Foundation of China under contract No. 61972009.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017
Google Scholar
Athitsos, V., Sclaroff, S.: Estimating 3d hand pose from a cluttered image. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition (2003)
Google Scholar
Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Google Scholar
Boukhayma, A., Bem, R.D., Torr, P.H.: 3D hand shape and pose from images in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Google Scholar
Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3d hand pose estimation from monocular rgb images. In: The European Conference on Computer Vision (ECCV) (September 2018)
Google Scholar
Choi, C.: Deephand: robust hand pose estimation by completing a matrix imputed with deep features. In: Computer Vision & Pattern Recognition (2016)
Google Scholar
De, L.G.M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)
Article Google Scholar
Doosti, B., Naha, S., Mirbagheri, M., Crandall, D.J.: Hope-net: a graph-based model for hand-object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)
Google Scholar
Fitzgibbon, A.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings, pp. 3633–3642 (2015)
Google Scholar
Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand pointnet: 3d hand pose estimation using point sets, pp. 8417–8426 (June 2018). https://doi.org/10.1109/CVPR.2018.00878
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Google Scholar
Ge, L., et al.: 3D hand shape and pose estimation from a single rgb image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Google Scholar
Hui, L., Yuan, J., Lee, J., Ge, L., Thalmann, D.: Hough forest with optimized leaves for global hand pose estimation with arbitrary postures. IEEE Trans. Cybern. PP(99), 1–15 (2017)
Google Scholar
Hürst, W., van Wezel, C.: Gesture-based interaction via finger tracking for mobile augmented reality. Multimed. Tools Appl. 62, 233–258 (2011)
Article Google Scholar
Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: IEEE Conference on Computer Vision & Pattern Recognition (2015)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Kulon, D., Wang, H., Güler, R.A., Bronstein, M.M., Zafeiriou, S.: Single image 3d hand reconstruction with mesh convolutions. In: BMVC (September 2019)
Google Scholar
Li, G., Muller, M., Thabet, A., Ghanem, B.: Deepgcns: can gcns go as deep as cnns? In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015). https://doi.org/10.1145/2816795.2818013, http://doi.acm.org/10.1145/2816795.2818013
Malik, J., Elhayek, A., Nunnari, F., Varanasi, K., Stricker, D.: Deephps: end-to-end estimation of 3d hand pose and shape by learning from synthetic depth. In: 2018 International Conference on 3D Vision (3DV) (2018)
Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=B1QRgziT-
Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect, vol. 1 (January 2011). https://doi.org/10.5244/C.25.101
Panteleris, P., Argyros, A.A.: Back to RGB: 3d tracking of hands and hand-object interactions based on short-baseline stereo. CoRR abs/1705.05301 (2017). http://arxiv.org/abs/1705.05301
Piumsomboon, T., Clark, A., Billinghurst, M., Cockburn, A.: User-defined gestures for augmented reality. In: Kotzé, P., Marsden, G., Lindgaard, G., Wesson, J., Winckler, M. (eds.) INTERACT 2013. LNCS, vol. 8118, pp. 282–299. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40480-1_18
Chapter Google Scholar
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6), 245:1–245:17 (2017). https://doi.org/10.1145/3130800.3130883, http://doi.acm.org/10.1145/3130800.3130883
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. CoRR abs/1803.11404 (2018). http://arxiv.org/abs/1803.11404
Tang, D., Yu, T.H., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: IEEE International Conference on Computer Vision (2013)
Google Scholar
Tkach, A., Pauly, M., Tagliasacchi, A.: Sphere-meshes for real-time hand modeling and tracking. ACM Trans. Graph. 35(6), 222:1–222:11 (2016). https://doi.org/10.1145/2980179.2980226, http://doi.acm.org/10.1145/2980179.2980226
Wandt, B., Ackermann, H., Rosenhahn, B.: A kinematic chain space for monocular motion capture (February 2017)
Google Scholar
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Google Scholar
Yuan, S., et al.: Depth-based 3d hand pose estimation: from current achievements to future goals, pp. 2636–2645 (June 2018). https://doi.org/10.1109/CVPR.2018.00279
Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3d hand pose tracking and estimation using stereo matching (October 2016)
Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single rgb images. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Wangxuan Institute of Computer Technology, Peking University, Beijing, China
Yiming He & Wei Hu

Authors

Yiming He
View author publications
You can also search for this author in PubMed Google Scholar
Wei Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Hu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Yiran Chen
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
University of British Columbia, Vancouver, BC, Canada
Jane Wang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Ruiping Wang
Xidian University, Xi’an, China
Weisheng Dong

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1912 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Y., Hu, W. (2021). 3D Hand Pose Estimation via Regularized Graph Representation Learning. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13069. Springer, Cham. https://doi.org/10.1007/978-3-030-93046-2_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-93046-2_46
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93045-5
Online ISBN: 978-3-030-93046-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

3D Hand Pose Estimation via Regularized Graph Representation Learning