Abstract
This paper addresses the problem of 3D hand pose estimation from a monocular RGB image. While previous methods have shown great success, the structure of hands has not been fully exploited, which is critical in pose estimation. To this end, we propose a regularized graph representation learning under a conditional adversarial learning framework for 3D hand pose estimation, aiming to capture structural inter-dependencies of hand joints. In particular, we estimate an initial hand pose from a parametric hand model as a prior of hand structure, which regularizes the inference of the structural deformation in the prior pose for accurate graph representation learning via residual graph convolution. To optimize the hand structure further, we propose two bone-constrained loss functions, which characterize the morphable structure of hand poses explicitly. Also, we introduce an adversarial learning framework conditioned on the input image with a multi-source discriminator, which imposes the structural constraints onto the distribution of generated 3D hand poses for anthropomorphically valid hand poses. Extensive experiments demonstrate that our model sets the new state-of-the-art in 3D hand pose estimation from a monocular image on five standard benchmarks.
This work was supported by National Natural Science Foundation of China under contract No. 61972009.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017
Athitsos, V., Sclaroff, S.: Estimating 3d hand pose from a cluttered image. In: IEEE Computer Society Conference on Computer Vision & Pattern Recognition (2003)
Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Boukhayma, A., Bem, R.D., Torr, P.H.: 3D hand shape and pose from images in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Cai, Y., Ge, L., Cai, J., Yuan, J.: Weakly-supervised 3d hand pose estimation from monocular rgb images. In: The European Conference on Computer Vision (ECCV) (September 2018)
Choi, C.: Deephand: robust hand pose estimation by completing a matrix imputed with deep features. In: Computer Vision & Pattern Recognition (2016)
De, L.G.M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)
Doosti, B., Naha, S., Mirbagheri, M., Crandall, D.J.: Hope-net: a graph-based model for hand-object pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)
Fitzgibbon, A.: Accurate, robust, and flexible real-time hand tracking. In: Proceedings, pp. 3633–3642 (2015)
Ge, L., Cai, Y., Weng, J., Yuan, J.: Hand pointnet: 3d hand pose estimation using point sets, pp. 8417–8426 (June 2018). https://doi.org/10.1109/CVPR.2018.00878
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Ge, L., et al.: 3D hand shape and pose estimation from a single rgb image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Hui, L., Yuan, J., Lee, J., Ge, L., Thalmann, D.: Hough forest with optimized leaves for global hand pose estimation with arbitrary postures. IEEE Trans. Cybern. PP(99), 1–15 (2017)
Hürst, W., van Wezel, C.: Gesture-based interaction via finger tracking for mobile augmented reality. Multimed. Tools Appl. 62, 233–258 (2011)
Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: IEEE Conference on Computer Vision & Pattern Recognition (2015)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)
Kulon, D., Wang, H., Güler, R.A., Bronstein, M.M., Zafeiriou, S.: Single image 3d hand reconstruction with mesh convolutions. In: BMVC (September 2019)
Li, G., Muller, M., Thabet, A., Ghanem, B.: Deepgcns: can gcns go as deep as cnns? In: The IEEE International Conference on Computer Vision (ICCV) (October 2019)
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 248:1–248:16 (2015). https://doi.org/10.1145/2816795.2818013, http://doi.acm.org/10.1145/2816795.2818013
Malik, J., Elhayek, A., Nunnari, F., Varanasi, K., Stricker, D.: Deephps: end-to-end estimation of 3d hand pose and shape by learning from synthetic depth. In: 2018 International Conference on 3D Vision (3DV) (2018)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018). https://openreview.net/forum?id=B1QRgziT-
Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect, vol. 1 (January 2011). https://doi.org/10.5244/C.25.101
Panteleris, P., Argyros, A.A.: Back to RGB: 3d tracking of hands and hand-object interactions based on short-baseline stereo. CoRR abs/1705.05301 (2017). http://arxiv.org/abs/1705.05301
Piumsomboon, T., Clark, A., Billinghurst, M., Cockburn, A.: User-defined gestures for augmented reality. In: Kotzé, P., Marsden, G., Lindgaard, G., Wesson, J., Winckler, M. (eds.) INTERACT 2013. LNCS, vol. 8118, pp. 282–299. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40480-1_18
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. 36(6), 245:1–245:17 (2017). https://doi.org/10.1145/3130800.3130883, http://doi.acm.org/10.1145/3130800.3130883
Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. CoRR abs/1803.11404 (2018). http://arxiv.org/abs/1803.11404
Tang, D., Yu, T.H., Kim, T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: IEEE International Conference on Computer Vision (2013)
Tkach, A., Pauly, M., Tagliasacchi, A.: Sphere-meshes for real-time hand modeling and tracking. ACM Trans. Graph. 35(6), 222:1–222:11 (2016). https://doi.org/10.1145/2980179.2980226, http://doi.acm.org/10.1145/2980179.2980226
Wandt, B., Ackermann, H., Rosenhahn, B.: A kinematic chain space for monocular motion capture (February 2017)
Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3D human pose estimation in the wild by adversarial learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Yuan, S., et al.: Depth-based 3d hand pose estimation: from current achievements to future goals, pp. 2636–2645 (June 2018). https://doi.org/10.1109/CVPR.2018.00279
Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3d hand pose tracking and estimation using stereo matching (October 2016)
Zimmermann, C., Brox, T.: Learning to estimate 3d hand pose from single rgb images. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
He, Y., Hu, W. (2021). 3D Hand Pose Estimation via Regularized Graph Representation Learning. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13069. Springer, Cham. https://doi.org/10.1007/978-3-030-93046-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-93046-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93045-5
Online ISBN: 978-3-030-93046-2
eBook Packages: Computer ScienceComputer Science (R0)