Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction

Armagan, Anil; Garcia-Hernando, Guillermo; Baek, Seungryul; Hampali, Shreyas; Rad, Mahdi; Zhang, Zhaohui; Xie, Shipeng; Chen, MingXiu; Zhang, Boshen; Xiong, Fu; Xiao, Yang; Cao, Zhiguo; Yuan, Junsong; Ren, Pengfei; Huang, Weiting; Sun, Haifeng; Hrúz, Marek; Kanis, Jakub; Krňoul, Zdeněk; Wan, Qingfu; Li, Shile; Yang, Linlin; Lee, Dongheui; Yao, Angela; Zhou, Weiguo; Mei, Sijia; Liu, Yunhui; Spurr, Adrian; Iqbal, Umar; Molchanov, Pavlo; Weinzaepfel, Philippe; Brégier, Romain; Rogez, Grégory; Lepetit, Vincent; Kim, Tae-Kyun

doi:10.1007/978-3-030-58592-1_6

Anil Armagan¹²,
Guillermo Garcia-Hernando^12,13,
Seungryul Baek^12,31,
Shreyas Hampali¹⁴,
Mahdi Rad¹⁴,
Zhaohui Zhang¹⁵,
Shipeng Xie¹⁵,
MingXiu Chen¹⁵,
Boshen Zhang¹⁶,
Fu Xiong¹⁷,
Yang Xiao¹⁶,
Zhiguo Cao¹⁶,
Junsong Yuan¹⁸,
Pengfei Ren¹⁹,
Weiting Huang¹⁹,
Haifeng Sun¹⁹,
Marek Hrúz²⁰,
Jakub Kanis²⁰,
Zdeněk Krňoul²⁰,
Qingfu Wan²¹,
Shile Li²²,
Linlin Yang²³,
Dongheui Lee²²,
Angela Yao²⁴,
Weiguo Zhou²⁵,
Sijia Mei²⁵,
Yunhui Liu²⁶,
Adrian Spurr²⁷,
Umar Iqbal²⁸,
Pavlo Molchanov²⁸,
Philippe Weinzaepfel²⁹,
Romain Brégier²⁹,
Grégory Rogez²⁹,
Vincent Lepetit^14,30 &
…
Tae-Kyun Kim^12,32

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Included in the following conference series:

European Conference on Computer Vision

3680 Accesses
23 Citations

Abstract

We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS’19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS’19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand models to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27 mm to 13 mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
\(f^{reg}\) geometrically regresses the skeleton from the mesh vertex coordinates. It is provided with the MANO model and the weights are fixed during the process.

References

Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: CVPR (2019)
Google Scholar
Bhattarai, B., Baek, S., Bodur, R., Kim, T.K.: Sampling strategies for GAN synthetic data. In: ICASSP (2020)
Google Scholar
Boukhayma, A., de Bem, R., Torr, P.H.: 3D hand shape and pose from images in the wild. In: CVPR (2019)
Google Scholar
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: CVPR (2019)
Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. In: CVPR (2018)
Google Scholar
Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: CVPR (2016)
Google Scholar
Ge, L., et al.: 3D hand shape and pose estimation from a single RGB image. In: CVPR (2019)
Google Scholar
Hampali, S., Oberweger, M., Rad, M., Lepetit, V.: HO-3D: A multi-user, multi-object dataset for joint 3D hand-object pose estimation. arXiv preprint arXiv:1907.01481v1 (2019)
Hasson, Y., et al.: 3D hand shape and pose estimation from a single RGB image. In: CVPR (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Huang, W., Ren, P., Wang, J., Qi, Q., Sun, H.: AWR: adaptive weighting regression for 3D hand pose estimation. In: AAAI (2020)
Google Scholar
Iqbal, U., Molchanov, P., Breuel, T., Gall, J., Kautz, J.: Hand pose estimation via latent 2.5D heatmap regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 125–143. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_8
Chapter Google Scholar
Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015)
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23
Chapter Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: ICNN (1995)
Google Scholar
Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV (2019)
Google Scholar
Li, S., Lee, D.: Point-to-pose voting based hand pose estimation using residual permutation equivariant layer. In: CVPR (2019)
Google Scholar
Lin, J., Wu, Y., Huang, T.S.: Modeling the constraints of human hand motion. In: HUMO (2000)
Google Scholar
Moon, G., Chang, J.Y., Lee, K.M.: V2V-PoseNet: voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In: CVPR (2018)
Google Scholar
Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR (2018)
Google Scholar
Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: ICCV (2017)
Google Scholar
Oberweger, M., Lepetit, V.: DeepPrior++: improving fast and accurate 3D hand pose estimation. In: ICCV Workshop on HANDS (2017)
Google Scholar
Oikonomidis, I., Kyriazis, N., Argyros., A.A.: Efficient model-based 3D tracking of hand articulations using kinect. In: BMVC (2011)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net++: multi-person 2D and 3D pose detection in natural images. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1146–1161 (2019)
Google Scholar
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6), 2451–24517 (2017)
Google Scholar
Shin, D., Ren, Z., Sudderth, E.B., Fowlkes, C.C.: Multi-layer depth and epipolar feature transformers for 3D scene reconstruction. In: CVPR Workshops (2019)
Google Scholar
Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)
Google Scholar
Sinha, A., Choi, C., Ramani, K.: DeepHand: robust hand pose estimation by completing a matrix imputed with deep features. In: CVPR (2016)
Google Scholar
Spurr, A., Iqbal, U., Molchanov, P., Hilliges, O., Kautz, J.: Weakly supervised 3D hand pose estimation via biomechanical constraints. arXiv preprint arXiv:2003.09282 (2020)
Sun, X., Xiao, B., Wei, F., Liang, S., Wei, Y.: Integral human pose regression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 536–553. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_33
Chapter Google Scholar
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML (2019)
Google Scholar
Tung, H.Y.F., Tung, H.W., Yumer, E., Fragkiadaki, K.: Self-supervised learning of motion capture. In: NIPS (2017)
Google Scholar
Wan, Q.: SenoritaHand: Analytical 3D skeleton renderer and patch-based refinement for HANDS19 challenge Task 1 - Depth-based 3D hand pose estimation (December 2019). https://github.com/strawberryfg/Senorita-HANDS19-Pose
Wan, Q., Qiu, W., Yuille, A.L.: Patch-based 3D human pose refinement. In: CVPRW (2019)
Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: A convolutional neural network for 6D object pose estimation in cluttered scenes. In: RSS (2018)
Google Scholar
Xiong, F., et al.: A2J: anchor-to-joint regression network for 3D articulated pose estimation from a single depth image. In: ICCV (2019)
Google Scholar
Yang, L., Li, S., Lee, D., Yao, A.: Aligning latent spaces for 3D hand pose estimation. In: ICCV (2019)
Google Scholar
Yuan, S., Ye, Q., Garcia-Hernando, G., Kim, T.K.: The 2017 hands in the million challenge on 3D hand pose estimation. arXiv preprint arXiv:1707.02237 (2017)
Yuan, S., et al.: Depth-based 3D hand pose estimation: from current achievements to future goals. In: CVPR (2018)
Google Scholar
Yuan, S., Ye, Q., Stenger, B., Jain, S., Kim, T.K.: BigHand 2.2M Benchmark: hand pose data set and state of the art analysis. In: CVPR (2017)
Google Scholar
Zhang, X., Li, Q., Mo, H., Zhang, W., Zheng, W.: End-to-end hand mesh recovery from a monocular RGB image. In: ICCV (2019)
Google Scholar
Zhang, Z., Xie, S., Chen, M., Zhu, H.: HandAugment: A simple data augmentation method for depth-based 3D hand pose estimation. arXiv preprint arXiv:2001.00702 (2020)
Zhou, X., Wan, Q., Zhang, W., Xue, X., Wei, Y.: Model-based deep hand pose estimation. In: IJCAI (2016)
Google Scholar
Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. In: ICCV (2017)
Google Scholar

Download references

Acknowledgements

This work is partially supported by Huawei Technologies Co. Ltd. and Samsung Electronics. S. Baek was supported by IITP funds from MSIT of Korea (No. 2020-0-01336 AIGS of UNIST, No. 2020-0-00537 Development of 5G based low latency device - edge cloud interaction technology).

Author information

Authors and Affiliations

Imperial College London, London, UK
Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek & Tae-Kyun Kim
Niantic, Inc., San Francisco, USA
Guillermo Garcia-Hernando
Graz University of Technology, Graz, Austria
Shreyas Hampali, Mahdi Rad & Vincent Lepetit
Rokid Corp. Ltd., San Francisco, USA
Zhaohui Zhang, Shipeng Xie & MingXiu Chen
HUST, Wuhan, China
Boshen Zhang, Yang Xiao & Zhiguo Cao
Megvii Research Nanjing, Nanjing, China
Fu Xiong
SUNY Buffalo, Buffalo, USA
Junsong Yuan
BUPT, Beijing, China
Pengfei Ren, Weiting Huang & Haifeng Sun
University of West Bohemia, Pilsen, Czech Republic
Marek Hrúz, Jakub Kanis & Zdeněk Krňoul
Fudan University, Shanghai, China
Qingfu Wan
TUM, Munich, Germany
Shile Li & Dongheui Lee
University of Bonn, Bonn, Germany
Linlin Yang
NUS, Singapore, Singapore
Angela Yao
Harbin Institute of Technology, Harbin, China
Weiguo Zhou & Sijia Mei
CUHK, Hong Kong, People’s Republic of China
Yunhui Liu
ETH Zurich, Zürich, Switzerland
Adrian Spurr
NVIDIA Research, Santa Clara, USA
Umar Iqbal & Pavlo Molchanov
NAVER LABS Europe, Meylan, France
Philippe Weinzaepfel, Romain Brégier & Grégory Rogez
ENPC ParisTech, Champs-sur-Marne, France
Vincent Lepetit
UNIST, Ulsan, South Korea
Seungryul Baek
KAIST, Daejeon, South Korea
Tae-Kyun Kim

Authors

Anil Armagan
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Garcia-Hernando
View author publications
You can also search for this author in PubMed Google Scholar
Seungryul Baek
View author publications
You can also search for this author in PubMed Google Scholar
Shreyas Hampali
View author publications
You can also search for this author in PubMed Google Scholar
Mahdi Rad
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shipeng Xie
View author publications
You can also search for this author in PubMed Google Scholar
MingXiu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Boshen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fu Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguo Cao
View author publications
You can also search for this author in PubMed Google Scholar
Junsong Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Ren
View author publications
You can also search for this author in PubMed Google Scholar
Weiting Huang
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Marek Hrúz
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Kanis
View author publications
You can also search for this author in PubMed Google Scholar
Zdeněk Krňoul
View author publications
You can also search for this author in PubMed Google Scholar
Qingfu Wan
View author publications
You can also search for this author in PubMed Google Scholar
Shile Li
View author publications
You can also search for this author in PubMed Google Scholar
Linlin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dongheui Lee
View author publications
You can also search for this author in PubMed Google Scholar
Angela Yao
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Sijia Mei
View author publications
You can also search for this author in PubMed Google Scholar
Yunhui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Spurr
View author publications
You can also search for this author in PubMed Google Scholar
Umar Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Pavlo Molchanov
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Weinzaepfel
View author publications
You can also search for this author in PubMed Google Scholar
Romain Brégier
View author publications
You can also search for this author in PubMed Google Scholar
Grégory Rogez
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Lepetit
View author publications
You can also search for this author in PubMed Google Scholar
Tae-Kyun Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anil Armagan .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8252 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Armagan, A. et al. (2020). Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-58592-1_6
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58591-4
Online ISBN: 978-3-030-58592-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation Under Hand-Object Interaction