RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues

Lin, Yukang; Li, Ronghui; Lyu, Kedi; Zhang, Yachao; Li, Xiu

doi:10.1007/978-981-99-8432-9_16

Yukang Lin¹⁵,
Ronghui Li¹⁵,
Kedi Lyu¹⁶,
Yachao Zhang¹⁵ &
…
Xiu Li¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

512 Accesses

Abstract

The pixel-aligned implicit functions (IFs) enable the reconstruction of 3D human with complete and detailed clothing from a single RGB image. To enhance robustness for poses, existing work introduce the parametric body model as prior, but this limits the recovery of the geometry details and makes it challenging to handle loose clothing. Our goal is to reconstruct both clothing and pose that highly align with the input image, even in cases of peculiar poses and complex clothing. To achieve this, we propose a multi-scale features-based implicit method, called RICH, which combines the flexibility of implicit function and the powerful prior of parametric body model. RICH introduces a 3D human body model as prior knowledge and adopts local feature to constrain human body generation. Furthermore, RICH employs a pretrained image encoder to extract global pixel-aligned feature, which contributes to high-precision and complete reconstruction of clothing geometry and of the external appearance such as hair and accessories. Besides, by establishing connections with the joints of the body model, RICH utilizes an attention mechanism to construct relative spatial feature, thereby increasing the robustness for poses. Finally, RICH takes as input local, relative, and global feature to IF to query occupancy and the clothed human is represented by the 0.5 iso-surface of the 3D occupancy field. Quantitative and qualitative evaluation on the THuman2.0 and CAPE datasets shows that RICH outperforms the state-of-the-art methods. In particular, RICH demonstrates strong generalization ability on in-the-wild images, even under the scenarios of challenging poses and complex clothing. The code and supplementary material will be available at https://github.com/lyk412/RICH.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1186 (2019)
Google Scholar
Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2Shape: detailed full human body geometry from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2293–2303 (2019)
Google Scholar
Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3D reconstruction of humans wearing clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1506–1515 (2022)
Google Scholar
Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-Garment Net: learning to dress 3D people from images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5420–5430 (2019)
Google Scholar
Cui, Y., Chang, W., Nöll, T., Stricker, D.: KinectAvatar: fully automatic body capture using a single kinect. In: Park, J.I., Kim, J. (eds.) Computer Vision-ACCV 2012 Workshops: ACCV 2012 International Workshops, Daejeon, Korea, 5–6 November 2012, Revised Selected Papers, Part II 11. LNCS, vol. 7729, pp. 133–147. Springer, Cham (2013). https://doi.org/10.1007/978-3-642-37484-5_12
He, T., Collomosse, J., Jin, H., Soatto, S.: Geo-PIFu: geometry and pixel aligned implicit functions for single-view human reconstruction. Adv. Neural. Inf. Process. Syst. 33, 9276–9287 (2020)
Google Scholar
Hu, Q., et al.: RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11108–11117 (2020)
Google Scholar
Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3334–3342 (2015)
Google Scholar
Lin, S., Zhang, H., Zheng, Z., Shao, R., Liu, Y.: Learning implicit templates for point-based clothed human modeling. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part III, pp. 210–228. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20062-5_13
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)
Article Google Scholar
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Seminal Graphics: Pioneering Efforts that Shaped the Field, pp. 347–353 (1998)
Google Scholar
Ma, Q., Saito, S., Yang, J., Tang, S., Black, M.J.: SCALE: modeling clothed humans with a surface codec of articulated local elements. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16082–16093 (2021)
Google Scholar
Ma, Q., et al.: Learning to dress 3D people in generative clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6469–6478 (2020)
Google Scholar
Ma, Q., Yang, J., Tang, S., Black, M.J.: The power of points for modeling humans in clothing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10974–10984 (2021)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Google Scholar
Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)
Google Scholar
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)
Google Scholar
Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 84–93 (2020)
Google Scholar
Su, Z., Yu, T., Wang, Y., Liu, Y.: DeepCloth: neural garment representation for shape and style editing. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1581–1593 (2022)
Article Google Scholar
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13286–13296. IEEE (2022)
Google Scholar
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5746–5756 (2021)
Google Scholar
Zhang, H., et al.: CloSET: modeling clothed humans on continuous surface with explicit template decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 501–511 (2023)
Google Scholar
Zhang, H., et al.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11446–11456 (2021)
Google Scholar
Zhang, Y., Qu, Y., Xie, Y., Li, Z., Zheng, S., Li, C.: Perturbed self-distillation: weakly supervised large-scale point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15520–15528 (2021)
Google Scholar
Zhang, Y., Xie, Y., Li, C., Wu, Z., Qu, Y.: Learning all-in collaborative multiview binary representation for clustering. IEEE Trans. Neural Networks Learn. Syst. 1–14 (2022). https://doi.org/10.1109/TNNLS.2022.3202102
Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3170–3184 (2021)
Article Google Scholar
Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4491–4500 (2019)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Shenzhen Key Laboratory of next generation interactive media innovative technology (No. ZDSYS20210623092001004), in part by the China Postdoctoral Science Foundation (No. 2023M731957), in part by the National Natural Science Foundation of China under Grant 62306165.

Author information

Authors and Affiliations

Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Yukang Lin, Ronghui Li, Yachao Zhang & Xiu Li
College of Computer Science and Technology, Jilin University, Changchun, China
Kedi Lyu

Authors

Yukang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ronghui Li
View author publications
You can also search for this author in PubMed Google Scholar
Kedi Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Yachao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiu Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yachao Zhang or Xiu Li .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 574 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, Y., Li, R., Lyu, K., Zhang, Y., Li, X. (2024). RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_16

Download citation

DOI: https://doi.org/10.1007/978-981-99-8432-9_16
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues