Skip to main content

RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

  • 512 Accesses

Abstract

The pixel-aligned implicit functions (IFs) enable the reconstruction of 3D human with complete and detailed clothing from a single RGB image. To enhance robustness for poses, existing work introduce the parametric body model as prior, but this limits the recovery of the geometry details and makes it challenging to handle loose clothing. Our goal is to reconstruct both clothing and pose that highly align with the input image, even in cases of peculiar poses and complex clothing. To achieve this, we propose a multi-scale features-based implicit method, called RICH, which combines the flexibility of implicit function and the powerful prior of parametric body model. RICH introduces a 3D human body model as prior knowledge and adopts local feature to constrain human body generation. Furthermore, RICH employs a pretrained image encoder to extract global pixel-aligned feature, which contributes to high-precision and complete reconstruction of clothing geometry and of the external appearance such as hair and accessories. Besides, by establishing connections with the joints of the body model, RICH utilizes an attention mechanism to construct relative spatial feature, thereby increasing the robustness for poses. Finally, RICH takes as input local, relative, and global feature to IF to query occupancy and the clothed human is represented by the 0.5 iso-surface of the 3D occupancy field. Quantitative and qualitative evaluation on the THuman2.0 and CAPE datasets shows that RICH outperforms the state-of-the-art methods. In particular, RICH demonstrates strong generalization ability on in-the-wild images, even under the scenarios of challenging poses and complex clothing. The code and supplementary material will be available at https://github.com/lyk412/RICH.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alldieck, T., Magnor, M., Bhatnagar, B.L., Theobalt, C., Pons-Moll, G.: Learning to reconstruct people in clothing from a single RGB camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1186 (2019)

    Google Scholar 

  2. Alldieck, T., Pons-Moll, G., Theobalt, C., Magnor, M.: Tex2Shape: detailed full human body geometry from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2293–2303 (2019)

    Google Scholar 

  3. Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3D reconstruction of humans wearing clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1506–1515 (2022)

    Google Scholar 

  4. Bhatnagar, B.L., Tiwari, G., Theobalt, C., Pons-Moll, G.: Multi-Garment Net: learning to dress 3D people from images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5420–5430 (2019)

    Google Scholar 

  5. Cui, Y., Chang, W., Nöll, T., Stricker, D.: KinectAvatar: fully automatic body capture using a single kinect. In: Park, J.I., Kim, J. (eds.) Computer Vision-ACCV 2012 Workshops: ACCV 2012 International Workshops, Daejeon, Korea, 5–6 November 2012, Revised Selected Papers, Part II 11. LNCS, vol. 7729, pp. 133–147. Springer, Cham (2013). https://doi.org/10.1007/978-3-642-37484-5_12

  6. He, T., Collomosse, J., Jin, H., Soatto, S.: Geo-PIFu: geometry and pixel aligned implicit functions for single-view human reconstruction. Adv. Neural. Inf. Process. Syst. 33, 9276–9287 (2020)

    Google Scholar 

  7. Hu, Q., et al.: RandLA-Net: efficient semantic segmentation of large-scale point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11108–11117 (2020)

    Google Scholar 

  8. Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3334–3342 (2015)

    Google Scholar 

  9. Lin, S., Zhang, H., Zheng, Z., Shao, R., Liu, Y.: Learning implicit templates for point-based clothed human modeling. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part III, pp. 210–228. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20062-5_13

  10. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)

    Article  Google Scholar 

  11. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction algorithm. In: Seminal Graphics: Pioneering Efforts that Shaped the Field, pp. 347–353 (1998)

    Google Scholar 

  12. Ma, Q., Saito, S., Yang, J., Tang, S., Black, M.J.: SCALE: modeling clothed humans with a surface codec of articulated local elements. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16082–16093 (2021)

    Google Scholar 

  13. Ma, Q., et al.: Learning to dress 3D people in generative clothing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6469–6478 (2020)

    Google Scholar 

  14. Ma, Q., Yang, J., Tang, S., Black, M.J.: The power of points for modeling humans in clothing. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10974–10984 (2021)

    Google Scholar 

  15. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)

    Google Scholar 

  16. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)

    Google Scholar 

  17. Pavlakos, G., et al.: Expressive body capture: 3D hands, face, and body from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10975–10985 (2019)

    Google Scholar 

  18. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: PIFu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)

    Google Scholar 

  19. Saito, S., Simon, T., Saragih, J., Joo, H.: PIFuHD: multi-level pixel-aligned implicit function for high-resolution 3D human digitization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 84–93 (2020)

    Google Scholar 

  20. Su, Z., Yu, T., Wang, Y., Liu, Y.: DeepCloth: neural garment representation for shape and style editing. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1581–1593 (2022)

    Article  Google Scholar 

  21. Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: ICON: implicit clothed humans obtained from normals. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13286–13296. IEEE (2022)

    Google Scholar 

  22. Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4D: real-time human volumetric capture from very sparse consumer RGBD sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5746–5756 (2021)

    Google Scholar 

  23. Zhang, H., et al.: CloSET: modeling clothed humans on continuous surface with explicit template decomposition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 501–511 (2023)

    Google Scholar 

  24. Zhang, H., et al.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11446–11456 (2021)

    Google Scholar 

  25. Zhang, Y., Qu, Y., Xie, Y., Li, Z., Zheng, S., Li, C.: Perturbed self-distillation: weakly supervised large-scale point cloud semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15520–15528 (2021)

    Google Scholar 

  26. Zhang, Y., Xie, Y., Li, C., Wu, Z., Qu, Y.: Learning all-in collaborative multiview binary representation for clustering. IEEE Trans. Neural Networks Learn. Syst. 1–14 (2022). https://doi.org/10.1109/TNNLS.2022.3202102

  27. Zheng, Z., Yu, T., Liu, Y., Dai, Q.: PaMIR: parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3170–3184 (2021)

    Article  Google Scholar 

  28. Zhu, H., Zuo, X., Wang, S., Cao, X., Yang, R.: Detailed human shape estimation from a single image by hierarchical mesh deformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4491–4500 (2019)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Shenzhen Key Laboratory of next generation interactive media innovative technology (No. ZDSYS20210623092001004), in part by the China Postdoctoral Science Foundation (No. 2023M731957), in part by the National Natural Science Foundation of China under Grant 62306165.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yachao Zhang or Xiu Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 574 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, Y., Li, R., Lyu, K., Zhang, Y., Li, X. (2024). RICH: Robust Implicit Clothed Humans Reconstruction from Multi-scale Spatial Cues. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8432-9_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8431-2

  • Online ISBN: 978-981-99-8432-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics