Abstract
While considerable progress has been made in 3D human reconstruction, existing methodologies exhibit performance limitations, mainly when dealing with complex clothing variations and challenging poses. In this paper, we propose an innovative approach that integrates fine-grained body part labels and geometric features into the reconstruction process, addressing the aforementioned challenges and boosting the overall performance. Our method, Fine Implicit Reconstruction Enhancement (FIRE), leverages blend weight-based soft human parsing labels and geometric features obtained through Ray-based Sampling to enrich the representation and predictive power of the reconstruction pipeline. We argue that incorporating these features provides valuable body part-specific information, which aids in improving the accuracy of the reconstructed model. Our work presents a subtle yet significant enhancement to current techniques, pushing the boundaries of performance and paving the way for future research in this intriguing field. Extensive experiments on various benchmarks demonstrate the effectiveness of our FIRE approach in terms of reconstruction accuracy, particularly in handling challenging poses and clothing variations. Our method’s versatility and improved performance render it a promising solution for diverse applications within the domain of 3D human body reconstruction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alldieck, T., Zanfir, M., Sminchisescu, C.: Photorealistic monocular 3d reconstruction of humans wearing clothing. In: CVPR, pp. 1506–1515 (2022)
Chan, K., Lin, G., Zhao, H., Lin, W.: S-pifu: integrating parametric human models with pifu for single-view clothed human reconstruction. Adv. Neural. Inf. Process. Syst. 35, 17373–17385 (2022)
Chan, K.Y., Lin, G., Zhao, H., Lin, W.: Integratedpifu: integrated pixel aligned implicit function for single-view human reconstruction. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part II, pp. 328–344. Springer (2022). https://doi.org/10.1007/978-3-031-20086-1_19
Chen, X., Zheng, Y., Black, M.J., Hilliges, O., Geiger, A.: Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11594–11604 (2021)
Kavan, L., Collins, S., Žára, J., O’Sullivan, C.: Skinning with dual quaternions. In: Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games, pp. 39–46 (2007)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR, pp. 5939–5948 (2019)
Feng, Q., Liu, Y., Lai, Y.K., Yang, J., Li, K.: Fof: learning fourier occupancy field for monocular real-time human reconstruction. arXiv:2206.02194 (2022)
He, T., Collomosse, J., Jin, H., Soatto, S.: Geo-pifu: geometry and pixel aligned implicit functions for single-view human reconstruction. Adv. Neural. Inf. Process. Syst. 33, 9276–9287 (2020)
He, T., Xu, Y., Saito, S., Soatto, S., Tung, T.: Arch++: animation-ready clothed human reconstruction revisited. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11046–11056 (2021)
Huang, Y., et al.: One-shot implicit animatable avatars with model-based priors. arXiv:2212.02469 (2022)
Huang, Z., Xu, Y., Lassner, C., Li, H., Tung, T.: Arch: animatable reconstruction of clothed humans. In: CVPR, pp. 3093–3102 (2020)
Jiang, B., Hong, Y., Bao, H., Zhang, J.: Selfrecon: self reconstruction your digital avatar from monocular video. In: CVPR, pp. 5605–5615 (2022)
Joo, H., Simon, T., Sheikh, Y.: Total capture: a 3d deformation model for tracking faces, hands, and bodies. In: CVPR, pp. 8320–8329 (2018)
Lin, S., Zhang, H., Zheng, Z., Shao, R., Liu, Y.: Learning implicit templates for point-based clothed human modeling. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 210–228. Springer (2022). https://doi.org/10.1007/978-3-031-20062-5_13
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 1–16 (2015)
Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3d surface construction algorithm. ACM siggraph computer graphics 21(4), 163–169 (1987)
Ma, Q., et al.: Learning to dress 3d people in generative clothing. In: CVPR, pp. 6469–6478 (2020)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: CVPR, pp. 4460–4470 (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: CVPR, pp. 165–174 (2019)
Pavlakos, G., et al.: Expressive body capture: 3d hands, face, and body from a single image. In: CVPR, pp. 10975–10985 (2019)
Peng, S., et al.: Neural body: implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In: CVPR, pp. 9054–9063 (2021)
Pesavento, M., Volino, M., Hilton, A.: Super-resolution 3d human shape from a single low-resolution image. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part II, pp. 447–464. Springer (2022). https://doi.org/10.1007/978-3-031-20086-1_26
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. arXiv:2201.02610 (2022)
Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: pixel-aligned implicit function for high-resolution clothed human digitization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2304–2314 (2019)
Saito, S., Simon, T., Saragih, J., Joo, H.: Pifuhd: multi-level pixel-aligned implicit function for high-resolution 3d human digitization. In: CVPR, pp. 84–93 (2020)
Jackson, A.S., Manafas, C., Tzimiropoulos, G.: 3D human body reconstruction from a single image via volumetric regression. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 64–77. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11018-5_6
Wang, K., Zheng, H., Zhang, G., Yang, J.: Parametric model estimation for 3d clothed humans from point clouds. In: ISMAR, pp. 156–165. IEEE (2021)
Wang, S., Mihajlovic, M., Ma, Q., Geiger, A., Tang, S.: Metaavatar: learning animatedly clothed human models from few depth images. Adv. Neural. Inf. Process. Syst. 34, 2810–2822 (2021)
Weng, C.Y., Curless, B., Srinivasan, P.P., Barron, J.T., Kemelmacher-Shlizerman, I.: Humannerf: free-viewpoint rendering of moving people from monocular video. In: CVPR, pp. 16210–16220 (2022)
Xiong, Z., et al.: Pifu for the real world: a self-supervised framework to reconstruct dressed humans from single-view images. arXiv:2208.10769 (2022)
Xiu, Y., Yang, J., Tzionas, D., Black, M.J.: Icon: implicit clothed humans obtained from normals. In: CVPR, pp. 13286–13296. IEEE (2022)
Xu, H., Bazavan, E.G., Zanfir, A., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Ghum & ghuml: generative 3d human shape and articulated pose models. In: CVPR, pp. 6184–6193 (2020)
Zheng, Z., Yu, T., Liu, Y., Dai, Q.: Pamir: Parametric model-conditioned implicit representation for image-based human reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3170–3184 (2021)
Patel, P., Huang, C.H.P., Tesch, J., Hoffmann, D.T., Tripathi, S., Black, M.J.: Agora: avatars in geography optimized for regression analysis. In: CVPR, pp. 13468–13478 (2021)
Zheng, Z., Yu, T., Wei, Y., Dai, Q., Liu, Y.: Deephuman: 3d human reconstruction from a single image. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7739–7749 (2019)
Moon, G., Nam, H., Shiratori, T., Lee, K.M.: 3d clothed human reconstruction in the wild. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022, Proceedings, Part II, pp. 184–200. Springer (2022). https://doi.org/10.1007/978-3-031-20086-1_11
Jinka, S.S., Srivastava, A., Pokhariya, C., Sharma, A., Narayanan, P.: Sharp: shape-aware reconstruction of people in loose clothing. Int. J. Comput. Vision 131(4), 918–937 (2023)
Yu, T., Zheng, Z., Guo, K., Liu, P., Dai, Q., Liu, Y.: Function4d: real-time human volumetric capture from very sparse consumer rgbd sensors. In: CVPR, pp. 5746–5756 (2021)
Kocabas, M., Huang, C.H.P., Hilliges, O., Black, M.J.: Pare: part attention regressor for 3d human body estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11127–11137 (2021)
Ravi, N., Reizenstein, J., Novotny, D., Gordon, T., Lo, W.Y., Johnson, J., Gkioxari, G.: Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, J., Chen, X., Wang, K., Wei, P., Lin, L. (2024). FIRE: Fine Implicit Reconstruction Enhancement with Detailed Body Part Labels and Geometric Features. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-8432-9_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)