Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views

Authors

  • Zixin Zou BNRist, Tsinghua University
  • Weihao Cheng ARC Lab, Tencent PCG
  • Yan-Pei Cao ARC Lab, Tencent PCG
  • Shi-Sheng Huang Beijing Normal University
  • Ying Shan ARC Lab, Tencent PCG
  • Song-Hai Zhang BNRist, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i7.28626

Keywords:

CV: 3D Computer Vision, CV: Learning & Optimization for CV

Abstract

Reconstructing 3D objects from extremely sparse views is a long-standing and challenging problem. While recent techniques employ image diffusion models for generating plausible images at novel viewpoints or for distilling pre-trained diffusion priors into 3D representations using score distillation sampling (SDS), these methods often struggle to simultaneously achieve high-quality, consistent, and detailed results for both novel-view synthesis (NVS) and geometry. In this work, we present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. Specifically, we employ a controller that harnesses epipolar features from input views, guiding a pre-trained diffusion model, such as Stable Diffusion, to produce novel-view images that maintain 3D consistency with the input. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results, even when faced with open-world objects. To address the blurriness introduced by conventional SDS, we introduce the category-score distillation sampling (C-SDS) to enhance detail. We conduct experiments on CO3DV2 which is a multi-view dataset of real-world objects. Both quantitative and qualitative evaluations demonstrate that our approach outperforms previous state-of-the-art works on the metrics regarding NVS and geometry reconstruction.

Published

2024-03-24

How to Cite

Zou, Z., Cheng, W., Cao, Y.-P., Huang, S.-S., Shan, Y., & Zhang, S.-H. . (2024). Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7900-7908. https://doi.org/10.1609/aaai.v38i7.28626

Issue

Section

AAAI Technical Track on Computer Vision VI