SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images

Authors

  • Weihao Cheng Tencent
  • Yan-Pei Cao Tencent
  • Ying Shan Tencent

DOI:

https://doi.org/10.1609/aaai.v38i2.27894

Keywords:

CV: 3D Computer Vision, CV: Applications

Abstract

We study to generate novel views of indoor scenes given sparse input views. The challenge is to achieve both photorealism and view consistency. We present SparseGNV: a learning framework that incorporates 3D structures and image generative models to generate novel views with three modules. The first module builds a neural point cloud as underlying geometry, providing scene context and guidance for the target novel view. The second module utilizes a transformer-based network to map the scene context and the guidance into a shared latent space and autoregressively decodes the target view in the form of discrete image tokens. The third module reconstructs the tokens back to the image of the target view. SparseGNV is trained across a large-scale indoor scene dataset to learn generalizable priors. Once trained, it can efficiently generate novel views of an unseen indoor scene in a feed-forward manner. We evaluate SparseGNV on real-world indoor scenes and demonstrate that it outperforms state-of-the-art methods based on either neural radiance fields or conditional image generation.

Published

2024-03-24

How to Cite

Cheng, W., Cao, Y.-P., & Shan, Y. (2024). SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1308-1316. https://doi.org/10.1609/aaai.v38i2.27894

Issue

Section

AAAI Technical Track on Computer Vision I