Learning Deformable Hypothesis Sampling for Accurate PatchMatch Multi-View Stereo

Authors

  • Hongjie Li The State Key Lab. LIESMARS, Wuhan University
  • Yao Guo The State Key Lab. LIESMARS, Wuhan University
  • Xianwei Zheng The State Key Lab. LIESMARS, Wuhan University
  • Hanjiang Xiong The State Key Lab. LIESMARS, Wuhan University

DOI:

https://doi.org/10.1609/aaai.v38i4.28091

Keywords:

CV: 3D Computer Vision, CV: Computational Photography, Image & Video Synthesis

Abstract

This paper introduces a learnable Deformable Hypothesis Sampler (DeformSampler) to address the challenging issue of noisy depth estimation in faithful PatchMatch multi-view stereo (MVS). We observe that the heuristic depth hypothesis sampling modes employed by PatchMatch MVS solvers are insensitive to (i) the piece-wise smooth distribution of depths across the object surface and (ii) the implicit multi-modal distribution of depth prediction probabilities along the ray direction on the surface points. Accordingly, we develop DeformSampler to learn distribution-sensitive sample spaces to (i) propagate depths consistent with the scene's geometry across the object surface and (ii) fit a Laplace Mixture model that approaches the point-wise probabilities distribution of the actual depths along the ray direction. We integrate DeformSampler into a learnable PatchMatch MVS system to enhance depth estimation in challenging areas, such as piece-wise discontinuous surface boundaries and weakly-textured regions. Experimental results on DTU and Tanks & Temples datasets demonstrate its superior performance and generalization capabilities compared to state-of-the-art competitors. Code is available at https://github.com/Geo-Tell/DS-PMNet.

Published

2024-03-24

How to Cite

Li, H., Guo, Y., Zheng, X., & Xiong, H. (2024). Learning Deformable Hypothesis Sampling for Accurate PatchMatch Multi-View Stereo. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3082-3090. https://doi.org/10.1609/aaai.v38i4.28091

Issue

Section

AAAI Technical Track on Computer Vision III