Deep Multi-view Stereo for Dense 3D Reconstruction from Monocular Endoscopic Video

Bae, Gwangbin; Budvytis, Ignas; Yeung, Chung-Kwong; Cipolla, Roberto

doi:10.1007/978-3-030-59716-0_74

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12263))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7487 Accesses
13 Citations

Abstract

3D reconstruction from monocular endoscopic images is a challenging task. State-of-the-art multi-view stereo (MVS) algorithms based on image patch similarity often fail to obtain a dense reconstruction from weakly-textured endoscopic images. In this paper, we present a novel deep-learning-based MVS algorithm that can produce a dense and accurate 3D reconstruction from a monocular endoscopic image sequence. Our method consists of three key steps. Firstly, a number of depth candidates are sampled around the depth prediction made by a pre-trained CNN. Secondly, each candidate is projected to the other images in the sequence, and the matching score is measured using a patch embedding network that maps each image patch into a compact embedding. Finally, the candidate with the highest score is selected for each pixel. Experiments on colonoscopy videos demonstrate that our patch embedding network outperforms zero-normalized cross-correlation and a state-of-the-art stereo matching network in terms of matching accuracy and that our MVS algorithm produces several degrees of magnitude denser reconstruction than the competing methods when same accuracy filtering is applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. Bmvc 11, 1–11 (2011)
Google Scholar
Chen, G., Pham, M., Redarce, T.: Sensor-based guidance control of a continuum robot for a semi-autonomous colonoscopy. Robot. Auton. Syst. 57(6), 712–722 (2009)
Article Google Scholar
Hou, Y., Dupont, E., Redarce, T., Lamarque, F.: A compact active stereovision system with dynamic reconfiguration for endoscopy or colonoscopy applications. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8673, pp. 448–455. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10404-1_56
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, X., et al.: Self-supervised learning for dense depth estimation in monocular endoscopy. In: Stoyanov, D., et al. (eds.) CARE/CLIP/OR 2.0/ISIC -2018. LNCS, vol. 11041, pp. 128–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01201-4_15
Chapter Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Ma, R., Wang, R., Pizer, S., Rosenman, J., McGill, S.K., Frahm, J.-M.: Real-time 3D reconstruction of colonoscopic surfaces for determining missing regions. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 573–582. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_64
Chapter Google Scholar
Mapillary: Opensfm (2017). https://github.com/mapillary/OpenSfM
Parot, V., et al.: Photometric stereo endoscopy. J. Biomed. Opt. 18(7), 076017 (2013)
Article Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Schmalz, C., Forster, F., Schick, A., Angelopoulou, E.: An endoscopic 3D scanner based on structured light. Med. Image Anal. 16(5), 1063–1072 (2012)
Article Google Scholar
Ullman, S.: The interpretation of structure from motion. Proc. Roy. Soc. London 203(1153), 405–426 (1979). https://doi.org/10.1098/rspb.1979.0006
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)
Google Scholar
Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Engineering, University of Cambridge, Cambridge, UK
Gwangbin Bae, Ignas Budvytis & Roberto Cipolla
Bio-Medical Engineering (HK) Limited, Hong Kong, China
Chung-Kwong Yeung

Authors

Gwangbin Bae
View author publications
You can also search for this author in PubMed Google Scholar
Ignas Budvytis
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Kwong Yeung
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Cipolla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gwangbin Bae .

Editor information

Editors and Affiliations

University of Toronto, Toronto, ON, Canada
Anne L. Martel
The University of British Columbia, Vancouver, BC, Canada
Purang Abolmaesumi
University College London, London, UK
Danail Stoyanov
École Centrale de Nantes, Nantes, France
Diana Mateus
EURECOM, Biot, France
Maria A. Zuluaga
Chinese Academy of Sciences, Beijing, China
S. Kevin Zhou
Sorbonne University, Paris, France
Daniel Racoceanu
The Hebrew University of Jerusalem, Jerusalem, Israel
Leo Joskowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bae, G., Budvytis, I., Yeung, CK., Cipolla, R. (2020). Deep Multi-view Stereo for Dense 3D Reconstruction from Monocular Endoscopic Video. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_74

Download citation

DOI: https://doi.org/10.1007/978-3-030-59716-0_74
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59715-3
Online ISBN: 978-3-030-59716-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)