Abstract
3D reconstruction from monocular endoscopic images is a challenging task. State-of-the-art multi-view stereo (MVS) algorithms based on image patch similarity often fail to obtain a dense reconstruction from weakly-textured endoscopic images. In this paper, we present a novel deep-learning-based MVS algorithm that can produce a dense and accurate 3D reconstruction from a monocular endoscopic image sequence. Our method consists of three key steps. Firstly, a number of depth candidates are sampled around the depth prediction made by a pre-trained CNN. Secondly, each candidate is projected to the other images in the sequence, and the matching score is measured using a patch embedding network that maps each image patch into a compact embedding. Finally, the candidate with the highest score is selected for each pixel. Experiments on colonoscopy videos demonstrate that our patch embedding network outperforms zero-normalized cross-correlation and a state-of-the-art stereo matching network in terms of matching accuracy and that our MVS algorithm produces several degrees of magnitude denser reconstruction than the competing methods when same accuracy filtering is applied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. Bmvc 11, 1–11 (2011)
Chen, G., Pham, M., Redarce, T.: Sensor-based guidance control of a continuum robot for a semi-autonomous colonoscopy. Robot. Auton. Syst. 57(6), 712–722 (2009)
Hou, Y., Dupont, E., Redarce, T., Lamarque, F.: A compact active stereovision system with dynamic reconfiguration for endoscopy or colonoscopy applications. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8673, pp. 448–455. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10404-1_56
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, X., et al.: Self-supervised learning for dense depth estimation in monocular endoscopy. In: Stoyanov, D., et al. (eds.) CARE/CLIP/OR 2.0/ISIC -2018. LNCS, vol. 11041, pp. 128–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01201-4_15
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Ma, R., Wang, R., Pizer, S., Rosenman, J., McGill, S.K., Frahm, J.-M.: Real-time 3D reconstruction of colonoscopic surfaces for determining missing regions. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 573–582. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_64
Mapillary: Opensfm (2017). https://github.com/mapillary/OpenSfM
Parot, V., et al.: Photometric stereo endoscopy. J. Biomed. Opt. 18(7), 076017 (2013)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8026–8037 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Schmalz, C., Forster, F., Schick, A., Angelopoulou, E.: An endoscopic 3D scanner based on structured light. Med. Image Anal. 16(5), 1063–1072 (2012)
Ullman, S.: The interpretation of structure from motion. Proc. Roy. Soc. London 203(1153), 405–426 (1979). https://doi.org/10.1098/rspb.1979.0006
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)
Žbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bae, G., Budvytis, I., Yeung, CK., Cipolla, R. (2020). Deep Multi-view Stereo for Dense 3D Reconstruction from Monocular Endoscopic Video. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_74
Download citation
DOI: https://doi.org/10.1007/978-3-030-59716-0_74
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59715-3
Online ISBN: 978-3-030-59716-0
eBook Packages: Computer ScienceComputer Science (R0)