Depth map prediction from a single image with generative adversarial nets

Zhang, Shaoyong; Li, Na; Qiu, Chenchen; Yu, Zhibin; Zheng, Haiyong; Zheng, Bing

doi:10.1007/s11042-018-6694-x

Depth map prediction from a single image with generative adversarial nets

Published: 25 September 2018

Volume 79, pages 14357–14374, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shaoyong Zhang¹,
Na Li¹,
Chenchen Qiu¹,
Zhibin Yu ORCID: orcid.org/0000-0003-4372-1767¹,
Haiyong Zheng¹ &
…
Bing Zheng¹

970 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

A depth map is a fundamental component of 3D construction. Depth map prediction from a single image is a challenging task in computer vision. In this paper, we consider the depth prediction as an image-to-image task and propose an adversarial convolutional architecture called the Depth Generative Adversarial Network (DepthGAN) for depth prediction. To enhance the image translation ability, we take advantage of a Fully Convolutional Residual Network (FCRN) and combine it with a generative adversarial network, which has shown remarkable achievements in image-to-image tasks. We also present a new loss function including the scale-invariant (SI) error and the structural similarity (SSIM) loss function to improve our model and to output a high-quality depth map. Experiments show that the DepthGAN performs better in monocular depth prediction than the current best method on the NYU Depth v2 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generative Adversarial Networks for Unsupervised Monocular Depth Prediction

Depth Prediction from Monocular Images with CGAN

Deep-plane sweep generative adversarial network for consistent multi-view depth estimation

Article 16 November 2021

References

Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875
Brock A, Lim T, Ritchie JM, Weston N (2016) Neural photo editing with introspective adversarial networks. arXiv:1609.07093
Cao Y, Xia Y, Wang Z (2010) A close-form iterative algorithm for depth inferring from a single image. In: European Conference on computer vision. Springer, pp 729–742
Cao Y, Wu Z, Shen C (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in neural information processing systems, pp 2172–2180
Cherian A, Morellas V, Papanikolopoulos N (2009) Accurate 3d ground plane estimation from a single image. In: IEEE International conference on robotics and automation, 2009. ICRA’09. IEEE, pp 2243–2249
Clayden K (2012) Personality, motivation and level of involvement of land-based recreationists in the Irish uplands. Ph.D. thesis, Waterford Institute of Technology
Dong H, Yu S, Wu C, Guo Y (2017) Semantic image synthesis via adversarial learning. arXiv:1707.06873
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2658
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366–2374
Endres F, Hess J, Sturm J, Cremers D, Burgard W (2014) 3-d mapping with an rgb-d camera. IEEE Trans Robot 30(1):177–187
Article Google Scholar
Fan X, Zheng K, Lin Y, Wang S (2015) Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. arXiv:1504.07159
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of wasserstein gans. In: Advances in neural information processing systems, pp 5769–5779
Harman PV, Flack J, Fox S, Dowley M (2002) Rapid 2d-to-3d conversion. In: Stereoscopic displays and virtual reality systems IX, vol 4660. International Society for Optics and Photonics, pp 78–87
He K, Sun J, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoiem D, Efros AA, Hebert M (2008) Putting objects in perspective. Int J Comput Vis 80(1):3–15
Article Google Scholar
Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks arXiv preprint
Jung JI, Ho YS (2010) Depth map estimation from single-view image using object classification based on bayesian learning. In: 3DTV-conference: the true vision-capture, transmission and display of 3D video (3DTV-CON), 2010. IEEE, pp 1–4
Kaneko T, Hiramatsu K, Kashino K (2017) Generative attribute controller with conditional filtered generative adversarial networks. In: IEEE Conference on computer vision and pattern recognition (CVPR), vol 2
Karacan L, Akata Z, Erdem A, Erdem E (2016) Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv:1612.00215
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International conference on 3D vision (3DV). IEEE, pp 239–248
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2016) Photo-realistic single image super-resolution using a generative adversarial network. ArXiv preprint
Li Y, Lu H, Li J, Li X, Li Y, Serikawa S (2016) Underwater image de-scattering and classification by deep neural network. Comput Electric Eng 54:68–77
Article Google Scholar
Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338
Article Google Scholar
Liu B, Gould S, Koller D (2010) Single image depth estimation from predicted semantic labels. In: 2010 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 1253–1260
Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lu H, Li B, Zhu J, Li Y, Li Y, Xu X, He L, Li X, Li J, Serikawa S (2017) Wound intensity correction and segmentation with convolutional neural networks. Concurr Comput Pract Exper, 29(6)
Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet of Things Journal
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375
Article Google Scholar
Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Generation Computer Systems
Mao X, Li Q, Xie H, Lau RY, Wang Z, Smolley SP (2017) Least squares generative adversarial networks. In: 2017 IEEE International conference on computer vision (ICCV). IEEE, pp 2813–2821
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434
Roy A, Todorovic S (2016) Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5506–5514
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Saxena A, Chung SH, Ng AY (2008) 3-d depth reconstruction from a single still image. Int J Comput Vis 76(1):53–69
Article Google Scholar
Saxena A, Sun M, Ng AY (2009) Make3d: learning 3d scene structure from a single still image. IEEE Trans Pattern Anal Mach Intell 31(5):824–840
Article Google Scholar
Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electric Eng 40(1):41–50
Article Google Scholar
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Article Google Scholar
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European Conference on computer vision. Springer, pp 746–760
Sønderby CK, Caballero J, Theis L, Shi W, Huszár F (2016) Amortised map inference for image super-resolution. arXiv:1610.04490
Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Towards unified depth and semantic prediction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2800–2809
Wang Q, Li S, Qin H, Hao A (2016) Super-resolution of multi-observed rgb-d images based on nonlocal regression and total variation. IEEE Trans Image Process 25 (3):1425–1440
Article MathSciNet Google Scholar
Xu X, He L, Lu H, Gao L, Ji Y (2018) Deep adversarial metric learning for cross-modal retrieval. World Wide Web, 1–16
Yang W, Zhou Q, Fan Y, Gao G, Wu S, Ou W, Lu H, Cheng J, Latecki LJ (2017) Deep context convolutional neural networks for semantic segmentation. In: CCF Chinese conference on computer vision. Springer, pp 696–704
Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In: AAAI, pp 2852–2858
Zhao W, Zhao F, Wang D, Lu H (2018) Defocus blur detection via multi-stream bottom-top-bottom fully convolutional network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3080–3088
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International conference on computer vision

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Number 61701463, the National PostDoctoral Foundation of China under Grant Number 2017M622277, the Fundamental Research Funds for the Central Universities under Grant Number 201713019, the Natural Science Foundation of Shandong Province of China under Grant Number ZR2017BF011 and the Qingdao Postdoctoral Science Foundation of China. We gratefully acknowledge the support of the NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

Author information

Authors and Affiliations

College of Information Science and Engineering, Ocean University of China, Qingdao, China
Shaoyong Zhang, Na Li, Chenchen Qiu, Zhibin Yu, Haiyong Zheng & Bing Zheng

Authors

Shaoyong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Na Li
View author publications
You can also search for this author in PubMed Google Scholar
Chenchen Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Haiyong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Bing Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhibin Yu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, S., Li, N., Qiu, C. et al. Depth map prediction from a single image with generative adversarial nets. Multimed Tools Appl 79, 14357–14374 (2020). https://doi.org/10.1007/s11042-018-6694-x

Download citation

Received: 08 June 2018
Revised: 23 July 2018
Accepted: 13 September 2018
Published: 25 September 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11042-018-6694-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Depth map prediction from a single image with generative adversarial nets

Abstract

Access this article

Similar content being viewed by others

Generative Adversarial Networks for Unsupervised Monocular Depth Prediction

Depth Prediction from Monocular Images with CGAN

Deep-plane sweep generative adversarial network for consistent multi-view depth estimation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Depth map prediction from a single image with generative adversarial nets

Abstract

Access this article

Similar content being viewed by others

Generative Adversarial Networks for Unsupervised Monocular Depth Prediction

Depth Prediction from Monocular Images with CGAN

Deep-plane sweep generative adversarial network for consistent multi-view depth estimation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation