Skip to main content
Log in

Triaxial Squeeze Attention Module and Mutual-Exclusion Loss Based Unsupervised Monocular Depth Estimation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Monocular depth estimation plays a crucial role in scene perception and 3D reconstruction. Supervised learning based depth estimation needs vast amounts of ground-truth depth data for training, which seriously restricts its generalization. In recent years, the unsupervised learning methods without LiDAR points cloud have attracted more and more attention. In this paper, an unsupervised monocular depth estimation method using stereo pairs for training is designed. We present a triaxial squeeze attention module and introduce it into our unsupervised framework to augment the representations of the depth map in detail. We also propose a novel training loss that enforces mutual-exclusion in image reconstruction to improve the performance and robustness in unsupervised learning. Experimental results on KITTI show that our method not only outperforms existing unsupervised methods but also achieves better results comparable with several supervised approaches trained with ground-truth data. The improvements in our method can better preserve the details of the depth map and allow the shape of objects to be maintained more smoothly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Geng M, Shang S, Ding B, Wang H, Zhang P (2020) Unsupervised learning-based depth estimation-aided visual slam approach. Circ Syst Signal Process 39(2):543–570. https://doi.org/10.1007/s00034-019-01173-3

    Article  Google Scholar 

  2. Lee SJ, Choi H, Hwang SS (2020) Real-time depth estimation using recurrent CNN with sparse depth cues for SLAM system. Int J Control Autom 18(1):206–216. https://doi.org/10.1007/s12555-019-0350-8

    Article  Google Scholar 

  3. Wang Y, Chao W, Garg D, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR 2019). pp 8437–8445. https://doi.org/10.1109/CVPR.2019.00864

  4. Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: 2015 IEEE international conference on computer vision (ICCV). pp 2722–2730. https://doi.org/10.1109/ICCV.2015.312

  5. Jin Y, Lee M (2019) Enhancing binocular depth estimation based on proactive perception and action cyclic learning for an autonomous developmental robot. IEEE Trans Syst Man Cybern Ssyat 49(1):169–180. https://doi.org/10.1109/TSMC.2017.2779474

    Article  Google Scholar 

  6. Ding Y, Lin L, Wang L, Zhang M, Li D (2020) Digging into the multi-scale structure for a more refined depth map and 3D reconstruction. Neural Comput Appl 32(15):11217–11228. https://doi.org/10.1007/s00521-020-04702-3

    Article  Google Scholar 

  7. Wang B, Feng Y, Liu H (2018) Multi-scale features fusion from sparse LiDAR data and single image for depth completion. Electron Lett 54(24):1375–1376. https://doi.org/10.1049/el.2018.6149

    Article  Google Scholar 

  8. Willis AR, Papadakis J, Brink KM (2017) Linear depth reconstruction for RGBD sensors, Southeastcon 2017

  9. Guo Y, Chen T (2018) Semantic segmentation of RGBD images based on deep depth regression. Pattern Recogn Lett 109:55–64. https://doi.org/10.1016/j.patrec.2017.08.026

    Article  Google Scholar 

  10. Wang Y, Gao Y, Achim A, Dahnoun N (2014) Robust obstacle detection based on a novel disparity calculation method and G-disparity. Comput Vis Image Underst 123:23–40. https://doi.org/10.1016/j.cviu.2014.02.014

    Article  Google Scholar 

  11. Zhou C, Liu Y, Sun Q, Lasang P (2021) Vehicle detection and disparity estimation using blended stereo images. IEEE Trans Intell Veh 6(4):690–698. https://doi.org/10.1109/TIV.2020.3049008

    Article  Google Scholar 

  12. Wu G, Li Y, Huang Y, Liu Y (2019) Joint view synthesis and disparity refinement for stereo matching. Front Comput Sci 13(6):1337–1352. https://doi.org/10.1007/s11704-018-8099-4

    Article  Google Scholar 

  13. Hu P, Yang S, Zhang G, Deng H (2021) High-speed and accurate 3D shape measurement using DIC-assisted phase matching and triple-scanning. Opt Lasers Eng. https://doi.org/10.1016/j.optlaseng.2021.106725

    Article  Google Scholar 

  14. Bao Z, Li B, Zhang W (2019) Robustness of ToF and stereo fusion for high-accuracy depth map. IET Comput Vis 13(7):676–681. https://doi.org/10.1049/iet-cvi.2018.5476

    Article  Google Scholar 

  15. Wang C et al (2021) Self-supervised multiscale adversarial regression network for stereo disparity estimation. IEEE T Cybern 51(10):4770–4783. https://doi.org/10.1109/TCYB.2020.2999492

    Article  Google Scholar 

  16. Dong Q, Feng J (2018) Adaptive disparity computation using local and non-local cost aggregations. Multimed Tools Appl 77(24):31647–31663. https://doi.org/10.1007/s11042-018-6236-6

    Article  Google Scholar 

  17. Wang C, Miguel Buenaposada J, Zhu R, Lucey S (2018) Learning depth from monocular videos using direct methods. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 2022–2030. https://doi.org/10.1109/CVPR.2018.00216

  18. Mayer N et al (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). pp 4040–4048. https://doi.org/10.1109/CVPR.2016.438

  19. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Proceedings of 2016 fourth international conference on 3D vision (3DV). pp 239–248. https://doi.org/10.1109/3DV.2016.32

  20. Saxena A, Sun M, Ng AY (2009) Make3D: learning 3D scene structure from a single still image. IEEE Trans Pattern Anal 31(5):824–840. https://doi.org/10.1109/TPAMI.2008.132

    Article  Google Scholar 

  21. Wu D, Luo X, Shang M, He Y, Wang G, Zhou M (2021) A deep latent factor model for high-dimensional and sparse matrices in recommender systems. IEEE Trans Syst Man Cybern Syst 51(7):4285–4296. https://doi.org/10.1109/TSMC.2019.2931393

    Article  Google Scholar 

  22. Luo X, Zhou M, Li S, Wu D, Liu Z, Shang M (2021) Algorithms of unconstrained non-negative latent factor analysis for recommender systems. IEEE Trans Big Data 7(1):227–240. https://doi.org/10.1109/TBDATA.2019.2916868

    Article  Google Scholar 

  23. Tan N, Zhong Z, Yu P, Li Z, Ni F (2022) A discrete model-free scheme for fault tolerant tracking control of redundant manipulators. IEEE Trans Ind Inform. https://doi.org/10.1109/TII.2022.3149919

    Article  Google Scholar 

  24. Ye X, Ji X, Sun B, Chen S, Wang Z, Li H (2020) DRM-SLAM: towards dense reconstruction of monocular SLAM with scene depth fusion. Neurocomputing 396:76–91. https://doi.org/10.1016/j.neucom.2020.02.044

    Article  Google Scholar 

  25. Luo H, Gao Y, Wu Y, Liao C, Yang X, Cheng K (2019) Real-time dense monocular SLAM with online adapted depth prediction network. IEEE Trans Multimed 21(2):470–483. https://doi.org/10.1109/TMM.2018.2859034

    Article  Google Scholar 

  26. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Adv Neural Inf Process Syst 27

  27. Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). pp 5162–5170

  28. Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 2002–2011. https://doi.org/10.1109/CVPR.2018.00214

  29. Ma F, Karaman S (2018) Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 4796–4803

  30. Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017) Sparsity invariant CNNs. In: Proceedings 2017 international conference on 3D vision (3DV), pp 11–20, https://doi.org/10.1109/3DV.2017.00012

  31. Garg R, VijayKumar BG, Carneiro G, Reid I (2016) Unsupervised CNN for single view depth estimation: geometry to the rescue. Comput Vis - ECCV 2016 PT VIII 9912:740–756. https://doi.org/10.1007/978-3-319-46484-8_45

    Article  Google Scholar 

  32. Yu JJ, Harley AW, Derpanis KG (2016) Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. Comput Vis- ECCV 2016 Workshops Pt III 9915:3–10. https://doi.org/10.1007/978-3-319-49409-8_1

    Article  MathSciNet  Google Scholar 

  33. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: 30th IEEE conference on computer vision and pattern recognition (CVPR 2017), pp 6602–6611. https://doi.org/10.1109/CVPR.2017.699

  34. Feng X, Fang B (2021) Algorithm for epipolar geometry and correcting monocular stereo vision based on a plane mirror. Optik. https://doi.org/10.1016/j.ijleo.2020.165890

    Article  Google Scholar 

  35. Chen J, Yang X, Jia Q, Liao C (2021) DENAO: monocular depth estimation network with auxiliary optical flow. IEEE Trans Pattern Anal 43(8):2598–2610. https://doi.org/10.1109/TPAMI.2020.2977021

    Article  Google Scholar 

  36. Gomaa MAK, de Silva O, Mann GKI, Gosine RG (2021) Observability-constrained VINS for MAVs using interacting multiple model algorithm. IEEE Trans Aerosp Electron Syst 57(3):1423–1442. https://doi.org/10.1109/TAES.2020.3043534

    Article  Google Scholar 

  37. Dai R et al (2019) Unsupervised learning of depth estimation based on attention model and global pose optimization. Signal Process-Image Commnun 78:284–292. https://doi.org/10.1016/j.image.2019.07.007

    Article  Google Scholar 

  38. Song X et al (2021) MLDA-Net: multi-level dual attention-based network for self-supervised monocular depth estimation. IEEE Trans Image Process 30:4691–4705. https://doi.org/10.1109/TIP.2021.3074306

    Article  Google Scholar 

  39. Xu X, Chen Z, Yin F (2021) Multi-scale spatial attention-guided monocular depth estimation with semantic enhancement. IEEE Trans Image Process 30:8811–8822. https://doi.org/10.1109/TIP.2021.3120670

    Article  Google Scholar 

  40. Yang D, Zhong X, Gu D, Peng X, Hu H (2020) Unsupervised framework for depth estimation and camera motion prediction from video. Neurocomputing 385:169–185. https://doi.org/10.1016/j.neucom.2019.12.049

    Article  Google Scholar 

  41. Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: 30th IEEE conference on computer vision and pattern recognition (CVPR 2017), pp 6612+. https://doi.org/10.1109/CVPR.2017.700

  42. Pilzer A, Xu D, Puscas MM, Ricci E, Sebe N (2018) Unsupervised adversarial depth estimation using cycled generative networks. In: 2018 international conference on 3D vision (3DV), pp 587–595. https://doi.org/10.1109/3DV.2018.00073

  43. Ji Z, Song X, Song H, Yang H, Guo X (2021) RDRF-Net: a pyramid architecture network with residual-based dynamic receptive fields for unsupervised depth estimation. Neurocomputing 457:1–12. https://doi.org/10.1016/j.neucom.2021.05.089

    Article  Google Scholar 

  44. Zhao S, Fu H, Gong M, Tao D (2019) Geometry-aware symmetric domain adaptation for monocular depth estimation. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR 2019), pp 9780–9790. https://doi.org/10.1109/CVPR.2019.01002

  45. Chen Y, Zhao H, Hu Z, Peng J (2021) Attention-based context aggregation network for monocular depth estimation. Int J Mach Learn Cybern 12(6):1583–1596. https://doi.org/10.1007/s13042-020-01251-y

    Article  Google Scholar 

  46. Song M, Lim S, Kim W (2021) Monocular depth estimation using laplacian pyramid-based depth residuals. IEEE Trans Circ Syst Video 31(11):4381–4393. https://doi.org/10.1109/TCSVT.2021.3049869

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Grant Nos. 41774027, 41904022) and the Fundamental Research Funds for the Central Universities (2242020R40135).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuguo Pan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, J., Pan, S., Gao, W. et al. Triaxial Squeeze Attention Module and Mutual-Exclusion Loss Based Unsupervised Monocular Depth Estimation. Neural Process Lett 54, 4375–4390 (2022). https://doi.org/10.1007/s11063-022-10812-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10812-x

Keywords

Navigation