Joint self-supervised learning and adversarial adaptation for monocular depth estimation from thermal image

Shin, Ukcheol; Park, Kwanyong; Lee, Kyunghyun; Lee, Byeong-Uk; Kweon, In So

doi:10.1007/s00138-023-01404-3

Joint self-supervised learning and adversarial adaptation for monocular depth estimation from thermal image

Original Paper
Published: 31 May 2023

Volume 34, article number 55, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Ukcheol Shin ORCID: orcid.org/0000-0001-8363-9886¹,
Kwanyong Park¹,
Kyunghyun Lee¹,
Byeong-Uk Lee¹ &
…
In So Kweon¹

344 Accesses
2 Altmetric
Explore all metrics

Abstract

Depth estimation from thermal images is one potential solution to achieve reliability and robustness against diverse weather, lighting, and environmental conditions. Also, a self-supervised training method further boosts its scalability to various scenarios, which are usually impossible to collect ground-truth labels, such as GPS-denied and LiDAR-denied conditions. However, self-supervision from thermal images is usually insufficient to train networks due to the thermal image properties, such as low-contrast and textureless properties. Introducing additional self-supervision sources (e.g., RGB images) also introduces further hardware and software constraints, such as complicated multi-sensor calibration and synchronized data acquisition. Therefore, this manuscript proposes a novel training framework combining self-supervised learning and adversarial feature adaptation to leverage additional modality information without such constraints. The framework aims to train a network that estimates a monocular depth map from a thermal image in a self-supervised manner. In the training stage, the framework utilizes two self-supervisions; image reconstruction of unpaired RGB-thermal images and adversarial feature adaptation between unpaired RGB-thermal features. Based on the proposed method, the trained network achieves state-of-the-art quantitative results and edge-preserved depth estimation results compared to previous methods. Our source code is available at www.github.com/ukcheolshin/SelfDepth4Thermal

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DarkMDE: Excavating Synthetic Images for Nighttime Depth Estimation Using Cross-Domain Supervision

Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data

Article Open access 16 December 2022

References

Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)
Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M.J.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12240–12249 (2019)
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)
Bian, J.-W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., Cheng, M.-M., Reid, I.: Unsupervised scale-consistent depth learning from video. Int. J. Comput. Vis. 129(9), 2548–2564 (2021)
Article Google Scholar
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: CVPR (2020)
Watson, J., Aodha, O.M., Prisacariu, V., Brostow, G., Firman, M.: The temporal opportunist: self-supervised multi-frame monocular depth. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1164–1174 (2021)
Gonzalez, J.L., Kim, M.: Plade-net: towards pixel-level accuracy for self-supervised single-view depth estimation with neural positional encoding and distilled matting loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6851–6860 (2021)
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011 (2018)
Bhat, S.F., Alhashim, I., Wonka, P.: Adabins: depth estimation using adaptive bins. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4009–4018 (2021)
Kim, N., Choi, Y., Hwang, S., Kweon, I.S.: Multispectral transfer network: unsupervised depth estimation for all-day vision. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Lu, Y., Lu, G.: An alternative of lidar in nighttime: unsupervised depth estimation based on single thermal image. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3833–3843 (2021)
Shin, U., Lee, K., Lee, S., Kweon, I.S.: Self-supervised depth and ego-motion estimation for monocular thermal video using multi-spectral consistency loss. IEEE Robot. Autom. Lett. 7, 1103–1110 (2021)
Article Google Scholar
Beier, K., Gemperlein, H.: Simulation of infrared detection range at fog conditions for enhanced vision systems in civil aviation. Aerosp. Sci. Technol. 8(1), 63–71 (2004)
Article Google Scholar
Sun, Y., Zuo, W., Liu, M.: Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robot. Autom. Lett. 4(3), 2576–2583 (2019)
Article Google Scholar
Zhang, Q., Zhao, S., Luo, Y., Zhang, D., Huang, N., Han, J.: Abmdrnet: adaptive-weighted bi-directional modality difference reduction network for rgb-t semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2633–2642 (2021)
Kim, Y.-H., Shin, U., Park, J., Kweon, I.S.: Ms-uda: multi-spectral unsupervised domain adaptation for thermal image semantic segmentation. IEEE Robot. Autom. Lett. 6(4), 6497–6504 (2021)
Article Google Scholar
Liu, H., Zhang, J., Yang, K., Hu, X., Stiefelhagen, R.: Cmx: cross-modal fusion for rgb-x semantic segmentation with transformers. arXiv preprint arXiv:2203.04838 (2022)
Shin, U., Lee, K., Lee, B.-U., Kweon, I.S.: Maximizing self-supervision from thermal image for effective self-supervised learning of depth and ego-motion. IEEE Robot. Autom. Lett. 7(3), 7771–7778 (2022)
Article Google Scholar
Choi, Y., Kim, N., Hwang, S., Park, K., Yoon, J.S., An, K., Kweon, I.S.: Kaist multi-spectral day/night data set for autonomous and assisted driving. IEEE Trans. Intell. Transp. Syst. 19(3), 934–948 (2018)
Article Google Scholar
Nagase, Y., Kushida, T., Tanaka, K., Funatomi, T., Mukaigawa, Y.: Shape from thermal radiation: passive ranging using multi-spectral lwir measurements. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12661–12671 (2022)
Yawen, L., Guoyu, L.: Superthermal: matching thermal as visible through thermal feature exploration. IEEE Robot. Autom. Lett. 6(2), 2690–2697 (2021)
Article Google Scholar
Shin, U., Park, K., Lee, B.-U., Lee, K., Kweon, I.S.: Self-supervised monocular depth estimation from thermal images via adversarial multi-spectral adaptation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5798–5807 (2023)
Lee, A.J., Cho, Y., Yoon, S., Shin, Y., Kim, A.: ViViD: vision for visibility dataset. In: ICRA Workshop on Dataset Generation and Benchmarking of SLAM Algorithms for Robotics and VR/AR, Montreal, May. 2019. Best paper award
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189. PMLR (2015)
Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L.: Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3339–3348 (2018)
Tsai, Y.-H., Hung, W.-C., Schulter, S., Sohn, K., Yang, M.-H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 7472–7481 (2018)
Zhang, Y., David, P., Gong, B.: Curriculum domain adaptation for semantic segmentation of urban scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2020–2030 (2017)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2223–2232 (2017)
Chen, Y.-C., Lin, Y.-Y., Yang, M.-H., Huang, J.-B.: Crdoco: pixel-level domain transfer with cross-domain consistency. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1791–1800 (2019)
Park, K., Woo, S., Kim, D., Cho, D., Kweon, I.S.: Preserving semantic and temporal consistency for unpaired video-to-video translation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1248–1257 (2019)
Hoffman, J., Tzeng, E., Park, T., Zhu, J.-Y., Isola, P., Saenko, K., Efros, A., Darrell, T.: Cycada: cycle-consistent adversarial domain adaptation. In: Proceedings of International Conference on Machine Learning (ICML), pp. 1989–1998 (2018)
Gong, R., Li, W., Chen, Y., Van Gool, L.: Dlow: domain flow for adaptation and generalization. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 2477–2486 (2019)
Park, K., Woo, S., Shin, I., Kweon, I.S.: Discover, hallucinate, and adapt: open compound domain adaptation for semantic segmentation. Adv. Neural Inf. Process. Syst. 33, 10869–10880 (2020)
Google Scholar
Li, Y., Yuan, L., Vasconcelos, N.: Bidirectional learning for domain adaptation of semantic segmentation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 6936–6945 (2019)
Pan, F., Shin, I., Rameau, F., Lee, S., Kweon, I.S.: Unsupervised intra-domain adaptation for semantic segmentation through self-supervision. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 3764–3773 (2020)
Lee, T., Lee, B.-U., Shin, I., Choe, J., Shin, U., Kweon, I.S., Yoon, K.-J.: Uda-cope: unsupervised domain adaptation for category-level object pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14891–14900 (2022)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Bian, J., Li, Z., Wang, N., Zhan, H., Shen, C., Cheng, M.-M., Reid, I.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in Neural Information Processing Systems, pp. 35–45 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: European Conference on Computer Vision, pp. 746–760. Springer (2012)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems, pp. 2366–2374 (2014)
Ma, F., Karaman, S.: Sparse-to-dense: depth prediction from sparse depth samples and a single image. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4796–4803. IEEE (2018)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Lee, A.J., Cho, Y., Shin, Y., Kim, A., Myung, H.: Vivid++: vision for visibility dataset. IEEE Robot. Autom. Lett. 7(3), 6282–6289 (2022)
Article Google Scholar
Shivakumar, S.S., Rodrigues, N., Zhou, A., Miller, I.D., Kumar, V., Taylor, C.J.: Pst900: Rgb-thermal calibration, dataset and segmentation network. arXiv preprint arXiv:1909.10980 (2019)

Download references

Acknowledgements

This work was supported by Police-Lab 2.0 Program funded by the Ministry of Science and ICT(MSIT, Korea) and Korean National Police Agency(KNPA, Korea) [Project Name: AI System Development for an Image processing Based on Multi-Band(visible, NIR, LWIR) Fusion Sensing/Project Number: 220122 M0500]

Author information

Authors and Affiliations

Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daehak-ro, Daejeon, 34141, Korea
Ukcheol Shin, Kwanyong Park, Kyunghyun Lee, Byeong-Uk Lee & In So Kweon

Authors

Ukcheol Shin
View author publications
You can also search for this author in PubMed Google Scholar
Kwanyong Park
View author publications
You can also search for this author in PubMed Google Scholar
Kyunghyun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Byeong-Uk Lee
View author publications
You can also search for this author in PubMed Google Scholar
In So Kweon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ukcheol Shin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shin, U., Park, K., Lee, K. et al. Joint self-supervised learning and adversarial adaptation for monocular depth estimation from thermal image. Machine Vision and Applications 34, 55 (2023). https://doi.org/10.1007/s00138-023-01404-3

Download citation

Received: 23 March 2023
Revised: 23 March 2023
Accepted: 20 April 2023
Published: 31 May 2023
DOI: https://doi.org/10.1007/s00138-023-01404-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint self-supervised learning and adversarial adaptation for monocular depth estimation from thermal image

Abstract

Access this article

Similar content being viewed by others

DarkMDE: Excavating Synthetic Images for Nighttime Depth Estimation Using Cross-Domain Supervision

Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data

Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint self-supervised learning and adversarial adaptation for monocular depth estimation from thermal image

Abstract

Access this article

Similar content being viewed by others

DarkMDE: Excavating Synthetic Images for Nighttime Depth Estimation Using Cross-Domain Supervision

Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data

Project to Adapt: Domain Adaptation for Depth Completion from Noisy and Sparse Sensor Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation