An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images

Chen, Yan; Ni, Jianjun; Tang, Guangyi; Cao, Weidong; Yang, Simon X.

doi:10.1007/s11042-023-15845-5

An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images

Published: 22 June 2023

Volume 83, pages 12159–12184, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yan Chen^1,2,
Jianjun Ni ORCID: orcid.org/0000-0002-7130-8331^1,2,
Guangyi Tang^1,2,
Weidong Cao^1,2 &
…
Simon X. Yang³

362 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

3D object detection has received extensive attention from researchers. RGB-D sensors are often used for the information complementary in 3D object detection tasks due to their easy acquisition of aligned point cloud and RGB image data, relatively reasonable prices, and reliable performance. However, how to effectively fuse point cloud data and RGB image data in RGB-D images, and use this cross-modal information to improve the performance of 3D object detection, remains a challenge for further research. To deal with these problems, an improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images is proposed in this paper. First, a dense-to-sparse cross-modal learning module (DCLM) is designed, which reduces information waste in the interaction between 2D dense information and 3D sparse information. Then, an inter-modal attention fusion module (IAFM) is designed, which can retain more meaningful information adaptively in the fusion process for the 2D and 3D features. In addition, an intra-modal attention context aggregation module (IACAM) is designed to aggregate context information in both 2D and 3D modalities, and model the relationship between objects. Finally, the detailed quantitative and qualitative experiments are carried out on the SUN RGB-D dataset, and the results show that the proposed model can obtain state-of-the-art 3D object detection results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://rgbd.cs.princeton.edu/data/.

References

Araki R, Hirakawa T, Yamashita T, Fujiyoshi H (2022) MT-DSSD: multi-task deconvolutional single shot detector for object detection, segmentation, and grasping detection. Advanced Robotics 36(8):373–387. https://doi.org/10.1080/01691864.2022.2043183
Article Google Scholar
Bai, X, Hu, Z, Zhu, X, Huang, Q, Chen, Y, Fu, H, Tai, C-L (2022) Transfusion: Robust lidar-camera fusion for 3D object detection with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR):New Orleans, LA, USA, pp 090–1099. https://doi.org/10.1109/CVPR52688.2022.00116
Chang, J.-R, Chen, Y-S (2018) Pyramid stereo matching network. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Salt Lake City, UT, United States, pp 5410–5418. https://doi.org/10.1109/CVPR.2018.00567
Chen, Z, Huang, S, Tao, D (2018) Context refinement for object detection. In: Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics):vol 11212 LNCS. Munich, Germany, pp 74–89. https://doi.org/10.1007/978-3-030-01237-3_5
Chen, J, Lei, B, Song, Q, Ying, H, Chen, DZ, Wu, J (2020) A hierarchical graph network for 3D object detection on point clouds. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Virtual, Online, United States, pp 389–398. https://doi.org/10.1109/CVPR42600.2020.00047
Chen, Z, Li, Z, Zhang, S, Fang, L, Jiang, Q, Zhao, F (2022) AutoAlignV2: Deformable feature aggregation for dynamic multi-modal 3D object detection. arXiv:2207.10316 https://doi.org/10.48550
Cheng, B, Sheng, L, Shi, S, Yang, M, Xu, D (2021) Back-tracing representative points for voting-based 3D object detection in point clouds. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Virtual, Online, United States, pp 8959–8968. https://doi.org/10.1109/CVPR46437.2021.00885
Dai, A, Chang, AX, Savva, M, Halber, M, Funkhouser, T, Niecner, M (2017) ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017-January. Honolulu, HI, United States, pp 2432–2443. https://doi.org/10.1109/CVPR.2017.261
Ding, M, Huo, Y, Yi, H, Wang, Z, Shi, J, Lu, Z, Luo, P (2020) Learning depth-guided convolutions for monocular 3d object detection. In: Proceedings of the IEEE computer society conference on computer Vision and Pattern Recognition, Virtual, Online, United States, pp 11669–11678. https://doi.org/10.1109/CVPR42600.2020.01169
Engelcke, M, Rao, D, Wang, D.Z, Tong, C.H, Posner, I (2017) Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. In: Proceedings - IEEE international conference on robotics and automation, vol 0. Singapore, Singapore, pp 1355–1361. https://doi.org/10.1109/ICRA.2017.7989161
Fu, H, Gong, M, Wang, C, Batmanghelich, K, Tao, D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Salt Lake City, UT, United States, pp 2002–2011. https://doi.org/10.1109/CVPR.2018.00214
Gao Z, Zhai G, Deng H, Yang X (2020) Extended geometric models for stereoscopic 3D with vertical screen disparity. Displays 65:101972. https://doi.org/10.1016/j.displa.2020.101972
Article Google Scholar
Gupta, S, Arbelaez, P, Girshick, R, Malik, J (2015) Aligning 3D models to RGB-D images of cluttered scenes. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 07-12-June-2015. Boston, MA, United States, pp 4731–4740. https://doi.org/10.1109/CVPR.2015.7299105
Gupta, S, Girshick, R, Arbelaez, P, Malik, J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics):vol 8695 LNCS. Zurich, Switzerland, pp 345–360. https://doi.org/10.1007/978-3-319-10584-0_23
Huang, S, Xie, Y, Zhu, S.-C, Zhu, Y (2021) Spatio-temporal self-supervised representation learning for 3D point clouds. In: Proceedings of the IEEE International Conference on Computer Vision, Virtual, Online, Canada, pp 6515–6525. https://doi.org/10.1109/ICCV48922.2021.00647
Jeon G, Anisetti M, Damiani E, Kantarci B (2020) Artificial intelligence in deep learning algorithms for multimedia analysis. Multimedia Tools and Applications 79(45–46):34129–34139. https://doi.org/10.1007/s11042-020-09232-7
Article Google Scholar
Ji C, Liu G, Zhao D (2022) Monocular 3D object detection via estimation of paired keypoints for autonomous driving. Multimedia Tools and Applications 81(4):5973–5988. https://doi.org/10.1007/s11042-021-11801-3
Article Google Scholar
Keselman, L, Woodfill, JI, Grunnet-Jepsen, A, Bhowmik, A (2017) Intel(R) RealSense(TM) stereoscopic depth cameras. In: IEEE computer society conference on computer vision and pattern recognition workshops, vol 2017-July. Honolulu, HI, United States, pp 1267–1276. https://doi.org/10.1109/CVPRW.2017.167
Ku, J, Mozifian, M, Lee, J, Harakeh, A, Waslander, SL (2018) Joint 3D proposal generation and object detection from view aggregation. In: IEEE International Conference on Intelligent Robots and Systems, Madrid, Spain, pp 5750–5757. https://doi.org/10.1109/IROS.2018.8594049
Lahoud, J, Ghanem, B (2017) 2D-Driven 3D object detection in RGB-D images. In: Proceedings of the IEEE International Conference on Computer Vision, vol 2017-October. Venice, Italy, pp 4632–4640. https://doi.org/10.1109/ICCV.2017.495
Li, B, Ouyang, W, Sheng, L, Zeng, X, Wang, X (2020) GS3D: An efficient 3D object detection framework for autonomous driving. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2019-June. Long Beach, CA, United States, pp 1019–1028. https://doi.org/10.1109/CVPR.2019.00111
Li, Y, Qi, X, Chen, Y, Wang, L, Li, Z, Sun, J, Jia, J (2022) Voxel field fusion for 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR):New Orleans, LA, USA, pp 1120–1129. https://doi.org/10.1109/CVPR52688.2022.00119
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2018) Scale-aware Fast R-CNN for pedestrian detection. IEEE Transactions on Multimedia 20(4):985–996. https://doi.org/10.1109/TMM.2017.2759508
Article Google Scholar
Li Y, Ma L, Tan W, Sun C, Cao D, Li J (2020) GRNet: Geometric relation network for 3D object detection from point clouds. ISPRS Journal of Photogrammetry and Remote Sensing 165:43–53. https://doi.org/10.1016/j.isprsjprs.2020.05.008
Article Google Scholar
Li L, Wan Z, He H (2021) Incomplete multi-view clustering with joint partition and graph learning. IEEE Transactions on Knowledge and Data Engineering 35(1):589–602. https://doi.org/10.1109/TKDE.2021.3082470
Article Google Scholar
Liu, Z, Zhang, Z, Cao, Y, Hu, H, Tong, X (2021) Group-free 3D object detection via transformers. In: Proceedings of the IEEE international conference on computer vision, Virtual, Online, Canada, pp 2929–2938. https://doi.org/10.1109/ICCV48922.2021.00294
Liu B, Wu H, Su W, Zhang W, Sun J (2018) Rotation-invariant object detection using sector-ring HOG and boosted random ferns. Visual Computer 34(5):707–719. https://doi.org/10.1007/s00371-017-1408-3
Article Google Scholar
Lu Y-F, Yu Q, Gao J-W, Li Y, Zou J-C, Qiao H (2022) Cross stage partial connections based weighted bi-directional feature pyramid and enhanced spatial transformation network for robust object detection. Neurocomputing 513:70–82. https://doi.org/10.1016/j.neucom.2022.09.117
Article Google Scholar
Luo, S, Dai, H, Shao, L, Ding, Y (2021) M3DSSD: Monocular 3D single stage object detector. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Virtual, Online, United States, pp 6141–6150. https://doi.org/10.1109/CVPR46437.2021.00608
Luo Q, Ma H, Tang L, Wang Y, Xiong R (2020) 3D-SSD: Learning hierarchical features from RGB-D images for amodal 3D object detection. Neurocomputing 378:364–374. https://doi.org/10.1016/j.neucom.2019.10.025
Article Google Scholar
Misra, I, Girdhar, R, Joulin, A (2021) An end-to-end transformer model for 3D object detection. In: Proceedings of the IEEE international conference on computer vision, Virtual, Online, Canada, pp 2886–2897. https://doi.org/10.1109/ICCV48922.2021.00290
Mousavian, A, Anguelov, D, Koecka, J, Flynn, J (2017) 3D bounding box estimation using deep learning and geometry. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January. Honolulu, HI, United States, pp 5632–5640. https://doi.org/10.1109/CVPR.2017.597
Ni J, Chen Y, Chen Y, Zhu J, Ali D, Cao W (2020) A survey on theories and applications for self-driving cars based on deep learning methods. Applied Sciences-Basel 10(8):2749. https://doi.org/10.3390/app10082749
Article Google Scholar
Ni J, Shen K, Chen Y, Cao W, Yang SX (2022) An improved deep network-based scene classification method for self-driving cars. IEEE Transactions on Instrumentation and Measurement 71:5001614. https://doi.org/10.1109/TIM.2022.3146923
Article Google Scholar
Qi, C.R, Chen, X, Litany, O, Guibas, LJ (2020) ImVoteNet: Boosting 3D object detection in point clouds with image votes. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Virtual, Online, United States, pp 4403–4412. https://doi.org/10.1109/CVPR42600.2020.00446
Qi, C.R, Litany, O, He, K, Guibas, L (2019) Deep hough voting for 3D object detection in point clouds. In: Proceedings of the IEEE international conference on computer vision, vol 2019-October. Seoul, Korea, Republic of, pp 9276–9285. https://doi.org/10.1109/ICCV.2019.00937
Qi, C.R, Liu, W, Wu, C, Su, H, Guibas, LJ (2018) Frustum pointnets for 3D object detection from RGB-D data. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, United States, pp 918–927. https://doi.org/10.1109/CVPR.2018.00102
Qi, C.R, Su, H, Mo, K, Guibas, LJ (2017) PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017-January. Honolulu, HI, United States, pp 77–85. https://doi.org/10.1109/CVPR.2017.16
Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, vol 2017-December. Long Beach, CA, United States, pp 5100–5109
Google Scholar
Rahman MM, Tan Y, Xue J, Lu K (2020) Notice of removal: Recent advances in 3d object detection in the era of deep neural networks: A survey. IEEE Transactions on Image Processing 29:2947–2962. https://doi.org/10.1109/TIP.2019.2955239
Article Google Scholar
Ren Z, Sudderth EB (2020) Clouds of oriented gradients for 3D detection of objects, surfaces, and indoor scene layouts. IEEE Transactions on Pattern Analysis and Machine Intelligence 42(10):2670–2683. https://doi.org/10.1109/TPAMI.2019.2923201
Article Google Scholar
Ren Y, Chen C, Li S, Kuo C-CJ (2018) Context-assisted 3D (C3D) object detection from RGB-D images. Journal of Visual Communication and Image Representation 55:131–141. https://doi.org/10.1016/j.jvcir.2018.05.019
Article Google Scholar
Rosten E, Porter R, Drummond T (2010) Faster and better: A machine learning approach to corner detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(1):105–119. https://doi.org/10.1109/TPAMI.2008.275
Article Google Scholar
Shi, S, Wang, X, Li, H (2019) PointRCNN: 3D object proposal generation and detection from point cloud. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2019-June. Long Beach, CA, United States, pp 770–779. https://doi.org/10.1109/CVPR.2019.00086
Silberman, N, Hoiem, D, Kohli, P, Fergus, R (2012) Indoor segmentation and support inference from RGBD images. In: Lecture notes in computer science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics):vol 7576 LNCS. Florence, Italy, pp 746–760. https://doi.org/10.1007/978-3-642-33715-4_54
Song, S, Lichtenberg, S.P, Xiao, J (2015) SUN RGB-D: A RGB-D scene understanding benchmark suite. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 07-12-June-2015. Boston, MA, United States, pp 567–576. https://doi.org/10.1109/CVPR.2015.7298655
Song, S, Xiao, J (2014) Sliding shapes for 3D object detection in depth images. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics):vol 8694 LNCS. Zurich, Switzerland, pp 634–651. https://doi.org/10.1007/978-3-319-10599-4_41
Song, S, Xiao, J (2016) Deep sliding shapes for amodal 3D object detection in RGB-D images. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2016-December. Las Vegas, NV, United States, pp 808–816. https://doi.org/10.1109/CVPR.2016.94
Sun, R, Qian, J, Jose, R.H, Gong, Z, Miao, R, Xue, W, Liu, P (2020) A flexible and efficient real-time ORB-based full-HD image feature extraction accelerator. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 28(2):565–575. https://doi.org/10.1109/TVLSI.2019.2945982
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, vol 2017-December. Long Beach, CA, United States, pp 5999–6009
Google Scholar
Wang, Y, Chen, X, Cao, L, Huang, W, Sun, F, Wang, Y (2022) Multimodal token fusion for vision transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR):New Orleans, LA, USA, pp 12186–12195. https://doi.org/10.1109/CVPR52688.2022.01187
Wang, H, Shi, S, Yang, Z, Fang, R, Qian, Q, Li, H, Schiele, B, Wang, L (2022) RBGNet: Ray-based grouping for 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR):New Orleans, LA, USA, pp 1110–1119. https://doi.org/10.1109/CVPR52688.2022.00118
Wang, W, Tran, D, Feiszli, M (2020) What makes training multi-modal classification networks hard? In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Virtual, Online, United States, pp 12692–12702. https://doi.org/10.1109/CVPR42600.2020.01271
Wang, Y, Ye, T, Cao, L, Huang, W, Sun, F, He, F, Tao, D (2022) Bridged transformer for vision and point cloud 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR):New Orleans, LA, USA, pp 12114–12123. https://doi.org/10.1109/CVPR52688.2022.01180
Wang Y, Wang C, Long P, Gu Y, Li W (2021) Recent advances in 3D object detection based on RGB-D: A survey. Displays 70:102077. https://doi.org/10.1016/j.displa.2021.102077
Article Google Scholar
Wang Z, Xie Q, Wei M, Long K, Wang J (2022) Multi-feature fusion VoteNet for 3D object detection. ACM Transactions on Multimedia Computing, Communications and Applications 18(1):6. https://doi.org/10.1145/3462219
Article Google Scholar
Woodford OJ, Pham M-T, Maki A, Perbet F, Stenger B (2014) Demisting the hough transform for 3d shape recognition and registration. International Journal of Computer Vision 106(3):332–341. https://doi.org/10.1007/s11263-013-0623-2
Article Google Scholar
Xiao, J, Owens, A, Torralba, A (2013) SUN3D: A database of big spaces reconstructed using SfM and object labels. In: Proceedings of the IEEE international conference on computer vision, Sydney, NSW, Australia, pp 1625–1632. https://doi.org/10.1109/ICCV.2013.458
Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimedia Tools and Applications 79(33–34):23729–23791. https://doi.org/10.1007/s11042-020-08976-6
Article Google Scholar
Xie Q, Lai Y-K, Wu J, Wang Z, Zhang Y, Xu K, Wang J (2021) Vote-based 3D object detection with context modeling and SOB-3DNMS. International Journal of Computer Vision 129(6):1857–1874. https://doi.org/10.1007/s11263-021-01456-w
Article Google Scholar
Xu, D, Anguelov, D, Jain, A (2018) PointFusion: Deep sensor fusion for 3D bounding box estimation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Salt Lake City, UT, United States, pp 244–253. https://doi.org/10.1109/CVPR.2018.00033
Xu, B, Chen, Z (2018) Multi-level fusion based 3D object detection from monocular images. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Salt Lake City, UT, United States, pp 2345–2353. https://doi.org/10.1109/CVPR.2018.00249
Zhang, Y, Chen, J, Huang, D (2022) CAT-Det: Contrastively augmented transformer for multi-modal 3D object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR):New Orleans, LA, USA, pp 908–917. https://doi.org/10.1109/CVPR52688.2022.00098
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimedia 19(2):4–10. https://doi.org/10.1109/MMUL.2012.24
Article Google Scholar
Zhang M, Xu S, Song W, He Q (2021) Wei, Q (2021) Lightweight underwater object detection based on YOLO v4 and multi-scale attentional feature fusion. Remote Sensing 13(22):4706. https://doi.org/10.3390/rs13224706
Article Google Scholar
Zhang L, Li W, Yu L, Sun L, Dong X, Ning X (2021) GmFace: An explicit function for face image representation. Displays 68:102022. https://doi.org/10.1016/j.displa.2021.102022
Article Google Scholar
Zhao L, Guo J, Xu D, Sheng L (2021) Transformer3D-Det: Improving 3D object detection by vote refinement. IEEE Transactions on Circuits and Systems for Video Technology 31(12):4735–4746. https://doi.org/10.1109/TCSVT.2021.3102025
Article Google Scholar
Zhou, Z, Fan, X, Shi, P, Xin, Y (2021) R-MSFM: Recurrent multi-scale feature modulation for monocular depth estimating. In: Proceedings of the IEEE international conference on computer vision, Virtual, Online, Canada, pp 12757–12766. https://doi.org/10.1109/ICCV48922.2021.01254
Zhou, Y, Tuzel, O (2018) VoxelNet: End-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, Salt Lake City, UT, United States, pp 4490–4499. https://doi.org/10.1109/CVPR.2018.00472
Zhou H, Yuan Y, Shi C (2009) Object tracking using SIFT features and mean shift. Computer Vision and Image Understanding 113(3):345–352. https://doi.org/10.1016/j.cviu.2008.08.006
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (61873086) and the Science and Technology Support Program of Changzhou (CE20215022).

Author information

Authors and Affiliations

College of Information Science and Engineering, Hohai University, 213022, Changzhou, China
Yan Chen, Jianjun Ni, Guangyi Tang & Weidong Cao
School of Artificial Intelligence and Automation, Hohai University, 213022, Changzhou, China
Yan Chen, Jianjun Ni, Guangyi Tang & Weidong Cao
Advanced Robotics and Intelligent Systems (ARIS) Laboratory, School of Engineering, University of Guelph, N1G 2W1, Guelph, Canada
Simon X. Yang

Authors

Yan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Ni
View author publications
You can also search for this author in PubMed Google Scholar
Guangyi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Simon X. Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianjun Ni.

Ethics declarations

Conflicts of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Y., Ni, J., Tang, G. et al. An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images. Multimed Tools Appl 83, 12159–12184 (2024). https://doi.org/10.1007/s11042-023-15845-5

Download citation

Received: 22 December 2022
Revised: 20 April 2023
Accepted: 15 May 2023
Published: 22 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15845-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

Data availability statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation