DMFF: dual-way multimodal feature fusion for 3D object detection

Dong, Xiaopeng; Di, Xiaoguang; Wang, Wenzhuang

doi:10.1007/s11760-023-02772-z

DMFF: dual-way multimodal feature fusion for 3D object detection

Original Paper
Published: 20 September 2023

Volume 18, pages 455–463, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Xiaopeng Dong¹,
Xiaoguang Di¹ &
Wenzhuang Wang¹

166 Accesses
Explore all metrics

Abstract

Recently, multimodal 3D object detection that fuses the complementary information from LiDAR data and RGB images has been an active research topic. However, it is not trivial to fuse images and point clouds because of different representations of them. Inadequate feature fusion also brings bad effects on detection performance. We convert images into pseudo point clouds by using a depth completion and utilize a more efficient feature fusion method to address the problems. In this paper, we propose a dual-way multimodal feature fusion network (DMFF) for 3D object detection. Specifically, we first use a dual stream feature extraction module (DSFE) to generate homogeneous LiDAR and pseudo region of interest (RoI) features. Then, we propose a dual-way feature interaction method (DWFI) that enables intermodal and intramodal interaction of the two features. Next, we design a local attention feature fusion module (LAFF) to select which features of the input are more likely to contribute to the desired output. In addition, the proposed DMFF achieves the state-of-the-art performances on the KITTI Dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection

URFormer: Unified Representation LiDAR-Camera 3D Object Detection with Transformer

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

Data and materials availability

Not applicable.

References

Zhou, C., Zhang, Y., Chen, J., and Huang, D.: OcTr: Octree-based transformer for 3D object detection. arXiv preprint arXiv:2303.12621 (2023)
Sheng, H., Cai, S., Liu, Y., Deng, B., Huang, J., Hua, X. S., Zhao, M. J.: Improving 3d object detection with channel-wise transformer. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition. pp. 2743-2752 (2021)
Hu, J. S., Kuai, T., and Waslander, S. L.: Point density-aware voxels for lidar 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8469-8478 (2022)
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., Li, H.: Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: CVPR. pp. 10529–10538 (2020)
Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: CVPR. pp. 770–779 (2019)
Xu, Q., Zhong, Y., and Neumann, U.: Behind the curtain: learning occluded shapes for 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence. pp. 2893-2901 (2022)
Chen, Y., Li, Y., Zhang, X., Sun, J., and Jia, J.: Focal sparse convolutional networks for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5428-5437 (2022)
Zhu, H., Deng, J., Zhang, Y., Ji, J., Mao, Q., Li, H., and Zhang, Y.: Vpfnet: Improving 3d object detection with virtual point based lidar and stereo data fusion. arXiv preprint arXiv:2111.14382 (2021)
Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D., and Han, S.: BEVFusion: multi-task multi-sensor fusion with unified bird’s-eye view representation. arXiv preprint arXiv:2205.13542 (2022)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from rgb-d data. In: CVPR. pp. 918–927 (2018)
Wang, Y., Mao, Q., Zhu, H., Deng, J., Zhang, Y., Ji, J., Zhang, Y.: Multi-modal 3d object detection in autonomous driving: a survey. arXiv preprint arXiv:2106.12735 (2021)
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: Sequential fusion for 3d object detection. In: CVPR. pp. 4604–4612 (2020)
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: CVPR. pp. 1907–1915 (2017)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS). pp. 1–8. IEEE (2018)
Wu, X., Peng, L., Yang, H., Xie, L., Huang, C., Deng, C., Cai, D.: Sparse fuse dense: towards high quality 3d detection with depth completion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5418-5427 (2022)
Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: Penet: Towards precise and efficient image guided depth completion. In 2021 IEEE international conference on robotics and automation (ICRA). pp. 13656-13662. IEEE (2021)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR. pp. 3354–3361 (2012)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR. pp. 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS 30 (2017)
Zhou, Y., Tuzel, O.: Voxelnet: End-to-end learning for point cloud based 3d object detection. In: CVPR. pp. 4490–4499 (2018)
Yan, Y., Mao, Y., Li, B.: Second: Sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)
Article Google Scholar
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., Li, H.: Voxel r-cnn: Towards high performance voxel-based 3d object detection. In: AAAI. pp. 1201–1209 (2021)
Wang, Y., Chao, W. L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K. Q.: Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8445-8453 (2019)
Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., and Fan, X.: Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6851-6860 (2019)
Roddick, T., Kendall, A., Cipolla, R.: Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188 (2018)
Reading, C., Harakeh, A., Chae, J., Waslander, S.L.: Categorical depth distribution network for monocular 3d object detection. In: CVPR. pp. 8555–8564 (2021)
Huang, T., Liu, Z., Chen, X., Bai, X.: Epnet: Enhancing point features with image semantics for 3d object detection. In: ECCV. pp. 35–52. Springer (2020)
Liu, Z., Huang, T., Li, B., Chen, X., Wang, X., Bai, X.: EPNet++: Cascade bi-directional fusion for multi-modal 3D object detection. arXiv preprint arXiv:2112.11088 (2021)
Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: sequential fusion for 3d object detection. In: CVPR. pp. 4604–4612 (2020)
Wang, C., Ma, C., Zhu, M., Yang, X.: Pointaugmenting: cross-modal augmentation for 3d object detection. In: CVPR. pp. 11794–11803 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems, 30 (2017)
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi- task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 7345-7353 (2019)
Mahmoud, A., Hu, J. S., Waslander, S. L.: Dense voxel fusion for 3D object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 663-672 (2023)
Yang, H., Liu, Z., Wu, X., Wang, W., Qian, W., He, X., Cai, D.: Graph R-CNN: towards accurate 3D object detection with semantic-decorated local graph. In: ECCV. pp. 662-679. Springer (2022)

Download references

Funding

This work was supported in part by the Natural Science Foundation of Heilongjiang Province of China (No.LH2021F026), Fundamental Research Funds for the Central Universities (No. HIT.NSRIF202243), and Aeronautical Science Foundation of China (No.2022Z071077002).

Author information

Authors and Affiliations

Control and Simulation Center, Harbin Institute of Technology, Harbin, China
Xiaopeng Dong, Xiaoguang Di & Wenzhuang Wang

Authors

Xiaopeng Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Di
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhuang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XD and XD designed the research. XD drafted the manuscript. XD helped organize the manuscript.

Corresponding author

Correspondence to Xiaoguang Di.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dong, X., Di, X. & Wang, W. DMFF: dual-way multimodal feature fusion for 3D object detection. SIViP 18, 455–463 (2024). https://doi.org/10.1007/s11760-023-02772-z

Download citation

Received: 04 June 2023
Revised: 15 August 2023
Accepted: 29 August 2023
Published: 20 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11760-023-02772-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DMFF: dual-way multimodal feature fusion for 3D object detection

Abstract

Access this article

Similar content being viewed by others

3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection

URFormer: Unified Representation LiDAR-Camera 3D Object Detection with Transformer

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

Data and materials availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DMFF: dual-way multimodal feature fusion for 3D object detection

Abstract

Access this article

Similar content being viewed by others

3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection

URFormer: Unified Representation LiDAR-Camera 3D Object Detection with Transformer

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

Data and materials availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation