Abstract
Nowadays, the deep learning for object detection has become more popular and is widely adopted in many fields. This paper focuses on the research of LiDAR and camera sensor fusion technology for vehicle detection to ensure extremely high detection accuracy. The proposed network architecture takes full advantage of the deep information of both the LiDAR point cloud and RGB image in object detection. First, the LiDAR point cloud and RGB image are fed into the system. Then a high-resolution feature map is used to generate a reliable 3D object proposal for both the LiDAR point cloud and RGB image. Finally, 3D box regression is performed to predict the extent and orientation of vehicles in 3D space. Experiments on the challenging KITTI benchmark show that the proposed approach obtains ideal detection results and the detection time of each frame is about 0.12 s. This approach could establish a basis for further research in autonomous vehicles.
Similar content being viewed by others
Abbreviations
- BEV:
-
Bird’s-eye view of the LiDAR point cloud
- IOU:
-
Intersection over union
- ROI:
-
Region of interest
- AOS:
-
Average orientation similarity
References
Kehl, W., Manhardt, F., Tombari, F., et al.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE International Conference on Computer Vision, Computer Vision Foundation, Venice, 22–29 October, 2017
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multiBox detector. computer science. In: 16th Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Boston, 8–10 June, 2015
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Zhou, Y., Tuzel, O.: VoxelNet: End-to-End learning for point cloud based 3D object detection. In:19th Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Salt Lake City, 18–22 June, 2018
Behley, J., Steinhage, V., Cremers, A.B.: Laser-based segment classification using a mixture of bag-of-words. In: IEEE International Conference on Computer Vision, Computer Vision Foundation, Tokyo, 7–10 April, 2013
Wu, B., Wan, A., Yue, X., et al.: SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud. In: IEEE International Conference on Robotics and Automation, Faculty of Mathematics and Physics of Charles University, Brisbane, 13–17 August, 2018
Li, S., Kang, X., Fang, L., et al.: Pixel-level image fusion: a survey of the state of the art. Inform. Fusion 33(5), 100–112 (2017)
Cvejic, N., Nikolov, S. G., Knowles, H. D., et al.: The effect of pixel-level fusion on object tracking in multi-sensor surveillance video. In: 8th IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Minneapolis, 18–23 June, 2007
An, L., Chen, X., Yang, S.: Multi-graph feature level fusion for person re-identification. Neurocomputing 259(4), 39–45 (2017)
Sharma, V., Davis, J.W.: Feature-level fusion for object segmentation using mutual information. In: 7th IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, New York, 17–22 June, 2006
Yebo, G., Minglei, Y., Zhenguo, S., et al.: The applications of decision-level data fusion techniques in the field of multiuser detection for DS-UWB systems. Sensors 15(10), 24771–24790 (2015)
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. CVPR 11(2), 936–944 (2016)
Cai, Z., Fan, Q., Feris, R.S., et al.: A unified multi-scale deep convolutional neural network for fast object detection. Comput. Vis. 9908, 354–370 (2016)
Chen, X., Ma, H., Wan, J., et al.: Multi-view 3D object detection network for autonomous driving. In: 17th Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Las Vegas, 26–30 June, 2016
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Sci. 428(6), 158–165 (2014)
Song, Y., Gong, L.: Analysis and improvement of joint bilateral upsampling for depth image super-resolution. Wireless Communications and Signal Processing, Institute of Electrical and Electronics Engineers, Yangzhou, 13–15 October, 2016
Girshick, R., Donahue, J., Darrelland, T., et al.: Rich feature hierarchies for object detection and semantic segmentation. In: 15th Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Tianjin, 3–6 August, 2014
Geiger, A., Lenz, P., Stiller, C., et al.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Zeng, Y., Hu, Y., Liu, S., et al.: RT3D: real-time 3D vehicle detection in LiDAR point cloud for autonomous driving. IEEE Robot. Autom. Lett. 8(11), 125–132 (2018)
Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In:20th Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Long Beach, 16–20 June, 2019
Gustafsson, F., Linder-Noren, E.: Automotive 3D object detection without target domain annotations. Dissertation, Linkoping University (2018)
Duan, K., Bai, S., Xie, L., et al.: CenterNet: keypoint triplets for object detection. In: 20th Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Long Beach, 16–20 June, 2019
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: 13th Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Providence, 10–15 June, 2012
Lederer, C., Altstadt, S., Andriamonje, S., et al.: Vehicle detection from 3D Lidar using fully convolutional network. Robotics: Science and Systems, University of Michigan, Ann Arbor, 20–22 June, 2016
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2017YFB0102603, 2018YFB0105003), the National Natural Science Foundation of China (51875255, 61601203, 61773184, U1564201, U1664258, U1764257, U1762264), the Natural Science Foundation of Jiangsu Province (BK20180100), the Six Talent Peaks Project of Jiangsu Province (2018-TD-GDZB-022), the Key Project for the Development of Strategic Emerging Industries of Jiangsu Province (2016-1094), and the Key Research and Development Program of Zhenjiang City (GY2017006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Cai, Y., Zhang, T., Wang, H. et al. 3D Vehicle Detection Based on LiDAR and Camera Fusion. Automot. Innov. 2, 276–283 (2019). https://doi.org/10.1007/s42154-019-00083-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42154-019-00083-z