Stereo Frustums: a Siamese Pipeline for 3D Object Detection

Mo, Xi; Sajid, Usman; Wang, Guanghui

doi:10.1007/s10846-020-01287-w

Stereo Frustums: a Siamese Pipeline for 3D Object Detection

Published: 08 December 2020

Volume 101, article number 6, (2021)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

248 Accesses
7 Citations
Explore all metrics

Abstract

The paper proposes a light-weighted stereo frustums matching module for 3D objection detection. The proposed framework takes advantage of a high-performance 2D detector and a point cloud segmentation network to regress 3D bounding boxes for autonomous driving vehicles. Instead of performing traditional stereo matching to compute disparities, the module directly takes the 2D proposals from both the left and the right views as input. Based on the epipolar constraints recovered from the well-calibrated stereo cameras, we propose four matching algorithms to search for the best match for each proposal between the stereo image pairs. Each matching pair proposes a segmentation of the scene which is then fed into a 3D bounding box regression network. Results of extensive experiments on KITTI dataset demonstrate that the proposed Siamese pipeline outperforms the state-of-the-art stereo-based 3D bounding box regression methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stereo VoVNet-CNN for 3D object detection

Article 12 September 2021

Semantic frustum-based sparsely embedded convolutional detection

Article 19 January 2021

Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

Article 22 July 2022

Data Availability

The authors designed and tested proposed methods on publicly available KITTI dataset: http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d.

References

Bao, W., Xu, B., Chen, Z.: Monofenet: monocular 3D object detection with feature enhancement networks. Transactions on Image Processing (2019)
Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals using stereo imagery for accurate object class detection. Transactions on Pattern Analysis and Machine Intelligence 40(5), 1259–1272 (2017)
Article Google Scholar
Du, X., Ang, M.H., Karaman, S., Rus, D.: A general pipeline for 3D detection of vehicles. In: IEEE International Conference on Robotics and Automation, pp 3194–3200 (2018)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robotics Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Official KITTI benchmark. http://www.cvlibs.net/datasets/kitti/. Accessed 19 Nov 2019
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: IEEE International Conference on Computer Vision, pp 2961–2969 (2017)
Königshof, H., Salscheider, N.O., Stiller, C.: Realtime 3D object detection for automated driving using stereo vision and semantic information. In: IEEE International Conference on Intelligent Transportation Systems (2019)
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1–8 (2018)
Li, K., Ma, W., Sajid, U., Wu, Y., Wang, G.: 2 object detection with convolutional neural networks. Deep Learning in Computer Vision: Principles and Applications 30(31), 41 (2020)
Article Google Scholar
Li, P., Chen, X., Shen, S.: Stereo r-cnn based 3D object detection for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7644–7652 (2019)
Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: European Conference on Computer Vision, pp 641–656 (2018)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp 2980–2988 (2017)
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3569–3577 (2018)
Ma, W., Wu, Y., Cen, F., Wang, G.: Mdfn: multi-scale deep feature learning network for object detection. Pattern Recogn. 100, 107149 (2020)
Article Google Scholar
Pon, A.D., Ku, J., Li, C., Waslander, S.L.: Object-centric stereo matching for 3D object detection. In: IEEE International Conference on Robotics and Automation, pp 8383–8389 (2020)
Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3D object detection from rgb-d data. In: IEEE Conference on Computer Vision and Pattern Recognition , pp 918–927 (2018)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp 5099–5108 (2017)
Qin, Z., Wang, J., Lu, Y.: Triangulation learning network: from monocular to stereo 3D object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7607–7615 (2019)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp 91–99 (2015)
Shi, S., Wang, X., Li, H.: Pointrcnn: 3D object proposal generation and detection from point cloud. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–779 (2019)
Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network. Transactions on Pattern Analysis and Machine Intelligence (2020)
Shin, K., Kwon, Y.P., Tomizuka, M.: Roarnet: a robust 3D object detection based on region approximation refinement. In: IEEE Intelligent Vehicles Symposium (IV), pp 2510–2515 (2019)
Tian, L., Li, M., Hao, Y., Liu, J., Zhang, G., Chen, Y.Q.: Robust 3-d human detection in complex environments with a depth camera. Trans. Multimed. 20(9), 2249–2261 (2018)
Article Google Scholar
Wang, B., An, J., Cao, J.: Voxel-fpn: multi-scale voxel feature aggregation in 3D object detection from point clouds. Sensors 20(3), 704 (2020)
Article Google Scholar
Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8445–8453 (2019)
Wang, Z., Jia, K.: Frustum convnet: sliding frustums to aggregate local point-wise features for amodal. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1742–1749 (2019)
Xu, D., Anguelov, D., Jain, A.: Pointfusion: deep sensor fusion for 3D bounding box estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 244–253 (2018)
Xu, Z., Zhang, W., Ye, X., Tan, X., Yang, W., Wen, S., Ding, E., Meng, A., Huang, L.: Zoomnet: part-aware adaptive zooming neural network for 3D object detection. In: AAAI, pp 12557–12564 (2020)
Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3D object detection from point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7652–7660 (2018)
Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Ipod: intensive point-based object detector for point cloud. arXiv:1812.05276 (2018)
Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: Std: sparse-to-dense 3D object detector for point cloud. In: IEEE International Conference on Computer Vision, pp 1951–1960 (2019)
You, Y., Wang, Y., Chao, W.L., Garg, D., Pleiss, G., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-Lidar++: accurate depth for 3D object detection in autonomous driving. In: International Conference on Learning Representations (2019)
Zhang, G., Liu, J., Li, H., Chen, Y.Q., Davis, L.S.: Joint human detection and head pose estimation via multistream networks for rgb-d videos. Signal Processing Letters 24(11), 1666–1670 (2017)
Article Google Scholar
Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3D object detection. In: IEEE conference on Computer Vision and Pattern Recognition, pp 4490–4499 (2018)

Download references

Funding

The work was supported in part by the United States Department of Agriculture (USDA) under the grant no. 2019-67021-28996, and the Nvidia GPU grant.

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, School of Engineering, University of Kansas, Lawrence, KS, 66045, USA
Xi Mo & Usman Sajid
Department of Computer Science, Ryerson University, Toronto, ON, M5B 2K3, Canada
Guanghui Wang

Authors

Xi Mo
View author publications
You can also search for this author in PubMed Google Scholar
Usman Sajid
View author publications
You can also search for this author in PubMed Google Scholar
Guanghui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanghui Wang.

Ethics declarations

Conflict of interests

The authors declare no conflict of interest/competing interests.

Additional information

Code Availability

The authors declare that the source code of this paper and trained models will be available to the public

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mo, X., Sajid, U. & Wang, G. Stereo Frustums: a Siamese Pipeline for 3D Object Detection. J Intell Robot Syst 101, 6 (2021). https://doi.org/10.1007/s10846-020-01287-w

Download citation

Received: 19 July 2020
Accepted: 27 October 2020
Published: 08 December 2020
DOI: https://doi.org/10.1007/s10846-020-01287-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stereo Frustums: a Siamese Pipeline for 3D Object Detection

Abstract

Access this article

Similar content being viewed by others

Stereo VoVNet-CNN for 3D object detection

Semantic frustum-based sparsely embedded convolutional detection

Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Code Availability

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stereo Frustums: a Siamese Pipeline for 3D Object Detection

Abstract

Access this article

Similar content being viewed by others

Stereo VoVNet-CNN for 3D object detection

Semantic frustum-based sparsely embedded convolutional detection

Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Code Availability

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation