Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

Zhai, Minghao; Liu, Junchen; Zhang, Wei; Liu, Chen; Li, Wei; Cao, Yi

doi:10.1007/978-3-030-27541-9_37

Minghao Zhai^14,16,
Junchen Liu^14,16,
Wei Zhang^14,16,
Chen Liu^14,16,
Wei Li¹⁵ &
…
Yi Cao^14,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11744))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

2721 Accesses
6 Citations

Abstract

SSD (Single Shot Multibox Detector) is one of advanced object detection methods and apparently can detect objects with high accuracy and fast speed. However, detecting small objects accurately remains a problem full of challenges for SSD. To handle this troublesome problem, our paper introduce a multi-scale feature fusion single shot object detector based on DenseNet (MFSOD), which combine the dense convolutional network (DenseNet) with SSD framework. Firstly, we add additional convolutional layers after backbone network to realize multi-scale feature detection. In addition, the feature fusion module is designed to fuse the multi-scale features from different layers, introducing the contextual information in object detection. Finally, we evaluate the proposed method on PASCAL VOC2007 and MS COCO benchmark datasets. The results indicate that our proposed method achieves 78.9% mAP on PASCAL VOC2007 test and 27.1% mAP on MS COCO test-dev2015 at the speed of 23 FPS. MFSOD algorithm outperforms the conventional SSD in aspects of accuracy, especially for small objects, and satisfies the demand of real-time application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ren, S., He, K.: Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2015)
Article Google Scholar
Agarwal, S., Terrail, J.: Recent advances in object detection in the age of deep convolutional neural networks. arXiv preprint arXiv:1809.03193 (2018)
Uijlings, J., van de Sande, K.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Article Google Scholar
Erhan, D., Szegedy, C., Toshev, A.: Scalable object detection using deep neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2155–2162. IEEE, Columbus (2014)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893. IEEE, San Diego (2005)
Google Scholar
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, vol. 3361, no. (10) (1995)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105. MIT Press, Lake Tahoe (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Chapter Google Scholar
Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE, Columbus (2014)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. IEEE, Santiago, Chile (2015)
Google Scholar
Bell, S., Zitnick, C., Bala, K.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883. IEEE, Las Vegas (2016)
Google Scholar
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99. MIT Press, Montreal (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. IEEE, Las Vegas (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. IEEE, Honolulu (2017)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Huang, G., Liu, Z., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE, Honolulu (2017)
Google Scholar
Everingham, M., Eslami, S., Van Gool, L.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Fu, C., Liu, W., Ranga, A.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Shen, Z., Liu, Z., Li J.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1937–1945. IEEE, Venice (2017)
Google Scholar
Zhou, P., Ni, B., Geng, C.: Scale-transferrable object detection. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 528–537. IEEE, Salt Lake City (2018)
Google Scholar
Kim, K., Hong, S., Roh, B.: PVANET: deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016)
Huang, J., Rathod, V., Sun, C.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3296–3297. IEEE, Honolulu (2017)
Google Scholar
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chapter Google Scholar
Kong, T., Yao, A., Chen, Y.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 845–853. IEEE, Las Vegas (2016)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014)
Google Scholar
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv: 1712. 00960 (2017)
Google Scholar
Lin, T., Dollár, P., Girshick, R.: Feature pyramid networks for object detection. arXiv preprint arXiv: 1612.03144 (2016)
Google Scholar
Dai, J., Li, Y., He, K.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS’16 Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), pp. 379–387. ACM, Barcelona (2016)
Google Scholar

Download references

Acknowledgement

This work reported here was supported by the National Natural Science Foundation of China (Grant No. 51375209), 111 Project (Grant No. B18027), the Six Talent Peaks Project in Jiangsu Province (Grant No. ZBZZ-012), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX18-0630, KYCX18-1846). Finally, the authors would like to thanks for the support of PASCAL VOC datasets.

Author information

Authors and Affiliations

School of Mechanical Engineering, Jiangnan University, Wuxi, 214122, Jiangsu, China
Minghao Zhai, Junchen Liu, Wei Zhang, Chen Liu & Yi Cao
Suzhou Vocational Institute of Industrial Technology, Suzhou, 215104, Jiangsu, China
Wei Li
Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology, Wuxi, 214122, Jiangsu, China
Minghao Zhai, Junchen Liu, Wei Zhang, Chen Liu & Yi Cao

Authors

Minghao Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Junchen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Cao .

Editor information

Editors and Affiliations

Shenyang Institute of Automation, Shenyang, China
Haibin Yu
Shenyang Institute of Automation, Shenyang, China
Jinguo Liu
Shenyang Institute of Automation, Shenyang, China
Lianqing Liu
University of Portsmouth, Portsmouth, UK
Zhaojie Ju
Shenyang Institute of Automation, Shenyang, China
Yuwang Liu
University of Portsmouth, Portsmouth, UK
Dalin Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhai, M., Liu, J., Zhang, W., Liu, C., Li, W., Cao, Y. (2019). Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11744. Springer, Cham. https://doi.org/10.1007/978-3-030-27541-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-27541-9_37
Published: 06 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27540-2
Online ISBN: 978-3-030-27541-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics