Skip to main content

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11744))

Included in the following conference series:

Abstract

SSD (Single Shot Multibox Detector) is one of advanced object detection methods and apparently can detect objects with high accuracy and fast speed. However, detecting small objects accurately remains a problem full of challenges for SSD. To handle this troublesome problem, our paper introduce a multi-scale feature fusion single shot object detector based on DenseNet (MFSOD), which combine the dense convolutional network (DenseNet) with SSD framework. Firstly, we add additional convolutional layers after backbone network to realize multi-scale feature detection. In addition, the feature fusion module is designed to fuse the multi-scale features from different layers, introducing the contextual information in object detection. Finally, we evaluate the proposed method on PASCAL VOC2007 and MS COCO benchmark datasets. The results indicate that our proposed method achieves 78.9% mAP on PASCAL VOC2007 test and 27.1% mAP on MS COCO test-dev2015 at the speed of 23 FPS. MFSOD algorithm outperforms the conventional SSD in aspects of accuracy, especially for small objects, and satisfies the demand of real-time application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ren, S., He, K.: Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2015)

    Article  Google Scholar 

  2. Agarwal, S., Terrail, J.: Recent advances in object detection in the age of deep convolutional neural networks. arXiv preprint arXiv:1809.03193 (2018)

  3. Uijlings, J., van de Sande, K.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)

    Article  Google Scholar 

  4. Erhan, D., Szegedy, C., Toshev, A.: Scalable object detection using deep neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2155–2162. IEEE, Columbus (2014)

    Google Scholar 

  5. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893. IEEE, San Diego (2005)

    Google Scholar 

  6. LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, vol. 3361, no. (10) (1995)

    Google Scholar 

  7. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105. MIT Press, Lake Tahoe (2012)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23

    Chapter  Google Scholar 

  9. Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE, Columbus (2014)

    Google Scholar 

  10. Girshick, R.: Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. IEEE, Santiago, Chile (2015)

    Google Scholar 

  11. Bell, S., Zitnick, C., Bala, K.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883. IEEE, Las Vegas (2016)

    Google Scholar 

  12. Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99. MIT Press, Montreal (2015)

    Google Scholar 

  13. Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. IEEE, Las Vegas (2016)

    Google Scholar 

  14. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. IEEE, Honolulu (2017)

    Google Scholar 

  15. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  16. Huang, G., Liu, Z., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE, Honolulu (2017)

    Google Scholar 

  17. Everingham, M., Eslami, S., Van Gool, L.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)

    Article  Google Scholar 

  18. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  19. Fu, C., Liu, W., Ranga, A.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

  20. Shen, Z., Liu, Z., Li J.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1937–1945. IEEE, Venice (2017)

    Google Scholar 

  21. Zhou, P., Ni, B., Geng, C.: Scale-transferrable object detection. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 528–537. IEEE, Salt Lake City (2018)

    Google Scholar 

  22. Kim, K., Hong, S., Roh, B.: PVANET: deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016)

  23. Huang, J., Rathod, V., Sun, C.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3296–3297. IEEE, Honolulu (2017)

    Google Scholar 

  24. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22

    Chapter  Google Scholar 

  25. Kong, T., Yao, A., Chen, Y.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 845–853. IEEE, Las Vegas (2016)

    Google Scholar 

  26. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014)

    Google Scholar 

  27. Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv: 1712. 00960 (2017)

    Google Scholar 

  28. Lin, T., Dollár, P., Girshick, R.: Feature pyramid networks for object detection. arXiv preprint arXiv: 1612.03144 (2016)

    Google Scholar 

  29. Dai, J., Li, Y., He, K.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS’16 Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), pp. 379–387. ACM, Barcelona (2016)

    Google Scholar 

Download references

Acknowledgement

This work reported here was supported by the National Natural Science Foundation of China (Grant No. 51375209), 111 Project (Grant No. B18027), the Six Talent Peaks Project in Jiangsu Province (Grant No. ZBZZ-012), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX18-0630, KYCX18-1846). Finally, the authors would like to thanks for the support of PASCAL VOC datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Cao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhai, M., Liu, J., Zhang, W., Liu, C., Li, W., Cao, Y. (2019). Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11744. Springer, Cham. https://doi.org/10.1007/978-3-030-27541-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27541-9_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27540-2

  • Online ISBN: 978-3-030-27541-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics