Abstract
SSD (Single Shot Multibox Detector) is one of advanced object detection methods and apparently can detect objects with high accuracy and fast speed. However, detecting small objects accurately remains a problem full of challenges for SSD. To handle this troublesome problem, our paper introduce a multi-scale feature fusion single shot object detector based on DenseNet (MFSOD), which combine the dense convolutional network (DenseNet) with SSD framework. Firstly, we add additional convolutional layers after backbone network to realize multi-scale feature detection. In addition, the feature fusion module is designed to fuse the multi-scale features from different layers, introducing the contextual information in object detection. Finally, we evaluate the proposed method on PASCAL VOC2007 and MS COCO benchmark datasets. The results indicate that our proposed method achieves 78.9% mAP on PASCAL VOC2007 test and 27.1% mAP on MS COCO test-dev2015 at the speed of 23 FPS. MFSOD algorithm outperforms the conventional SSD in aspects of accuracy, especially for small objects, and satisfies the demand of real-time application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ren, S., He, K.: Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2015)
Agarwal, S., Terrail, J.: Recent advances in object detection in the age of deep convolutional neural networks. arXiv preprint arXiv:1809.03193 (2018)
Uijlings, J., van de Sande, K.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Erhan, D., Szegedy, C., Toshev, A.: Scalable object detection using deep neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2155–2162. IEEE, Columbus (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893. IEEE, San Diego (2005)
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, vol. 3361, no. (10) (1995)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1097–1105. MIT Press, Lake Tahoe (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE, Columbus (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448. IEEE, Santiago, Chile (2015)
Bell, S., Zitnick, C., Bala, K.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883. IEEE, Las Vegas (2016)
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems, pp. 91–99. MIT Press, Montreal (2015)
Redmon, J., Divvala, S., Girshick, R.: You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. IEEE, Las Vegas (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525. IEEE, Honolulu (2017)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Huang, G., Liu, Z., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE, Honolulu (2017)
Everingham, M., Eslami, S., Van Gool, L.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Fu, C., Liu, W., Ranga, A.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Shen, Z., Liu, Z., Li J.: DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1937–1945. IEEE, Venice (2017)
Zhou, P., Ni, B., Geng, C.: Scale-transferrable object detection. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 528–537. IEEE, Salt Lake City (2018)
Kim, K., Hong, S., Roh, B.: PVANET: deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 (2016)
Huang, J., Rathod, V., Sun, C.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3296–3297. IEEE, Honolulu (2017)
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Kong, T., Yao, A., Chen, Y.: HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 845–853. IEEE, Las Vegas (2016)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2014)
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv: 1712. 00960 (2017)
Lin, T., Dollár, P., Girshick, R.: Feature pyramid networks for object detection. arXiv preprint arXiv: 1612.03144 (2016)
Dai, J., Li, Y., He, K.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS’16 Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), pp. 379–387. ACM, Barcelona (2016)
Acknowledgement
This work reported here was supported by the National Natural Science Foundation of China (Grant No. 51375209), 111 Project (Grant No. B18027), the Six Talent Peaks Project in Jiangsu Province (Grant No. ZBZZ-012), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX18-0630, KYCX18-1846). Finally, the authors would like to thanks for the support of PASCAL VOC datasets.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhai, M., Liu, J., Zhang, W., Liu, C., Li, W., Cao, Y. (2019). Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11744. Springer, Cham. https://doi.org/10.1007/978-3-030-27541-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-27541-9_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27540-2
Online ISBN: 978-3-030-27541-9
eBook Packages: Computer ScienceComputer Science (R0)