Skip to main content
Log in

A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

This paper describes a hardware-oriented two-stage algorithm that can be deployed in a resource-limited field-programmable gate array (FPGA) for fast-object detection and recognition with out external memory. The first stage is the bounding boxes proposal with a conventional object detection method, and the second is convolutional neural network (CNN)-based classification for accuracy improvement. Frequently accessing external memories significantly affects the execution efficiency of object classification. Unfortunately, the existing CNN models with a large number of parameters are difficult to deploy in FPGAs with limited on-chip memory resources. In this study, we designed a compact CNN model and performed the hardware-oriented quantization for parameters and intermediate results. As a result, CNN-based ultra-fast-object classification was realized with all parameters and intermediate results stored on chip. Several evaluations were performed to demonstrate the performance of the proposed algorithm. The object classification module consumes only 163.67 Kbits of on-chip memories for ten regions of interest (ROIs), this is suitable for low-end FPGA devices. In the aspect of accuracy, our method provides a correctness rate of 98.01% in open-source data set MNIST and over 96.5% in other three self-built data sets, which is distinctly better than conventional ultra-high-speed object detection algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html

  2. https://pytorch.org/

  3. http://sipi.usc.edu/database/.

  4. http://yann.lecun.com/exdb/mnist/

References

  1. Sharma, A., Shimasaki, K., Gu, Q., Chen, J., Aoyama, T., Takaki, T., Ishii, I., Tamura, K., Tajima, K.: Super high-speed vision platform for processing 1024\(\times\) 1024 images in real time at 12500 fps. IEEE/SICE International Symposium on system integration (SII), 544–549 (2016)

  2. Ishii, I., Tatebe, T., Gu, Q., Takaki, T.: Color-histogram-based Tracking at 2000 fps. J. Electron. Imaging 21(1), 13010 (2012)

    Article  Google Scholar 

  3. Ishii, I., Ichida, T., Gu, Q., Takaki, T.: 500-fps face tracking system. J. Real-Time Image Process. 8(4), 379–388 (2013)

    Article  Google Scholar 

  4. Ma, X., Najjar, W.A., Roy-Chowdhury, A.K.: Evaluation and acceleration of high-throughput fixed-point object detection on FPGAs. IEEE Trans. Circuits Syst. Video Technol. 25(6), 1051–1062 (2015)

    Article  Google Scholar 

  5. Nakahara, H., Yonekawa, H., Fujii, T., Sato, S.: A lightweight YOLOv2: A binarized CNN with a parallel support vector regression for an FPGA. Proceedings of the 2018 ACM/SIGDA International Symposium on field-programmable gate arrays, 31–40 (2018)

  6. Chen, Y.-H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)

    Article  Google Scholar 

  7. Shen, J., Huang, Y., Wang, Z., Qiao, Y., Wen, M., Zhang, C.: Proceedings of the 2018 ACM/SIGDA International Symposium on field-programmable gate arrays, 97–106 (2018)

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  9. Bay, H., Tuytelaars, T., Van Gool L.: Surf: Speeded up robust features. Eur. Conf. Comput. Vis. 404–417 (2006)

  10. Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection, Computer Vision and Pattern Recognition, 2005, IEEE Computer Society Conference on, 886–893 (2005)

  11. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. Eur. Conf. Comput. Vis. 778–792 (2010)

  12. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. IEEE international conference on Computer Vision (ICCV), 2564–2571 (2011)

  13. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 1097–1105 (2012)

  15. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016)

  17. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, 580–587 (2014)

  18. Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  19. Girshick, R.: Fast r-cnn. Proceedings of the IEEE international conference on computer vision 1440–1448, (2015)

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 91–99 (2015)

  21. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788 (2016)

  22. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint (2017)

  23. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  24. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. European conference on computer vision, 21–37 (2016)

  25. Gu, Q., Ishii, I.: Review of some advances and applications in real-time high-speed vision: Our views and experiences. Int. J. Autom. Comput. 13(4), 305–318 (2016)

    Article  Google Scholar 

  26. Ishii, I., Tatebe, T., Gu, Q., Takaki, T.: Color-histogram-based tracking at 2000 fps. J. Electron. Imaging 21(1), 013010 (2012)

    Article  Google Scholar 

  27. Gu, Q., Takaki, T., Ishii, I.: Fast FPGA-based multiobject feature extraction. IEEE Trans. Circuits Syst. Video Technol. 23(1), 30–45 (2013)

    Article  Google Scholar 

  28. Gu, Q., Kawahara, T., Aoyama, T., Takaki, T., Ishii, I., Takemoto, A., Sakamoto, N.: LOC-based high-throughput cell morphology analysis system. IEEE Trans. Autom. Sci. Eng. 12(4), 1346–1356 (2015)

    Article  Google Scholar 

  29. Gu, Q., Aoyama, T., Takaki, T., Ishii, I.: Simultaneous vision-based shape and motion analysis of cells fast-flowing in a microchannel. IEEE Trans. Autom. Sci. Eng. 12(1), 204–215 (2015)

    Article  Google Scholar 

  30. Li, J., Yin, Y., Liu, X., Xu, D., Gu, Q.: 12,000-fps Multi-object detection using HOG descriptor and SVM classifier. IEEE/RSJ International Conference on intelligent robots and systems (IROS), 5928–5933 (2017)

  31. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst., pp. 1135–1143 (2015)

  32. Luo, J., Wu, J., Lin, W.: Thinet: A filter level pruning method for deep neural network compression. arXiv preprint arXiv:1707.06342 (2017)

  33. Wang, W., Sun, Y., Eriksson, B., Wang, W., Aggarwal, V.: Wide compression: Tensor ring nets. IEEE Conference on computer vision and pattern recognition, 9329–9338 (2018)

  34. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  35. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. IEEE Conference on computer vision and pattern recognition (CVPR), 5987–5995 (2017)

  36. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. European Conference on computer vision, 525–542 (2016)

  37. Wang, P., Hu, Q., Zhang, Y., Zhang, C., Liu, Y., Cheng, J.: Others, two-step quantization for low-bit neural networks. Proceedings of the IEEE Conference on computer vision and pattern recognition, 4376–4384 (2018)

  38. Park, E., Ahn, J., Yoo, S.: Weighted-entropy-based quantization for deep neural networks. IEEE Conference on computer vision and pattern recognition (CVPR) (2017)

  39. Leng, C., Li, H., Zhu, S., Jin, R.: Extremely low bit neural network: Squeeze the last bit out with admm. arXiv preprint arXiv:1707.09870 (2017)

Download references

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (Grant No. 61673376).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingyi Gu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Long, X., Hu, S. et al. A novel hardware-oriented ultra-high-speed object detection algorithm based on convolutional neural network. J Real-Time Image Proc 17, 1703–1714 (2020). https://doi.org/10.1007/s11554-019-00931-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-019-00931-5

Keywords

Navigation