Abstract
Convolutional neural network (CNN) is the most well-known algorithm that it has been widely utilized in the applications of the image recognition and classification. Various Field Programmable Gate Array based (FPGA-based) CNN architectures had been proposed for the capability of the fast reconfigurability. However, the high-performance designs are necessary to reduce the computational time. The contributions of the paper include: 1) using heterogeneous and two-dimensional dispatcher technologies to implement FPGA-based CNN accelerators at different computational levels of CNN so that the computational time of CNN can be reduced and 2) proposing a flexible and integrated pipeline software and hardware (SW/HW) architecture to reduce the integration overheads of using a CNN framework. The experimental results show that the proposed architectures have the best performance and minimum FPGA resource requirements.
Similar content being viewed by others
References
Akira J, Fujii T, Sato S, Nakahara H (2018) An FPGA realization of OpenPose based on a sparse weight convolutional neural network. In: 2018 international conference on field- programmable technology (FPT). IEEE, pp 310-313
Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCLTM deep learning accelerator on Arria 10. In: 2017 international symposium on field-programmable gate array (FPGA). ACM, pp 55–64
Chakradhar S, Sankaradas M, Jakkula V, Cadambi S (2010) A dynamically configurable coprocessor for convolutional neural networks. In: 2010 37th international symposium oncomputer architecture (ISCA). ACM 247-257
Chen T, Du Z, Sun N, Wang J, Wu C, Chn Y (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigarch Comput Architect News 42(1):269–284. https://doi.org/10.1145/2654822.2541967
Chen YT, Cong J, Fang Z, Lei J, Wei P (2016) When spark meets FPGAs: a case study for next-generation DNA sequencing acceleration. In: 2016 8th Usenix workshop on hot topic in cloud computing (HotCloud). https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/chen. Accessed 11 June 2022
Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138. https://doi.org/10.1109/JSSC.2016.2616357
Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: 2014 24th international conference on artificial neural networks (ICANN). Springer, pp 281-290
Farabet C, Poulet C, Han JY, LeCun Y (2009) Cnp: an FPGA-based processor for convolutional networks. In: 2009 international conference on field programmable logic and applications (FPL). IEEE, pp 32-37
Github repository (2022) https://github.com/tensorflow/tensorflow. Accessed 11 June 2022
Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: IEEE 25th international symposium on field-programmable custom computing machines (FCCM). IEEE, pp 152–159
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: 2015 international conference on machine learning (ICML). ACM, pp 1737-1746
Han S, Mao H, Dally WJ (2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. https://arxiv.org/abs/1510.00149. Accessed 11 June 2022
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andeetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for Mobile vision applications. arXiv. https://arxiv.org/abs/1704.04861. Accessed 11 June 2022.
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Jouppi N, Young C, Patil N, Patterson D, Agrawal G, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: 2017 44th international symposium on computer architecture (ISCA). ACM, pp 1–12
Krizhevsky A, Sutskever I, Hinton GE (2017) Image net classification with deep convolutional neural networks. ACM Commun 60(6):84–90. https://doi.org/10.1145/3065386
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Li H, Fan X, Li J, Cao W, Zhou X, Wang L (2016) A High-Performance FPGA-based Accelerator for Large-scale Convolutional Neural Networks. In: 2016 International conference on field programmable logic and applications (FPL). IEEE, pp 1–9
Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 international conference on field programmable logic and applications (FPL). IEEE, pp 1-9
Lian X, Liu Z, Song Z, Dai J, Zhou W, Ji X (2019) High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans Very Large-Scale Integr (VLSI) Syst 27(8):1874–1885. https://doi.org/10.1109/TVLSI.2019.2913958
Noronha D, Salehpour B, Wilton SJE (2018) LeFlow: enabling flexible FPGA high-level synthesis of TensorFlow deep neural networks. In: 5th international workshop on FPGA for software programmer (FSP). VDE-Verlag, pp 1–8
Rosenberg C (2013) Improving photo search: a step across the semantic gap. Google AI blog. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html. Accessed 11 June 2022
Shen Y, Ferdman M, Milder P (2017) Maximizing CNN accelerator efficiency through resource partitioning. In: 2017 44th international symposium on computer architecture (ISCA). ACM, 535–547
Simonyan, K., Zisserman (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556. Accessed 11 June 2022.
Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo JS, Cao Y (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: 2016 international symposium on field-programmable gate array (FPGA). ACM, pp 16–25
Umuroglu Y, Fraser N, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable Binarized neural network inference. In: 2017 international symposium on field-programmable gate array (FPGA). ACM, pp 65–74
Wei X, Liang Y, Li X, Yu CH, Zhang P, Cong J (2018) “TGPA: Tile-Grained Pipeline Architecture for Low latency CNN Inference,” in 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8
Wikipedia (2022) https://en.wikipedia.org/wiki/TensorFlow. Accessed 11 June 2022
Xilinx (2012) Large FPGA methodology guide. Xilinx web. https://www.xilinx.com/support/documentation/sw_manuals/xilinx13_4/ug872_largefpga.pdf. Accessed 11 June 2022
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator Design for Deep Convolutional Neural Networks. In: 2015 international symposium on field-programmable gate array (FPGA). ACM, pp 161-170
Zhang C, Wu D, Sun J, Sun G, Luo G, Cong J (2016) Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster. In: 2016 International symposium on low power electronics and design (ISLPED). ACM, pp 326–331
Funding
This work was supported by Ministry of Science and Technology, Taiwan, MOST 110–2221-E-024-001.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
To the best of our knowledge, the named authors have no conflict of interest, financial or otherwise.
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kao, CC. Performance-oriented FPGA-based convolution neural network designs. Multimed Tools Appl 82, 21019–21030 (2023). https://doi.org/10.1007/s11042-023-14537-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14537-4