Performance-oriented FPGA-based convolution neural network designs

Kao, Chi-Chou

doi:10.1007/s11042-023-14537-4

Performance-oriented FPGA-based convolution neural network designs

Published: 09 February 2023

Volume 82, pages 21019–21030, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chi-Chou Kao ORCID: orcid.org/0000-0003-3174-9367¹

394 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Convolutional neural network (CNN) is the most well-known algorithm that it has been widely utilized in the applications of the image recognition and classification. Various Field Programmable Gate Array based (FPGA-based) CNN architectures had been proposed for the capability of the fast reconfigurability. However, the high-performance designs are necessary to reduce the computational time. The contributions of the paper include: 1) using heterogeneous and two-dimensional dispatcher technologies to implement FPGA-based CNN accelerators at different computational levels of CNN so that the computational time of CNN can be reduced and 2) proposing a flexible and integrated pipeline software and hardware (SW/HW) architecture to reduce the integration overheads of using a CNN framework. The experimental results show that the proposed architectures have the best performance and minimum FPGA resource requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Anatomization of FPGA-Based Neural Networks

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

Article 24 November 2023

References

Akira J, Fujii T, Sato S, Nakahara H (2018) An FPGA realization of OpenPose based on a sparse weight convolutional neural network. In: 2018 international conference on field- programmable technology (FPT). IEEE, pp 310-313
Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCLTM deep learning accelerator on Arria 10. In: 2017 international symposium on field-programmable gate array (FPGA). ACM, pp 55–64
Chakradhar S, Sankaradas M, Jakkula V, Cadambi S (2010) A dynamically configurable coprocessor for convolutional neural networks. In: 2010 37th international symposium oncomputer architecture (ISCA). ACM 247-257
Chen T, Du Z, Sun N, Wang J, Wu C, Chn Y (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigarch Comput Architect News 42(1):269–284. https://doi.org/10.1145/2654822.2541967
Article Google Scholar
Chen YT, Cong J, Fang Z, Lei J, Wei P (2016) When spark meets FPGAs: a case study for next-generation DNA sequencing acceleration. In: 2016 8^th Usenix workshop on hot topic in cloud computing (HotCloud). https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/chen. Accessed 11 June 2022
Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138. https://doi.org/10.1109/JSSC.2016.2616357
Article Google Scholar
Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: 2014 24th international conference on artificial neural networks (ICANN). Springer, pp 281-290
Farabet C, Poulet C, Han JY, LeCun Y (2009) Cnp: an FPGA-based processor for convolutional networks. In: 2009 international conference on field programmable logic and applications (FPL). IEEE, pp 32-37
Github repository (2022) https://github.com/tensorflow/tensorflow. Accessed 11 June 2022
Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: IEEE 25th international symposium on field-programmable custom computing machines (FCCM). IEEE, pp 152–159
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: 2015 international conference on machine learning (ICML). ACM, pp 1737-1746
Han S, Mao H, Dally WJ (2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. https://arxiv.org/abs/1510.00149. Accessed 11 June 2022
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andeetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for Mobile vision applications. arXiv. https://arxiv.org/abs/1704.04861. Accessed 11 June 2022.
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Article Google Scholar
Jouppi N, Young C, Patil N, Patterson D, Agrawal G, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: 2017 44th international symposium on computer architecture (ISCA). ACM, pp 1–12
Krizhevsky A, Sutskever I, Hinton GE (2017) Image net classification with deep convolutional neural networks. ACM Commun 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Li H, Fan X, Li J, Cao W, Zhou X, Wang L (2016) A High-Performance FPGA-based Accelerator for Large-scale Convolutional Neural Networks. In: 2016 International conference on field programmable logic and applications (FPL). IEEE, pp 1–9
Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 international conference on field programmable logic and applications (FPL). IEEE, pp 1-9
Lian X, Liu Z, Song Z, Dai J, Zhou W, Ji X (2019) High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans Very Large-Scale Integr (VLSI) Syst 27(8):1874–1885. https://doi.org/10.1109/TVLSI.2019.2913958
Article Google Scholar
Noronha D, Salehpour B, Wilton SJE (2018) LeFlow: enabling flexible FPGA high-level synthesis of TensorFlow deep neural networks. In: 5th international workshop on FPGA for software programmer (FSP). VDE-Verlag, pp 1–8
Rosenberg C (2013) Improving photo search: a step across the semantic gap. Google AI blog. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html. Accessed 11 June 2022
Shen Y, Ferdman M, Milder P (2017) Maximizing CNN accelerator efficiency through resource partitioning. In: 2017 44th international symposium on computer architecture (ISCA). ACM, 535–547
Simonyan, K., Zisserman (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556. Accessed 11 June 2022.
Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo JS, Cao Y (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: 2016 international symposium on field-programmable gate array (FPGA). ACM, pp 16–25
Umuroglu Y, Fraser N, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable Binarized neural network inference. In: 2017 international symposium on field-programmable gate array (FPGA). ACM, pp 65–74
Wei X, Liang Y, Li X, Yu CH, Zhang P, Cong J (2018) “TGPA: Tile-Grained Pipeline Architecture for Low latency CNN Inference,” in 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8
Wikipedia (2022) https://en.wikipedia.org/wiki/TensorFlow. Accessed 11 June 2022
Xilinx (2012) Large FPGA methodology guide. Xilinx web. https://www.xilinx.com/support/documentation/sw_manuals/xilinx13_4/ug872_largefpga.pdf. Accessed 11 June 2022
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator Design for Deep Convolutional Neural Networks. In: 2015 international symposium on field-programmable gate array (FPGA). ACM, pp 161-170
Zhang C, Wu D, Sun J, Sun G, Luo G, Cong J (2016) Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster. In: 2016 International symposium on low power electronics and design (ISLPED). ACM, pp 326–331

Download references

Funding

This work was supported by Ministry of Science and Technology, Taiwan, MOST 110–2221-E-024-001.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National University of Tainan, Tainan, Taiwan
Chi-Chou Kao

Authors

Chi-Chou Kao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chi-Chou Kao.

Ethics declarations

Conflict of interest

To the best of our knowledge, the named authors have no conflict of interest, financial or otherwise.

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kao, CC. Performance-oriented FPGA-based convolution neural network designs. Multimed Tools Appl 82, 21019–21030 (2023). https://doi.org/10.1007/s11042-023-14537-4

Download citation

Received: 11 September 2021
Revised: 30 June 2022
Accepted: 31 January 2023
Published: 09 February 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11042-023-14537-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance-oriented FPGA-based convolution neural network designs

Abstract

Access this article

Similar content being viewed by others

An Anatomization of FPGA-Based Neural Networks

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance-oriented FPGA-based convolution neural network designs

Abstract

Access this article

Similar content being viewed by others

An Anatomization of FPGA-Based Neural Networks

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation