Skip to main content
Log in

A Survey of Algorithmic and Hardware Optimization Techniques for Vision Convolutional Neural Networks on FPGAs

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In today’s world, the applications of convolutional neural networks (CNN) are limitless and are employed in numerous fields. The CNNs get wider and deeper to achieve near-human accuracy. Implementing such networks on resource constrained hardware is a cumbersome task. CNNs need to be optimized both on hardware and algorithmic levels to compress and fit into resource limited devices. This survey aims to investigate different optimization techniques of Vision CNNs, both on algorithmic and hardware level, which would help in efficient hardware implementation, especially for FPGAs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Putnam A et al (2015) A reconfigurable fabric for accelerating large-scale datacenter services. IEEE Micro 35(3):10–22

    Article  Google Scholar 

  2. Microsoft Research Blog. Microsoft unveils Project Brainwave for real-time AI. https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave, (2017)

  3. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386

    Article  Google Scholar 

  4. Mcculloch Warren, Pitts Walter (1943) A logical calculus of ideas immanent in nervous activity. Bull Math Biophys 5:127–147

    Article  MathSciNet  Google Scholar 

  5. Ackley H, Hinton E, Sejnowski J (1985) A learning algorithm for boltzmann machines. Cognit Sci 9:147–169

    Article  Google Scholar 

  6. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pages 2278–2324

  7. Alex Krizhevsky, Ilya Sutskever, Hinton Geoffrey E (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  8. Lin Min, Chen Qiang, Yan Shuicheng (2014) Network in network. arXiv:1312.4400

  9. Szegedy Christian, Liu W, Jia Y, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan D, Vanhoucke V, Rabinovich Andrew (2015) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9

  10. Simonyan Karen, Zisserman Andrew (2015) Very deep convolutional networks for large-scale image recognition

  11. He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian (June 2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  12. Deng Jia, Dong W, Socher R, Li L, Li K, Fei-Fei Li (2009) Imagenet: A large-scale hierarchical image database. In: CVPR 2009

  13. Iandola Forrest N, Moskewicz Matthew W, Ashraf Khalid, Han Song, Dally William J, Keutzer Kurt (2017) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. ICLR

  14. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788

  15. Andrew G (2017) Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Efficient convolutional neural networks for mobile vision applications. CoRR, Mobilenets

  16. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269

  17. Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, Sun Jian (2017) Shufflenet: An extremely efficient convolutional neural network for mobile devices. CoRR, arXiv:1707.01083

  18. Howard A, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1314–1324

  19. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520

  20. Ma Ningning, Zhang Xiangyu, Zheng Hai-Tao, Sun Jian (September 2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV)

  21. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8697–8710

  22. Liu Chenxi, Zoph Barret, Shlens Jonathon, Hua Wei, Li Li-Jia, Fei-Fei Li, Yuille Alan L, Huang Jonathan, Murphy Kevin (2017) Progressive neural architecture search. CoRR, arXiv:1712.00559

  23. Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Le Quoc V (2018) Mnasnet: Platform-aware neural architecture search for mobile. CoRR, arXiv:1807.11626

  24. Real Esteban, Aggarwal Alok, Huang Yanping, Le Quoc V (2018) Regularized evolution for image classifier architecture search. CoRR, arXiv:1802.01548

  25. Liu Hanxiao, Simonyan Karen, Yang Yiming (2018) DARTS: differentiable architecture search. CoRR, arXiv:1806.09055

  26. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826

  27. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807

  28. Jiang Weiwen, Zhang Xinyi, Sha Edwin Hsing-Mean, Yang Lei, Zhuge Qingfeng, Shi Yiyu, Hu Jingtong (2019) Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. CoRR, arXiv:1901.11211

  29. Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: FPGA ’15

  30. Vanhoucke Vincent, Senior Andrew, Mao Mark Z (2011) Improving the speed of neural networks on cpus. In: Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011

  31. Gupta Suyog, Agrawal Ankur, Gopalakrishnan Kailash, Narayanan Pritish (2015) Deep learning with limited numerical precision. volume 37 of Proceedings of Machine Learning Research, pages 1737–1746, Lille, France, 07–09 Jul 2015. PMLR

  32. Kevin Kiningham (2017) Design and analysis of a hardware cnn accelerator. Small 27:6

    Google Scholar 

  33. Wei Xuechao, Yu CH, Zhang P, Chen Youxiang, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6

  34. Zhu Yuhao, Mattina Matthew, Whatmough P (2018) Mobile machine learning hardware at arm: A systems-on-chip (soc) perspective. arXiv:1801.06274

  35. Nallatech. FPGA Acceleration of Convolutional Neural Networks. https://www.nallatech.com/wp-content/uploads/Nalllatech-Whitepaper-FPGA-Accelerated-CNN-003TR.pdf

  36. Suda Naveen, Chandra Vikas, Dasika G, Mohanty Abinash, Ma Yu-Fei, Vrudhula S, Seo Jae sun, Cao Y (2016) Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: FPGA ’16

  37. Wang Y, Xu J, Han Y, Li H, Li X (2016) Deepburning: Automatic generation of fpga-based learning accelerators for the neural network family. 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6

  38. Cun Yann Le, Denker John S, Solla Sara A (1990) Optimal brain damage. In: Advances in Neural Information Processing Systems, pages 598–605. Morgan Kaufmann

  39. Hassibi Babak, Stork David G (1993) Stork Crc. Ricoh. Com. Second order derivatives for network pruning: Optimal brain surgeon. In: Advances in Neural Information Processing Systems 5, pages 164–171. Morgan Kaufmann

  40. Page A, Jafari A, Shea C, Mohsenin T (2017) Sparcnet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J Emerg Technol Comput Syst 13(3):31

    Article  Google Scholar 

  41. Molchanov P, Tyree S, Karras Tero, Aila Timo, Kautz J (2017) Pruning convolutional neural networks for resource efficient inference. International Conference on Learning Representations (ICLR)

  42. Han Song, Pool Jeff, Tran John, Dally William J (2015) Learning both weights and connections for efficient neural networks. CoRR, arXiv:1506.02626

  43. Lance Elliot. Deep Compression and Pruning for Machine Learning in AI Self-Driving Cars: Using Convolutional Neural Networks (CNN). https://aitrends.com/ai-insider/deep-compression-pruning-machine-learning-ai-self-driving-cars-using-convolutional-neural-networks-cnn/, (2017)

  44. Jose Hanson Stephen, Pratt Lorien Y (1989) Comparing biases for minimal network construction with back-propagation. In: Touretzky DS (ed) Advances in neural information processing systems 1. Morgan-Kaufmann, Burlington, pp 177–185

    Google Scholar 

  45. Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Koh Pang W, Le Quoc V, Ng Andrew Y (2010) Tiled convolutional neural networks. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23. Curran Associates Inc, New York, pp 1279–1287

    Google Scholar 

  46. Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee SK, Hernández-Lobato JM, Wei G, Brooks D (2016) Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 267–278

  47. Han Song, Mao Huizi, Dally William J (October 2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv e-prints, page arXiv:1510.00149

  48. Chen Wenlin, Wilson James T, Tyree Stephen, Weinberger Kilian Q, Chen Yixin (2015) Compressing neural networks with the hashing trick. CoRR, arXiv:1504.04788

  49. Gong Yunchao, Liu Liu, Yang Ming, Bourdev Lubomir D (2014) Compressing deep convolutional networks using vector quantization. CoRR, arXiv:1412.6115

  50. Denton Emily L, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates Inc, Burlington, pp 1269–1277

    Google Scholar 

  51. Zhang X, Zou Jianhua, Ming Xiang, He K, Sun J (2015) Efficient and accurate approximations of nonlinear convolutional networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1984–1992

  52. Jaderberg Max, Vedaldi Andrea, Zisserman Andrew (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British Machine Vision Conference. BMVA Press

  53. Zeiler Matthew D, Fergus R (2014) Visualizing and understanding convolutional networks. In: European Conference on Computer Vision (ECCV)

  54. Christian Szegedy, Ioffe S, Vanhoucke V, Alemi Alexander Amir (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI

  55. Wu B, Wan A, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 446–454

  56. Chen Liang-Chieh, Papandreou George, Schroff Florian, Adam Hartwig (2017) Rethinking atrous convolution for semantic image segmentation. CoRR, arXiv:1706.05587

  57. Huang Gao, Liu Shichen, Maaten LVD, Weinberger Kilian Q (2018) Condensenet: An efficient densenet using learned group convolutions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2752–2761

  58. Springenberg Jost Tobias, Dosovitskiy A, Brox T, Riedmiller Martin A (2015) Striving for simplicity: The all convolutional net. CoRR, arXiv:1412.6806

  59. Krizhevsky A, Hinton G (2009) Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto

  60. Netzer Yuval, Wang Tao, Coates Adam, Bissacco Alessandro, Wu Bo, Ng Andrew Y (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011

  61. Courbariaux Matthieu, Bengio Yoshua (2016) Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv:1602.02830

  62. Lai Liangzhen, Suda Naveen, Chandra V (2018) Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv:1801.06601

  63. Lai Liangzhen, Suda Naveen, Chandra V (2017) Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv:/1703.03073

  64. Qiu Jiantao, Wang J, Yao Song, Guo K, Li Boxun, Zhou Erjin, Yu J, Tang T, Xu N, Song S, Wang Yu, Yang H (2016) Going deeper with embedded fpga platform for convolutional neural network. In: FPGA ’16

  65. Courbariaux Matthieu, Bengio Yoshua, David J (2014) Training deep neural networks with low precision multiplications. arXiv:1412.7024

  66. Gysel Philipp (2016) Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv:abs/1605.06402

  67. Sharma H, Park Jongse, Suda Naveen, Lai Liangzhen, Chau Benson, Kim JK, Chandra Vikas, Esmaeilzadeh H (2018) Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pages 764–775

  68. Tripathi Subarna, Dane Gökçe, Kang B, Bhaskaran V, Nguyen T (2017) Lcdet: Low-complexity fully-convolutional neural networks for object detection in embedded systems. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 411–420

  69. Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, Burlington, pp 3123–3131

    Google Scholar 

  70. Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren (2018) A gpu-outperforming fpga accelerator architecture for binary convolutional neural networks. ACM J Emerg Technol Comput Syst 14(2):1–16

    Google Scholar 

  71. McDanel Bradley, Teerapittayanon Surat, Kung HT (2017) Embedded binarized neural networks. In: Proceedings of the 2017 International Conference on Embedded Wireless Systems and Networks, EWSN ’17, page 168–173, USA. Junction Publishing

  72. Ding Ruizhou, Liu Z, Shi Rongye, Marculescu Diana, Blanton R (2017) Lightnn: Filling the gap between conventional deep neural networks and binarized networks. Proceedings of the on Great Lakes Symposium on VLSI 2017

  73. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: Bastian Leibe, Jiri Matas, Nicu Sebe, Max Welling (eds) Computer Vision - ECCV 2016. Springer International Publishing, Cham, pp 525–542

    Google Scholar 

  74. Zhou Shuchang, Ni Zekun, Zhou X, Wen He, Wu Yuxin, Zou Yuheng (2016) Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. rXiv:abs/1606.06160

  75. Hubara Itay, Courbariaux Matthieu, Soudry Daniel, El-Yaniv Ran, Bengio Yoshua (2017) Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18:187:1–187:30

    MathSciNet  MATH  Google Scholar 

  76. Wu J, Leng C, Wang Yuhang, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4820–4828

  77. Hinton Geoffrey E, Vinyals Oriol, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531

  78. Romero A, Ballas Nicolas, Kahou S, Chassang Antoine, Gatta C, Bengio Yoshua (2015) Fitnets: Hints for thin deep nets. CoRR, arXiv:1412.6550

  79. Lu Liang, Guo Michelle, Renals S (2017) Knowledge distillation for small-footprint highway networks. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4820–4824

  80. Srivastava R, Greff Klaus, Schmidhuber J (2015) Highway networks. arXiv:1505.00387

  81. Mishra A, Marr Debbie (2018) Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv:1711.05852

  82. Badrinarayanan Vijay, Kendall Alex, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39:2481–2495

    Article  Google Scholar 

  83. Jie H, Shen L, Samuel A, Sun Gang W, Enhua G (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023

    Article  Google Scholar 

  84. Xie Saining, Girshick Ross B, Dollár Piotr, Tu Zhuowen, He Kaiming (2017) Aggregated residual transformations for deep neural networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995

  85. Pham Hieu, Guan Melody, Zoph Barret, Le Quoc, Dean Jeff (2018) Efficient neural architecture search via parameters sharing. volume 80 of Proceedings of Machine Learning Research, pages 4095–4104, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR

  86. Wang Yanchen (2017) Deep learning in real time - inference acceleration and continuous training. https://medium.com/syncedreview/deep-learning-in-real-time-inference-acceleration-and-continuous-training-17dac9438b0b

  87. Chen Tianshi, Du Zidong, Sun Ninghui, Wang J, Wu Chengyong, Chen Yunji, Temam Olivier (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ASPLOS ’14

  88. Chen Yunji, Luo T, Liu Shaoli, Zhang S, He Liqiang, Wang J, Li L, Chen Tianshi, Xu Z, Sun Ninghui, Temam O (2014) Dadiannao: A machine-learning supercomputer. 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 609–622

  89. Chen Yu-Hsin, Krishna T, Emer J, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52:127–138

    Article  Google Scholar 

  90. Chen Y, Emer J, Sze V (2018) Eyeriss v2: A flexible and high-performance accelerator for emerging deep neural networks. arXiv:1807.07928

  91. Park Hyunsun, Kim Dongyoung, Ahn Junwhan, Yoo Sungjoo (2016) Zero and data reuse-aware fast convolution for deep neural networks on gpu. 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 1–10

  92. Han Song, Liu Xingyu, Mao Huizi, Pu J, Pedram A, Horowitz M, Dally W (2016) Eie: Efficient inference engine on compressed deep neural network. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 243–254

  93. Li Dawei, Wang Xiaolong, Kong Deguang (2018) Deeprebirth: Accelerating deep neural network execution on mobile devices. arXiv:1708.04728

  94. Aimar A, Mostafa Hesham, Calabrese Enrico, Rios-Navarro Antonio, Tapiador-Morales Ricardo, Lungu I, Milde Moritz B, Corradi F, Linares-Barranco A, Liu Shih-Chii, Delbruck T (2019) Nullhop: a flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans Neural Netw Learn Syst 30:644–656

    Article  Google Scholar 

  95. Ovtcharov Kalin, Ruwase Olatunji, Kim J, Fowers Jeremy , Strauss K, Chung E (2015) Accelerating deep convolutional neural networks using specialized hardware

  96. Peemen Maurice, Setio Arnaud AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. 2013 IEEE 31st International Conference on Computer Design (ICCD), pages 13–19

  97. Jouppi N et al (2017) In-datacenter performance analysis of a tensor processing unit. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 1–12

  98. Shen Y, Ferdman Michael, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 535–547

  99. Aydonat U, O’Connell S, Capalija D, Ling A, Chiu Gordon R (2017) An opencl deep learning accelerator on arria 10. In: FPGA ’17

  100. Lu L, Liang Y, Xiao Qingcheng, Yan Shengen (2017) Evaluating fast algorithms for convolutional neural networks on fpgas. 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 101–108

  101. Zhang Jialiang, Li J (2017) Improving the performance of opencl-based fpga accelerator for convolutional neural network. In: FPGA ’17

  102. Zhao Ruizhe, Niu Xinyu, Wu Yajie, Luk W, Liu Q (2017) Optimizing cnn-based object detection algorithms on embedded fpga platforms. In: ARC

  103. Arish S, Sinha Sharad, Smitha KG (2019) Optimization of Convolutional Neural Networks on Resource Constrained Devices. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 19–24

  104. Albericio J, Delmas A, Judd P, Sharify Sayeh, O’Leary G, Genov R, Moshovos A (2017) Bit-pragmatic deep neural network computing. 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 382–394

  105. Parashar Angshuman, Rhu Minsoo, Mukkara Anurag, Puglielli Antonio, Venkatesan R, Khailany B, Emer J, Keckler Stephen W, Dally W (2017) Scnn: An accelerator for compressed-sparse convolutional neural networks. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 27–40

  106. Venieris Stylianos I, Bouganis C (2016) fpgaconvnet: A framework for mapping convolutional neural networks on fpgas. 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 40–47

  107. Motamedi Mohammad, Gysel Philipp, Akella V, Ghiasi S (2016) Design space exploration of fpga-based deep convolutional neural networks. 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 575–580

  108. Motamedi Mohammad, Gysel Philipp, Ghiasi S (2017) Placid: a platform for fpga-based accelerator creation for dcnns. ACM Trans Multimed Comput, Commun, Appl (TOMM) 13:1–21

    Article  Google Scholar 

  109. Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, Diana Marculescu (2020) Single-path nas: designing hardware-efficient convnets in less than 4 hours. In: Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet (eds) Mach Learn Know Disc Databases. Springer International Publishing, Cham, pp 481–497

    Google Scholar 

  110. Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, Keutzer Kurt (2018) Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. CoRR, arXiv:1708.04728

  111. Cai Han, Zhu Ligeng, Han Song (2018) Proxylessnas: Direct neural architecture search on target task and hardware. CoRR, arXiv:1812.03443

  112. Lu Qing, Jiang Weiwen, Xu Xiaowei, Shi Yiyu, Hu Jingtong (2019) On neural architecture search for resource-constrained hardware platforms

  113. Jiang Weiwen, Yang Lei, Dasgupta Sakyasingha, Jingtong Hu, Shi Yiyu (2020) Hardware and neural architecture co-search with hot start, Standing on the shoulders of giants

  114. Li Yuhong, Hao Cong, Zhang Xiaofan, Liu Xinheng, Chen Yao, Xiong Jinjun, Hwu Wen mei, Chen Deming (2020) Efficient differentiable dnn architecture and implementation co-search for embedded ai solutions, Edd

  115. Zhang X, Wang J, Zhu C, Lin Y, Xiong J, Hwu W, Chen D (2018) Dnnbuilder: an automated tool for building high-performance dnn hardware accelerators for fpgas. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8

  116. Hao Cong, Chen Yao, Liu Xinheng, Sarwari Atif, Sew Daryl, Dhar Ashutosh, Bryan Wu, Dongdong Fu, Xiong Jinjun, Hwu Wen mei, Gu Junli, Chen Deming (2019) Neural architecture and implementation search and its applications in autonomous driving, Nais

  117. Marchisio Alberto, Massa Andrea, Mrazek Vojtech, Bussolino Beatrice, Martina Maurizio, Shafique Muhammad (2020) Nascaps: A framework for neural architecture search to optimize the accuracy and hardware efficiency of convolutional capsule networks

  118. Achararit P, Hanif MA, Putra RVW, Shafique M, Hara-Azumi Y (2020) Apnas: accuracy-and-performance-aware neural architecture search for neural hardware accelerators. IEEE Access 8:165319–165334

    Article  Google Scholar 

  119. Jiang Weiwen, Yang Lei, Sha Edwin Hsing-Mean, Zhuge Qingfeng, Gu Shouzhen, Shi Yiyu, Hu Jingtong (2019) Hardware/software co-exploration of neural architectures. CoRR, arXiv:1907.04650

  120. Hao Cong, Zhang Xiaofan, Li Yuhong, Huang Sitao, Xiong Jinjun, Rupnow Kyle, Hwu Wen-Mei, Chen Deming (2019) FPGA/DNN co-design: An efficient design methodology for iot intelligence on the edge. CoRR, arXiv:1904.04421

  121. Zhang Xiaofan, Lu Haoming, Hao Cong, Li Jiachen, Cheng Bowen, Li Yuhong, Rupnow Kyle, Xiong Jinjun, Huang Thomas, Shi Honghui, Hwu Wen mei, Chen Deming (2020) Skynet: a hardware-efficient method for object detection and tracking on embedded systems

  122. Abadi M et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467

  123. Chollet François et al (2018) Keras: The Python Deep Learning library. https://keras.io

  124. Paszke Adam, Gross S, Chintala Soumith, Chanan G, Yang E, Devito Zachary, Lin Zeming, Desmaison Alban, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Workshop

  125. Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, Darrell Trevor (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM ’14, page 675–678, New York, NY, USA. Association for Computing Machinery

  126. Gu Junli, Liu Y, Gao Yuan, Zhu Maohua (2016) Opencl caffe: Accelerating and enabling a cross platform machine learning framework. In: Proceedings of the 4th International Workshop on OpenCL

  127. Guo Shasha, Wang L, Chen B, Dou Qiang, Tang Yuxing, Li Zhisheng (2017) Fixcaffe: Training cnn with low precision arithmetic operations by fixed point caffe. In: APPT

  128. Al-Rfou Rami et al (2016) Theano: A python framework for fast computation of mathematical expressions. arXiv:1605.02688

  129. Travis E, Oliphant (2006) A guide to NumPy. USA: Trelgol Publishing

  130. nVIDIA. Tensorrt userguide. nVidia Corporation, (2017)

  131. David Robert, Duke Jared, Jain Advait, Reddi Vijay Janapa, Jeffries Nat, Li Jian, Kreeger Nick, Nappier Ian, Natraj Meghna, Regev Shlomi, Rhodes Rocky, Wang Tiezhen, Warden Pete (2020) Embedded machine learning on tinyml systems, Tensorflow lite micro

  132. Lin Ji, Chen Wei-Ming, Lin Yujun, Cohn John, Gan Chuang, Han Song (2020) Tiny deep learning on iot devices, Mcunet

  133. Microsoft. Embedded Learning Library, (2020). https://microsoft.github.io/ELL/

  134. uTensor. uTensor, (2020). https://github.com/uTensor/uTensor

  135. Rotem Nadav, Fix Jordan, Abdulrasool Saleem, Catron Garret, Deng Summer, Dzhabarov Roman, Gibson Nick, Hegeman James, Lele Meghan, Levenstein Roman, Montgomery Jack, Maher Bert, Nadathur Satish, Olesen Jakob, Park Jongsoo, Rakhov Artem, Smelyanskiy Misha, Wang Man (2019) Graph lowering compiler techniques for neural networks, Glow

  136. STMicroelectronics. STM32Cube.AI, 2020. https://www.st.com/content/st com/en/stm32-ann.html

  137. Chen T, Li Mu, Li Y, Lin M, Wang Naiyan, Wang Minjie, Xiao Tianjun, Xu B, Zhang C, Zhang Zheng (2015) Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274

  138. Zhang C, Fang Zhenman, Zhou Peipei, Pan P, Cong J (2016) Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8

  139. Clive Maxfield. Push-button generation of deep neural networks. https://www.embedded.com/push-button-generation-of-deep-neural-networks/, (2016)

  140. Umuroglu Yaman, Fraser Nicholas J, Gambardella G, Blott M, Leong P, Jahre Magnus, Vissers K(2017) Finn: A framework for fast, scalable binarized neural network inference. In: FPGA ’17

  141. Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, Yang H (2018) Angel-eye: a complete design flow for mapping cnn onto embedded fpga. IEEE Trans Computer-Aided Des Integr Circuits Syst 37(1):35–47

    Article  Google Scholar 

  142. Kouris Alexandros, Venieris Stylianos I, Bouganis C (2018) Cascadecnn: Pushing the performance limits of quantisation in convolutional neural networks. 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pages 155–1557

  143. Shen Junzhong, Huang Y, Wang Zelong, Qiao Yuran, Wen Mei, Zhang C (2018) Towards a uniform template-based architecture for accelerating 2d and 3d cnns on fpga. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  144. Lai Y, Chi Yuze, Hu Yuwei, Wang J, Yu CH, Zhou Yuan, Cong J, Zhang Z (2019) Heterocl: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  145. Xu J, Liu Zhiqiang, Jiang Jingfei, Dou Y, Li S (2018) Cafpga: an automatic generation model for cnn accelerator. Microprocess. Microsyst 60:196–206

    Article  Google Scholar 

  146. Sharma H, Park J, Mahajan D, Amaro E, Kim JK, Shao C, Mishra A, Esmaeilzadeh H (2016) From high-level deep neural models to fpgas. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1–12

  147. Xu Pengfei, Zhang Xiaofan, Hao Cong, Zhao Yang, Zhang Yongan, Wang Yue, Li Chaojian, Guan Zetong, Chen Deming, Lin Yingyan (Feb 2020) Autodnnchip: An automated dnn chip predictor and builder for both fpgas and asics. The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  148. Zeng H, Chen Ren, Zhang C, Prasanna V (2018) A framework for generating high throughput cnn implementations on fpgas. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  149. Ma Y, Suda Naveen, Cao Yu, Seo Jae sun, Vrudhula S (2016) Scalable and modularized rtl compilation of convolutional neural networks onto fpga. 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pages 1–8

  150. Ma Y, Cao Y, Vrudhula S, Seo J (2017) An automatic rtl compiler for high-throughput fpga implementation of diverse deep convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pages 1–8

  151. Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) Fp-dnn: An automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 152–159

  152. Ma Yufei, Suda Naveen, Cao Yu, Vrudhula Sarma, Seo Jae sun (2018) Alamo: Fpga acceleration of deep learning algorithms with a modularized rtl compiler. Integration 62:14–23

    Article  Google Scholar 

  153. Gokhale V, Zaidy A, Chang AXM, Culurciello E (2017) Snowflake: An efficient hardware accelerator for convolutional neural networks. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4

  154. Cieszewski Radosław, Linczuk Maciej, Pozniak Krzysztof, Romaniuk Ryszard (2013) Review of parallel computing methods and tools for FPGA technology. In: Romaniuk Ryszard S (ed) Photonics applications in astronomy. communications, industry, and high-energy physics experiments 2013, volume 8903. International Society for Optics and Photonics, SPIE, pp 596–608

    Google Scholar 

  155. Svensson Bo Joel, Tripathi Rakesh (2016) Getting started with opencl on the zynq

  156. Muslim F, Ma L, Roozmeh Medhi, Lavagno L (2017) Efficient fpga implementation of opencl high-performance computing applications via high-level synthesis. IEEE Access 5:2747–2762

    Article  Google Scholar 

  157. Cieszewski R, Pozniak K, Romaniuk Ryszard S (2014) Python based high-level synthesis compiler. In: Proc. SPIE 9290, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments

  158. Cieszewski Radosław, Linczuk Maciej (2016) Rpython high-level synthesis. In: Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA)

  159. Decaluwe Jan (2004) Myhdl: a python-based hardware description language. Linux J 2004:5

    Google Scholar 

  160. Lockhart D, Zibrat Gary, Batten Christopher (2014) Pymtl: A unified framework for vertically integrated computer architecture research. 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 280–292

  161. Haglund Per, Mencer O, Luk W, Tai Benjamin (2003) Hardware design with a scripting language. In: FPL

  162. Clow John, Tzimpragos Georgios, Dangwal D, Guo Sammy, McMahan Joseph, Sherwood T (2017) A pythonic approach for rapid hardware prototyping and instrumentation. 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pages 1–7

  163. Mashtizadeh Ali José (2007) Phdl : a python hardware design framework

  164. Merchant S, Peterson GD, Park Sang Ki, Kong SG (2006) Fpga implementation of evolvable block-based neural networks. In: 2006 IEEE International Conference on Evolutionary Computation, pages 3129–3136

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arish Sateesan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the MOE Singapore Tier-1 grant under Grant No.: 2017-T1-001-192 (RG28/17). The first author did most of the work while at NTU, Singapore.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sateesan, A., Sinha, S., K. G., S. et al. A Survey of Algorithmic and Hardware Optimization Techniques for Vision Convolutional Neural Networks on FPGAs. Neural Process Lett 53, 2331–2377 (2021). https://doi.org/10.1007/s11063-021-10458-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10458-1

Keywords

Navigation