Abstract
In today’s world, the applications of convolutional neural networks (CNN) are limitless and are employed in numerous fields. The CNNs get wider and deeper to achieve near-human accuracy. Implementing such networks on resource constrained hardware is a cumbersome task. CNNs need to be optimized both on hardware and algorithmic levels to compress and fit into resource limited devices. This survey aims to investigate different optimization techniques of Vision CNNs, both on algorithmic and hardware level, which would help in efficient hardware implementation, especially for FPGAs.
Similar content being viewed by others
References
Putnam A et al (2015) A reconfigurable fabric for accelerating large-scale datacenter services. IEEE Micro 35(3):10–22
Microsoft Research Blog. Microsoft unveils Project Brainwave for real-time AI. https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave, (2017)
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386
Mcculloch Warren, Pitts Walter (1943) A logical calculus of ideas immanent in nervous activity. Bull Math Biophys 5:127–147
Ackley H, Hinton E, Sejnowski J (1985) A learning algorithm for boltzmann machines. Cognit Sci 9:147–169
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pages 2278–2324
Alex Krizhevsky, Ilya Sutskever, Hinton Geoffrey E (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Lin Min, Chen Qiang, Yan Shuicheng (2014) Network in network. arXiv:1312.4400
Szegedy Christian, Liu W, Jia Y, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan D, Vanhoucke V, Rabinovich Andrew (2015) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9
Simonyan Karen, Zisserman Andrew (2015) Very deep convolutional networks for large-scale image recognition
He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian (June 2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Deng Jia, Dong W, Socher R, Li L, Li K, Fei-Fei Li (2009) Imagenet: A large-scale hierarchical image database. In: CVPR 2009
Iandola Forrest N, Moskewicz Matthew W, Ashraf Khalid, Han Song, Dally William J, Keutzer Kurt (2017) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \(<\)1mb model size. ICLR
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788
Andrew G (2017) Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Efficient convolutional neural networks for mobile vision applications. CoRR, Mobilenets
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269
Zhang Xiangyu, Zhou Xinyu, Lin Mengxiao, Sun Jian (2017) Shufflenet: An extremely efficient convolutional neural network for mobile devices. CoRR, arXiv:1707.01083
Howard A, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y, Pang R, Adam H, Le Q (2019) Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1314–1324
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520
Ma Ningning, Zhang Xiangyu, Zheng Hai-Tao, Sun Jian (September 2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV)
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8697–8710
Liu Chenxi, Zoph Barret, Shlens Jonathon, Hua Wei, Li Li-Jia, Fei-Fei Li, Yuille Alan L, Huang Jonathan, Murphy Kevin (2017) Progressive neural architecture search. CoRR, arXiv:1712.00559
Tan Mingxing, Chen Bo, Pang Ruoming, Vasudevan Vijay, Le Quoc V (2018) Mnasnet: Platform-aware neural architecture search for mobile. CoRR, arXiv:1807.11626
Real Esteban, Aggarwal Alok, Huang Yanping, Le Quoc V (2018) Regularized evolution for image classifier architecture search. CoRR, arXiv:1802.01548
Liu Hanxiao, Simonyan Karen, Yang Yiming (2018) DARTS: differentiable architecture search. CoRR, arXiv:1806.09055
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807
Jiang Weiwen, Zhang Xinyi, Sha Edwin Hsing-Mean, Yang Lei, Zhuge Qingfeng, Shi Yiyu, Hu Jingtong (2019) Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. CoRR, arXiv:1901.11211
Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: FPGA ’15
Vanhoucke Vincent, Senior Andrew, Mao Mark Z (2011) Improving the speed of neural networks on cpus. In: Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011
Gupta Suyog, Agrawal Ankur, Gopalakrishnan Kailash, Narayanan Pritish (2015) Deep learning with limited numerical precision. volume 37 of Proceedings of Machine Learning Research, pages 1737–1746, Lille, France, 07–09 Jul 2015. PMLR
Kevin Kiningham (2017) Design and analysis of a hardware cnn accelerator. Small 27:6
Wei Xuechao, Yu CH, Zhang P, Chen Youxiang, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6
Zhu Yuhao, Mattina Matthew, Whatmough P (2018) Mobile machine learning hardware at arm: A systems-on-chip (soc) perspective. arXiv:1801.06274
Nallatech. FPGA Acceleration of Convolutional Neural Networks. https://www.nallatech.com/wp-content/uploads/Nalllatech-Whitepaper-FPGA-Accelerated-CNN-003TR.pdf
Suda Naveen, Chandra Vikas, Dasika G, Mohanty Abinash, Ma Yu-Fei, Vrudhula S, Seo Jae sun, Cao Y (2016) Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: FPGA ’16
Wang Y, Xu J, Han Y, Li H, Li X (2016) Deepburning: Automatic generation of fpga-based learning accelerators for the neural network family. 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1–6
Cun Yann Le, Denker John S, Solla Sara A (1990) Optimal brain damage. In: Advances in Neural Information Processing Systems, pages 598–605. Morgan Kaufmann
Hassibi Babak, Stork David G (1993) Stork Crc. Ricoh. Com. Second order derivatives for network pruning: Optimal brain surgeon. In: Advances in Neural Information Processing Systems 5, pages 164–171. Morgan Kaufmann
Page A, Jafari A, Shea C, Mohsenin T (2017) Sparcnet: a hardware accelerator for efficient deployment of sparse convolutional networks. ACM J Emerg Technol Comput Syst 13(3):31
Molchanov P, Tyree S, Karras Tero, Aila Timo, Kautz J (2017) Pruning convolutional neural networks for resource efficient inference. International Conference on Learning Representations (ICLR)
Han Song, Pool Jeff, Tran John, Dally William J (2015) Learning both weights and connections for efficient neural networks. CoRR, arXiv:1506.02626
Lance Elliot. Deep Compression and Pruning for Machine Learning in AI Self-Driving Cars: Using Convolutional Neural Networks (CNN). https://aitrends.com/ai-insider/deep-compression-pruning-machine-learning-ai-self-driving-cars-using-convolutional-neural-networks-cnn/, (2017)
Jose Hanson Stephen, Pratt Lorien Y (1989) Comparing biases for minimal network construction with back-propagation. In: Touretzky DS (ed) Advances in neural information processing systems 1. Morgan-Kaufmann, Burlington, pp 177–185
Jiquan Ngiam, Zhenghao Chen, Daniel Chia, Koh Pang W, Le Quoc V, Ng Andrew Y (2010) Tiled convolutional neural networks. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems 23. Curran Associates Inc, New York, pp 1279–1287
Reagen B, Whatmough P, Adolf R, Rama S, Lee H, Lee SK, Hernández-Lobato JM, Wei G, Brooks D (2016) Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 267–278
Han Song, Mao Huizi, Dally William J (October 2015) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv e-prints, page arXiv:1510.00149
Chen Wenlin, Wilson James T, Tyree Stephen, Weinberger Kilian Q, Chen Yixin (2015) Compressing neural networks with the hashing trick. CoRR, arXiv:1504.04788
Gong Yunchao, Liu Liu, Yang Ming, Bourdev Lubomir D (2014) Compressing deep convolutional networks using vector quantization. CoRR, arXiv:1412.6115
Denton Emily L, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27. Curran Associates Inc, Burlington, pp 1269–1277
Zhang X, Zou Jianhua, Ming Xiang, He K, Sun J (2015) Efficient and accurate approximations of nonlinear convolutional networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1984–1992
Jaderberg Max, Vedaldi Andrea, Zisserman Andrew (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British Machine Vision Conference. BMVA Press
Zeiler Matthew D, Fergus R (2014) Visualizing and understanding convolutional networks. In: European Conference on Computer Vision (ECCV)
Christian Szegedy, Ioffe S, Vanhoucke V, Alemi Alexander Amir (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI
Wu B, Wan A, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 446–454
Chen Liang-Chieh, Papandreou George, Schroff Florian, Adam Hartwig (2017) Rethinking atrous convolution for semantic image segmentation. CoRR, arXiv:1706.05587
Huang Gao, Liu Shichen, Maaten LVD, Weinberger Kilian Q (2018) Condensenet: An efficient densenet using learned group convolutions. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2752–2761
Springenberg Jost Tobias, Dosovitskiy A, Brox T, Riedmiller Martin A (2015) Striving for simplicity: The all convolutional net. CoRR, arXiv:1412.6806
Krizhevsky A, Hinton G (2009) Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto
Netzer Yuval, Wang Tao, Coates Adam, Bissacco Alessandro, Wu Bo, Ng Andrew Y (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011
Courbariaux Matthieu, Bengio Yoshua (2016) Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv:1602.02830
Lai Liangzhen, Suda Naveen, Chandra V (2018) Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv:1801.06601
Lai Liangzhen, Suda Naveen, Chandra V (2017) Deep convolutional neural network inference with floating-point weights and fixed-point activations. arXiv:/1703.03073
Qiu Jiantao, Wang J, Yao Song, Guo K, Li Boxun, Zhou Erjin, Yu J, Tang T, Xu N, Song S, Wang Yu, Yang H (2016) Going deeper with embedded fpga platform for convolutional neural network. In: FPGA ’16
Courbariaux Matthieu, Bengio Yoshua, David J (2014) Training deep neural networks with low precision multiplications. arXiv:1412.7024
Gysel Philipp (2016) Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv:abs/1605.06402
Sharma H, Park Jongse, Suda Naveen, Lai Liangzhen, Chau Benson, Kim JK, Chandra Vikas, Esmaeilzadeh H (2018) Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pages 764–775
Tripathi Subarna, Dane Gökçe, Kang B, Bhaskaran V, Nguyen T (2017) Lcdet: Low-complexity fully-convolutional neural networks for object detection in embedded systems. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 411–420
Matthieu Courbariaux, Yoshua Bengio, Jean-Pierre David (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, Burlington, pp 3123–3131
Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren (2018) A gpu-outperforming fpga accelerator architecture for binary convolutional neural networks. ACM J Emerg Technol Comput Syst 14(2):1–16
McDanel Bradley, Teerapittayanon Surat, Kung HT (2017) Embedded binarized neural networks. In: Proceedings of the 2017 International Conference on Embedded Wireless Systems and Networks, EWSN ’17, page 168–173, USA. Junction Publishing
Ding Ruizhou, Liu Z, Shi Rongye, Marculescu Diana, Blanton R (2017) Lightnn: Filling the gap between conventional deep neural networks and binarized networks. Proceedings of the on Great Lakes Symposium on VLSI 2017
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: Bastian Leibe, Jiri Matas, Nicu Sebe, Max Welling (eds) Computer Vision - ECCV 2016. Springer International Publishing, Cham, pp 525–542
Zhou Shuchang, Ni Zekun, Zhou X, Wen He, Wu Yuxin, Zou Yuheng (2016) Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. rXiv:abs/1606.06160
Hubara Itay, Courbariaux Matthieu, Soudry Daniel, El-Yaniv Ran, Bengio Yoshua (2017) Quantized neural networks: training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18:187:1–187:30
Wu J, Leng C, Wang Yuhang, Hu Q, Cheng J (2016) Quantized convolutional neural networks for mobile devices. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4820–4828
Hinton Geoffrey E, Vinyals Oriol, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
Romero A, Ballas Nicolas, Kahou S, Chassang Antoine, Gatta C, Bengio Yoshua (2015) Fitnets: Hints for thin deep nets. CoRR, arXiv:1412.6550
Lu Liang, Guo Michelle, Renals S (2017) Knowledge distillation for small-footprint highway networks. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4820–4824
Srivastava R, Greff Klaus, Schmidhuber J (2015) Highway networks. arXiv:1505.00387
Mishra A, Marr Debbie (2018) Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv:1711.05852
Badrinarayanan Vijay, Kendall Alex, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39:2481–2495
Jie H, Shen L, Samuel A, Sun Gang W, Enhua G (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023
Xie Saining, Girshick Ross B, Dollár Piotr, Tu Zhuowen, He Kaiming (2017) Aggregated residual transformations for deep neural networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5987–5995
Pham Hieu, Guan Melody, Zoph Barret, Le Quoc, Dean Jeff (2018) Efficient neural architecture search via parameters sharing. volume 80 of Proceedings of Machine Learning Research, pages 4095–4104, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR
Wang Yanchen (2017) Deep learning in real time - inference acceleration and continuous training. https://medium.com/syncedreview/deep-learning-in-real-time-inference-acceleration-and-continuous-training-17dac9438b0b
Chen Tianshi, Du Zidong, Sun Ninghui, Wang J, Wu Chengyong, Chen Yunji, Temam Olivier (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ASPLOS ’14
Chen Yunji, Luo T, Liu Shaoli, Zhang S, He Liqiang, Wang J, Li L, Chen Tianshi, Xu Z, Sun Ninghui, Temam O (2014) Dadiannao: A machine-learning supercomputer. 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 609–622
Chen Yu-Hsin, Krishna T, Emer J, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52:127–138
Chen Y, Emer J, Sze V (2018) Eyeriss v2: A flexible and high-performance accelerator for emerging deep neural networks. arXiv:1807.07928
Park Hyunsun, Kim Dongyoung, Ahn Junwhan, Yoo Sungjoo (2016) Zero and data reuse-aware fast convolution for deep neural networks on gpu. 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pages 1–10
Han Song, Liu Xingyu, Mao Huizi, Pu J, Pedram A, Horowitz M, Dally W (2016) Eie: Efficient inference engine on compressed deep neural network. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pages 243–254
Li Dawei, Wang Xiaolong, Kong Deguang (2018) Deeprebirth: Accelerating deep neural network execution on mobile devices. arXiv:1708.04728
Aimar A, Mostafa Hesham, Calabrese Enrico, Rios-Navarro Antonio, Tapiador-Morales Ricardo, Lungu I, Milde Moritz B, Corradi F, Linares-Barranco A, Liu Shih-Chii, Delbruck T (2019) Nullhop: a flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans Neural Netw Learn Syst 30:644–656
Ovtcharov Kalin, Ruwase Olatunji, Kim J, Fowers Jeremy , Strauss K, Chung E (2015) Accelerating deep convolutional neural networks using specialized hardware
Peemen Maurice, Setio Arnaud AA, Mesman B, Corporaal H (2013) Memory-centric accelerator design for convolutional neural networks. 2013 IEEE 31st International Conference on Computer Design (ICCD), pages 13–19
Jouppi N et al (2017) In-datacenter performance analysis of a tensor processing unit. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 1–12
Shen Y, Ferdman Michael, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 535–547
Aydonat U, O’Connell S, Capalija D, Ling A, Chiu Gordon R (2017) An opencl deep learning accelerator on arria 10. In: FPGA ’17
Lu L, Liang Y, Xiao Qingcheng, Yan Shengen (2017) Evaluating fast algorithms for convolutional neural networks on fpgas. 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 101–108
Zhang Jialiang, Li J (2017) Improving the performance of opencl-based fpga accelerator for convolutional neural network. In: FPGA ’17
Zhao Ruizhe, Niu Xinyu, Wu Yajie, Luk W, Liu Q (2017) Optimizing cnn-based object detection algorithms on embedded fpga platforms. In: ARC
Arish S, Sinha Sharad, Smitha KG (2019) Optimization of Convolutional Neural Networks on Resource Constrained Devices. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 19–24
Albericio J, Delmas A, Judd P, Sharify Sayeh, O’Leary G, Genov R, Moshovos A (2017) Bit-pragmatic deep neural network computing. 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 382–394
Parashar Angshuman, Rhu Minsoo, Mukkara Anurag, Puglielli Antonio, Venkatesan R, Khailany B, Emer J, Keckler Stephen W, Dally W (2017) Scnn: An accelerator for compressed-sparse convolutional neural networks. 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pages 27–40
Venieris Stylianos I, Bouganis C (2016) fpgaconvnet: A framework for mapping convolutional neural networks on fpgas. 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 40–47
Motamedi Mohammad, Gysel Philipp, Akella V, Ghiasi S (2016) Design space exploration of fpga-based deep convolutional neural networks. 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 575–580
Motamedi Mohammad, Gysel Philipp, Ghiasi S (2017) Placid: a platform for fpga-based accelerator creation for dcnns. ACM Trans Multimed Comput, Commun, Appl (TOMM) 13:1–21
Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, Diana Marculescu (2020) Single-path nas: designing hardware-efficient convnets in less than 4 hours. In: Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet (eds) Mach Learn Know Disc Databases. Springer International Publishing, Cham, pp 481–497
Wu Bichen, Dai Xiaoliang, Zhang Peizhao, Wang Yanghan, Sun Fei, Wu Yiming, Tian Yuandong, Vajda Peter, Jia Yangqing, Keutzer Kurt (2018) Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. CoRR, arXiv:1708.04728
Cai Han, Zhu Ligeng, Han Song (2018) Proxylessnas: Direct neural architecture search on target task and hardware. CoRR, arXiv:1812.03443
Lu Qing, Jiang Weiwen, Xu Xiaowei, Shi Yiyu, Hu Jingtong (2019) On neural architecture search for resource-constrained hardware platforms
Jiang Weiwen, Yang Lei, Dasgupta Sakyasingha, Jingtong Hu, Shi Yiyu (2020) Hardware and neural architecture co-search with hot start, Standing on the shoulders of giants
Li Yuhong, Hao Cong, Zhang Xiaofan, Liu Xinheng, Chen Yao, Xiong Jinjun, Hwu Wen mei, Chen Deming (2020) Efficient differentiable dnn architecture and implementation co-search for embedded ai solutions, Edd
Zhang X, Wang J, Zhu C, Lin Y, Xiong J, Hwu W, Chen D (2018) Dnnbuilder: an automated tool for building high-performance dnn hardware accelerators for fpgas. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8
Hao Cong, Chen Yao, Liu Xinheng, Sarwari Atif, Sew Daryl, Dhar Ashutosh, Bryan Wu, Dongdong Fu, Xiong Jinjun, Hwu Wen mei, Gu Junli, Chen Deming (2019) Neural architecture and implementation search and its applications in autonomous driving, Nais
Marchisio Alberto, Massa Andrea, Mrazek Vojtech, Bussolino Beatrice, Martina Maurizio, Shafique Muhammad (2020) Nascaps: A framework for neural architecture search to optimize the accuracy and hardware efficiency of convolutional capsule networks
Achararit P, Hanif MA, Putra RVW, Shafique M, Hara-Azumi Y (2020) Apnas: accuracy-and-performance-aware neural architecture search for neural hardware accelerators. IEEE Access 8:165319–165334
Jiang Weiwen, Yang Lei, Sha Edwin Hsing-Mean, Zhuge Qingfeng, Gu Shouzhen, Shi Yiyu, Hu Jingtong (2019) Hardware/software co-exploration of neural architectures. CoRR, arXiv:1907.04650
Hao Cong, Zhang Xiaofan, Li Yuhong, Huang Sitao, Xiong Jinjun, Rupnow Kyle, Hwu Wen-Mei, Chen Deming (2019) FPGA/DNN co-design: An efficient design methodology for iot intelligence on the edge. CoRR, arXiv:1904.04421
Zhang Xiaofan, Lu Haoming, Hao Cong, Li Jiachen, Cheng Bowen, Li Yuhong, Rupnow Kyle, Xiong Jinjun, Huang Thomas, Shi Honghui, Hwu Wen mei, Chen Deming (2020) Skynet: a hardware-efficient method for object detection and tracking on embedded systems
Abadi M et al (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467
Chollet François et al (2018) Keras: The Python Deep Learning library. https://keras.io
Paszke Adam, Gross S, Chintala Soumith, Chanan G, Yang E, Devito Zachary, Lin Zeming, Desmaison Alban, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS Workshop
Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, Darrell Trevor (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM ’14, page 675–678, New York, NY, USA. Association for Computing Machinery
Gu Junli, Liu Y, Gao Yuan, Zhu Maohua (2016) Opencl caffe: Accelerating and enabling a cross platform machine learning framework. In: Proceedings of the 4th International Workshop on OpenCL
Guo Shasha, Wang L, Chen B, Dou Qiang, Tang Yuxing, Li Zhisheng (2017) Fixcaffe: Training cnn with low precision arithmetic operations by fixed point caffe. In: APPT
Al-Rfou Rami et al (2016) Theano: A python framework for fast computation of mathematical expressions. arXiv:1605.02688
Travis E, Oliphant (2006) A guide to NumPy. USA: Trelgol Publishing
nVIDIA. Tensorrt userguide. nVidia Corporation, (2017)
David Robert, Duke Jared, Jain Advait, Reddi Vijay Janapa, Jeffries Nat, Li Jian, Kreeger Nick, Nappier Ian, Natraj Meghna, Regev Shlomi, Rhodes Rocky, Wang Tiezhen, Warden Pete (2020) Embedded machine learning on tinyml systems, Tensorflow lite micro
Lin Ji, Chen Wei-Ming, Lin Yujun, Cohn John, Gan Chuang, Han Song (2020) Tiny deep learning on iot devices, Mcunet
Microsoft. Embedded Learning Library, (2020). https://microsoft.github.io/ELL/
uTensor. uTensor, (2020). https://github.com/uTensor/uTensor
Rotem Nadav, Fix Jordan, Abdulrasool Saleem, Catron Garret, Deng Summer, Dzhabarov Roman, Gibson Nick, Hegeman James, Lele Meghan, Levenstein Roman, Montgomery Jack, Maher Bert, Nadathur Satish, Olesen Jakob, Park Jongsoo, Rakhov Artem, Smelyanskiy Misha, Wang Man (2019) Graph lowering compiler techniques for neural networks, Glow
STMicroelectronics. STM32Cube.AI, 2020. https://www.st.com/content/st com/en/stm32-ann.html
Chen T, Li Mu, Li Y, Lin M, Wang Naiyan, Wang Minjie, Xiao Tianjun, Xu B, Zhang C, Zhang Zheng (2015) Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274
Zhang C, Fang Zhenman, Zhou Peipei, Pan P, Cong J (2016) Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–8
Clive Maxfield. Push-button generation of deep neural networks. https://www.embedded.com/push-button-generation-of-deep-neural-networks/, (2016)
Umuroglu Yaman, Fraser Nicholas J, Gambardella G, Blott M, Leong P, Jahre Magnus, Vissers K(2017) Finn: A framework for fast, scalable binarized neural network inference. In: FPGA ’17
Guo K, Sui L, Qiu J, Yu J, Wang J, Yao S, Han S, Wang Y, Yang H (2018) Angel-eye: a complete design flow for mapping cnn onto embedded fpga. IEEE Trans Computer-Aided Des Integr Circuits Syst 37(1):35–47
Kouris Alexandros, Venieris Stylianos I, Bouganis C (2018) Cascadecnn: Pushing the performance limits of quantisation in convolutional neural networks. 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pages 155–1557
Shen Junzhong, Huang Y, Wang Zelong, Qiao Yuran, Wen Mei, Zhang C (2018) Towards a uniform template-based architecture for accelerating 2d and 3d cnns on fpga. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Lai Y, Chi Yuze, Hu Yuwei, Wang J, Yu CH, Zhou Yuan, Cong J, Zhang Z (2019) Heterocl: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Xu J, Liu Zhiqiang, Jiang Jingfei, Dou Y, Li S (2018) Cafpga: an automatic generation model for cnn accelerator. Microprocess. Microsyst 60:196–206
Sharma H, Park J, Mahajan D, Amaro E, Kim JK, Shao C, Mishra A, Esmaeilzadeh H (2016) From high-level deep neural models to fpgas. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1–12
Xu Pengfei, Zhang Xiaofan, Hao Cong, Zhao Yang, Zhang Yongan, Wang Yue, Li Chaojian, Guan Zetong, Chen Deming, Lin Yingyan (Feb 2020) Autodnnchip: An automated dnn chip predictor and builder for both fpgas and asics. The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Zeng H, Chen Ren, Zhang C, Prasanna V (2018) A framework for generating high throughput cnn implementations on fpgas. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Ma Y, Suda Naveen, Cao Yu, Seo Jae sun, Vrudhula S (2016) Scalable and modularized rtl compilation of convolutional neural networks onto fpga. 2016 26th International Conference on Field Programmable Logic and Applications (FPL), pages 1–8
Ma Y, Cao Y, Vrudhula S, Seo J (2017) An automatic rtl compiler for high-throughput fpga implementation of diverse deep convolutional neural networks. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pages 1–8
Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) Fp-dnn: An automated framework for mapping deep neural networks onto fpgas with rtl-hls hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 152–159
Ma Yufei, Suda Naveen, Cao Yu, Vrudhula Sarma, Seo Jae sun (2018) Alamo: Fpga acceleration of deep learning algorithms with a modularized rtl compiler. Integration 62:14–23
Gokhale V, Zaidy A, Chang AXM, Culurciello E (2017) Snowflake: An efficient hardware accelerator for convolutional neural networks. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–4
Cieszewski Radosław, Linczuk Maciej, Pozniak Krzysztof, Romaniuk Ryszard (2013) Review of parallel computing methods and tools for FPGA technology. In: Romaniuk Ryszard S (ed) Photonics applications in astronomy. communications, industry, and high-energy physics experiments 2013, volume 8903. International Society for Optics and Photonics, SPIE, pp 596–608
Svensson Bo Joel, Tripathi Rakesh (2016) Getting started with opencl on the zynq
Muslim F, Ma L, Roozmeh Medhi, Lavagno L (2017) Efficient fpga implementation of opencl high-performance computing applications via high-level synthesis. IEEE Access 5:2747–2762
Cieszewski R, Pozniak K, Romaniuk Ryszard S (2014) Python based high-level synthesis compiler. In: Proc. SPIE 9290, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments
Cieszewski Radosław, Linczuk Maciej (2016) Rpython high-level synthesis. In: Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA)
Decaluwe Jan (2004) Myhdl: a python-based hardware description language. Linux J 2004:5
Lockhart D, Zibrat Gary, Batten Christopher (2014) Pymtl: A unified framework for vertically integrated computer architecture research. 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pages 280–292
Haglund Per, Mencer O, Luk W, Tai Benjamin (2003) Hardware design with a scripting language. In: FPL
Clow John, Tzimpragos Georgios, Dangwal D, Guo Sammy, McMahan Joseph, Sherwood T (2017) A pythonic approach for rapid hardware prototyping and instrumentation. 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pages 1–7
Mashtizadeh Ali José (2007) Phdl : a python hardware design framework
Merchant S, Peterson GD, Park Sang Ki, Kong SG (2006) Fpga implementation of evolvable block-based neural networks. In: 2006 IEEE International Conference on Evolutionary Computation, pages 3129–3136
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the MOE Singapore Tier-1 grant under Grant No.: 2017-T1-001-192 (RG28/17). The first author did most of the work while at NTU, Singapore.
Rights and permissions
About this article
Cite this article
Sateesan, A., Sinha, S., K. G., S. et al. A Survey of Algorithmic and Hardware Optimization Techniques for Vision Convolutional Neural Networks on FPGAs. Neural Process Lett 53, 2331–2377 (2021). https://doi.org/10.1007/s11063-021-10458-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10458-1