Abstract
In this paper, a mixed-signal coarse-grained reconfigurable architecture (CGRA) for accelerating inference in deep neural networks (DNNs) is presented. It is based on performing dot-product computations using analog computing to achieve a considerable speed improvement. Other computations are performed digitally. In the proposed structure (called MX-CGRA), analog tiles consisting of memristor crossbars are employed. To reduce the overhead of converting the data between analog and digital domains, we utilize a proper interface between the analog and digital tiles. In addition, the structure benefits from an efficient memory hierarchy where the data is moved as close as possible to the computing fabric. Moreover, to fully utilize the tiles, we define a set of micro instructions to configure the analog and digital domains. Corresponding context words used in the CGRA are determined by these instructions (generated by a companion compiler tool). The efficacy of the MX-CGRA is assessed by modeling the execution of state-of-the-art DNN architectures on this structure. The architectures are used to classify images of the ImageNet dataset. Simulation results show that, compared to the previous mixed-signal DNN accelerators, on average, a higher throughput of 2.35 × is achieved.
- [1] . 2021. Self-driving cars: A survey. Expert Systems with Applications 165 (2021), 113816.
DOI: Google ScholarCross Ref - [2] . 2019. Brain MRI image classification for cancer detection using deep wavelet autoencoder-based deep neural network. IEEE Access 7 (2019), 46278–46287.
DOI: Google ScholarCross Ref - [3] . 2012. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling. 2012 8th Int. Symp. Chinese Spok. Lang. Process. ISCSLP 2012. 301–305.
DOI: Google ScholarCross Ref - [4] . 2021. Neural machine translation: Past, present, and future. Neural Comput. Appl. 33, 23 (2021), 15919–15931.
DOI: Google ScholarDigital Library - [5] 2016. EIE: Efficient inference engine on compressed deep neural network. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016 16 (2016), 243–254.
DOI: Google ScholarDigital Library - [6] . 2018. ConvLight: A convolutional accelerator with memristor integrated photonic computing. Proc. - 24th IEEE Int. Conf. High Perform. Comput. HiPC 2017. 114–123.
DOI: Google ScholarCross Ref - [7] . 2017. A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 08, 06 (2017), 149–170.
DOI: Google ScholarCross Ref - [8] . 2015. A CGRA-based approach for accelerating convolutional neural networks. Proc. - IEEE 9th Int. Symp. Embed. Multicore/Manycore SoCs, MCSoC 2015. 73–80.
DOI: Google ScholarDigital Library - [9] . 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. Proc. 2018 Des. Autom. Test Eur. Conf. Exhib. DATE 2018. 413–418.
DOI: Google ScholarCross Ref - [10] . 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. 31st AAAI Conf. Artif. Intell. AAAI 2017. 4278–4284.Google ScholarCross Ref
- [11] 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 14–26.
DOI: Google ScholarDigital Library - [12] , J. L. . 1990. Nonlinear switched capacitor ‘neural’ networks for optimization problems. IEEE Transactions on Circuits and Systems 37, 3 (1990), 384–398.Google ScholarCross Ref
- [13] . 2019. Analog neuromorphic system based on multi input floating gate MOS neuron model. Proc. - IEEE Int. Symp. Circuits Syst.
DOI: Google ScholarCross Ref - [14] 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. Int. Conf. Archit. Support Program. Lang. Oper. Syst. - ASPLOS (2019), 715–731.
DOI: Google ScholarDigital Library - [15] . 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Mach. Learn. Knowl. Extr. 1, 1 (2018), 75–114.
DOI: Google ScholarCross Ref - [16] . 2019. OCTAN: An on-chip training algorithm for memristive neuromorphic circuits. IEEE Trans. Circuits Syst. I: Regul. Pap. 66, 12 (2019), 4687–4698.
DOI: Google ScholarCross Ref - [17] 2020. Fully hardware-implemented memristor convolutional neural network. Nature 577, 7792 (2020), 641–646.
DOI: Google ScholarCross Ref - [18] 2019. Analog deep neural network based on NOR flash computing array for high speed/energy efficiency computation. Proc. - IEEE Int. Symp. Circuits Syst. 7–10.
DOI: Google ScholarCross Ref - [19] 2018. PROMISE: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms. Proc. - Int. Symp. Comput. Archit. (2018) 43–56.
DOI: Google ScholarDigital Library - [20] 2021. FORMS: Fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator. Proc. - Int. Symp. Comput. Archit. 265–278.
DOI: Google ScholarDigital Library - [21] . 2021. GoSPA: An energy-efficient high-performance globally optimized SParse convolutional neural network accelerator. Proc. - Int. Symp. Comput. Archit. 1110–1123.
DOI: Google ScholarDigital Library - [22] . 2020. aCortex: An energy-efficient multipurpose mixed-signal inference accelerator. IEEE J. Explor. Solid-State Comput. Devices Circuits 6, 1 (2020), 98–106.
DOI: Google ScholarCross Ref - [23] 2022. PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference. (2022).Google Scholar
- [24] . 2020. Vesti: Energy-efficient in-memory computing accelerator for deep neural networks. IEEE Trans. Very Large Scale Integr. Syst. 28, 1 (2020), 48–61.
DOI: Google ScholarCross Ref - [25] 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-Based main memory. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 27–39.
DOI: Google ScholarDigital Library - [26] 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. Proc. - Des. Autom. Conf.
DOI: Google ScholarDigital Library - [27] . 2018. Memristor crossbar tiles in a flexible, general purpose neural processor. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 1 (2018), 137–145.
DOI: Google ScholarCross Ref - [28] 2020. PANTHER: A programmable architecture for neural network training harnessing energy-efficient ReRAM. IEEE Trans. Comput. 69, 8 (2020), 1128–1142.
DOI: Google ScholarDigital Library - [29] . 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 1 (2017), 127–138.
DOI: Google ScholarCross Ref - [30] . 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 2 (2019), 292–308.
DOI: Google ScholarCross Ref - [31] 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Networks Learn. Syst. 30, 3 (2019), 644–656.
DOI: Google ScholarCross Ref - [32] 2014. NeuroCGRA: A CGRA with support for neural networks. Proc. 2014 Int. Conf. High Perform. Comput. Simulation, HPCS 2014, 1, c, 506–511.
DOI: Google ScholarCross Ref - [33] . 2014. Performance evaluation of a 3D-stencil library for distributed memory array accelerators. Proc. - 2014 2nd Int. Symp. Comput. Networking, CANDAR 2014. 388–393.
DOI: Google ScholarDigital Library - [34] 2019. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572, 7767 (2019), 106–111.
DOI: Google ScholarCross Ref - [35] . 2018. Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators. IEEE Trans. Comput. Des. Integr. Circuits Syst. 37, 11 (2018), 2301–2310.
DOI: Google ScholarCross Ref - [36] . 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. Proc. - Des. Autom. Conf., Part 12828.
DOI: Google ScholarDigital Library - [37] . 2018. Energy and reliability improvement of voltage-based, clustered, coarse-grain reconfigurable architectures by employing quality-aware mapping. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 3 (2018), 480–493.
DOI: Google ScholarCross Ref - [38] 2019. Data-flow graph mapping optimization for CGRA with deep reinforcement learning. IEEE Trans. Comput. Des. Integr. Circuits Syst. 38, 12 (2019), 2271–2283.
DOI: Google ScholarDigital Library - [39] 2020. Thousands of conductance levels in memristors monolithically integrated on CMOS. Research Square (2022).
DOI: Google ScholarCross Ref - [40] . 2022. A survey of quantization methods for efficient neural network inference. Low-Power Comput. Vis. (2022). 291–326.
DOI: Google ScholarCross Ref - [41] 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2704–2713.
DOI: Google ScholarCross Ref - [42] . 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.
DOI: Google ScholarDigital Library - [43] . 2017. Robust neuromorphic computing in the presence of process variation. Proc. 2017 Des. Autom. Test Eur. DATE 2017. 440–445.
DOI: Google ScholarCross Ref - [44] . 2015. Vortex: Variation-aware training for memristor X-bar. Proc. - Des. Autom. Conf. 2015-July, c, 1–6.
DOI: Google ScholarDigital Library - [45] . 2021. Loading-aware reliability improvement of ultra-low power memristive neural networks. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 8 (2021), 3411–3421.
DOI: Google ScholarCross Ref - [46] . 2021. Reliability enhancement of inverter-based memristor crossbar neural networks using mathematical analysis of circuit non-idealities. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 10 (Aug. 2021), 4310–4323.
DOI: Google ScholarCross Ref - [47] . 2021. LATIM: Loading-aware offline training method for inverter-based memristive neural networks. IEEE Trans. Circuits Syst. II Express Briefs 68, 10 (2021), 3346–3350.
DOI: Google ScholarCross Ref - [48] . 2018. An ultra low-power memristive neuromorphic circuit for internet of things smart sensors. IEEE Internet Things J. 5, 2 (2018), 1011–1022.
DOI: Google ScholarCross Ref - [49] . 2020. Comparison of activation function on extreme learning machine (ELM) performance for classifying the active compound. AIP Conf. Proc. 2264, (Sept. 2020).
DOI: Google ScholarCross Ref - [50] 2020. In-memory vector-matrix multiplication in monolithic complementary metal–oxide–semiconductor-memristor integrated circuits: Design choices, challenges, and perspectives. Adv. Intell. Syst. 2, 11 (2020), 2000115.
DOI: Google ScholarCross Ref - [51] . 2020. Lifetime enhancement for RRAM-based computing-in-memory engine considering aging and thermal effects. Proc. - 2020 IEEE Int. Conf. Artif. Intell. Circuits Syst. AICAS 2020. 11–15.
DOI: Google ScholarCross Ref - [52] . 2021. Efficient techniques for extending service time for memristor-based neural networks. 2021 IEEE Asia Pacific Conf. Circuits Syst. APCCAS 2021 2021 IEEE Conf. Postgrad. Res. Microelectron. Electron. PRIMEASIA 2021. 81–84.
DOI: Google ScholarCross Ref - [53] . 2022. Memristor crossbar architectures for implementing deep neural networks. Complex Intell. Syst. 8, 2 (2022), 787–802.
DOI: Google ScholarCross Ref - [54] . 2015. Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1–14.Google Scholar
- [55] . 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
DOI: Google ScholarDigital Library - [56] 2015. Going deeper with convolutions. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. vol. 07-12-June, 1–9.
DOI: Google ScholarCross Ref - [57] . 2016. Rethinking the inception architecture for computer vision. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2016), 2818–2826.
DOI: Google ScholarCross Ref - [58] . 2017. CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures. Proc. - Int. Symp. Comput. Archit. vol. Part F1286 (2017), 147–160.
DOI: Google ScholarDigital Library
Index Terms
- Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference
Recommendations
A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal ProcessingIn this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and CommunicationsIn recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
A novel zero weight/activation-aware hardware architecture of convolutional neural network
DATE '17: Proceedings of the Conference on Design, Automation & Test in EuropeIt is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel ...
Comments