research-article

Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference

Authors:
Reza Kazerooni-Zand

University of Tehran, Iran

University of Tehran, Iran

0009-0005-9465-2265
View Profile

,
Mehdi Kamal

University of Southern California, USA

University of Southern California, USA

0000-0001-7098-6440
View Profile

,
Ali Afzali-Kusha

University of Tehran and Institute for Research in Fundamental Sciences (IPM), Iran

University of Tehran and Institute for Research in Fundamental Sciences (IPM), Iran

0000-0001-8614-2007
View Profile

,
Massoud Pedram

University of Southern California, USA

University of Southern California, USA

0000-0002-2677-7307
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 28 Issue 4Article No.: 66pp 1–25https://doi.org/10.1145/3595638

Published:18 July 2023Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

In this paper, a mixed-signal coarse-grained reconfigurable architecture (CGRA) for accelerating inference in deep neural networks (DNNs) is presented. It is based on performing dot-product computations using analog computing to achieve a considerable speed improvement. Other computations are performed digitally. In the proposed structure (called MX-CGRA), analog tiles consisting of memristor crossbars are employed. To reduce the overhead of converting the data between analog and digital domains, we utilize a proper interface between the analog and digital tiles. In addition, the structure benefits from an efficient memory hierarchy where the data is moved as close as possible to the computing fabric. Moreover, to fully utilize the tiles, we define a set of micro instructions to configure the analog and digital domains. Corresponding context words used in the CGRA are determined by these instructions (generated by a companion compiler tool). The efficacy of the MX-CGRA is assessed by modeling the execution of state-of-the-art DNN architectures on this structure. The architectures are used to classify images of the ImageNet dataset. Simulation results show that, compared to the previous mixed-signal DNN accelerators, on average, a higher throughput of 2.35 × is achieved.

REFERENCES

[1] Badue C., Guidolini R., Carneiro R., Azevedo P., Cardoso V., Forechi A., Jesus L., Berriel R., Paixão T., Mutz F., de Paula Veronese L., Oliveira-Santos T., and De Souza A.. 2021. Self-driving cars: A survey. Expert Systems with Applications 165 (2021), 113816. DOI:Google ScholarCross Ref
[2] Kumar Mallick P., Ryu S. H., Satapathy S. K., Mishra S., Nguyen G. N., and Tiwari P.. 2019. Brain MRI image classification for cancer detection using deep wavelet autoencoder-based deep neural network. IEEE Access 7 (2019), 46278–46287. DOI:Google ScholarCross Ref
[3] Pan J., Liu C., Wang Z., Hu Y., and Jiang H.. 2012. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling. 2012 8th Int. Symp. Chinese Spok. Lang. Process. ISCSLP 2012. 301–305. DOI:Google ScholarCross Ref
[4] Mohamed S. A., Elsayed A. A., Hassan Y. F., and Abdou M. A.. 2021. Neural machine translation: Past, present, and future. Neural Comput. Appl. 33, 23 (2021), 15919–15931. DOI:Google ScholarDigital Library
[5] Han S. et al. 2016. EIE: Efficient inference engine on compressed deep neural network. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016 16 (2016), 243–254. DOI:Google ScholarDigital Library
[6] Dang D., Dass J., and Mahapatra R.. 2018. ConvLight: A convolutional accelerator with memristor integrated photonic computing. Proc. - 24th IEEE Int. Conf. High Perform. Comput. HiPC 2017. 114–123. DOI:Google ScholarCross Ref
[7] Ando K., Takamaeda-Yamazaki S., Ikebe M., Asai T., and Motomura M.. 2017. A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 08, 06 (2017), 149–170. DOI:Google ScholarCross Ref
[8] Tanomoto M., Takamaeda-Yamazaki S., Yao J., and Nakashima Y.. 2015. A CGRA-based approach for accelerating convolutional neural networks. Proc. - IEEE 9th Int. Symp. Embed. Multicore/Manycore SoCs, MCSoC 2015. 73–80. DOI:Google ScholarDigital Library
[9] Akbari O., Kamal M., Afzali-Kusha A., Pedram M., and Shafique M.. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. Proc. 2018 Des. Autom. Test Eur. Conf. Exhib. DATE 2018. 413–418. DOI:Google ScholarCross Ref
[10] Szegedy C., Ioffe S., Vanhoucke V., and Alemi A. A.. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. 31st AAAI Conf. Artif. Intell. AAAI 2017. 4278–4284.Google ScholarCross Ref
[11] Shafiee A. et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 14–26. DOI:Google ScholarDigital Library
[12] Rodriguez-Vazquez A., Dominguez-Castro A., Rueda A., J. L.Huertas , and Sanchez-Sinencio E.. 1990. Nonlinear switched capacitor ‘neural’ networks for optimization problems. IEEE Transactions on Circuits and Systems 37, 3 (1990), 384–398.Google ScholarCross Ref
[13] Tripathi A., Arabizadeh M., Khandelwal S., and Thakur C. S.. 2019. Analog neuromorphic system based on multi input floating gate MOS neuron model. Proc. - IEEE Int. Symp. Circuits Syst. DOI:Google ScholarCross Ref
[14] Ankit A. et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. Int. Conf. Archit. Support Program. Lang. Oper. Syst. - ASPLOS (2019), 715–731. DOI:Google ScholarDigital Library
[15] Mittal S.. 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Mach. Learn. Knowl. Extr. 1, 1 (2018), 75–114. DOI:Google ScholarCross Ref
[16] Ansari M., Fayyazi A., Kamal M., Afzali-Kusha A., and Pedram M.. 2019. OCTAN: An on-chip training algorithm for memristive neuromorphic circuits. IEEE Trans. Circuits Syst. I: Regul. Pap. 66, 12 (2019), 4687–4698. DOI:Google ScholarCross Ref
[17] Yao P. et al. 2020. Fully hardware-implemented memristor convolutional neural network. Nature 577, 7792 (2020), 641–646. DOI:Google ScholarCross Ref
[18] Xiang Y. C. et al. 2019. Analog deep neural network based on NOR flash computing array for high speed/energy efficiency computation. Proc. - IEEE Int. Symp. Circuits Syst. 7–10. DOI:Google ScholarCross Ref
[19] Srivastava P. et al. 2018. PROMISE: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms. Proc. - Int. Symp. Comput. Archit. (2018) 43–56. DOI:Google ScholarDigital Library
[20] Yuan G. et al. 2021. FORMS: Fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator. Proc. - Int. Symp. Comput. Archit. 265–278. DOI:Google ScholarDigital Library
[21] Deng C., Sui Y., Liao S., Qian X., and Yuan B.. 2021. GoSPA: An energy-efficient high-performance globally optimized SParse convolutional neural network accelerator. Proc. - Int. Symp. Comput. Archit. 1110–1123. DOI:Google ScholarDigital Library
[22] Bavandpour M., Mahmoodi M. R., and Strukov D. B.. 2020. aCortex: An energy-efficient multipurpose mixed-signal inference accelerator. IEEE J. Explor. Solid-State Comput. Devices Circuits 6, 1 (2020), 98–106. DOI:Google ScholarCross Ref
[23] Zhang B. et al. 2022. PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference. (2022).Google Scholar
[24] Yin S., Jiang Z., Kim M., Gupta T., Seok M., and Seo J. S.. 2020. Vesti: Energy-efficient in-memory computing accelerator for deep neural networks. IEEE Trans. Very Large Scale Integr. Syst. 28, 1 (2020), 48–61. DOI:Google ScholarCross Ref
[25] Chi P. et al. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-Based main memory. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 27–39. DOI:Google ScholarDigital Library
[26] Liu X. et al. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. Proc. - Des. Autom. Conf. DOI:Google ScholarDigital Library
[27] Mountain D. J., McLean M. R., and Krieger C. D.. 2018. Memristor crossbar tiles in a flexible, general purpose neural processor. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 1 (2018), 137–145. DOI:Google ScholarCross Ref
[28] Ankit A. et al. 2020. PANTHER: A programmable architecture for neural network training harnessing energy-efficient ReRAM. IEEE Trans. Comput. 69, 8 (2020), 1128–1142. DOI:Google ScholarDigital Library
[29] Chen Y. H., Krishna T., Emer J. S., and Sze V.. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 1 (2017), 127–138. DOI:Google ScholarCross Ref
[30] Chen Y. H., Yang T. J., Emer J. S., and Sze V.. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 2 (2019), 292–308. DOI:Google ScholarCross Ref
[31] Aimar A. et al. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Networks Learn. Syst. 30, 3 (2019), 644–656. DOI:Google ScholarCross Ref
[32] Jafri S. M. A. H. et al. 2014. NeuroCGRA: A CGRA with support for neural networks. Proc. 2014 Int. Conf. High Perform. Comput. Simulation, HPCS 2014, 1, c, 506–511. DOI:Google ScholarCross Ref
[33] Inagaki Y., Takamaeda-Yamazaki S., Yao J., and Nakashima Y.. 2014. Performance evaluation of a 3D-stencil library for distributed memory array accelerators. Proc. - 2014 2nd Int. Symp. Comput. Networking, CANDAR 2014. 388–393. DOI:Google ScholarDigital Library
[34] Pei J. et al. 2019. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572, 7767 (2019), 106–111. DOI:Google ScholarCross Ref
[35] Bae I., Harris B., Min H., and Egger B.. 2018. Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators. IEEE Trans. Comput. Des. Integr. Circuits Syst. 37, 11 (2018), 2301–2310. DOI:Google ScholarCross Ref
[36] Karunaratne M., Mohite A. K., Mitra T., and Peh L. S.. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. Proc. - Des. Autom. Conf., Part 12828. DOI:Google ScholarDigital Library
[37] Afzali-Kusha H., Akbari O., Kamal M., and Pedram M.. 2018. Energy and reliability improvement of voltage-based, clustered, coarse-grain reconfigurable architectures by employing quality-aware mapping. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 3 (2018), 480–493. DOI:Google ScholarCross Ref
[38] Liu D. et al. 2019. Data-flow graph mapping optimization for CGRA with deep reinforcement learning. IEEE Trans. Comput. Des. Integr. Circuits Syst. 38, 12 (2019), 2271–2283. DOI:Google ScholarDigital Library
[39] Yang J., Rao M., Tang H. et al. 2020. Thousands of conductance levels in memristors monolithically integrated on CMOS. Research Square (2022). DOI:Google ScholarCross Ref
[40] Gholami A., Kim S., Dong Z., Yao Z., Mahoney M. W., and Keutzer K.. 2022. A survey of quantization methods for efficient neural network inference. Low-Power Comput. Vis. (2022). 291–326. DOI:Google ScholarCross Ref
[41] Jacob B. et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2704–2713. DOI:Google ScholarCross Ref
[42] Liang T., Glossner J., Wang L., Shi S., and Zhang X.. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403. DOI:Google ScholarDigital Library
[43] BanaGozar A., Maleki M. A., Kamal M., Afzali-Kusha A., and Pedram M.. 2017. Robust neuromorphic computing in the presence of process variation. Proc. 2017 Des. Autom. Test Eur. DATE 2017. 440–445. DOI:Google ScholarCross Ref
[44] Liu B., Li H., Chen Y., Li X., Wu Q., and Huang T.. 2015. Vortex: Variation-aware training for memristor X-bar. Proc. - Des. Autom. Conf. 2015-July, c, 1–6. DOI:Google ScholarDigital Library
[45] Vahdat S., Kamal M., Afzali-Kusha A., and Pedram M.. 2021. Loading-aware reliability improvement of ultra-low power memristive neural networks. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 8 (2021), 3411–3421. DOI:Google ScholarCross Ref
[46] Vahdat S., Kamal M., Afzali-Kusha A., and Pedram M.. 2021. Reliability enhancement of inverter-based memristor crossbar neural networks using mathematical analysis of circuit non-idealities. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 10 (Aug. 2021), 4310–4323. DOI:Google ScholarCross Ref
[47] Vahdat S., Kamal M., Afzali-Kusha A., and Pedram M.. 2021. LATIM: Loading-aware offline training method for inverter-based memristive neural networks. IEEE Trans. Circuits Syst. II Express Briefs 68, 10 (2021), 3346–3350. DOI:Google ScholarCross Ref
[48] Fayyazi A., Ansari M., Kamal M., Afzali-Kusha A., and Pedram M.. 2018. An ultra low-power memristive neuromorphic circuit for internet of things smart sensors. IEEE Internet Things J. 5, 2 (2018), 1011–1022. DOI:Google ScholarCross Ref
[49] Ratnawati D. E., Widodo Marjono, and Anam S.. 2020. Comparison of activation function on extreme learning machine (ELM) performance for classifying the active compound. AIP Conf. Proc. 2264, (Sept. 2020). DOI:Google ScholarCross Ref
[50] Amirsoleimani A. et al. 2020. In-memory vector-matrix multiplication in monolithic complementary metal–oxide–semiconductor-memristor integrated circuits: Design choices, challenges, and perspectives. Adv. Intell. Syst. 2, 11 (2020), 2000115. DOI:Google ScholarCross Ref
[51] Zhang S., Zhang G. L., Li B., Li H. H., and Schlichtmann U.. 2020. Lifetime enhancement for RRAM-based computing-in-memory engine considering aging and thermal effects. Proc. - 2020 IEEE Int. Conf. Artif. Intell. Circuits Syst. AICAS 2020. 11–15. DOI:Google ScholarCross Ref
[52] Ma Y., Zhang C., and Zhou P.. 2021. Efficient techniques for extending service time for memristor-based neural networks. 2021 IEEE Asia Pacific Conf. Circuits Syst. APCCAS 2021 2021 IEEE Conf. Postgrad. Res. Microelectron. Electron. PRIMEASIA 2021. 81–84. DOI:Google ScholarCross Ref
[53] Liu X. and Zeng Z.. 2022. Memristor crossbar architectures for implementing deep neural networks. Complex Intell. Syst. 8, 2 (2022), 787–802. DOI:Google ScholarCross Ref
[54] Simonyan K. and Zisserman A.. 2015. Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1–14.Google Scholar
[55] Krizhevsky A., Sutskever I., and Hinton G. E.. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90. DOI:Google ScholarDigital Library
[56] Szegedy C. et al. 2015. Going deeper with convolutions. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. vol. 07-12-June, 1–9. DOI:Google ScholarCross Ref
[57] Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z.. 2016. Rethinking the inception architecture for computer vision. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2016), 2818–2826. DOI:Google ScholarCross Ref
[58] Ravi G. S. and Lipasti M. H.. 2017. CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures. Proc. - Int. Symp. Comput. Archit. vol. Part F1286 (2017), 147–160. DOI:Google ScholarDigital Library

Index Terms

Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference
1. Hardware
  1. Emerging technologies
  2. Very large scale integration design
    1. Analog and mixed-signal circuits

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
Read More
An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
Read More
A novel zero weight/activation-aware hardware architecture of convolutional neural network
DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Design Automation of Electronic Systems Volume 28, Issue 4
July 2023
432 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3597460
Editor:
X. Sharon Hu
University of Notre Dame, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 18 July 2023
- Online AM: 3 May 2023
- Accepted: 26 April 2023
- Revised: 25 March 2023
- Received: 28 December 2022
Published in todaes Volume 28, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Coarse-grained reconfigurable architecture
accelerator
memristor
Convolutional Neural Network
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 389
  Total Downloads
- Downloads (Last 12 months)389
- Downloads (Last 6 weeks)47
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference

ACM Transactions on Design Automation of Electronic Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks

An FPGA-based accelerator platform implements for convolutional neural network

A novel zero weight/activation-aware hardware architecture of convolutional neural network