skip to main content
research-article

Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference

Published:18 July 2023Publication History
Skip Abstract Section

Abstract

In this paper, a mixed-signal coarse-grained reconfigurable architecture (CGRA) for accelerating inference in deep neural networks (DNNs) is presented. It is based on performing dot-product computations using analog computing to achieve a considerable speed improvement. Other computations are performed digitally. In the proposed structure (called MX-CGRA), analog tiles consisting of memristor crossbars are employed. To reduce the overhead of converting the data between analog and digital domains, we utilize a proper interface between the analog and digital tiles. In addition, the structure benefits from an efficient memory hierarchy where the data is moved as close as possible to the computing fabric. Moreover, to fully utilize the tiles, we define a set of micro instructions to configure the analog and digital domains. Corresponding context words used in the CGRA are determined by these instructions (generated by a companion compiler tool). The efficacy of the MX-CGRA is assessed by modeling the execution of state-of-the-art DNN architectures on this structure. The architectures are used to classify images of the ImageNet dataset. Simulation results show that, compared to the previous mixed-signal DNN accelerators, on average, a higher throughput of 2.35 × is achieved.

REFERENCES

  1. [1] Badue C., Guidolini R., Carneiro R., Azevedo P., Cardoso V., Forechi A., Jesus L., Berriel R., Paixão T., Mutz F., de Paula Veronese L., Oliveira-Santos T., and De Souza A.. 2021. Self-driving cars: A survey. Expert Systems with Applications 165 (2021), 113816. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Kumar Mallick P., Ryu S. H., Satapathy S. K., Mishra S., Nguyen G. N., and Tiwari P.. 2019. Brain MRI image classification for cancer detection using deep wavelet autoencoder-based deep neural network. IEEE Access 7 (2019), 4627846287. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Pan J., Liu C., Wang Z., Hu Y., and Jiang H.. 2012. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling. 2012 8th Int. Symp. Chinese Spok. Lang. Process. ISCSLP 2012. 301305. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Mohamed S. A., Elsayed A. A., Hassan Y. F., and Abdou M. A.. 2021. Neural machine translation: Past, present, and future. Neural Comput. Appl. 33, 23 (2021), 1591915931. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Han S. et al. 2016. EIE: Efficient inference engine on compressed deep neural network. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016 16 (2016), 243254. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Dang D., Dass J., and Mahapatra R.. 2018. ConvLight: A convolutional accelerator with memristor integrated photonic computing. Proc. - 24th IEEE Int. Conf. High Perform. Comput. HiPC 2017. 114123. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Ando K., Takamaeda-Yamazaki S., Ikebe M., Asai T., and Motomura M.. 2017. A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 08, 06 (2017), 149170. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Tanomoto M., Takamaeda-Yamazaki S., Yao J., and Nakashima Y.. 2015. A CGRA-based approach for accelerating convolutional neural networks. Proc. - IEEE 9th Int. Symp. Embed. Multicore/Manycore SoCs, MCSoC 2015. 7380. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Akbari O., Kamal M., Afzali-Kusha A., Pedram M., and Shafique M.. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. Proc. 2018 Des. Autom. Test Eur. Conf. Exhib. DATE 2018. 413418. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Szegedy C., Ioffe S., Vanhoucke V., and Alemi A. A.. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. 31st AAAI Conf. Artif. Intell. AAAI 2017. 42784284.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Shafiee A. et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 1426. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Rodriguez-Vazquez A., Dominguez-Castro A., Rueda A., J. L.Huertas , and Sanchez-Sinencio E.. 1990. Nonlinear switched capacitor ‘neural’ networks for optimization problems. IEEE Transactions on Circuits and Systems 37, 3 (1990), 384398.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Tripathi A., Arabizadeh M., Khandelwal S., and Thakur C. S.. 2019. Analog neuromorphic system based on multi input floating gate MOS neuron model. Proc. - IEEE Int. Symp. Circuits Syst. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Ankit A. et al. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. Int. Conf. Archit. Support Program. Lang. Oper. Syst. - ASPLOS (2019), 715731. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Mittal S.. 2018. A survey of ReRAM-based architectures for processing-in-memory and neural networks. Mach. Learn. Knowl. Extr. 1, 1 (2018), 75114. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Ansari M., Fayyazi A., Kamal M., Afzali-Kusha A., and Pedram M.. 2019. OCTAN: An on-chip training algorithm for memristive neuromorphic circuits. IEEE Trans. Circuits Syst. I: Regul. Pap. 66, 12 (2019), 46874698. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Yao P. et al. 2020. Fully hardware-implemented memristor convolutional neural network. Nature 577, 7792 (2020), 641646. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Xiang Y. C. et al. 2019. Analog deep neural network based on NOR flash computing array for high speed/energy efficiency computation. Proc. - IEEE Int. Symp. Circuits Syst. 710. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Srivastava P. et al. 2018. PROMISE: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms. Proc. - Int. Symp. Comput. Archit. (2018) 4356. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Yuan G. et al. 2021. FORMS: Fine-grained polarized ReRAM-based in-situ computation for mixed-signal DNN accelerator. Proc. - Int. Symp. Comput. Archit. 265278. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Deng C., Sui Y., Liao S., Qian X., and Yuan B.. 2021. GoSPA: An energy-efficient high-performance globally optimized SParse convolutional neural network accelerator. Proc. - Int. Symp. Comput. Archit. 11101123. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Bavandpour M., Mahmoodi M. R., and Strukov D. B.. 2020. aCortex: An energy-efficient multipurpose mixed-signal inference accelerator. IEEE J. Explor. Solid-State Comput. Devices Circuits 6, 1 (2020), 98106. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Zhang B. et al. 2022. PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference. (2022).Google ScholarGoogle Scholar
  24. [24] Yin S., Jiang Z., Kim M., Gupta T., Seok M., and Seo J. S.. 2020. Vesti: Energy-efficient in-memory computing accelerator for deep neural networks. IEEE Trans. Very Large Scale Integr. Syst. 28, 1 (2020), 4861. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Chi P. et al. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-Based main memory. Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016. 2739. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Liu X. et al. 2015. RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. Proc. - Des. Autom. Conf. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Mountain D. J., McLean M. R., and Krieger C. D.. 2018. Memristor crossbar tiles in a flexible, general purpose neural processor. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 1 (2018), 137145. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Ankit A. et al. 2020. PANTHER: A programmable architecture for neural network training harnessing energy-efficient ReRAM. IEEE Trans. Comput. 69, 8 (2020), 11281142. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Chen Y. H., Krishna T., Emer J. S., and Sze V.. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52, 1 (2017), 127138. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Chen Y. H., Yang T. J., Emer J. S., and Sze V.. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 2 (2019), 292308. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Aimar A. et al. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Networks Learn. Syst. 30, 3 (2019), 644656. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Jafri S. M. A. H. et al. 2014. NeuroCGRA: A CGRA with support for neural networks. Proc. 2014 Int. Conf. High Perform. Comput. Simulation, HPCS 2014, 1, c, 506511. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Inagaki Y., Takamaeda-Yamazaki S., Yao J., and Nakashima Y.. 2014. Performance evaluation of a 3D-stencil library for distributed memory array accelerators. Proc. - 2014 2nd Int. Symp. Comput. Networking, CANDAR 2014. 388393. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Pei J. et al. 2019. Towards artificial general intelligence with hybrid Tianjic chip architecture. Nature 572, 7767 (2019), 106111. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Bae I., Harris B., Min H., and Egger B.. 2018. Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators. IEEE Trans. Comput. Des. Integr. Circuits Syst. 37, 11 (2018), 23012310. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Karunaratne M., Mohite A. K., Mitra T., and Peh L. S.. 2017. HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect. Proc. - Des. Autom. Conf., Part 12828. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Afzali-Kusha H., Akbari O., Kamal M., and Pedram M.. 2018. Energy and reliability improvement of voltage-based, clustered, coarse-grain reconfigurable architectures by employing quality-aware mapping. IEEE J. Emerg. Sel. Top. Circuits Syst. 8, 3 (2018), 480493. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Liu D. et al. 2019. Data-flow graph mapping optimization for CGRA with deep reinforcement learning. IEEE Trans. Comput. Des. Integr. Circuits Syst. 38, 12 (2019), 22712283. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Yang J., Rao M., Tang H. et al. 2020. Thousands of conductance levels in memristors monolithically integrated on CMOS. Research Square (2022). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Gholami A., Kim S., Dong Z., Yao Z., Mahoney M. W., and Keutzer K.. 2022. A survey of quantization methods for efficient neural network inference. Low-Power Comput. Vis. (2022). 291326. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Jacob B. et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 27042713. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Liang T., Glossner J., Wang L., Shi S., and Zhang X.. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370403. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] BanaGozar A., Maleki M. A., Kamal M., Afzali-Kusha A., and Pedram M.. 2017. Robust neuromorphic computing in the presence of process variation. Proc. 2017 Des. Autom. Test Eur. DATE 2017. 440445. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Liu B., Li H., Chen Y., Li X., Wu Q., and Huang T.. 2015. Vortex: Variation-aware training for memristor X-bar. Proc. - Des. Autom. Conf. 2015-July, c, 16. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Vahdat S., Kamal M., Afzali-Kusha A., and Pedram M.. 2021. Loading-aware reliability improvement of ultra-low power memristive neural networks. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 8 (2021), 34113421. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Vahdat S., Kamal M., Afzali-Kusha A., and Pedram M.. 2021. Reliability enhancement of inverter-based memristor crossbar neural networks using mathematical analysis of circuit non-idealities. IEEE Trans. Circuits Syst. I: Regul. Pap. 68, 10 (Aug. 2021), 43104323. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Vahdat S., Kamal M., Afzali-Kusha A., and Pedram M.. 2021. LATIM: Loading-aware offline training method for inverter-based memristive neural networks. IEEE Trans. Circuits Syst. II Express Briefs 68, 10 (2021), 33463350. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Fayyazi A., Ansari M., Kamal M., Afzali-Kusha A., and Pedram M.. 2018. An ultra low-power memristive neuromorphic circuit for internet of things smart sensors. IEEE Internet Things J. 5, 2 (2018), 10111022. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Ratnawati D. E., Widodo Marjono, and Anam S.. 2020. Comparison of activation function on extreme learning machine (ELM) performance for classifying the active compound. AIP Conf. Proc. 2264, (Sept. 2020). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Amirsoleimani A. et al. 2020. In-memory vector-matrix multiplication in monolithic complementary metal–oxide–semiconductor-memristor integrated circuits: Design choices, challenges, and perspectives. Adv. Intell. Syst. 2, 11 (2020), 2000115. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Zhang S., Zhang G. L., Li B., Li H. H., and Schlichtmann U.. 2020. Lifetime enhancement for RRAM-based computing-in-memory engine considering aging and thermal effects. Proc. - 2020 IEEE Int. Conf. Artif. Intell. Circuits Syst. AICAS 2020. 1115. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Ma Y., Zhang C., and Zhou P.. 2021. Efficient techniques for extending service time for memristor-based neural networks. 2021 IEEE Asia Pacific Conf. Circuits Syst. APCCAS 2021 2021 IEEE Conf. Postgrad. Res. Microelectron. Electron. PRIMEASIA 2021. 8184. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Liu X. and Zeng Z.. 2022. Memristor crossbar architectures for implementing deep neural networks. Complex Intell. Syst. 8, 2 (2022), 787802. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Simonyan K. and Zisserman A.. 2015. Very deep convolutional networks for large-scale image recognition. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 114.Google ScholarGoogle Scholar
  55. [55] Krizhevsky A., Sutskever I., and Hinton G. E.. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 8490. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Szegedy C. et al. 2015. Going deeper with convolutions. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. vol. 07-12-June, 19. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z.. 2016. Rethinking the inception architecture for computer vision. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2016), 28182826. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Ravi G. S. and Lipasti M. H.. 2017. CHARSTAR: Clock hierarchy aware resource scaling in tiled architectures. Proc. - Int. Symp. Comput. Archit. vol. Part F1286 (2017), 147160. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Memristive-based Mixed-signal CGRA for Accelerating Deep Neural Network Inference

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 28, Issue 4
        July 2023
        432 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/3597460
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2023
        • Online AM: 3 May 2023
        • Accepted: 26 April 2023
        • Revised: 25 March 2023
        • Received: 28 December 2022
        Published in todaes Volume 28, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)389
        • Downloads (Last 6 weeks)47

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text