research-article

Re2PIM: A Reconfigurable ReRAM-Based PIM Design for Variable-Sized Vector-Matrix Multiplication

Authors:
Yilong Zhao

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Zhezhi He

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Naifeng Jing

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Xiaoyao Liang

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Li Jiang

Shanghai Jiao Tong University & Shanghai Qi Zhi Institute, Shanghai, China

Shanghai Jiao Tong University & Shanghai Qi Zhi Institute, Shanghai, China
View Profile

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSIJune 2021Pages 15–20https://doi.org/10.1145/3453688.3461494

Published:22 June 2021Publication History

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI

Pages 15–20

ABSTRACT

ReRAM-based deep neural network (DNN) accelerator shows enormous potential because of ReRAM's high computational-density and power-efficiency. A typical feature of DNNs is that weight matrix size varies across diverse DNNs and DNN layers. However, current ReRAM-based DNN accelerators adopt a fixed-sized compute unit (CU) design, resulting in a dilemma of trading off between throughput and energy-efficiency: when computing large vector-matrix multiplication with small CUs, the overhead of the peripheral circuits is relatively high; when computing small vector-matrix multiplication with large CUs, the low utilization of ReRAM crossbars damages the throughput. In this work, we propose Re2PIM, a reconfigurable ReRAM-based DNN accelerator. Each tile of Re2PIM is composed of reconfigurable units (RUs), which can be reconfigured as vector-vatrix multiplier (VMM), digital-to-analog converter (DAC), or analog shift-and-add (AS+A). We can reconfigure RUs and obtain CUs of various sizes according to the DNN's weight matrices. It hence assures a high energy-efficiency without damaging throughput given various DNN benchmarks. Evaluations on different DNN benchmarks show that Re2PIM can achieve 27×/34×/1.5× and 5.7×/17×/8.2× improvement in energy efficiency and computational throughput respectively compared to the state-of-art accelerators (PRIME / ISAAC / TIMELY).

Supplemental Material

GLSVLSI21-glsv056.mp4

mp4

118.9 MB

Download

References

Rajeev Balasubramonian and et al. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Trans. Archit. Code Optim., Vol. 14, 2, Article 14 (June 2017), 25 pages. https://doi.org/10.1145/3085572Google ScholarDigital Library
W. Cao and et al. 2019. Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices. In ICCAD. 1--7. https://doi.org/10.1109/ICCAD45719.2019.8942099Google Scholar
L. Chen and et al. 2017. Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar. In DATE. 19--24. https://doi.org/10.23919/DATE.2017.7926952Google Scholar
P. Chen and et al. 2015. Compact Modeling of RRAM Devices and Its Applications in 1T1R and 1S1R Array Design. IEEE Transactions on Electron Devices, Vol. 62, 12 (2015), 4022--4028. https://doi.org/10.1109/TED.2015.2492421Google ScholarCross Ref
P. Chi and et al. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In ISCA. 27--39. https://doi.org/10.1109/ISCA.2016.13Google ScholarDigital Library
Teyuh Chou and et al. 2019. CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm. In MICRO '52 (Columbus, OH, USA). 114--125. https://doi.org/10.1145/3352460.3358328Google ScholarDigital Library
Jacob Devlin and et al. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv: 1810.04805 [cs.CL]Google Scholar
Gerald Gamrath and et al. 2020. The SCIP Optimization Suite 7.0. Technical Report. Optimization Online. http://www.optimization-online.org/DB_HTML/2020/03/7705.htmlGoogle Scholar
Kaiming He and et al. 2015. Deep Residual Learning for Image Recognition. arxiv: 1512.03385 [cs.CV]Google Scholar
K. He and et al. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. 1026--1034. https://doi.org/10.1109/ICCV.2015.123Google ScholarDigital Library
Zhezhi He and et al. 2019. Noise Injection Adaption: End-to-End ReRAM Crossbar Non-Ideal Effect Adaption for Neural Network Mapping (DAC '19). ACM, New York, NY, USA, Article 57, 6 pages. https://doi.org/10.1145/3316781.3317870Google ScholarDigital Library
Andrew G. Howard and et al. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arxiv: 1704.04861 [cs.CV]Google Scholar
A. Karpathy and et al. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137. https://doi.org/10.1109/CVPR.2015.7298932Google Scholar
Alex Krizhevsky and et al. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM, Vol. 60, 6 (May 2017), 84--90. https://doi.org/10.1145/3065386Google ScholarDigital Library
W. Li and et al. 2020. Timely: Pushing Data Movements And Interfaces In Pim Accelerators Towards Local And In Time Domain. In ISCA. 832--845. https://doi.org/10.1109/ISCA45697.2020.00073Google ScholarDigital Library
M. O'Halloran and et al. 2004. A 10-nW 12-bit accurate analog storage cell with 10-aA leakage. IEEE Journal of Solid-State Circuits, Vol. 39, 11 (2004), 1985--1996. https://doi.org/10.1109/JSSC.2004.835817Google ScholarCross Ref
Adam Paszke and et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.Google Scholar
Fabrice Salvaire and et al. [n.d.]. PySpice. https://pyspice.fabrice-salvaire.frGoogle Scholar
A. Shafiee and et al. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In ISCA. 14--26. https://doi.org/10.1109/ISCA.2016.12Google ScholarDigital Library
N. Silberman and et al. 2016. TensorFlow-Slim image classification model library. https://github.com/tensorflow/models/tree/master/research/slimGoogle Scholar
Karen Simonyan and et al. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv: 1409.1556 [cs.CV]Google Scholar
M. Zhao and et al. 2018. Characterizing Endurance Degradation of Incremental Switching in Analog RRAM for Neuromorphic Systems. In IEDM. 20.2.1-20.2.4. https://doi.org/10.1109/IEDM.2018.8614664Google Scholar

Index Terms

Re2PIM: A Reconfigurable ReRAM-Based PIM Design for Variable-Sized Vector-Matrix Multiplication
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Reconfigurable computing
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures

Recommendations

A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations

Signal processors exploiting ASIC acceleration suffer from sky-rocketing manufacturing costs and long design cycles. FPGA-based systems provide a programmable alternative for exploiting computation parallelism, but the flexibility they provide is not as ...
Read More
Reconfiguration of FPGA for Domain Specific Applications Using Embedded System Approach
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing Systems

Today’s systems are more complex and need higher performance. To accomplish this, systems include more hardware compared to software. This increases the use of FPGAs in modern systems because of its reconfiguration capabilities. FPGA contains many ...
Read More
Design and implementation of a field programmable CRC circuit architecture

The design and implementation of a programmable cyclic redundancy check (CRC) computation circuit architecture, suitable for deployment in network related system-on-chips (SoCs) is presented. The architecture has been designed to be field reprogrammable ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI
June 2021
504 pages
ISBN:9781450383936
DOI:10.1145/3453688
General Chairs:
Yiran Chen
Duke University, USA
,
Victor Zhirnov
Semiconductor Research Corporation, USA
,
Program Chairs:
Avesta Sasan
George Mason University, USA
,
Ioannis Savidis
Drexel University, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
energy-efficient
neural networks
reconfigurable
reram-based accelerator
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,156submissions,27%
Upcoming Conference
GLSVLSI '24

Sponsor:

sigda

Great Lakes Symposium on VLSI 2024

June 12 - 14, 2024

Clearwater , FL , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 314
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Re2PIM: A Reconfigurable ReRAM-Based PIM Design for Variable-Sized Vector-Matrix Multiplication

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations

Reconfiguration of FPGA for Domain Specific Applications Using Embedded System Approach

Design and implementation of a field programmable CRC circuit architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Re2PIM: A Reconfigurable ReRAM-Based PIM Design for Variable-Sized Vector-Matrix Multiplication

GLSVLSI '21: Proceedings of the 2021 on Great Lakes Symposium on VLSI

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations

Reconfiguration of FPGA for Domain Specific Applications Using Embedded System Approach

Design and implementation of a field programmable CRC circuit architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media