ABSTRACT
ReRAM-based deep neural network (DNN) accelerator shows enormous potential because of ReRAM's high computational-density and power-efficiency. A typical feature of DNNs is that weight matrix size varies across diverse DNNs and DNN layers. However, current ReRAM-based DNN accelerators adopt a fixed-sized compute unit (CU) design, resulting in a dilemma of trading off between throughput and energy-efficiency: when computing large vector-matrix multiplication with small CUs, the overhead of the peripheral circuits is relatively high; when computing small vector-matrix multiplication with large CUs, the low utilization of ReRAM crossbars damages the throughput. In this work, we propose Re2PIM, a reconfigurable ReRAM-based DNN accelerator. Each tile of Re2PIM is composed of reconfigurable units (RUs), which can be reconfigured as vector-vatrix multiplier (VMM), digital-to-analog converter (DAC), or analog shift-and-add (AS+A). We can reconfigure RUs and obtain CUs of various sizes according to the DNN's weight matrices. It hence assures a high energy-efficiency without damaging throughput given various DNN benchmarks. Evaluations on different DNN benchmarks show that Re2PIM can achieve 27×/34×/1.5× and 5.7×/17×/8.2× improvement in energy efficiency and computational throughput respectively compared to the state-of-art accelerators (PRIME / ISAAC / TIMELY).
Supplemental Material
- Rajeev Balasubramonian and et al. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Trans. Archit. Code Optim., Vol. 14, 2, Article 14 (June 2017), 25 pages. https://doi.org/10.1145/3085572Google ScholarDigital Library
- W. Cao and et al. 2019. Neural Network-Inspired Analog-to-Digital Conversion to Achieve Super-Resolution with Low-Precision RRAM Devices. In ICCAD. 1--7. https://doi.org/10.1109/ICCAD45719.2019.8942099Google Scholar
- L. Chen and et al. 2017. Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar. In DATE. 19--24. https://doi.org/10.23919/DATE.2017.7926952Google Scholar
- P. Chen and et al. 2015. Compact Modeling of RRAM Devices and Its Applications in 1T1R and 1S1R Array Design. IEEE Transactions on Electron Devices, Vol. 62, 12 (2015), 4022--4028. https://doi.org/10.1109/TED.2015.2492421Google ScholarCross Ref
- P. Chi and et al. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In ISCA. 27--39. https://doi.org/10.1109/ISCA.2016.13Google ScholarDigital Library
- Teyuh Chou and et al. 2019. CASCADE: Connecting RRAMs to Extend Analog Dataflow In An End-To-End In-Memory Processing Paradigm. In MICRO '52 (Columbus, OH, USA). 114--125. https://doi.org/10.1145/3352460.3358328Google ScholarDigital Library
- Jacob Devlin and et al. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv: 1810.04805 [cs.CL]Google Scholar
- Gerald Gamrath and et al. 2020. The SCIP Optimization Suite 7.0. Technical Report. Optimization Online. http://www.optimization-online.org/DB_HTML/2020/03/7705.htmlGoogle Scholar
- Kaiming He and et al. 2015. Deep Residual Learning for Image Recognition. arxiv: 1512.03385 [cs.CV]Google Scholar
- K. He and et al. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. 1026--1034. https://doi.org/10.1109/ICCV.2015.123Google ScholarDigital Library
- Zhezhi He and et al. 2019. Noise Injection Adaption: End-to-End ReRAM Crossbar Non-Ideal Effect Adaption for Neural Network Mapping (DAC '19). ACM, New York, NY, USA, Article 57, 6 pages. https://doi.org/10.1145/3316781.3317870Google ScholarDigital Library
- Andrew G. Howard and et al. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arxiv: 1704.04861 [cs.CV]Google Scholar
- A. Karpathy and et al. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137. https://doi.org/10.1109/CVPR.2015.7298932Google Scholar
- Alex Krizhevsky and et al. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM, Vol. 60, 6 (May 2017), 84--90. https://doi.org/10.1145/3065386Google ScholarDigital Library
- W. Li and et al. 2020. Timely: Pushing Data Movements And Interfaces In Pim Accelerators Towards Local And In Time Domain. In ISCA. 832--845. https://doi.org/10.1109/ISCA45697.2020.00073Google ScholarDigital Library
- M. O'Halloran and et al. 2004. A 10-nW 12-bit accurate analog storage cell with 10-aA leakage. IEEE Journal of Solid-State Circuits, Vol. 39, 11 (2004), 1985--1996. https://doi.org/10.1109/JSSC.2004.835817Google ScholarCross Ref
- Adam Paszke and et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.Google Scholar
- Fabrice Salvaire and et al. [n.d.]. PySpice. https://pyspice.fabrice-salvaire.frGoogle Scholar
- A. Shafiee and et al. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In ISCA. 14--26. https://doi.org/10.1109/ISCA.2016.12Google ScholarDigital Library
- N. Silberman and et al. 2016. TensorFlow-Slim image classification model library. https://github.com/tensorflow/models/tree/master/research/slimGoogle Scholar
- Karen Simonyan and et al. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv: 1409.1556 [cs.CV]Google Scholar
- M. Zhao and et al. 2018. Characterizing Endurance Degradation of Incremental Switching in Analog RRAM for Neuromorphic Systems. In IEDM. 20.2.1-20.2.4. https://doi.org/10.1109/IEDM.2018.8614664Google Scholar
Index Terms
- Re2PIM: A Reconfigurable ReRAM-Based PIM Design for Variable-Sized Vector-Matrix Multiplication
Recommendations
A coarse-grain reconfigurable architecture for multimedia applications supporting subword and floating-point calculations
Signal processors exploiting ASIC acceleration suffer from sky-rocketing manufacturing costs and long design cycles. FPGA-based systems provide a programmable alternative for exploiting computation parallelism, but the flexibility they provide is not as ...
Reconfiguration of FPGA for Domain Specific Applications Using Embedded System Approach
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing SystemsToday’s systems are more complex and need higher performance. To accomplish this, systems include more hardware compared to software. This increases the use of FPGAs in modern systems because of its reconfiguration capabilities. FPGA contains many ...
Design and implementation of a field programmable CRC circuit architecture
The design and implementation of a programmable cyclic redundancy check (CRC) computation circuit architecture, suitable for deployment in network related system-on-chips (SoCs) is presented. The architecture has been designed to be field reprogrammable ...
Comments