ABSTRACT
It is extremely challenging to deploy computing-intensive convolutional neural networks (CNNs) with rich parameters in mobile devices because of their limited computing resources and low power budgets. Although prior works build fast and energy-efficient CNN accelerators by greatly sacrificing test accuracy, mobile devices have to guarantee high CNN test accuracy for critical applications, e.g., unlocking phones by face recognitions. In this paper, we propose a 3D XPoint ReRAM-based process-in-memory architecture, 3DICT, to provide various test accuracies to applications with different priorities by lookup-based CNN tests that dynamically exploit the trade-off between test accuracy and latency. Compared to the state-of-the-art accelerators, on average, 3DICT improves the CNN test performance per Watt by 13% ∼ 61× and guarantees 9-year endurance under various CNN test accuracy requirements.
- [1]. , “YodaNN: An Ultra-Low Power CNN Accelerator Based on Binary Weights;” in ISVLSI, pages 236–241, July 2016.Google Scholar
- [2]. , “LCNN: Lookup-based Convolutional Neural Network,” in CVPR, 2017.Google Scholar
- [3]. , “A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit,” Scientific Reports, 7, 2017.Google Scholar
- [4]. , “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” in ISSCC, 2016.Google Scholar
- [5]. , “PRIME: A Novel PIM Architecture for Neural Network Computation in ReRAM-Based Main Memory;” in ISCA, 2016.Google Scholar
- [6]. , “GemDroid: A Framework to Evaluate Mobile Platforms,” in SIGMETRICS, 2014.Google Scholar
- [7]. , “Multi-column deep neural networks for image classification,” in CVPR, 2012.Google Scholar
- [8]. , “Torch7: A Matlab-like Environment for Machine Learning,” in BigLeam, NIPS Workshop, 2011.Google Scholar
- [9]. , “Binaryconnect: Training deep neural networks with binary weights during propagations;” in NIPS, 2015.Google Scholar
- [10]. , “NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory,” TCAD, 2012.Google Scholar
- [11]. , “CNP: An FPGA-based processor for Convolutional Networks,” in FPL, 2009.Google Scholar
- [12]. , “TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory,” in ASPLOS, 2017.Google Scholar
- [13]. , “Technological exploration of RRAM crossbar array for matrix-vector multiplication,” in ASPDAC, 2015.Google Scholar
- [14]. , “MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints,” in MobiSys, 2016.Google Scholar
- [15]. , “Deep Residual Learning for Image Recognition;” in CVPR, 2016.Google Scholar
- [16]. , “Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication;” in DAC, 2016.Google Scholar
- [17]. , “Enhancing Phase Change Memory Lifetime through Fine-Grained Current Regulation and Voltage Upscaling,” in ISLPED, 2011.Google Scholar
- [18]. , “XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs,” in ISLPED, 2017.Google Scholar
- [19]. , “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS, 2012.Google Scholar
- [20]. , “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, 86 (11), Nov 1998.Google ScholarCross Ref
- [21]. , “Accelerating mobile augmented reality on a handheld platform,” in ICCD, 2009.Google Scholar
- [22]. , “Anatomy of Ag/Hafnia-Based Selectors with 1010 Nonlinearity,” Advanced Materials, 29 (12), 2017.Google Scholar
- [23]. , “Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI,” in ISSCC, 2017.Google Scholar
- [24]. , “An ADC Performance, Power and Area Survey from 1997 to 2017,” http://web.stanford.edu/-murmann/adcsurvey.htmlGoogle Scholar
- [25]. , “An Energy-Efficient Digital ReRAM-Crossbar-Based CNN With Bitwise Parallelism,” JESSCDC, 2017.Google Scholar
- [26]. , “XNOR-Net: Imagenet Classification Using Binary Convolutional Neural Networks,” in ECCV, 2016.Google Scholar
- [27]. , “FODLAM: a first-order deep learning accelerator model,” https://github.com/cucapra/fodlamGoogle Scholar
- [28]. , “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in ISCA, 2016.Google Scholar
- [29]. , “Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis,” in ICDAR, 2003.Google Scholar
- [30]. , “Binary convolutional neural network on RRAM,” in ASP-DAC, 2017.Google Scholar
- [31]. , “Speeding up crossbar resistive memory by exploiting in-memory data patterns,” in ICCAD, 2017.Google Scholar
- [32]. , “Metal-Oxide RRAM;” Proceedings of the IEEE, 2012.Google Scholar
- [33]. , “Overcoming the challenges of crossbar resistive memory architectures;” in HPCA, 2015.Google Scholar
- [34]. , “A 130.7-mm22-Layer 32-Gb ReRAM Memory Device in 24-nm Technology,” JSSC, 2014.Google Scholar
- [35]. , “Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks,” in FPGA, 2015.Google Scholar
- [36]. , “A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology,” in ISCA, 2009.Google Scholar
Index Terms
- 3DICT: A Reliable and QoS Capable Mobile Process-In-Memory Architecture for Lookup-based CNNs in 3D XPoint ReRAMs
Recommendations
A frequent-value based PRAM memory architecture
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation ConferencePhase Change Random Access Memory (PRAM) has great potential as the replacement of DRAM as main memory, due to its advantages of high density, non-volatility, fast read speed, and excellent scalability. However, poor endurance and high write energy ...
Initial experience with 3D XPoint main memory
Abstract3D XPoint is the first commercially available main memory NVM solution targeting mainstream computer systems. Previous database studies on NVM memory evaluate their proposed techniques mainly on simulated or emulated NVM hardware. In this paper, ...
An Energy Efficient 3D-Heterogeneous Main Memory Architecture for Mobile Devices
MEMSYS '20: Proceedings of the International Symposium on Memory SystemsThe demand for main memory capacity is ever increasing in mobile devices and embedded systems. Dynamic Random Access Memories (DRAMs) can not keep pace with the required main memory capacities because of the restrictions in improving the cell density ...
Comments