Abstract
Modern mobile processors integrating an increasing number of cores into one single chip demand large-capacity, on-chip, last-level caches (LLCs) in order to achieve scalable performance improvements. However, adopting traditional memory technologies such as SRAM and embedded DRAM (eDRAM) leakage and scalability problems. Spin-transfer torque magnetic RAM (STT-MRAM) is a novel nonvolatile memory technology that has emerged as a promising alternative for constructing on-chip caches in high-end mobile processors. STT-MRAM has many advantages, such as short read latency, zero leakage from the memory cell, and better scalability than eDRAM and SRAM. Multilevel cell (MLC) STT-MRAM further enlarges capacity and reduces per-bit cost by storing more bits in one cell.
However, MLC STT-MRAM has long write latency which limits the effectiveness of MLC STT-MRAM-based LLCs. In this article, we address this limitation with three novel designs: line pairing (LP), line swapping (LS), and dynamic LP/LS enabler (DLE). LP forms fast cache lines by reorganizing MLC soft bits which are faster to write. LS dynamically stores frequently-written data into these fast cache lines. We then propose a dynamic LP/LS enabler (DLE) to enable LP and LS only if they help to improve the overall cache performance. Our experimental results show that the proposed designs improve system performance by 9--15% and reduce energy consumption by 14--21% for various types of mobile processors.
- Mohammad Alizadeh, Adel Javanmard, Shang-Tse Chuang, Sundar Iyer, and Yi Lu. 2012. Versatile refresh: Low complexity refresh scheduling for high-throughput multi-banked eDRAM. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems. 247--258. Google ScholarDigital Library
- ARM. 2012a. Cortex-A15. http://www.arm.com/products/processors/cortex-a/cortex-a15.php.Google Scholar
- ARM. 2012b. Cortex-A7. http://www.arm.com/products/processors/cortex-a/cortex-a7.php.Google Scholar
- ARM. 2011. ARM big.LITTLE technology. http://www.arm.com/products/processors/technologies/biglittleprocessing.php.Google Scholar
- Xiuyuan Bi, Mengjie Mao, Danghui Wang, and Hai Li. 2013. Unleashing the potential of MLC STT-RAM caches. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 429--436. Google ScholarDigital Library
- Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 143--154. Google ScholarDigital Library
- Yiran Chen, Xiaobin Wang, Wenzhong Zhu, Hai Li, Zhenyu Sun, Guangyu Sun, and Yuan Xie. 2010. Access scheme of multi-level cell spin-transfer torque random access memory and its optimization. In Proceedings of the IEEE International Midwest Symposium on Circuits and Systems. 1109--1112.Google ScholarCross Ref
- Yiran Chen, Weng-Fai Wong, Hai Li, and Cheng-Kok Koh. 2011. Processor caches built using multi-level spin-transfer torque RAM cells. In Proceedings of the International Symposium on Low Power Electronics and Design. 73--78. Google ScholarDigital Library
- Ping Chi, Cong Xu, Tao Zhang, Xiangyu Dong, and Yuan Xie. 2014. Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 301--308. Google ScholarDigital Library
- Suock Chung, K.-M. Rho, S.-D. Kim, H.-J. Suh, D.-J. Kim, H. J. Kim, S. H. Lee, J.-H. Park, H.-M. Hwang, S.-M. Hwang, J.-Y. Lee, Y.-B. Au, J.-U. Yi, Y.-H. Seo, D.-H. Jung, M.-S. Lee, S.-H. Cho, J.-N. Kim, G.-J. Park, J. Gyuan, A. Driskill-Smith, V. Nikitin, A. Ong, X. Tang, Y. Kim, J.-S. Rho, S.-K. Park, S. W. Chung, J. G. Jeong, and S. I. Hong. 2010. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application. In Proceedings of the IEEE International Electron Devices Meeting. 12--7.Google Scholar
- Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, H. Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the ACM/IEEE Design Automation Conference. IEEE, 554--559. Google ScholarDigital Library
- Fujitsu. 2012. LOOX. http://solutions.us.fujitsu.com/LOOX/.Google Scholar
- Preston Gralla. 2011. Motorola Xoom: The Missing Manual. O'Reilly Media, Inc. Google ScholarDigital Library
- Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 24--33. Google ScholarDigital Library
- M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proceedings of the IEEE International Electron Devices Meeting Technical Digest. 459--462.Google Scholar
- HP. 2010. CACTI. http://www.hpl.hp.com/research/cacti/.Google Scholar
- HTC. 2014. Desire 820. http://blog.htc.com/2014/09/htc-desire-820/.Google Scholar
- Intel. 2013. Atom C2000. http://ark.intel.com/products/71269.Google Scholar
- Intel. 2014. Atom Z3795. http://ark.intel.com/products/80267.Google Scholar
- Intel. 2015. Core i7-5557U. http://ark.intel.com/products/84993/.Google Scholar
- T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, K. Ito, H. Matsuoka, and H. Ohno. 2010. A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions. In Proceedings of the Symposium on VLSI Technology. 47--48.Google Scholar
- Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. 2006. Impact of NBTI on SRAM read stability and design for reliability. In Proceedings of the IEEE International Symposium on Quality Electronic Design. 210--218. Google ScholarDigital Library
- Jianhua Li, Liang Shi, Qingan Li, Chun Jason Xue, Yiran Chen, Yinlong Xu, and Wei Wang. 2013. Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh. ACM Trans. Des. Automat. Electron. Syst. 19, 1 (2013), 5:1--5:23. Google ScholarDigital Library
- Xiaohua Lou, Zheng Gao, Dimitar V. Dimitrov, and Michael X. Tang. 2008. Demonstration of multilevel cell spin transfer switching in MgO magnetic tunnel junctions. Appl. Phys. Lett. 93, 24 (2008), 242502--242503.Google ScholarCross Ref
- Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2 (2002), 50--58. Google ScholarDigital Library
- MediaTek. 2013. MT5692. http://event.mediatek.com/_en_octacore/.Google Scholar
- nVIDIA. 2012. Tegra 2. http://www.nvidia.com/object/tegra-superchip.html.Google Scholar
- nVIDIA. 2013. Tegra 4. http://www.nvidia.com/object/tegra-4-processor.html.Google Scholar
- Qualcomm. 2013. Snapdragon 615. https://www.qualcomm.com/products/snapdragon/processors/615.Google Scholar
- R. Sbiaa, R. Law, S. Y. H. Lua, E. L. Tan, T. Tahmasebi, C. C. Wang, and S. N. Piramanayagam. 2011. Spin transfer torque switching for multi-bit per cell magnetic memory with perpendicular anisotropy. Appl. Phys. Lett. 99, 9 (2011).Google ScholarCross Ref
- Mrigank Sharad, Rangharajan Venkatesan, Anand Raghunathan, and Kaushik Roy. 2013. Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches. In Proceedings of the International Symposium on Low Power Electronics and Design. 64--69. Google ScholarDigital Library
- Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 50--61. Google ScholarDigital Library
- Guangyu Sun, Huazhong Yang, and Yuan Xie. 2012. Performance/thermal-aware design of 3D-stacked L2 caches for CMPs. ACM Trans. Des. Autom. Electron. Syst. 17, 2 (2012), 13:1--13:20. Google ScholarDigital Library
- Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 239--249.Google ScholarCross Ref
- Zhenyu Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Proceedings of the IEEE/ACM Design Automation Conference. 1--6. Google ScholarDigital Library
- Dean M. Tullsen and Jeffery A. Brown. 2001. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 318--327. Google ScholarDigital Library
- Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2013. i2WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 234--245. Google ScholarDigital Library
- Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, USA, 34--45. Google ScholarDigital Library
- Wei Xu, Yiran Chen, Xiaobin Wang, and Tong Zhang. 2009. Improving STT MRAM storage density through smaller-than-worst-case transistor sizing. In Proceedings of the ACM/IEEE Design Automation Conference. 87--90. Google ScholarDigital Library
- Bo Zhao, Jun Yang, Youtao Zhang, Yiran Chen, and Hai Li. 2013. Common-source-line array: An area efficient memory architecture for bipolar nonvolatile devices. ACM Trans. Des. Autom. Electron. Syst. 18, 4 (2013), 57:1--57:18. Google ScholarDigital Library
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design. 264--268. Google ScholarDigital Library
Index Terms
- Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology
Recommendations
Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors
DAC '12: Proceedings of the 49th Annual Design Automation ConferenceMLC STT-MRAM (Multi-level Cell Spin-Transfer Torque Magnetic RAM), an emerging non-volatile memory technology, has become a promising candidate to construct L2 caches for high-end embedded processors. However, the long write latency limits the ...
Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories
New phase-change memory (PCM) devices have low-access latencies (like DRAM) and high capacities (i.e., low cost per bit, like Flash). In addition to being able to scale to smaller cell sizes than DRAM, a PCM cell can also store multiple bits per cell (...
Building and Optimizing MRAM-Based Commodity Memories
Emerging non-volatile memory technologies such as MRAM are promising design solutions for energy-efficient memory architecture, especially for mobile systems. However, building commodity MRAM by reusing DRAM designs is not straightforward. The existing ...
Comments