research-article

Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology

Authors:
Lei Jiang

University of Pittsburgh

University of Pittsburgh
View Profile

,
Bo Zhao

University of Pittsburgh

University of Pittsburgh
View Profile

,
Jun Yang

University of Pittsburgh

University of Pittsburgh
View Profile

,
Youtao Zhang

University of Pittsburgh

University of Pittsburgh
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 20 Issue 4Article No.: 54pp 1–24https://doi.org/10.1145/2764903

Published:28 September 2015Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Modern mobile processors integrating an increasing number of cores into one single chip demand large-capacity, on-chip, last-level caches (LLCs) in order to achieve scalable performance improvements. However, adopting traditional memory technologies such as SRAM and embedded DRAM (eDRAM) leakage and scalability problems. Spin-transfer torque magnetic RAM (STT-MRAM) is a novel nonvolatile memory technology that has emerged as a promising alternative for constructing on-chip caches in high-end mobile processors. STT-MRAM has many advantages, such as short read latency, zero leakage from the memory cell, and better scalability than eDRAM and SRAM. Multilevel cell (MLC) STT-MRAM further enlarges capacity and reduces per-bit cost by storing more bits in one cell.

However, MLC STT-MRAM has long write latency which limits the effectiveness of MLC STT-MRAM-based LLCs. In this article, we address this limitation with three novel designs: line pairing (LP), line swapping (LS), and dynamic LP/LS enabler (DLE). LP forms fast cache lines by reorganizing MLC soft bits which are faster to write. LS dynamically stores frequently-written data into these fast cache lines. We then propose a dynamic LP/LS enabler (DLE) to enable LP and LS only if they help to improve the overall cache performance. Our experimental results show that the proposed designs improve system performance by 9--15% and reduce energy consumption by 14--21% for various types of mobile processors.

References

Mohammad Alizadeh, Adel Javanmard, Shang-Tse Chuang, Sundar Iyer, and Yi Lu. 2012. Versatile refresh: Low complexity refresh scheduling for high-throughput multi-banked eDRAM. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems. 247--258. Google ScholarDigital Library
ARM. 2012a. Cortex-A15. http://www.arm.com/products/processors/cortex-a/cortex-a15.php.Google Scholar
ARM. 2012b. Cortex-A7. http://www.arm.com/products/processors/cortex-a/cortex-a7.php.Google Scholar
ARM. 2011. ARM big.LITTLE technology. http://www.arm.com/products/processors/technologies/biglittleprocessing.php.Google Scholar
Xiuyuan Bi, Mengjie Mao, Danghui Wang, and Hai Li. 2013. Unleashing the potential of MLC STT-RAM caches. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 429--436. Google ScholarDigital Library
Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L³Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 143--154. Google ScholarDigital Library
Yiran Chen, Xiaobin Wang, Wenzhong Zhu, Hai Li, Zhenyu Sun, Guangyu Sun, and Yuan Xie. 2010. Access scheme of multi-level cell spin-transfer torque random access memory and its optimization. In Proceedings of the IEEE International Midwest Symposium on Circuits and Systems. 1109--1112.Google ScholarCross Ref
Yiran Chen, Weng-Fai Wong, Hai Li, and Cheng-Kok Koh. 2011. Processor caches built using multi-level spin-transfer torque RAM cells. In Proceedings of the International Symposium on Low Power Electronics and Design. 73--78. Google ScholarDigital Library
Ping Chi, Cong Xu, Tao Zhang, Xiangyu Dong, and Yuan Xie. 2014. Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 301--308. Google ScholarDigital Library
Suock Chung, K.-M. Rho, S.-D. Kim, H.-J. Suh, D.-J. Kim, H. J. Kim, S. H. Lee, J.-H. Park, H.-M. Hwang, S.-M. Hwang, J.-Y. Lee, Y.-B. Au, J.-U. Yi, Y.-H. Seo, D.-H. Jung, M.-S. Lee, S.-H. Cho, J.-N. Kim, G.-J. Park, J. Gyuan, A. Driskill-Smith, V. Nikitin, A. Ong, X. Tang, Y. Kim, J.-S. Rho, S.-K. Park, S. W. Chung, J. G. Jeong, and S. I. Hong. 2010. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application. In Proceedings of the IEEE International Electron Devices Meeting. 12--7.Google Scholar
Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, H. Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the ACM/IEEE Design Automation Conference. IEEE, 554--559. Google ScholarDigital Library
Fujitsu. 2012. LOOX. http://solutions.us.fujitsu.com/LOOX/.Google Scholar
Preston Gralla. 2011. Motorola Xoom: The Missing Manual. O'Reilly Media, Inc. Google ScholarDigital Library
Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 24--33. Google ScholarDigital Library
M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proceedings of the IEEE International Electron Devices Meeting Technical Digest. 459--462.Google Scholar
HP. 2010. CACTI. http://www.hpl.hp.com/research/cacti/.Google Scholar
HTC. 2014. Desire 820. http://blog.htc.com/2014/09/htc-desire-820/.Google Scholar
Intel. 2013. Atom C2000. http://ark.intel.com/products/71269.Google Scholar
Intel. 2014. Atom Z3795. http://ark.intel.com/products/80267.Google Scholar
Intel. 2015. Core i7-5557U. http://ark.intel.com/products/84993/.Google Scholar
T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, K. Ito, H. Matsuoka, and H. Ohno. 2010. A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions. In Proceedings of the Symposium on VLSI Technology. 47--48.Google Scholar
Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. 2006. Impact of NBTI on SRAM read stability and design for reliability. In Proceedings of the IEEE International Symposium on Quality Electronic Design. 210--218. Google ScholarDigital Library
Jianhua Li, Liang Shi, Qingan Li, Chun Jason Xue, Yiran Chen, Yinlong Xu, and Wei Wang. 2013. Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh. ACM Trans. Des. Automat. Electron. Syst. 19, 1 (2013), 5:1--5:23. Google ScholarDigital Library
Xiaohua Lou, Zheng Gao, Dimitar V. Dimitrov, and Michael X. Tang. 2008. Demonstration of multilevel cell spin transfer switching in MgO magnetic tunnel junctions. Appl. Phys. Lett. 93, 24 (2008), 242502--242503.Google ScholarCross Ref
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2 (2002), 50--58. Google ScholarDigital Library
MediaTek. 2013. MT5692. http://event.mediatek.com/_en_octacore/.Google Scholar
nVIDIA. 2012. Tegra 2. http://www.nvidia.com/object/tegra-superchip.html.Google Scholar
nVIDIA. 2013. Tegra 4. http://www.nvidia.com/object/tegra-4-processor.html.Google Scholar
Qualcomm. 2013. Snapdragon 615. https://www.qualcomm.com/products/snapdragon/processors/615.Google Scholar
R. Sbiaa, R. Law, S. Y. H. Lua, E. L. Tan, T. Tahmasebi, C. C. Wang, and S. N. Piramanayagam. 2011. Spin transfer torque switching for multi-bit per cell magnetic memory with perpendicular anisotropy. Appl. Phys. Lett. 99, 9 (2011).Google ScholarCross Ref
Mrigank Sharad, Rangharajan Venkatesan, Anand Raghunathan, and Kaushik Roy. 2013. Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches. In Proceedings of the International Symposium on Low Power Electronics and Design. 64--69. Google ScholarDigital Library
Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 50--61. Google ScholarDigital Library
Guangyu Sun, Huazhong Yang, and Yuan Xie. 2012. Performance/thermal-aware design of 3D-stacked L2 caches for CMPs. ACM Trans. Des. Autom. Electron. Syst. 17, 2 (2012), 13:1--13:20. Google ScholarDigital Library
Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 239--249.Google ScholarCross Ref
Zhenyu Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Proceedings of the IEEE/ACM Design Automation Conference. 1--6. Google ScholarDigital Library
Dean M. Tullsen and Jeffery A. Brown. 2001. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 318--327. Google ScholarDigital Library
Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2013. i²WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 234--245. Google ScholarDigital Library
Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, USA, 34--45. Google ScholarDigital Library
Wei Xu, Yiran Chen, Xiaobin Wang, and Tong Zhang. 2009. Improving STT MRAM storage density through smaller-than-worst-case transistor sizing. In Proceedings of the ACM/IEEE Design Automation Conference. 87--90. Google ScholarDigital Library
Bo Zhao, Jun Yang, Youtao Zhang, Yiran Chen, and Hai Li. 2013. Common-source-line array: An area efficient memory architecture for bipolar nonvolatile devices. ACM Trans. Des. Autom. Electron. Syst. 18, 4 (2013), 57:1--57:18. Google ScholarDigital Library
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design. 264--268. Google ScholarDigital Library

Index Terms

Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors
DAC '12: Proceedings of the 49th Annual Design Automation Conference

MLC STT-MRAM (Multi-level Cell Spin-Transfer Torque Magnetic RAM), an emerging non-volatile memory technology, has become a promising candidate to construct L2 caches for high-end embedded processors. However, the long write latency limits the ...
Read More
Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories

New phase-change memory (PCM) devices have low-access latencies (like DRAM) and high capacities (i.e., low cost per bit, like Flash). In addition to being able to scale to smaller cell sizes than DRAM, a PCM cell can also store multiple bits per cell (...
Read More
Building and Optimizing MRAM-Based Commodity Memories

Emerging non-volatile memory technologies such as MRAM are promising design solutions for energy-efficient memory architecture, especially for mobile systems. However, building commodity MRAM by reusing DRAM designs is not straightforward. The existing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Design Automation of Electronic Systems Volume 20, Issue 4
Special Issue on Reliable, Resilient, and Robust Design of Circuits and Systems
September 2015
475 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2830627
Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 28 September 2015
- Revised: 1 April 2015
- Received: 1 January 2015
Published in todaes Volume 20, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Spin-transfer torque
magnetic random access memory
multilevel cell
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 235
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors

Efficient Data Mapping and Buffering Techniques for Multilevel Cell Phase-Change Memories

Building and Optimizing MRAM-Based Commodity Memories