skip to main content
research-article

Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology

Published:28 September 2015Publication History
Skip Abstract Section

Abstract

Modern mobile processors integrating an increasing number of cores into one single chip demand large-capacity, on-chip, last-level caches (LLCs) in order to achieve scalable performance improvements. However, adopting traditional memory technologies such as SRAM and embedded DRAM (eDRAM) leakage and scalability problems. Spin-transfer torque magnetic RAM (STT-MRAM) is a novel nonvolatile memory technology that has emerged as a promising alternative for constructing on-chip caches in high-end mobile processors. STT-MRAM has many advantages, such as short read latency, zero leakage from the memory cell, and better scalability than eDRAM and SRAM. Multilevel cell (MLC) STT-MRAM further enlarges capacity and reduces per-bit cost by storing more bits in one cell.

However, MLC STT-MRAM has long write latency which limits the effectiveness of MLC STT-MRAM-based LLCs. In this article, we address this limitation with three novel designs: line pairing (LP), line swapping (LS), and dynamic LP/LS enabler (DLE). LP forms fast cache lines by reorganizing MLC soft bits which are faster to write. LS dynamically stores frequently-written data into these fast cache lines. We then propose a dynamic LP/LS enabler (DLE) to enable LP and LS only if they help to improve the overall cache performance. Our experimental results show that the proposed designs improve system performance by 9--15% and reduce energy consumption by 14--21% for various types of mobile processors.

References

  1. Mohammad Alizadeh, Adel Javanmard, Shang-Tse Chuang, Sundar Iyer, and Yi Lu. 2012. Versatile refresh: Low complexity refresh scheduling for high-throughput multi-banked eDRAM. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems. 247--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARM. 2012a. Cortex-A15. http://www.arm.com/products/processors/cortex-a/cortex-a15.php.Google ScholarGoogle Scholar
  3. ARM. 2012b. Cortex-A7. http://www.arm.com/products/processors/cortex-a/cortex-a7.php.Google ScholarGoogle Scholar
  4. ARM. 2011. ARM big.LITTLE technology. http://www.arm.com/products/processors/technologies/biglittleprocessing.php.Google ScholarGoogle Scholar
  5. Xiuyuan Bi, Mengjie Mao, Danghui Wang, and Hai Li. 2013. Unleashing the potential of MLC STT-RAM caches. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 429--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mu-Tien Chang, Paul Rosenfeld, Shih-Lien Lu, and Bruce Jacob. 2013. Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 143--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yiran Chen, Xiaobin Wang, Wenzhong Zhu, Hai Li, Zhenyu Sun, Guangyu Sun, and Yuan Xie. 2010. Access scheme of multi-level cell spin-transfer torque random access memory and its optimization. In Proceedings of the IEEE International Midwest Symposium on Circuits and Systems. 1109--1112.Google ScholarGoogle ScholarCross RefCross Ref
  8. Yiran Chen, Weng-Fai Wong, Hai Li, and Cheng-Kok Koh. 2011. Processor caches built using multi-level spin-transfer torque RAM cells. In Proceedings of the International Symposium on Low Power Electronics and Design. 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ping Chi, Cong Xu, Tao Zhang, Xiangyu Dong, and Yuan Xie. 2014. Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 301--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Suock Chung, K.-M. Rho, S.-D. Kim, H.-J. Suh, D.-J. Kim, H. J. Kim, S. H. Lee, J.-H. Park, H.-M. Hwang, S.-M. Hwang, J.-Y. Lee, Y.-B. Au, J.-U. Yi, Y.-H. Seo, D.-H. Jung, M.-S. Lee, S.-H. Cho, J.-N. Kim, G.-J. Park, J. Gyuan, A. Driskill-Smith, V. Nikitin, A. Ong, X. Tang, Y. Kim, J.-S. Rho, S.-K. Park, S. W. Chung, J. G. Jeong, and S. I. Hong. 2010. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application. In Proceedings of the IEEE International Electron Devices Meeting. 12--7.Google ScholarGoogle Scholar
  11. Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, H. Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the ACM/IEEE Design Automation Conference. IEEE, 554--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fujitsu. 2012. LOOX. http://solutions.us.fujitsu.com/LOOX/.Google ScholarGoogle Scholar
  13. Preston Gralla. 2011. Motorola Xoom: The Missing Manual. O'Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada, M. Shoji, H. Hachino, C. Fukumoto, H. Nagao, and H. Kano. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. In Proceedings of the IEEE International Electron Devices Meeting Technical Digest. 459--462.Google ScholarGoogle Scholar
  16. HP. 2010. CACTI. http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  17. HTC. 2014. Desire 820. http://blog.htc.com/2014/09/htc-desire-820/.Google ScholarGoogle Scholar
  18. Intel. 2013. Atom C2000. http://ark.intel.com/products/71269.Google ScholarGoogle Scholar
  19. Intel. 2014. Atom Z3795. http://ark.intel.com/products/80267.Google ScholarGoogle Scholar
  20. Intel. 2015. Core i7-5557U. http://ark.intel.com/products/84993/.Google ScholarGoogle Scholar
  21. T. Ishigaki, T. Kawahara, R. Takemura, K. Ono, K. Ito, H. Matsuoka, and H. Ohno. 2010. A multi-level-cell spin-transfer torque memory with series-stacked magnetotunnel junctions. In Proceedings of the Symposium on VLSI Technology. 47--48.Google ScholarGoogle Scholar
  22. Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. 2006. Impact of NBTI on SRAM read stability and design for reliability. In Proceedings of the IEEE International Symposium on Quality Electronic Design. 210--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jianhua Li, Liang Shi, Qingan Li, Chun Jason Xue, Yiran Chen, Yinlong Xu, and Wei Wang. 2013. Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh. ACM Trans. Des. Automat. Electron. Syst. 19, 1 (2013), 5:1--5:23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xiaohua Lou, Zheng Gao, Dimitar V. Dimitrov, and Michael X. Tang. 2008. Demonstration of multilevel cell spin transfer switching in MgO magnetic tunnel junctions. Appl. Phys. Lett. 93, 24 (2008), 242502--242503.Google ScholarGoogle ScholarCross RefCross Ref
  25. Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2 (2002), 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. MediaTek. 2013. MT5692. http://event.mediatek.com/_en_octacore/.Google ScholarGoogle Scholar
  27. nVIDIA. 2012. Tegra 2. http://www.nvidia.com/object/tegra-superchip.html.Google ScholarGoogle Scholar
  28. nVIDIA. 2013. Tegra 4. http://www.nvidia.com/object/tegra-4-processor.html.Google ScholarGoogle Scholar
  29. Qualcomm. 2013. Snapdragon 615. https://www.qualcomm.com/products/snapdragon/processors/615.Google ScholarGoogle Scholar
  30. R. Sbiaa, R. Law, S. Y. H. Lua, E. L. Tan, T. Tahmasebi, C. C. Wang, and S. N. Piramanayagam. 2011. Spin transfer torque switching for multi-bit per cell magnetic memory with perpendicular anisotropy. Appl. Phys. Lett. 99, 9 (2011).Google ScholarGoogle ScholarCross RefCross Ref
  31. Mrigank Sharad, Rangharajan Venkatesan, Anand Raghunathan, and Kaushik Roy. 2013. Multi-level magnetic RAM using domain wall shift for energy-efficient, high-density caches. In Proceedings of the International Symposium on Low Power Electronics and Design. 64--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Guangyu Sun, Huazhong Yang, and Yuan Xie. 2012. Performance/thermal-aware design of 3D-stacked L2 caches for CMPs. ACM Trans. Des. Autom. Electron. Syst. 17, 2 (2012), 13:1--13:20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 239--249.Google ScholarGoogle ScholarCross RefCross Ref
  35. Zhenyu Sun, Wenqing Wu, and Hai Li. 2013. Cross-layer racetrack memory design for ultra high density and low power consumption. In Proceedings of the IEEE/ACM Design Automation Conference. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Dean M. Tullsen and Jeffery A. Brown. 2001. Handling long-latency loads in a simultaneous multithreading processor. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture. 318--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jue Wang, Xiangyu Dong, Yuan Xie, and Norman P. Jouppi. 2013. i2WAP: Improving non-volatile cache lifetime by reducing inter-and intra-set write variations. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 234--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture. ACM, New York, NY, USA, 34--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Wei Xu, Yiran Chen, Xiaobin Wang, and Tong Zhang. 2009. Improving STT MRAM storage density through smaller-than-worst-case transistor sizing. In Proceedings of the ACM/IEEE Design Automation Conference. 87--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Bo Zhao, Jun Yang, Youtao Zhang, Yiran Chen, and Hai Li. 2013. Common-source-line array: An area efficient memory architecture for bipolar nonvolatile devices. ACM Trans. Des. Autom. Electron. Syst. 18, 4 (2013), 57:1--57:18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design. 264--268. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Constructing Large and Fast On-Chip Cache for Mobile Processors with Multilevel Cell STT-MRAM Technology

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 20, Issue 4
      Special Issue on Reliable, Resilient, and Robust Design of Circuits and Systems
      September 2015
      475 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/2830627
      • Editor:
      • Naehyuck Chang
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 September 2015
      • Revised: 1 April 2015
      • Received: 1 January 2015
      Published in todaes Volume 20, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader