Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy

Authors:
Gabriel Rodríguez

Universidade da Coruña, Spain

Universidade da Coruña, Spain
View Profile

,
Juan Touriño

Universidade da Coruña, Spain

Universidade da Coruña, Spain
View Profile

,
Mahmut T. Kandemir

Pennsylvania State University, University Park, PA

Pennsylvania State University, University Park, PA
View Profile

ACM Transactions on Architecture and Code Optimization Volume 11 Issue 4Article No.: 38pp 1–26https://doi.org/10.1145/2669556

Published:08 December 2014Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

On-chip power consumption is one of the fundamental challenges of current technology scaling. Cache memories consume a sizable part of this power, particularly due to leakage energy. STT-RAM is one of several new memory technologies that have been proposed in order to improve power while preserving performance. It features high density and low leakage, but at the expense of write energy and performance. This article explores the use of STT-RAM--based scratchpad memories that trade nonvolatility in exchange for faster and less energetically expensive accesses, making them feasible for on-chip implementation in embedded systems. A novel multiretention scratchpad partitioning is proposed, featuring multiple storage spaces with different retention, energy, and performance characteristics. A customized compiler-based allocation algorithm suitable for use with such a scratchpad organization is described. Our experiments indicate that a multiretention STT-RAM scratchpad can provide energy savings of 53% with respect to an iso-area, hardware-managed SRAM cache.

References

ARM. 2010. Cortex-R5 Technical Reference Manual. Technical Report DDI-0460D. ARM Limited, Cambridge, UK.Google Scholar
O. Avissar, R. Barua, and D. Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26. Google ScholarDigital Library
R. Banakar, S. Steinke, L. Bo-Sik, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. 73--78. Google ScholarDigital Library
L. A. D. Bathen and N. Dutt. 2012. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories. In Proceedings of the 49th Annual Design Automation Conference. 447--452. Google ScholarDigital Library
L. A. D. Bathen, N. D. Dutt, D. Shin, and S.-S. Lim. 2011. SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis. 79--88. Google ScholarDigital Library
C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Department of Computer Science, Princeton University, Princeton, NJ. Google ScholarDigital Library
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, et al. 2011. The gem5 simulator. ACM Comput. Arch. News 39, 2, 1--7. Google ScholarDigital Library
S. Borkar and A. A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5, 67--77. Google ScholarDigital Library
J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp, R. K. Gupta, R. Jhala, and S. Swanson. 2011. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems. 105--118. Google ScholarDigital Library
Z. Diao, Z. Li, S. Wang, Y. Ding, A. Panchula, E. Chen, et al. 2007. Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory. J. Phys. Condens. Matter 19, 165209.Google ScholarCross Ref
H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th International Symposium on Computer Architecture. 365--376. Google ScholarDigital Library
B. Flachs, S. Asano, S. H. Dhong, P. Hotstee, G. Gervais, R. Kim, et al. 2005. A streaming processing unit for a CELL processor. In Proceedings of the IEEE International Solid-State Circuits Conference. 134--135.Google ScholarCross Ref
K. Flautner, N. S. Kim, S. M. Martin, D. Blaauw, and T. N. Mudge. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of the 29th International Symposium on Computer Architecture. 148--157. Google ScholarDigital Library
X. Guo, E. Ipek, and T. Soyata. 2010. Resistive computation: Avoiding the power wall with low-leakage, STT-MRAM based computing. In Proceedings of the 37th International Symposium on Computer Architecture. 371--382. Google ScholarDigital Library
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. 2011. Toward dark silicon servers. IEEE Micro 31, 4, 6--15. Google ScholarDigital Library
J. L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM Comput. Arch. News 34, 4, 1--17. Google ScholarDigital Library
M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, et al. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories, Albuquerque, NM.Google Scholar
J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2013. Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21, 6, 1094--1102.Google ScholarDigital Library
J. Hu, Q. Zhuge, C. J. Xue, W.-C. Tseng, and E. H.-M. Sha. 2014. Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors. ACM Trans. Embed. Comput. Syst. 13, 4, 79. Google ScholarDigital Library
ITRS. 2012. International Technology Roadmap for Semiconductors. Retrieved from http://www.itrs.net/Links/2012ITRS/Home2012.htm.Google Scholar
A. Jog, A. K. Mishra, C. Xu, Y. Xie, V. Narayanan, R. Iyer, and C. R. Das. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th Annual Design Automation Conference. 243--252. Google ScholarDigital Library
E. Kultursay, K. Swaminathan, V. Saripalli, V. Narayanan, M. T. Kandemir, and S. Datta. 2012. Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores. In Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis. 245--254. Google ScholarDigital Library
E. Kultursay, M. T. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software. 256--267.Google Scholar
B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, et al. 2010. Phase-change technology and the future of main memory. IEEE Micro 30, 1, 131--141. Google ScholarDigital Library
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture. 330--335. Google ScholarDigital Library
J. Li, L. Shi, Q. Li, C. J. Xue, Y. Chen, and Y. Xu. 2013b. Cache coherence enabled adaptive refresh for volatile STT-RAM. In Proceedings of Design, Automation and Test in Europe. 1247--1250. Google ScholarDigital Library
Q. Li, J. Li, L. Shi, C. J. Xue, Y. Chen, and Y. He. 2013a. Compiler-assisted refresh minimization for volatile STT-RAM cache. In Proceedings of the 18th Asia and South Pacific Design Automation Conference. 273--278.Google Scholar
Q. Li, Y. Zhao, J. Hu, C. J. Xue, E. Sha, and Y. He. 2012. MGC: Multiple graph-coloring for non-volatile memory based hybrid scratchpad memory. In Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures. 17--24. Google ScholarDigital Library
X. Liang, R. Canal, G.-Y. Wei, and D. Brooks. 2007. Process variation tolerant 3T1D-based cache architectures. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 15--26. Google ScholarDigital Library
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55. Google ScholarDigital Library
T. C. Mowry, M. S. Lam, and A. Gupta. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. 62--73. Google ScholarDigital Library
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report HPL-2009-85. HP Laboratories, Palo Alto, CA.Google Scholar
P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Design and Test Conference. 7--11. Google ScholarDigital Library
M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture. 24--33. Google ScholarDigital Library
M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Design Automation Conference. 690--695. Google ScholarDigital Library
M. Rasquinha, D. Choudhary, S. Chatterjee, S. Mukhopadhyay, and S. Yalamanchili. 2010. An energy efficient cache design using Spin Torque Transfer (STT) RAM. In Proceedings of the International Symposium on Low Power Electronics and Design. 389--394. Google ScholarDigital Library
N. D. Rizzo, M. DeHerrera, J. Janesky, B. Engel, J. Slaughter, and S. Tehrani. 2002. Thermally activated magnetization reversal in submicron magnetic tunnel junctions for magnetoresistive random access memory. Appl. Phys. Lett. 80, 13, 2335--2337.Google ScholarCross Ref
A. Shaffer, B. Einfalt, and P. Raghavan. 2010. PFFTC: An improved fast Fourier transform for the IBM cell broadband engine. In Proceedings of the International Conference on Computational Science. 1045--1054.Google Scholar
C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011a. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the 17th International Conference on High-Performance Computer Architecture. 50--61. Google ScholarDigital Library
C. W. Smullen, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011b. The STeTSiMS STT-RAM simulation and modeling system. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 318--325. Google ScholarDigital Library
Z. Sun, X. Bi, H. H. Li, W.-F. Wong, Z.-L. Ong, X. Zhu, and W. Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 329--338. Google ScholarDigital Library
M. B. Taylor. 2012. Is dark silicon useful&quest; Harnessing the four horsemen and the coming dark silicon apocalypse. In Proceedings of the 49th Annual Design Automation Conference. 1131--1136. Google ScholarDigital Library
S. Udayakumaran, A. Dominguez, and R. Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511. Google ScholarDigital Library
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, et al. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 205--218. Google ScholarDigital Library
P. Wang, G. Sun, T. Wang, Y. Xie, and J. Cong. 2013. Designing scratchpad memory architecture with emerging STT-RAM memory technologies. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1244--1247.Google Scholar
A. Yanamandra, B. Cover, P. Raghavan, M. J. Irwin, and M. T. Kandemir. 2008. Evaluating the role of scratchpad memories in chip multiprocessors for sparse matrix computations. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing. 1--10.Google Scholar

Index Terms

Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache ...
Read More
State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System
DAC '14: Proceedings of the 51st Annual Design Automation Conference

Multi-level Cell Spin-Transfer Torque Random Access Memory (MLC STT-RAM) is a promising nonvolatile memory technology for high-capacity and high-performance applications. However, the reliability concerns and the complicated access mechanism greatly ...
Read More
Endurance enhancement of write-optimized STT-RAM caches
MEMSYS '19: Proceedings of the International Symposium on Memory Systems

Low density and high leakage power of SRAM are the major setbacks for its scalability. Non-volatile memory (NVM) like spin-transfer torque random access memory (STT-RAM) is a suitable replacement for SRAM at the last level cache (LLC). NVM offers high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 11, Issue 4
January 2015
797 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2695583
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 December 2014
- Accepted: 1 September 2014
- Revised: 1 July 2014
- Received: 1 April 2014
Published in taco Volume 11, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Relaxed-retention
STT-RAM
Scratchpad
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 584
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System

Endurance enhancement of write-optimized STT-RAM caches