Abstract
On-chip power consumption is one of the fundamental challenges of current technology scaling. Cache memories consume a sizable part of this power, particularly due to leakage energy. STT-RAM is one of several new memory technologies that have been proposed in order to improve power while preserving performance. It features high density and low leakage, but at the expense of write energy and performance. This article explores the use of STT-RAM--based scratchpad memories that trade nonvolatility in exchange for faster and less energetically expensive accesses, making them feasible for on-chip implementation in embedded systems. A novel multiretention scratchpad partitioning is proposed, featuring multiple storage spaces with different retention, energy, and performance characteristics. A customized compiler-based allocation algorithm suitable for use with such a scratchpad organization is described. Our experiments indicate that a multiretention STT-RAM scratchpad can provide energy savings of 53% with respect to an iso-area, hardware-managed SRAM cache.
- ARM. 2010. Cortex-R5 Technical Reference Manual. Technical Report DDI-0460D. ARM Limited, Cambridge, UK.Google Scholar
- O. Avissar, R. Barua, and D. Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26. Google ScholarDigital Library
- R. Banakar, S. Steinke, L. Bo-Sik, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. 73--78. Google ScholarDigital Library
- L. A. D. Bathen and N. Dutt. 2012. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories. In Proceedings of the 49th Annual Design Automation Conference. 447--452. Google ScholarDigital Library
- L. A. D. Bathen, N. D. Dutt, D. Shin, and S.-S. Lim. 2011. SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis. 79--88. Google ScholarDigital Library
- C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Department of Computer Science, Princeton University, Princeton, NJ. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, et al. 2011. The gem5 simulator. ACM Comput. Arch. News 39, 2, 1--7. Google ScholarDigital Library
- S. Borkar and A. A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5, 67--77. Google ScholarDigital Library
- J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp, R. K. Gupta, R. Jhala, and S. Swanson. 2011. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems. 105--118. Google ScholarDigital Library
- Z. Diao, Z. Li, S. Wang, Y. Ding, A. Panchula, E. Chen, et al. 2007. Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory. J. Phys. Condens. Matter 19, 165209.Google ScholarCross Ref
- H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th International Symposium on Computer Architecture. 365--376. Google ScholarDigital Library
- B. Flachs, S. Asano, S. H. Dhong, P. Hotstee, G. Gervais, R. Kim, et al. 2005. A streaming processing unit for a CELL processor. In Proceedings of the IEEE International Solid-State Circuits Conference. 134--135.Google ScholarCross Ref
- K. Flautner, N. S. Kim, S. M. Martin, D. Blaauw, and T. N. Mudge. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of the 29th International Symposium on Computer Architecture. 148--157. Google ScholarDigital Library
- X. Guo, E. Ipek, and T. Soyata. 2010. Resistive computation: Avoiding the power wall with low-leakage, STT-MRAM based computing. In Proceedings of the 37th International Symposium on Computer Architecture. 371--382. Google ScholarDigital Library
- N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. 2011. Toward dark silicon servers. IEEE Micro 31, 4, 6--15. Google ScholarDigital Library
- J. L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM Comput. Arch. News 34, 4, 1--17. Google ScholarDigital Library
- M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, et al. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories, Albuquerque, NM.Google Scholar
- J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2013. Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21, 6, 1094--1102.Google ScholarDigital Library
- J. Hu, Q. Zhuge, C. J. Xue, W.-C. Tseng, and E. H.-M. Sha. 2014. Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors. ACM Trans. Embed. Comput. Syst. 13, 4, 79. Google ScholarDigital Library
- ITRS. 2012. International Technology Roadmap for Semiconductors. Retrieved from http://www.itrs.net/Links/2012ITRS/Home2012.htm.Google Scholar
- A. Jog, A. K. Mishra, C. Xu, Y. Xie, V. Narayanan, R. Iyer, and C. R. Das. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th Annual Design Automation Conference. 243--252. Google ScholarDigital Library
- E. Kultursay, K. Swaminathan, V. Saripalli, V. Narayanan, M. T. Kandemir, and S. Datta. 2012. Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores. In Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis. 245--254. Google ScholarDigital Library
- E. Kultursay, M. T. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software. 256--267.Google Scholar
- B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, et al. 2010. Phase-change technology and the future of main memory. IEEE Micro 30, 1, 131--141. Google ScholarDigital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture. 330--335. Google ScholarDigital Library
- J. Li, L. Shi, Q. Li, C. J. Xue, Y. Chen, and Y. Xu. 2013b. Cache coherence enabled adaptive refresh for volatile STT-RAM. In Proceedings of Design, Automation and Test in Europe. 1247--1250. Google ScholarDigital Library
- Q. Li, J. Li, L. Shi, C. J. Xue, Y. Chen, and Y. He. 2013a. Compiler-assisted refresh minimization for volatile STT-RAM cache. In Proceedings of the 18th Asia and South Pacific Design Automation Conference. 273--278.Google Scholar
- Q. Li, Y. Zhao, J. Hu, C. J. Xue, E. Sha, and Y. He. 2012. MGC: Multiple graph-coloring for non-volatile memory based hybrid scratchpad memory. In Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures. 17--24. Google ScholarDigital Library
- X. Liang, R. Canal, G.-Y. Wei, and D. Brooks. 2007. Process variation tolerant 3T1D-based cache architectures. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 15--26. Google ScholarDigital Library
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55. Google ScholarDigital Library
- T. C. Mowry, M. S. Lam, and A. Gupta. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. 62--73. Google ScholarDigital Library
- N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report HPL-2009-85. HP Laboratories, Palo Alto, CA.Google Scholar
- P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Design and Test Conference. 7--11. Google ScholarDigital Library
- M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture. 24--33. Google ScholarDigital Library
- M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Design Automation Conference. 690--695. Google ScholarDigital Library
- M. Rasquinha, D. Choudhary, S. Chatterjee, S. Mukhopadhyay, and S. Yalamanchili. 2010. An energy efficient cache design using Spin Torque Transfer (STT) RAM. In Proceedings of the International Symposium on Low Power Electronics and Design. 389--394. Google ScholarDigital Library
- N. D. Rizzo, M. DeHerrera, J. Janesky, B. Engel, J. Slaughter, and S. Tehrani. 2002. Thermally activated magnetization reversal in submicron magnetic tunnel junctions for magnetoresistive random access memory. Appl. Phys. Lett. 80, 13, 2335--2337.Google ScholarCross Ref
- A. Shaffer, B. Einfalt, and P. Raghavan. 2010. PFFTC: An improved fast Fourier transform for the IBM cell broadband engine. In Proceedings of the International Conference on Computational Science. 1045--1054.Google Scholar
- C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011a. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the 17th International Conference on High-Performance Computer Architecture. 50--61. Google ScholarDigital Library
- C. W. Smullen, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011b. The STeTSiMS STT-RAM simulation and modeling system. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 318--325. Google ScholarDigital Library
- Z. Sun, X. Bi, H. H. Li, W.-F. Wong, Z.-L. Ong, X. Zhu, and W. Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 329--338. Google ScholarDigital Library
- M. B. Taylor. 2012. Is dark silicon useful? Harnessing the four horsemen and the coming dark silicon apocalypse. In Proceedings of the 49th Annual Design Automation Conference. 1131--1136. Google ScholarDigital Library
- S. Udayakumaran, A. Dominguez, and R. Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511. Google ScholarDigital Library
- G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, et al. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 205--218. Google ScholarDigital Library
- P. Wang, G. Sun, T. Wang, Y. Xie, and J. Cong. 2013. Designing scratchpad memory architecture with emerging STT-RAM memory technologies. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1244--1247.Google Scholar
- A. Yanamandra, B. Cover, P. Raghavan, M. J. Irwin, and M. T. Kandemir. 2008. Evaluating the role of scratchpad memories in chip multiprocessors for sparse matrix computations. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing. 1--10.Google Scholar
Index Terms
- Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy
Recommendations
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache ...
State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System
DAC '14: Proceedings of the 51st Annual Design Automation ConferenceMulti-level Cell Spin-Transfer Torque Random Access Memory (MLC STT-RAM) is a promising nonvolatile memory technology for high-capacity and high-performance applications. However, the reliability concerns and the complicated access mechanism greatly ...
Endurance enhancement of write-optimized STT-RAM caches
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsLow density and high leakage power of SRAM are the major setbacks for its scalability. Non-volatile memory (NVM) like spin-transfer torque random access memory (STT-RAM) is a suitable replacement for SRAM at the last level cache (LLC). NVM offers high ...
Comments