skip to main content
research-article
Free Access

Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy

Published:08 December 2014Publication History
Skip Abstract Section

Abstract

On-chip power consumption is one of the fundamental challenges of current technology scaling. Cache memories consume a sizable part of this power, particularly due to leakage energy. STT-RAM is one of several new memory technologies that have been proposed in order to improve power while preserving performance. It features high density and low leakage, but at the expense of write energy and performance. This article explores the use of STT-RAM--based scratchpad memories that trade nonvolatility in exchange for faster and less energetically expensive accesses, making them feasible for on-chip implementation in embedded systems. A novel multiretention scratchpad partitioning is proposed, featuring multiple storage spaces with different retention, energy, and performance characteristics. A customized compiler-based allocation algorithm suitable for use with such a scratchpad organization is described. Our experiments indicate that a multiretention STT-RAM scratchpad can provide energy savings of 53% with respect to an iso-area, hardware-managed SRAM cache.

References

  1. ARM. 2010. Cortex-R5 Technical Reference Manual. Technical Report DDI-0460D. ARM Limited, Cambridge, UK.Google ScholarGoogle Scholar
  2. O. Avissar, R. Barua, and D. Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Banakar, S. Steinke, L. Bo-Sik, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: Design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Codesign. 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. A. D. Bathen and N. Dutt. 2012. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories. In Proceedings of the 49th Annual Design Automation Conference. 447--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. A. D. Bathen, N. D. Dutt, D. Shin, and S.-S. Lim. 2011. SPMVisor: Dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis. 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Department of Computer Science, Princeton University, Princeton, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, et al. 2011. The gem5 simulator. ACM Comput. Arch. News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Borkar and A. A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5, 67--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp, R. K. Gupta, R. Jhala, and S. Swanson. 2011. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems. 105--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Diao, Z. Li, S. Wang, Y. Ding, A. Panchula, E. Chen, et al. 2007. Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory. J. Phys. Condens. Matter 19, 165209.Google ScholarGoogle ScholarCross RefCross Ref
  11. H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In Proceedings of the 38th International Symposium on Computer Architecture. 365--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Flachs, S. Asano, S. H. Dhong, P. Hotstee, G. Gervais, R. Kim, et al. 2005. A streaming processing unit for a CELL processor. In Proceedings of the IEEE International Solid-State Circuits Conference. 134--135.Google ScholarGoogle ScholarCross RefCross Ref
  13. K. Flautner, N. S. Kim, S. M. Martin, D. Blaauw, and T. N. Mudge. 2002. Drowsy caches: Simple techniques for reducing leakage power. In Proceedings of the 29th International Symposium on Computer Architecture. 148--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Guo, E. Ipek, and T. Soyata. 2010. Resistive computation: Avoiding the power wall with low-leakage, STT-MRAM based computing. In Proceedings of the 37th International Symposium on Computer Architecture. 371--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. 2011. Toward dark silicon servers. IEEE Micro 31, 4, 6--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM Comput. Arch. News 34, 4, 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, et al. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories, Albuquerque, NM.Google ScholarGoogle Scholar
  18. J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2013. Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21, 6, 1094--1102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Hu, Q. Zhuge, C. J. Xue, W.-C. Tseng, and E. H.-M. Sha. 2014. Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors. ACM Trans. Embed. Comput. Syst. 13, 4, 79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ITRS. 2012. International Technology Roadmap for Semiconductors. Retrieved from http://www.itrs.net/Links/2012ITRS/Home2012.htm.Google ScholarGoogle Scholar
  21. A. Jog, A. K. Mishra, C. Xu, Y. Xie, V. Narayanan, R. Iyer, and C. R. Das. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the 49th Annual Design Automation Conference. 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Kultursay, K. Swaminathan, V. Saripalli, V. Narayanan, M. T. Kandemir, and S. Datta. 2012. Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores. In Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis. 245--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Kultursay, M. T. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an energy-efficient main memory alternative. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software. 256--267.Google ScholarGoogle Scholar
  24. B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, et al. 2010. Phase-change technology and the future of main memory. IEEE Micro 30, 1, 131--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Lee, M. Potkonjak, and W. H. Mangione-Smith. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture. 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Li, L. Shi, Q. Li, C. J. Xue, Y. Chen, and Y. Xu. 2013b. Cache coherence enabled adaptive refresh for volatile STT-RAM. In Proceedings of Design, Automation and Test in Europe. 1247--1250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Q. Li, J. Li, L. Shi, C. J. Xue, Y. Chen, and Y. He. 2013a. Compiler-assisted refresh minimization for volatile STT-RAM cache. In Proceedings of the 18th Asia and South Pacific Design Automation Conference. 273--278.Google ScholarGoogle Scholar
  28. Q. Li, Y. Zhao, J. Hu, C. J. Xue, E. Sha, and Y. He. 2012. MGC: Multiple graph-coloring for non-volatile memory based hybrid scratchpad memory. In Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Liang, R. Canal, G.-Y. Wei, and D. Brooks. 2007. Process variation tolerant 3T1D-based cache architectures. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. C. Mowry, M. S. Lam, and A. Gupta. 1992. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. 62--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report HPL-2009-85. HP Laboratories, Palo Alto, CA.Google ScholarGoogle Scholar
  33. P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Design and Test Conference. 7--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture. 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, and A. Parikh. 2001. Dynamic management of scratch-pad memory space. In Proceedings of the 38th Design Automation Conference. 690--695. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Rasquinha, D. Choudhary, S. Chatterjee, S. Mukhopadhyay, and S. Yalamanchili. 2010. An energy efficient cache design using Spin Torque Transfer (STT) RAM. In Proceedings of the International Symposium on Low Power Electronics and Design. 389--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. D. Rizzo, M. DeHerrera, J. Janesky, B. Engel, J. Slaughter, and S. Tehrani. 2002. Thermally activated magnetization reversal in submicron magnetic tunnel junctions for magnetoresistive random access memory. Appl. Phys. Lett. 80, 13, 2335--2337.Google ScholarGoogle ScholarCross RefCross Ref
  38. A. Shaffer, B. Einfalt, and P. Raghavan. 2010. PFFTC: An improved fast Fourier transform for the IBM cell broadband engine. In Proceedings of the International Conference on Computational Science. 1045--1054.Google ScholarGoogle Scholar
  39. C. W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011a. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the 17th International Conference on High-Performance Computer Architecture. 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. W. Smullen, A. Nigam, S. Gurumurthi, and M. R. Stan. 2011b. The STeTSiMS STT-RAM simulation and modeling system. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design. 318--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Z. Sun, X. Bi, H. H. Li, W.-F. Wong, Z.-L. Ong, X. Zhu, and W. Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. B. Taylor. 2012. Is dark silicon useful? Harnessing the four horsemen and the coming dark silicon apocalypse. In Proceedings of the 49th Annual Design Automation Conference. 1131--1136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Udayakumaran, A. Dominguez, and R. Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, et al. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 205--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. P. Wang, G. Sun, T. Wang, Y. Xie, and J. Cong. 2013. Designing scratchpad memory architecture with emerging STT-RAM memory technologies. In Proceedings of the IEEE International Symposium on Circuits and Systems. 1244--1247.Google ScholarGoogle Scholar
  46. A. Yanamandra, B. Cover, P. Raghavan, M. J. Irwin, and M. T. Kandemir. 2008. Evaluating the role of scratchpad memories in chip multiprocessors for sparse matrix computations. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing. 1--10.Google ScholarGoogle Scholar

Index Terms

  1. Volatile STT-RAM Scratchpad Design and Data Allocation for Low Energy

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 11, Issue 4
        January 2015
        797 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/2695583
        Issue’s Table of Contents

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 December 2014
        • Accepted: 1 September 2014
        • Revised: 1 July 2014
        • Received: 1 April 2014
        Published in taco Volume 11, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader