skip to main content
10.1145/1815961.1815983acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Rethinking DRAM design and organization for energy-constrained multi-cores

Published:19 June 2010Publication History

ABSTRACT

DRAM vendors have traditionally optimized the cost-per-bit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bit-lines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i)queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric.

This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that re tains the traditional DDRx SDRAMinterface. Selective Bit-line Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (54% on average) by reducing queuing delays. The third innovation further penalizes the cost-per-bit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to build stronger RAID-like fault tolerance, including chipkill-level reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.

References

  1. CACTI: An Integrated Cache and Memory Access Time, Cycle Time, Area, Leakage, and Dynamic Power Model. http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  2. HP Advanced Memory Protection Technologies - Technology Brief. http://www.hp.com.Google ScholarGoogle Scholar
  3. Micron System Power Calculator. http://www.micron.com/support/part info/powercalc.Google ScholarGoogle Scholar
  4. STREAM - Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.Google ScholarGoogle Scholar
  5. Virtutech Simics Full System Simulator. http://www.virtutech.com.Google ScholarGoogle Scholar
  6. M. Abbott et al. Durable Memory RS/6000 System Design. In Proceedings of International Symposium on Fault-Tolerant Computing, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Ahn, J. Leverich, R. S. Schreiber, and N. Jouppi. Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs. IEEE Computer Architecture Letters, vol.7(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber. Future Scaling of Processor-Memory Interfaces. In Proceedings of SC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Bailey et al. The NAS Parallel Benchmarks. International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Barroso. The Price of Performance. Queue, 3(7):48--53, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Barroso and U. Holzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Beamer et al. Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics. In Proceedings of ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Benia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical report, Department of Computer Science, Princeton University, 2008.Google ScholarGoogle Scholar
  14. P. Burns et al. Dynamic Tracking of Page Miss Ratio Curve for Memory Management. In Proceedings of ASPLOS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Cuppu and B. Jacob. Concurrency, Latency, or System Overhead: Which Has the Largest Impact on Uniprocessor DRAM-System Performance. In Proceedings of ISCA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Delaluz et al. DRAM Energy Management Using Software and Hardware Directed Power Mode Control. In Proceedings of HPCA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Delaluz et al. Scheduler-based DRAM Energy Management. In Proceedings of DAC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. J. Dell. A Whitepaper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. Technical report, IBM Microelectronics Division, 1997.Google ScholarGoogle Scholar
  19. X. Fan, H. Zeng, and C. Ellis. Memory Controller Policies for DRAM Power Management. In Proceedings of ISLPED, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Elsevier, 4th edition, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Huang, P. Pillai, and K. G. Shin. Design And Implementation Of Power-Aware Virtual Memory. In Proceedings Of The Annual Conference On Usenix Annual Technical Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Huang, K. Shin, C. Lefurgy, and T. Keller. Improving Energy Efficiency by Making DRAM Less Randomly Accessed. In Proceedings of ISLPED, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Hur and C. Lin. A Comprehensive Approach to DRAM Power Management. In Proceedings of HPCA, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  24. E. Ipek, O. Mutlu, J. Martinez, and R. Caruana. Self Optimizing Memory Controllers: A Reinforcement Learning Approach. In Proceedings of ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Itoh. VLSI Memory Chip Design. Springer, 2001.Google ScholarGoogle Scholar
  26. ITRS. International Technology Roadmap for Semiconductors, 2007 Edition. http://www.itrs.net/Links/2007ITRS/Home2007.htm.Google ScholarGoogle Scholar
  27. B. Jacob, S. W. Ng, and D. T. Wang. Memory Systems - Cache, DRAM, Disk. Elsevier, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Kumanoya et al. An Optimized Design for High-Performance Megabit DRAMs. Electronics and Communications in Japan, 72(8), 2007.Google ScholarGoogle Scholar
  29. O. La. SDRAM having posted CAS function of JEDEC standard, 2002. United States Patent, Number 6483769.Google ScholarGoogle Scholar
  30. A. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power Aware Page Allocation. In Proceedings of ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Lee, O. Mutlu, V. Narasiman, and Y. Patt. Prefetch-Aware DRAM Controllers. In Proceedings of MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Lefurgy et al. Energy management for commercial servers. IEEE Computer, 36(2):39--48, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Lim et al. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. In Proceedings of ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Lim et al. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Locklear. Chipkill Correct Memory Architecture. Technical report, Dell, 2000.Google ScholarGoogle Scholar
  36. G. Loh. 3D-Stacked Memory Architectures for Multi-Core Processors. In Proceedings of ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Meisner, B. Gold, and T. Wenisch. PowerNap: Eliminating Server Idle Power. In Proceedings of ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Micron Technology Inc. Micron DDR2 SDRAM Part MT47H256M8, 2006.Google ScholarGoogle Scholar
  39. N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In Proceedings of MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In Proceedings of MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems. In Proceedings of ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. U. Nawathe et al. An 8-Core 64-Thread 64b Power-Efficient SPARC SoC. In Proceedings of ISSCC, 2007.Google ScholarGoogle Scholar
  43. V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini. DMA-Aware Memory Energy Management. In Proceedings of HPCA, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  44. B. Rogers et al. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling. In Proceedings of ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. V. Romanchenko. Quad-Core Opteron: Architecture and Roadmaps. http://www.digital-daily.com/cpu/quad core opteron.Google ScholarGoogle Scholar
  46. B. Schroeder, E. Pinheiro, and W. Weber. DRAM Errors in the Wild: A Large-Scale Field Study. In Proceedings of SIGMETRICS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement. In Proceedings of ASPLOS-XV, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. R. Swinburne. Intel Core i7 - Nehalem Architecture Dive. http://www.bit-tech.net/hardware/2008/11/03/intel-core-i7-nehalem-architecture-dive/.Google ScholarGoogle Scholar
  49. S. Thoziyoor, N. Muralimanohar, and N. Jouppi. CACTI 5.0. Technical report, HP Laboratories, 2007.Google ScholarGoogle Scholar
  50. U.S. Environmental Protection Agency - Energy Star Program. Report To Congress on Server and Data Center Energy Efficiency - Public Law 109-431, 2007.Google ScholarGoogle Scholar
  51. D. Vantrease et al. Corona: System Implications of Emerging Nanophotonic Technology. In Proceedings of ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. D. Wang et al. DRAMsim: A Memory-System Simulator. In SIGARCH Computer Architecture News, volume 33, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. F. A. Ware and C. Hampel. Improving Power and Data Efficiency with Threaded Memory Modules. In Proceedings of ICCD, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  54. D. Wentzlaff et al. On-Chip Interconnection Architecture of the Tile Processor. In IEEE Micro, volume 22, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Yoon and M. Erez. Virtualized and Flexible ECC for Main Memory. In Proceedings of ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. H. Zheng et al. Mini-Rank: Adaptive DRAM Architecture For Improving Memory Power Efficiency. In Proceedings of MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rethinking DRAM design and organization for energy-constrained multi-cores

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
                June 2010
                520 pages
                ISBN:9781450300537
                DOI:10.1145/1815961
                • cover image ACM SIGARCH Computer Architecture News
                  ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
                  ISCA '10
                  June 2010
                  508 pages
                  ISSN:0163-5964
                  DOI:10.1145/1816038
                  Issue’s Table of Contents

                Copyright © 2010 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 19 June 2010

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate543of3,203submissions,17%

                Upcoming Conference

                ISCA '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader