ABSTRACT
DRAM vendors have traditionally optimized the cost-per-bit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bit-lines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i)queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric.
This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that re tains the traditional DDRx SDRAMinterface. Selective Bit-line Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (54% on average) by reducing queuing delays. The third innovation further penalizes the cost-per-bit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to build stronger RAID-like fault tolerance, including chipkill-level reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.
- CACTI: An Integrated Cache and Memory Access Time, Cycle Time, Area, Leakage, and Dynamic Power Model. http://www.hpl.hp.com/research/cacti/.Google Scholar
- HP Advanced Memory Protection Technologies - Technology Brief. http://www.hp.com.Google Scholar
- Micron System Power Calculator. http://www.micron.com/support/part info/powercalc.Google Scholar
- STREAM - Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.Google Scholar
- Virtutech Simics Full System Simulator. http://www.virtutech.com.Google Scholar
- M. Abbott et al. Durable Memory RS/6000 System Design. In Proceedings of International Symposium on Fault-Tolerant Computing, 1994.Google ScholarCross Ref
- J. Ahn, J. Leverich, R. S. Schreiber, and N. Jouppi. Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs. IEEE Computer Architecture Letters, vol.7(1), 2008. Google ScholarDigital Library
- J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber. Future Scaling of Processor-Memory Interfaces. In Proceedings of SC, 2009. Google ScholarDigital Library
- D. Bailey et al. The NAS Parallel Benchmarks. International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.Google ScholarDigital Library
- L. Barroso. The Price of Performance. Queue, 3(7):48--53, 2005. Google ScholarDigital Library
- L. Barroso and U. Holzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool, 2009. Google ScholarDigital Library
- S. Beamer et al. Re-Architecting DRAM Memory Systems with Monolithically Integrated Silicon Photonics. In Proceedings of ISCA, 2010. Google ScholarDigital Library
- C. Benia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical report, Department of Computer Science, Princeton University, 2008.Google Scholar
- P. Burns et al. Dynamic Tracking of Page Miss Ratio Curve for Memory Management. In Proceedings of ASPLOS, 2004. Google ScholarDigital Library
- V. Cuppu and B. Jacob. Concurrency, Latency, or System Overhead: Which Has the Largest Impact on Uniprocessor DRAM-System Performance. In Proceedings of ISCA, 2001. Google ScholarDigital Library
- V. Delaluz et al. DRAM Energy Management Using Software and Hardware Directed Power Mode Control. In Proceedings of HPCA, 2001. Google ScholarDigital Library
- V. Delaluz et al. Scheduler-based DRAM Energy Management. In Proceedings of DAC, 2002. Google ScholarDigital Library
- T. J. Dell. A Whitepaper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. Technical report, IBM Microelectronics Division, 1997.Google Scholar
- X. Fan, H. Zeng, and C. Ellis. Memory Controller Policies for DRAM Power Management. In Proceedings of ISLPED, 2001. Google ScholarDigital Library
- J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Elsevier, 4th edition, 2007. Google ScholarDigital Library
- H. Huang, P. Pillai, and K. G. Shin. Design And Implementation Of Power-Aware Virtual Memory. In Proceedings Of The Annual Conference On Usenix Annual Technical Conference, 2003. Google ScholarDigital Library
- H. Huang, K. Shin, C. Lefurgy, and T. Keller. Improving Energy Efficiency by Making DRAM Less Randomly Accessed. In Proceedings of ISLPED, 2005. Google ScholarDigital Library
- I. Hur and C. Lin. A Comprehensive Approach to DRAM Power Management. In Proceedings of HPCA, 2008.Google ScholarCross Ref
- E. Ipek, O. Mutlu, J. Martinez, and R. Caruana. Self Optimizing Memory Controllers: A Reinforcement Learning Approach. In Proceedings of ISCA, 2008. Google ScholarDigital Library
- K. Itoh. VLSI Memory Chip Design. Springer, 2001.Google Scholar
- ITRS. International Technology Roadmap for Semiconductors, 2007 Edition. http://www.itrs.net/Links/2007ITRS/Home2007.htm.Google Scholar
- B. Jacob, S. W. Ng, and D. T. Wang. Memory Systems - Cache, DRAM, Disk. Elsevier, 2008. Google ScholarDigital Library
- M. Kumanoya et al. An Optimized Design for High-Performance Megabit DRAMs. Electronics and Communications in Japan, 72(8), 2007.Google Scholar
- O. La. SDRAM having posted CAS function of JEDEC standard, 2002. United States Patent, Number 6483769.Google Scholar
- A. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power Aware Page Allocation. In Proceedings of ASPLOS, 2000. Google ScholarDigital Library
- C. Lee, O. Mutlu, V. Narasiman, and Y. Patt. Prefetch-Aware DRAM Controllers. In Proceedings of MICRO, 2008. Google ScholarDigital Library
- C. Lefurgy et al. Energy management for commercial servers. IEEE Computer, 36(2):39--48, 2003. Google ScholarDigital Library
- K. Lim et al. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments. In Proceedings of ISCA, 2008. Google ScholarDigital Library
- K. Lim et al. Disaggregated Memory for Expansion and Sharing in Blade Servers. In Proceedings of ISCA, 2009. Google ScholarDigital Library
- D. Locklear. Chipkill Correct Memory Architecture. Technical report, Dell, 2000.Google Scholar
- G. Loh. 3D-Stacked Memory Architectures for Multi-Core Processors. In Proceedings of ISCA, 2008. Google ScholarDigital Library
- D. Meisner, B. Gold, and T. Wenisch. PowerNap: Eliminating Server Idle Power. In Proceedings of ASPLOS, 2009. Google ScholarDigital Library
- Micron Technology Inc. Micron DDR2 SDRAM Part MT47H256M8, 2006.Google Scholar
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In Proceedings of MICRO, 2007. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In Proceedings of MICRO, 2007. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Parallelism-Aware Batch Scheduling: Enhancing Both Performance and Fairness of Shared DRAM Systems. In Proceedings of ISCA, 2008. Google ScholarDigital Library
- U. Nawathe et al. An 8-Core 64-Thread 64b Power-Efficient SPARC SoC. In Proceedings of ISSCC, 2007.Google Scholar
- V. Pandey, W. Jiang, Y. Zhou, and R. Bianchini. DMA-Aware Memory Energy Management. In Proceedings of HPCA, 2006.Google ScholarCross Ref
- B. Rogers et al. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling. In Proceedings of ISCA, 2009. Google ScholarDigital Library
- V. Romanchenko. Quad-Core Opteron: Architecture and Roadmaps. http://www.digital-daily.com/cpu/quad core opteron.Google Scholar
- B. Schroeder, E. Pinheiro, and W. Weber. DRAM Errors in the Wild: A Large-Scale Field Study. In Proceedings of SIGMETRICS, 2009. Google ScholarDigital Library
- K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis. Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement. In Proceedings of ASPLOS-XV, 2010. Google ScholarDigital Library
- R. Swinburne. Intel Core i7 - Nehalem Architecture Dive. http://www.bit-tech.net/hardware/2008/11/03/intel-core-i7-nehalem-architecture-dive/.Google Scholar
- S. Thoziyoor, N. Muralimanohar, and N. Jouppi. CACTI 5.0. Technical report, HP Laboratories, 2007.Google Scholar
- U.S. Environmental Protection Agency - Energy Star Program. Report To Congress on Server and Data Center Energy Efficiency - Public Law 109-431, 2007.Google Scholar
- D. Vantrease et al. Corona: System Implications of Emerging Nanophotonic Technology. In Proceedings of ISCA, 2008. Google ScholarDigital Library
- D. Wang et al. DRAMsim: A Memory-System Simulator. In SIGARCH Computer Architecture News, volume 33, September 2005. Google ScholarDigital Library
- F. A. Ware and C. Hampel. Improving Power and Data Efficiency with Threaded Memory Modules. In Proceedings of ICCD, 2006.Google ScholarCross Ref
- D. Wentzlaff et al. On-Chip Interconnection Architecture of the Tile Processor. In IEEE Micro, volume 22, 2007. Google ScholarDigital Library
- D. Yoon and M. Erez. Virtualized and Flexible ECC for Main Memory. In Proceedings of ASPLOS, 2010. Google ScholarDigital Library
- H. Zheng et al. Mini-Rank: Adaptive DRAM Architecture For Improving Memory Power Efficiency. In Proceedings of MICRO, 2008. Google ScholarDigital Library
Index Terms
- Rethinking DRAM design and organization for energy-constrained multi-cores
Recommendations
Rethinking DRAM design and organization for energy-constrained multi-cores
ISCA '10DRAM vendors have traditionally optimized the cost-per-bit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bit-lines in many DRAM chips, ...
Re-architecting DRAM memory systems with monolithically integrated silicon photonics
ISCA '10The performance of future manycore processors will only scale with the number of integrated cores if there is a corresponding increase in memory bandwidth. Projected scaling of electrical DRAM architectures appears unlikely to suffice, being constrained ...
XED: exposing on-die error detection information for strong memory reliability
ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitectureLarge-granularity memory failures continue to be a critical impediment to system reliability. To make matters worse, as DRAM scales to smaller nodes, the frequency of unreliable bits in DRAM chips continues to increase. To mitigate such scaling-related ...
Comments