research-article

Parallel application memory scheduling

Authors:
Eiman Ebrahimi

The University of Texas at Austin

The University of Texas at Austin
View Profile

,
Rustam Miftakhutdinov

The University of Texas at Austin

The University of Texas at Austin
View Profile

,
Chris Fallin

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Chang Joo Lee

Intel Corporation

Intel Corporation
View Profile

,
José A. Joao

The University of Texas at Austin

The University of Texas at Austin
View Profile

,
Onur Mutlu

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Yale N. Patt

The University of Texas at Austin

The University of Texas at Austin
View Profile

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2011Pages 362–373https://doi.org/10.1145/2155620.2155663

Published:03 December 2011Publication History

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 362–373

ABSTRACT

A primary use of chip-multiprocessor (CMP) systems is to speed up a single application by exploiting thread-level parallelism. In such systems, threads may slow each other down by issuing memory requests that interfere in the shared memory subsystem. This inter-thread memory system interference can significantly degrade parallel application performance. Better memory request scheduling may mitigate such performance degradation. However, previously proposed memory scheduling algorithms for CMPs are designed for multi-programmed workloads where each core runs an independent application, and thus do not take into account the inter-dependent nature of threads in a parallel application.

In this paper, we propose a memory scheduling algorithm designed specifically for parallel applications. Our approach has two main components, targeting two common synchronization primitives that cause inter-dependence of threads: locks and barriers. First, the runtime system estimates threads holding the locks that cause the most serialization as the set of limiter threads, which are prioritized by the memory scheduler. Second, the memory scheduler shuffles thread priorities to reduce the time threads take to reach the barrier. We show that our memory scheduler speeds up a set of memory-intensive parallel applications by 12.6% compared to the best previous memory scheduling technique.

References

A. Bhattacharjee and M. Martonosi. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In ISCA, 2009. Google ScholarDigital Library
Q. Cai et al. Meeting points: Using thread criticality to adapt multicore hardware to parallel regions. In PACT, 2008. Google ScholarDigital Library
S. Chen et al. Scheduling threads for constructive cache sharing on CMPs. In SPAA, 2007. Google ScholarDigital Library
H.-Y. Cheng et al. Memory latency reduction via thread throttling. In MICRO, 2010. Google ScholarDigital Library
D. H. Bailey et al. NAS parallel benchmarks. Technical report, NASA Ames Research Center, 1994.Google Scholar
E. Ebrahimi et al. Coordinated control of multiple prefetchers in multi-core systems. In MICRO, 2009. Google ScholarDigital Library
E. Ebrahimi et al. Fairness via source throttling: A configrable and high-performance fairness substrate for multi-core memory systems. In ASPLOS, 2010. Google ScholarDigital Library
E. Ebrahimi et al. Parallel application memory scheduling. Technical Report TR-HPS-2011-001, UT-Austin, Oct. 2011.Google ScholarDigital Library
E. Ebrahimi et al. Prefetch-aware shared resource management for multi-core systems. In ISCA, 2011. Google ScholarDigital Library
D. G. Feitelson and L. Rudolph. Gang scheduling performance benefits for fine-grain synchronization. JPDC, 16(4):306--318, 1992.Google ScholarCross Ref
I. Hur and C. Lin. Adaptive history-based memory scheduler. In MICRO, 2004. Google ScholarDigital Library
E. Ipek et al. Self-optimizing memory controllers: A reinforcement learning approach. In MICRO, 2008.Google ScholarDigital Library
R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS, 2004. Google ScholarDigital Library
R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007. Google ScholarDigital Library
S. Kim et al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google ScholarDigital Library
Y. Kim et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA, 2010.Google Scholar
Y. Kim et al. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO, 2010. Google ScholarDigital Library
J. Li et al. The thrifty barrier: energy-aware synchronization in shared memory multiprocessors. 2004. Google ScholarDigital Library
Y.-J. Lin et al. Hierarchical memory scheduling for multimedia MPSoCs. In ICCAD, 2010. Google ScholarDigital Library
C. Liu et al. Exploiting barriers to optimize power consumption of CMPs. In IPDPS, 2005. Google ScholarDigital Library
S. A. McKee et al. Dynamic access ordering for streamed computations. IEEE TC, 49:1255--1271, Nov. 2000. Google ScholarDigital Library
Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google Scholar
T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarDigital Library
T. Moscibroda and O. Mutlu. Distributed order scheduling and its application to multi-core DRAM controllers. In PODC, 2008. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA, 2008. Google ScholarDigital Library
K. J. Nesbit et al. Fair queuing memory systems. In MICRO, 2006. Google ScholarDigital Library
J. K. Ousterhout. Scheduling techniques for concurrent systems. In IEEE Distributed Computer Systems, 1982.Google Scholar
N. Rafique et al. Effective management of DRAM bandwidth in multicore processors,.Google Scholar
C. Ranger et al. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA, 2007. Google ScholarDigital Library
S. Rixner et al. Memory access scheduling. In ISCA, 2000. Google ScholarDigital Library
M. A. Suleman et al. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, 2009. Google ScholarDigital Library
W. Zuravleff and T. Robinbson. Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. U. S. Patent Number 5,630,096, 1997.Google Scholar

Index Terms

Parallel application memory scheduling
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

Adaptive Time-Based Least Memory Intensive Scheduling
MCSOC '15: Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip

DRAM memory is a major resource shared in multi-core system, hence memory requests from different applications interfere with each other. Therefore, different applications running together on the same chip can experience extremely different memory ...
Read More
Time-Based Least Memory Intensive Scheduling
MCSOC '14: Proceedings of the 2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs

In the modern chip-multiprocessor system, main memory is a shared resource among multiple concurrently executing threads/applications. The memory scheduling algorithms are developed to resolve memory contention by arbitrating memory access in such a way ...
Read More
Thread Cluster Memory Scheduling

Memory schedulers in multicore systems should carefully schedule memory requests from different threads to ensure high system performance and fair, fast progress of each thread. No existing memory scheduler provides both the highest system performance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
December 2011
519 pages
ISBN:9781450310536
DOI:10.1145/2155620
Conference Chair:
Carlo Galuzzi
Technische Universiteit Delft, The Netherlands
,
General Chair:
Luigi Carro
Universidade Federal do Rio Grande do Sul, Brasil
,
Program Chairs:
Andreas Moshovos
University of Toronto, Canada
,
Milos Prvulovic
Georgia Institute of Technology, United States
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 December 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CMP
memory controller
memory interference
multi-core
parallel applications
shared resources
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 124
  Total Citations
  View Citations
- 838
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallel application memory scheduling

MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive Time-Based Least Memory Intensive Scheduling

Time-Based Least Memory Intensive Scheduling

Thread Cluster Memory Scheduling