research-article

Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling

Authors:
Lichen Weng

Florida International University, Miami, FL

Florida International University, Miami, FL
View Profile

,
Chen Liu

Clarkson University, Potsdam, NY

Clarkson University, Potsdam, NY
View Profile

,
Jean-Luc Gaudiot

University of California, Irvine, Irvine, CA

University of California, Irvine, Irvine, CA
View Profile

CF '13: Proceedings of the ACM International Conference on Computing FrontiersMay 2013Article No.: 5Pages 1–10https://doi.org/10.1145/2482767.2482774

Published:14 May 2013Publication History

CF '13: Proceedings of the ACM International Conference on Computing Frontiers

Pages 1–10

ABSTRACT

Complexity in resource allocation grows dramatically as multiple cores and threads are implemented on Multicore Multi-threaded Microprocessors (MMMP). Such complexity is escalated with variations in workload behaviors. In an effort to support a dynamic, adaptive and scalable operating system (OS) scheduling policy for MMMP, architectural strategies are proposed to construct linear models to capture workload behaviors and then schedule threads according to their resource demands. This paper describes the design through three steps: in the first step we convert a static scheduling policy into a dynamic one, which evaluates the thread mapping pattern at runtime. In the second step we employ regression models to ensure that the scheduling policy is capable of responding to the changing behaviors of threads during execution. In the final step we limit the overhead of the proposed policy by adopting a heuristic approach, thus ensure the scalability with the exponential growth of core and thread counts. The experimental results validate our proposed model in terms of throughput, adaptability and scalability. Compared with the baseline static approach, our phase-triggered scheduling policy could achieve up to 29% speedup. We also provide detailed tradeoff study between performance and overhead that system architects can reference to when target systems and specific overheads are presented.

References

D. H. Albonesi, R. Balasubramonian, S. G. Dropsbo, S. Dwarkadas, F. G. Friedman, M. C. Huang, V. Kursun, G. Magklis, M. L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. W. Cook, and S. E. Schuster. Dynamically tuning processor resources with adaptive processing. Computer, 36(12):49--58, dec. 2003. Google ScholarDigital Library
F. J. Cazorla, P. M. W. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero. Predictable performance in SMT processors: synergy between the os and smts. IEEE Transactions on Computers, 55(7):785--799, july 2006. Google ScholarDigital Library
F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernández. Dynamicall controlled resource allocation in SMT processor. In Proc. of MICRO-37, pages 171--182, Portland, OR, dec. 2005. Google ScholarDigital Library
H. Cheng, C. Lin, J. Li, and C. Yang. Memory latency reduction via thread throttling. In Proc. of MICRO-43, pages 53--64, Atlanta, GA, dec. 2010. Google ScholarDigital Library
E. Frachtenberg, G. Feitelson, F. Petrini, and J. Fernandez. Adaptive parallel job scheduling with flexible coscheduling. IEEE Transactions on Parallel and Distributed Systems, 16(11):1066--1077, nov. 2005. Google ScholarDigital Library
M. Gulati and N. Bagherzadeh. Performance study of a multithreaded superscalar microprocessor. In Proc. of HPCA-2, pages 291--301, San Jose, CA, feb. 1996. Google ScholarDigital Library
J. L. Henning. SPEC CPU2000: measuring CPU performance in the new millennium. Computer, 33(7):28--35, 2000. Google ScholarDigital Library
S. Hily and A. Seznec. Contention on 2nd level cache may limit the effectiveness of simultaneous multithreading. Technical report, IRISA, feb. 1997.Google Scholar
D. Kang, C. Liu, and J.-L. Gaudiot. The impact of speculative execution on SMT processors. The International Journal of Parallel Programming, 36:361--385, 2008. Google ScholarDigital Library
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using os observations to improve performance in multicore systems. Micro, IEEE, 28(3):54--66, may-june 2008. Google ScholarDigital Library
C. Liu and J.-L. Gaudiot. The impact of resource sharing control on the design of multicore processors. In Procs. of Algorithms and Architectures for Parallel Processing, volume 5574, pages 315--326, Taipei, Taiwan, jun. 2009. Google ScholarDigital Library
K. J. Nesbit, M. Moreto, F. J. Cazorla, A. Ramirez, M. Valero, and J. E. Smith. Multicore resource management. IEEE Micro, 28(3):6--16, 1999. Google ScholarDigital Library
E. Perelman, G. Hamerly, and B. Calder. Picking statistically valid and early simulation points. In Proc. of PACT-12, pages 244--255, New Orleans, LA, oct. 2003. Google ScholarDigital Library
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.Google Scholar
Y. Sazeides and T. Juan. How to compare the performance of two SMT microarchitectures. In Proc. of IEEE ISPASS, pages 180--183, Tucson, AZ, aug. 2001.Google ScholarCross Ref
A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. ACM Sig-Metrics Performance Evaluation Review, 30:66--76, 2002. Google ScholarDigital Library
T. T. Soong. Fundamentals of probability and statistics for engineers. John Wiley & Sons, Incorporated, Hoboken, NJ, 2004.Google Scholar
D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. Stamm. Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. In Proc. of ISCA-23, pages 191--202, Philadelphia, PA, may 1996. Google ScholarDigital Library
L. Weng and C. Liu. On better performance from scheduling threads according to resource demands in MMMP. In Proc. of 16th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems, pages 339--345, San Diego, CA, sep. 2010. Google ScholarDigital Library
Z. Zhu and Z. Zhang. A performance comparison of dram memory system optimizations for smt processors. In Proc. of HPCA-11, pages 213--224, feb. 2005. Google ScholarDigital Library
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors vis scheduling. In Proc. of ASPLOS-15, pages 129--141, Pittsburgh, PA, mar. 2010. Google ScholarDigital Library
S. Zhuravlev, S. Blagodurov, and A. Fedorova. AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems. In Proc. of PACT-19, pages 249--260, Vienna, Austria, sep. 2010. Google ScholarDigital Library

Index Terms

Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Parallelism via Multithreaded and Multicore CPUs

Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...
Read More
Optimization and Implementation of LBM Benchmark on Multithreaded GPU
DSDE '10: Proceedings of the 2010 International Conference on Data Storage and Data Engineering

With fast development of transistor technology, Graphic Processing Unit(GPU) is increasingly used in the non-graphics applications, and major GPU hardware vendors have introduced software stacks for their own GPUs, such as Brook+ for AMD GPU. Compared ...
Read More
A multithreaded multicore system for embedded media processing
Transactions on high-performance embedded architectures and compilers III

We describe a multicore system targeting media processing applications where the cores are multithreaded. The multithreaded cores use a new type of multithreading that we call Subset Static Interleaved (SSI) multithreading. SSI multithreading combines ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '13: Proceedings of the ACM International Conference on Computing Frontiers
May 2013
302 pages
ISBN:9781450320535
DOI:10.1145/2482767
General Chairs:
Hubertus Franke
IBM, US
,
Alexander Heinecke
TU München, DE
,
Program Chairs:
Krishna Palem
Rice University, US and Nanyang Technological University, SG
,
Eli Upfal
Brown University
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adaptive thread scheduling
execution phases
operating system scheduler
Qualifiers
- research-article
Conference

Acceptance Rates
CF '13 Paper Acceptance Rate26of49submissions,53%Overall Acceptance Rate240of680submissions,35%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 227
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling

CF '13: Proceedings of the ACM International Conference on Computing Frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Parallelism via Multithreaded and Multicore CPUs

Optimization and Implementation of LBM Benchmark on Multithreaded GPU

A multithreaded multicore system for embedded media processing