ABSTRACT
Complexity in resource allocation grows dramatically as multiple cores and threads are implemented on Multicore Multi-threaded Microprocessors (MMMP). Such complexity is escalated with variations in workload behaviors. In an effort to support a dynamic, adaptive and scalable operating system (OS) scheduling policy for MMMP, architectural strategies are proposed to construct linear models to capture workload behaviors and then schedule threads according to their resource demands. This paper describes the design through three steps: in the first step we convert a static scheduling policy into a dynamic one, which evaluates the thread mapping pattern at runtime. In the second step we employ regression models to ensure that the scheduling policy is capable of responding to the changing behaviors of threads during execution. In the final step we limit the overhead of the proposed policy by adopting a heuristic approach, thus ensure the scalability with the exponential growth of core and thread counts. The experimental results validate our proposed model in terms of throughput, adaptability and scalability. Compared with the baseline static approach, our phase-triggered scheduling policy could achieve up to 29% speedup. We also provide detailed tradeoff study between performance and overhead that system architects can reference to when target systems and specific overheads are presented.
- D. H. Albonesi, R. Balasubramonian, S. G. Dropsbo, S. Dwarkadas, F. G. Friedman, M. C. Huang, V. Kursun, G. Magklis, M. L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P. W. Cook, and S. E. Schuster. Dynamically tuning processor resources with adaptive processing. Computer, 36(12):49--58, dec. 2003. Google ScholarDigital Library
- F. J. Cazorla, P. M. W. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero. Predictable performance in SMT processors: synergy between the os and smts. IEEE Transactions on Computers, 55(7):785--799, july 2006. Google ScholarDigital Library
- F. J. Cazorla, A. Ramirez, M. Valero, and E. Fernández. Dynamicall controlled resource allocation in SMT processor. In Proc. of MICRO-37, pages 171--182, Portland, OR, dec. 2005. Google ScholarDigital Library
- H. Cheng, C. Lin, J. Li, and C. Yang. Memory latency reduction via thread throttling. In Proc. of MICRO-43, pages 53--64, Atlanta, GA, dec. 2010. Google ScholarDigital Library
- E. Frachtenberg, G. Feitelson, F. Petrini, and J. Fernandez. Adaptive parallel job scheduling with flexible coscheduling. IEEE Transactions on Parallel and Distributed Systems, 16(11):1066--1077, nov. 2005. Google ScholarDigital Library
- M. Gulati and N. Bagherzadeh. Performance study of a multithreaded superscalar microprocessor. In Proc. of HPCA-2, pages 291--301, San Jose, CA, feb. 1996. Google ScholarDigital Library
- J. L. Henning. SPEC CPU2000: measuring CPU performance in the new millennium. Computer, 33(7):28--35, 2000. Google ScholarDigital Library
- S. Hily and A. Seznec. Contention on 2nd level cache may limit the effectiveness of simultaneous multithreading. Technical report, IRISA, feb. 1997.Google Scholar
- D. Kang, C. Liu, and J.-L. Gaudiot. The impact of speculative execution on SMT processors. The International Journal of Parallel Programming, 36:361--385, 2008. Google ScholarDigital Library
- R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using os observations to improve performance in multicore systems. Micro, IEEE, 28(3):54--66, may-june 2008. Google ScholarDigital Library
- C. Liu and J.-L. Gaudiot. The impact of resource sharing control on the design of multicore processors. In Procs. of Algorithms and Architectures for Parallel Processing, volume 5574, pages 315--326, Taipei, Taiwan, jun. 2009. Google ScholarDigital Library
- K. J. Nesbit, M. Moreto, F. J. Cazorla, A. Ramirez, M. Valero, and J. E. Smith. Multicore resource management. IEEE Micro, 28(3):6--16, 1999. Google ScholarDigital Library
- E. Perelman, G. Hamerly, and B. Calder. Picking statistically valid and early simulation points. In Proc. of PACT-12, pages 244--255, New Orleans, LA, oct. 2003. Google ScholarDigital Library
- J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.Google Scholar
- Y. Sazeides and T. Juan. How to compare the performance of two SMT microarchitectures. In Proc. of IEEE ISPASS, pages 180--183, Tucson, AZ, aug. 2001.Google ScholarCross Ref
- A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. ACM Sig-Metrics Performance Evaluation Review, 30:66--76, 2002. Google ScholarDigital Library
- T. T. Soong. Fundamentals of probability and statistics for engineers. John Wiley & Sons, Incorporated, Hoboken, NJ, 2004.Google Scholar
- D. M. Tullsen, S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. Stamm. Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. In Proc. of ISCA-23, pages 191--202, Philadelphia, PA, may 1996. Google ScholarDigital Library
- L. Weng and C. Liu. On better performance from scheduling threads according to resource demands in MMMP. In Proc. of 16th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems, pages 339--345, San Diego, CA, sep. 2010. Google ScholarDigital Library
- Z. Zhu and Z. Zhang. A performance comparison of dram memory system optimizations for smt processors. In Proc. of HPCA-11, pages 213--224, feb. 2005. Google ScholarDigital Library
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors vis scheduling. In Proc. of ASPLOS-15, pages 129--141, Pittsburgh, PA, mar. 2010. Google ScholarDigital Library
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. AKULA: a toolset for experimenting and developing thread placement algorithms on multicore systems. In Proc. of PACT-19, pages 249--260, Vienna, Austria, sep. 2010. Google ScholarDigital Library
Index Terms
- Scheduling optimization in multicore multithreaded microprocessors through dynamic modeling
Recommendations
Parallelism via Multithreaded and Multicore CPUs
Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...
Optimization and Implementation of LBM Benchmark on Multithreaded GPU
DSDE '10: Proceedings of the 2010 International Conference on Data Storage and Data EngineeringWith fast development of transistor technology, Graphic Processing Unit(GPU) is increasingly used in the non-graphics applications, and major GPU hardware vendors have introduced software stacks for their own GPUs, such as Brook+ for AMD GPU. Compared ...
A multithreaded multicore system for embedded media processing
Transactions on high-performance embedded architectures and compilers IIIWe describe a multicore system targeting media processing applications where the cores are multithreaded. The multithreaded cores use a new type of multithreading that we call Subset Static Interleaved (SSI) multithreading. SSI multithreading combines ...
Comments