Abstract
Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous execution, the operating system scheduler must choose the set of jobs to coscheduleThis paper demonstrates that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler. Thus, the full benefits of SMT hardware can only be achieved if the scheduler is aware of thread interactions. Here, a mechanism is presented that allows the scheduler to significantly raise the performance of SMT architectures. This is done without any advance knowledge of a workload's characteristics, using sampling to identify jobs which run well together.We demonstrate an SMT jobscheduler called SOS. SOS combines an overhead-free sample phase which collects information about various possible schedules, and a symbiosis phase which uses that information to predict which schedule will provide the best performance. We show that a small sample of the possible schedules is sufficient to identify a good schedule quickly. On a system with random job arrivals and departures, response time is improved as much as 17% over a schedule which does not incorporate symbiosis.
- 1 http://science.nas.nasa.gov/software/npb.Google Scholar
- 2 A. Agarwal, B. Lira, D. Kranz, and J. Kubiatowicz. APRIL: a processor architecture for multiprocessing. pages 104-114, May 1990. Google ScholarDigital Library
- 3 R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The tera computer system. In International Conference on Supercomputing, pages 1-6, June 1990. Google ScholarDigital Library
- 4 A. Arpaci-Dussean, D. Culler, and A. Mainwaring. Scheduling with implicit information in distributed systems. In Sigmetrics, 1998. Google ScholarDigital Library
- 5 R. Blumofe and C.Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Nov. 1994.Google ScholarDigital Library
- 6 R. Chandra, S. Devine, and B. Verghese. Scheduling and page migration for multiprocessor computer servers. In 6th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994. Google ScholarDigital Library
- 7 S. Chapin. Distributed and multiprocessor scheduling. ACM Computing surveys, Mar. 1996. Google ScholarDigital Library
- 8 H. Gofer, N. Camp, and R. Gomperts. Turnaround vs. throughput: Optimal utilization of a multiprocessor system. In SGI Technical Reports, May 1999.Google Scholar
- 9 J. Delany. Daylight multithreading toolkit interface. http : //www. daylight, com/meetings /mug99/D elany /mt /reentran~ May 1999.Google Scholar
- 10 K. Diefendorff. Compaq chooses smt for alpha. Microprocessor Report, 13(16), Dec. 1999.Google Scholar
- 11 M. Fillo, S. Keckler, W. Dally, N. Carter, A. Chang, Y. Gurevich, and W. Lee. The M-Machine multicomputer. In P8th Annual International Symposium on Microarchitecturc, Nov. 1995. Google ScholarDigital Library
- 12 A. Gupta, A. Ticker, and S. Urushibara. The impact of operating scheduling policies and synchronization methods on the performance of parallel applications. In Signetries, pages 392--403, June 1999.Google Scholar
- 13 B. Hamidzadeh and Y. Atif. Dynamic scheduling of real-time aperiodic tasks on multiprocessor architectures. In Proceedings of the B9th Hawaii International Conference on System Sciences, Oct. 1999. Google ScholarDigital Library
- 14 H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa. An elementary processor architecture with simultaneous instruction issuing from multiple threads. In isca92, pages 136-145, May 1992. Google ScholarDigital Library
- 15 W. Lee, M. Frank, V. Lee, K. Mackenzie, and L. Rudolph. Implications of i/o for gang scheduled workloads. In 3rd Workshop on Job Scheduling Strategies for Parallel Processing, Apr. 1997. Google ScholarDigital Library
- 16 S. Leffler, M. McKusick, M. Karels, and J. Quarterman. The Design and Implementation of the 4.3BSD UNIX Operating System. Addison-Wesley, 1989. Google ScholarDigital Library
- 17 J. Little. A simple proof of the queuing formula L = W. Operations Research, 9:383-387, 1961.Google ScholarDigital Library
- 18 J. L. Lo, S. J. Eggers, J. S. Emer, H. M. Levy, R. L. Stature, and D. Tullsen. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. In A CM Transactions on Computer Systems, Aug. 1997. Google ScholarDigital Library
- 19 H. Patterson and G. Gibson. Exposing i/o concurrency with informed prefetching. In Proceedings of Third International Conference on Parallel and Distributed Information Systems, Sept. 1994. Google ScholarDigital Library
- 20 K. Sehanser, D. Culler, and E. Thorsten. Compiler-controlled multithreading for lenient parallel languages. In Proceedings of FPCA '91 Conference on Functional Programming Languages and Computer Architecture, July 1991. Google ScholarDigital Library
- 21 F. Silva and I. Scherson. Improving throughput and utilization in parallel machines through concurrent gang. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, May 2000. Google ScholarDigital Library
- 22 S. Sistare, N. Nevin, T. Kimball, and E. Loh. Coecheduling mpi jobs using the spin daemon. In SC 99, Nov. 1999.Google Scholar
- 23 A. Snavely and L. Garter. Symbiotic jobscheduling on the MTA. In Workshop on Multi-Threaded Execution, Architecture, and Compilers, Jan. 2000.Google Scholar
- 24 A. Snavely, N. Mitchell, L. Carter, J. Ferrante, and D. Tullsen. Explorations in symbiosis on two multithreaded architectures. In Workshop on Multi-Threaded Execution, Architecture, and Compilers, Jan. 1999.Google Scholar
- 25 P. Sobalvarro, S. Pakin, W. Weihl, and A. Chien. Dynamic coseheduling on workstation clusters. In SRC Technical Note 1997-017, Mar. 1997. Google ScholarDigital Library
- 26 P. G. Sobalvarro and W. E. Weihl. Demand-based eoseheduling of parallel jobs on multiprogrammed multiprocessors. In IPPS95, pages 63-75, Apr. 1995. Google ScholarDigital Library
- 27 K. Thompson. Unix implementation. In The Bell System Technical Journal, July 1978.Google ScholarCross Ref
- 28 K. Thompson and D. Ritchie. The unix time-sharing system. In Communications of the A CM, July 1974. Google ScholarDigital Library
- 29 J. Torrelas, A. Tucker, and A. Gupta. Benefits of cache-affinity scheduling issues for multiprogrammed shared memory multi-processors. In 1993 ACM Sigmetrics, May 1993. Google ScholarDigital Library
- 30 A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed shared memory multiprocessors. In Symposium on Operating Systems Principals, Dec. 1989. Google ScholarDigital Library
- 31 D. Tullsen, S. Eggers, J. Emer, It. Levy, J. Lo, and R. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In ISCA96, pages 191-202, May 1996. Google ScholarDigital Library
- 32 D. Tullsen, S. Eggers, and H. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCAgS, pages 392-403, June 1995. Google ScholarDigital Library
- 33 D. M. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In ~2nd Annual Computer Measurement Group Conference, Dec. 1996.Google Scholar
- 34 R. Vaswani and J. Zahorjan. The implications of cache-affinity on processor scheduling for multiprogrammed, shared memory multiprocessors. In Symposium on Operating Systems Principals, Oct. 1991. Google ScholarDigital Library
- 35 W. Yamamoto and M. Nemirovsky. Increasing superscalar performance through multistreaming. In Conference on Parallel Architectures and Compilation Techniques, pages 49-58, June 1995. Google ScholarDigital Library
Index Terms
- Symbiotic jobscheduling for a simultaneous mutlithreading processor
Recommendations
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systemsSimultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous ...
Symbiotic jobscheduling for a simultaneous multithreaded processor
Special Issue: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (ASPLOS '00)Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous ...
Symbiotic jobscheduling for a simultaneous multithreaded processor
Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous ...
Comments