skip to main content
article
Free Access

Symbiotic jobscheduling for a simultaneous mutlithreading processor

Authors Info & Claims
Published:01 November 2000Publication History
Skip Abstract Section

Abstract

Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there are more jobs in the system than there is hardware to support simultaneous execution, the operating system scheduler must choose the set of jobs to coscheduleThis paper demonstrates that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler. Thus, the full benefits of SMT hardware can only be achieved if the scheduler is aware of thread interactions. Here, a mechanism is presented that allows the scheduler to significantly raise the performance of SMT architectures. This is done without any advance knowledge of a workload's characteristics, using sampling to identify jobs which run well together.We demonstrate an SMT jobscheduler called SOS. SOS combines an overhead-free sample phase which collects information about various possible schedules, and a symbiosis phase which uses that information to predict which schedule will provide the best performance. We show that a small sample of the possible schedules is sufficient to identify a good schedule quickly. On a system with random job arrivals and departures, response time is improved as much as 17% over a schedule which does not incorporate symbiosis.

References

  1. 1 http://science.nas.nasa.gov/software/npb.Google ScholarGoogle Scholar
  2. 2 A. Agarwal, B. Lira, D. Kranz, and J. Kubiatowicz. APRIL: a processor architecture for multiprocessing. pages 104-114, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The tera computer system. In International Conference on Supercomputing, pages 1-6, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 A. Arpaci-Dussean, D. Culler, and A. Mainwaring. Scheduling with implicit information in distributed systems. In Sigmetrics, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 R. Blumofe and C.Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Nov. 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 R. Chandra, S. Devine, and B. Verghese. Scheduling and page migration for multiprocessor computer servers. In 6th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 S. Chapin. Distributed and multiprocessor scheduling. ACM Computing surveys, Mar. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 H. Gofer, N. Camp, and R. Gomperts. Turnaround vs. throughput: Optimal utilization of a multiprocessor system. In SGI Technical Reports, May 1999.Google ScholarGoogle Scholar
  9. 9 J. Delany. Daylight multithreading toolkit interface. http : //www. daylight, com/meetings /mug99/D elany /mt /reentran~ May 1999.Google ScholarGoogle Scholar
  10. 10 K. Diefendorff. Compaq chooses smt for alpha. Microprocessor Report, 13(16), Dec. 1999.Google ScholarGoogle Scholar
  11. 11 M. Fillo, S. Keckler, W. Dally, N. Carter, A. Chang, Y. Gurevich, and W. Lee. The M-Machine multicomputer. In P8th Annual International Symposium on Microarchitecturc, Nov. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 A. Gupta, A. Ticker, and S. Urushibara. The impact of operating scheduling policies and synchronization methods on the performance of parallel applications. In Signetries, pages 392--403, June 1999.Google ScholarGoogle Scholar
  13. 13 B. Hamidzadeh and Y. Atif. Dynamic scheduling of real-time aperiodic tasks on multiprocessor architectures. In Proceedings of the B9th Hawaii International Conference on System Sciences, Oct. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 H. Hirata, K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T. Nishizawa. An elementary processor architecture with simultaneous instruction issuing from multiple threads. In isca92, pages 136-145, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 W. Lee, M. Frank, V. Lee, K. Mackenzie, and L. Rudolph. Implications of i/o for gang scheduled workloads. In 3rd Workshop on Job Scheduling Strategies for Parallel Processing, Apr. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 S. Leffler, M. McKusick, M. Karels, and J. Quarterman. The Design and Implementation of the 4.3BSD UNIX Operating System. Addison-Wesley, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 J. Little. A simple proof of the queuing formula L = W. Operations Research, 9:383-387, 1961.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 J. L. Lo, S. J. Eggers, J. S. Emer, H. M. Levy, R. L. Stature, and D. Tullsen. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. In A CM Transactions on Computer Systems, Aug. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 H. Patterson and G. Gibson. Exposing i/o concurrency with informed prefetching. In Proceedings of Third International Conference on Parallel and Distributed Information Systems, Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 K. Sehanser, D. Culler, and E. Thorsten. Compiler-controlled multithreading for lenient parallel languages. In Proceedings of FPCA '91 Conference on Functional Programming Languages and Computer Architecture, July 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 F. Silva and I. Scherson. Improving throughput and utilization in parallel machines through concurrent gang. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 S. Sistare, N. Nevin, T. Kimball, and E. Loh. Coecheduling mpi jobs using the spin daemon. In SC 99, Nov. 1999.Google ScholarGoogle Scholar
  23. 23 A. Snavely and L. Garter. Symbiotic jobscheduling on the MTA. In Workshop on Multi-Threaded Execution, Architecture, and Compilers, Jan. 2000.Google ScholarGoogle Scholar
  24. 24 A. Snavely, N. Mitchell, L. Carter, J. Ferrante, and D. Tullsen. Explorations in symbiosis on two multithreaded architectures. In Workshop on Multi-Threaded Execution, Architecture, and Compilers, Jan. 1999.Google ScholarGoogle Scholar
  25. 25 P. Sobalvarro, S. Pakin, W. Weihl, and A. Chien. Dynamic coseheduling on workstation clusters. In SRC Technical Note 1997-017, Mar. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 P. G. Sobalvarro and W. E. Weihl. Demand-based eoseheduling of parallel jobs on multiprogrammed multiprocessors. In IPPS95, pages 63-75, Apr. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27 K. Thompson. Unix implementation. In The Bell System Technical Journal, July 1978.Google ScholarGoogle ScholarCross RefCross Ref
  28. 28 K. Thompson and D. Ritchie. The unix time-sharing system. In Communications of the A CM, July 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29 J. Torrelas, A. Tucker, and A. Gupta. Benefits of cache-affinity scheduling issues for multiprogrammed shared memory multi-processors. In 1993 ACM Sigmetrics, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30 A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed shared memory multiprocessors. In Symposium on Operating Systems Principals, Dec. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31 D. Tullsen, S. Eggers, J. Emer, It. Levy, J. Lo, and R. Stamm. Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In ISCA96, pages 191-202, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32 D. Tullsen, S. Eggers, and H. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCAgS, pages 392-403, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33 D. M. Tullsen. Simulation and modeling of a simultaneous multithreading processor. In ~2nd Annual Computer Measurement Group Conference, Dec. 1996.Google ScholarGoogle Scholar
  34. 34 R. Vaswani and J. Zahorjan. The implications of cache-affinity on processor scheduling for multiprogrammed, shared memory multiprocessors. In Symposium on Operating Systems Principals, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35 W. Yamamoto and M. Nemirovsky. Increasing superscalar performance through multistreaming. In Conference on Parallel Architectures and Compilation Techniques, pages 49-58, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Symbiotic jobscheduling for a simultaneous mutlithreading processor

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 35, Issue 11
      Nov. 2000
      269 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/356989
      Issue’s Table of Contents

      Copyright © 2000 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 November 2000

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader