Skip to main content

A Job Self-scheduling Policy for HPC Infrastructures

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4942))

Abstract

The number of distributed high performance computing architectures has increased exponentially these last years. Thus, systems composed by several computational resources provided by different Research centers and Universities have become very popular. Job scheduling policies have been adapted to these new scenarios in which several independent resources have to be managed. New policies have been designed to take into account issues like multi-cluster environments, heterogeneous systems and the geographical distribution of the resources.

Several centralized scheduling solutions have been proposed in the literature for these environments, such as centralized schedulers, centralized queues and global controllers. These approaches use a unique scheduling entity responsible for scheduling all the jobs that are submitted by the users.

In this paper we propose the usage of self-scheduling techniques for dispatching the jobs that are submitted to a set of distributed computational hosts that are managed by independent schedulers (such as MOAB or LoadLeveler). It is a non-centralized and job-guided scheduling policy whose main goal is to optimize the job wait time. Thus, the scheduling decisions are done independently for each job instead of using a global policy where all the jobs are considered. On top of this, as a part of the proposed solution, we also demonstrate how the usage of job wait time prediction techniques can substantially improve the performance obtained in the described architecture.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bansal, N., Harchol-Balter, M.: Analysis of SRPT scheduling: investigating unfairness (2001)

    Google Scholar 

  2. Berman, F., Wolski, R.: The apples project: A status report (1997)

    Google Scholar 

  3. Berman, F., Wolski, R.: Scheduling from the perspective of the application. pp. 100–111 (1996)

    Google Scholar 

  4. Calzarossa, M., Haring, G., Kotsis, G., Merlo, A., Tessera, D.: A hierarchical approach to workload characterization for parallel systems. In: Hertzberger, B., Serazzi, G. (eds.) HPCN-Europe 1995. LNCS, vol. 919, pp. 102–109. Springer, Heidelberg (1995)

    Chapter  Google Scholar 

  5. Calzarossa, M., Massari, L., Tessera, D.: Workload characterization issues and methodologies. In: Reiser, M., Haring, G., Lindemann, C. (eds.) Performance Evaluation: Origins and Directions. LNCS, vol. 1769, pp. 459–482. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  6. Chiang, S.-H., Arpaci-Dusseau, A.C., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Ann. Workshop Workload Characterization (2001)

    Google Scholar 

  8. Cirne, W., Berman, F.: A model for moldable supercomputer jobs. In: 15th Intl. Parallel and Distributed Processing Symp. (2001)

    Google Scholar 

  9. Downey, A.B.: A parallel workload model and its implications for processor allocation. In: 6th Intl. Symp. High Performance Distributed Comput (August 1997)

    Google Scholar 

  10. Downey, A.B.: Using queue time predictions for processor allocation. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 35–57. Springer, Heidelberg (1997)

    Google Scholar 

  11. Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global grid computing for job scheduling. In: 5th IEEE/ACM International Workshop on Grid Computing (2004)

    Google Scholar 

  12. Feitelson, D.G.: Packing schemes for gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 89–110. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  13. Feitelson, D.D.G.: Parallel workload archive (2006)

    Google Scholar 

  14. Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the nasa ames ipsc/860. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995)

    Google Scholar 

  16. Feitelson, D.G., Rudolph, L.: Workload evolution on the cornell theory center ibm sp2. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 27–40. Springer, Heidelberg (1996)

    Google Scholar 

  17. Feitelson, D.G., Rudolph, L.: Metrics and benchmarking for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling - a status report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, p. 9. Springer, Heidelberg (2005)

    Google Scholar 

  19. Feitelson, D.G., Weil, A.: Utilization and predictability in scheduling the ibm sp2 with backfilling. In: Proceedings of the 12th International Parallel Processing Symposium, pp. 542–546 (1998)

    Google Scholar 

  20. Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. J Intl - International Journal of Supercomputer Applications (1997)

    Google Scholar 

  21. Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the Grid: Enabling scalable virtual organizations. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  22. Gerald, S., Rajkumar, K., Arun, R., Ponnuswamy, S.: Scheduling of parallel jobs in a heterogeneous multi-site environment. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, Springer, Heidelberg (2003)

    Google Scholar 

  23. Grimshaw, A.S., Wulf, W.A., French, J.C., Weaver, A.C., Reynolds Jr, P.F.: Legion: The next logical step toward a nationwide virtual computer (CS-94-21), 8 (1994)

    Google Scholar 

  24. Guim, F., Corbalan, J., Labarta, J.: The internals of the alvio-simulator: Simulator of hpc infraestructures (upc-dac-rr-cap-2007-2). Technical report, Architecture Computer Deparment - Technical University of Catalunya (2005)

    Google Scholar 

  25. Guim, F., Corbalan, J., Labarta, J.: Modeling the impact of resource sharing in backfilling policies using the alvio simulator. In: 15th Annual Meeting of the IEEE / ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (submitted, 2007)

    Google Scholar 

  26. Harchol-Balter, M., Crovella, M.E., Murta, C.D.: On choosing a task assignment policy for a distributed server system. Journal of Parallel and Distributed Computing 59(2), 204–228 (1999)

    Article  Google Scholar 

  27. Windisch, V.L.K., Moore, R., Feitelson, D., Nitzberg, B.: A comparison of workload traces from two production parallel machines. In: 6th Symp. Frontiers Massively Parallel Comput, pp. 319–326 (1996)

    Google Scholar 

  28. Lawson, B.G., Smirni, E.: Multiple-Queue Backfilling Scheduling with Priorities and Reservations for Parallel Systems. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 72–87. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  29. Li, H., Chen, J., Tao, Y., Groep, D., Wolters, L.: Improving a local learning technique for queue wait time predictions. Cluster and Grid computing (2006)

    Google Scholar 

  30. Pinchak, C., Lu, P., Goldenberg, M.: Practical heterogeneous placeholder scheduling in overlay metacomputers: Early experiences. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 205–228. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  31. Schroeder, B., Harchol-Balter, M.: Evaluation of task assignment policies for supercomputing servers: The case for load unbalancing and fairness. Cluster Computing 2004 (2004)

    Google Scholar 

  32. Sevcik, K.C.: Application scheduling and processor allocation in multiprogrammed parallel processing systems. Performance Evaluation, 107–140 (1994)

    Google Scholar 

  33. Shmueli, E., Feitelson, D.G.: Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 228–251. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  34. Skovira, J., Chan, W., Zhou, H., Lifka, D.A.: The EASY - LoadLeveler API Project. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 41–47. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  35. Smith, W., Taylor, V.E., Foster, I.T.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  36. Smith, W., Wong, P.: Resource selection using execution and queue wait time. predictions, p. 7

    Google Scholar 

  37. Talby, D., Feitelson, D.: Supporting priorities and improving utilization of the ibm sp scheduler using slack-based backfilling. In: Parallel Processing Symposium, pp. 513–517 (1999)

    Google Scholar 

  38. Tsafrir, D., Feitelson, D.G.: Instability in parallel job scheduling simulation: the role of workload flurries. In: 20th Intl. Parallel and Distributed Processing Symp. (2006)

    Google Scholar 

  39. Yue, J.: Global Backfilling Scheduling in Multiclusters. In: Manandhar, S., Austin, J., Desai, U., Oyanagi, Y., Talukder, A.K. (eds.) AACC 2004. LNCS, vol. 3285, pp. 232–239. Springer, Heidelberg (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Eitan Frachtenberg Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guim, F., Corbalan, J. (2008). A Job Self-scheduling Policy for HPC Infrastructures. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2007. Lecture Notes in Computer Science, vol 4942. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78699-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78699-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78698-6

  • Online ISBN: 978-3-540-78699-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics