A Job Self-scheduling Policy for HPC Infrastructures

Guim, Francesc; Corbalan, Julita

doi:10.1007/978-3-540-78699-3_4

A Job Self-scheduling Policy for HPC Infrastructures

Francesc Guim¹ &
Julita Corbalan¹

Conference paper

499 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4942))

Abstract

The number of distributed high performance computing architectures has increased exponentially these last years. Thus, systems composed by several computational resources provided by different Research centers and Universities have become very popular. Job scheduling policies have been adapted to these new scenarios in which several independent resources have to be managed. New policies have been designed to take into account issues like multi-cluster environments, heterogeneous systems and the geographical distribution of the resources.

Several centralized scheduling solutions have been proposed in the literature for these environments, such as centralized schedulers, centralized queues and global controllers. These approaches use a unique scheduling entity responsible for scheduling all the jobs that are submitted by the users.

In this paper we propose the usage of self-scheduling techniques for dispatching the jobs that are submitted to a set of distributed computational hosts that are managed by independent schedulers (such as MOAB or LoadLeveler). It is a non-centralized and job-guided scheduling policy whose main goal is to optimize the job wait time. Thus, the scheduling decisions are done independently for each job instead of using a global policy where all the jobs are considered. On top of this, as a part of the proposed solution, we also demonstrate how the usage of job wait time prediction techniques can substantially improve the performance obtained in the described architecture.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bansal, N., Harchol-Balter, M.: Analysis of SRPT scheduling: investigating unfairness (2001)
Google Scholar
Berman, F., Wolski, R.: The apples project: A status report (1997)
Google Scholar
Berman, F., Wolski, R.: Scheduling from the perspective of the application. pp. 100–111 (1996)
Google Scholar
Calzarossa, M., Haring, G., Kotsis, G., Merlo, A., Tessera, D.: A hierarchical approach to workload characterization for parallel systems. In: Hertzberger, B., Serazzi, G. (eds.) HPCN-Europe 1995. LNCS, vol. 919, pp. 102–109. Springer, Heidelberg (1995)
Chapter Google Scholar
Calzarossa, M., Massari, L., Tessera, D.: Workload characterization issues and methodologies. In: Reiser, M., Haring, G., Lindemann, C. (eds.) Performance Evaluation: Origins and Directions. LNCS, vol. 1769, pp. 459–482. Springer, Heidelberg (2000)
Chapter Google Scholar
Chiang, S.-H., Arpaci-Dusseau, A.C., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002)
Chapter Google Scholar
Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Ann. Workshop Workload Characterization (2001)
Google Scholar
Cirne, W., Berman, F.: A model for moldable supercomputer jobs. In: 15th Intl. Parallel and Distributed Processing Symp. (2001)
Google Scholar
Downey, A.B.: A parallel workload model and its implications for processor allocation. In: 6th Intl. Symp. High Performance Distributed Comput (August 1997)
Google Scholar
Downey, A.B.: Using queue time predictions for processor allocation. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 35–57. Springer, Heidelberg (1997)
Google Scholar
Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global grid computing for job scheduling. In: 5th IEEE/ACM International Workshop on Grid Computing (2004)
Google Scholar
Feitelson, D.G.: Packing schemes for gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 89–110. Springer, Heidelberg (1996)
Chapter Google Scholar
Feitelson, D.D.G.: Parallel workload archive (2006)
Google Scholar
Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002)
Chapter Google Scholar
Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the nasa ames ipsc/860. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995)
Google Scholar
Feitelson, D.G., Rudolph, L.: Workload evolution on the cornell theory center ibm sp2. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 27–40. Springer, Heidelberg (1996)
Google Scholar
Feitelson, D.G., Rudolph, L.: Metrics and benchmarking for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998)
Chapter Google Scholar
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling - a status report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, p. 9. Springer, Heidelberg (2005)
Google Scholar
Feitelson, D.G., Weil, A.: Utilization and predictability in scheduling the ibm sp2 with backfilling. In: Proceedings of the 12th International Parallel Processing Symposium, pp. 542–546 (1998)
Google Scholar
Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. J Intl - International Journal of Supercomputer Applications (1997)
Google Scholar
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the Grid: Enabling scalable virtual organizations. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, Springer, Heidelberg (2001)
Chapter Google Scholar
Gerald, S., Rajkumar, K., Arun, R., Ponnuswamy, S.: Scheduling of parallel jobs in a heterogeneous multi-site environment. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, Springer, Heidelberg (2003)
Google Scholar
Grimshaw, A.S., Wulf, W.A., French, J.C., Weaver, A.C., Reynolds Jr, P.F.: Legion: The next logical step toward a nationwide virtual computer (CS-94-21), 8 (1994)
Google Scholar
Guim, F., Corbalan, J., Labarta, J.: The internals of the alvio-simulator: Simulator of hpc infraestructures (upc-dac-rr-cap-2007-2). Technical report, Architecture Computer Deparment - Technical University of Catalunya (2005)
Google Scholar
Guim, F., Corbalan, J., Labarta, J.: Modeling the impact of resource sharing in backfilling policies using the alvio simulator. In: 15th Annual Meeting of the IEEE / ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (submitted, 2007)
Google Scholar
Harchol-Balter, M., Crovella, M.E., Murta, C.D.: On choosing a task assignment policy for a distributed server system. Journal of Parallel and Distributed Computing 59(2), 204–228 (1999)
Article Google Scholar
Windisch, V.L.K., Moore, R., Feitelson, D., Nitzberg, B.: A comparison of workload traces from two production parallel machines. In: 6th Symp. Frontiers Massively Parallel Comput, pp. 319–326 (1996)
Google Scholar
Lawson, B.G., Smirni, E.: Multiple-Queue Backfilling Scheduling with Priorities and Reservations for Parallel Systems. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 72–87. Springer, Heidelberg (2002)
Chapter Google Scholar
Li, H., Chen, J., Tao, Y., Groep, D., Wolters, L.: Improving a local learning technique for queue wait time predictions. Cluster and Grid computing (2006)
Google Scholar
Pinchak, C., Lu, P., Goldenberg, M.: Practical heterogeneous placeholder scheduling in overlay metacomputers: Early experiences. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 205–228. Springer, Heidelberg (2002)
Chapter Google Scholar
Schroeder, B., Harchol-Balter, M.: Evaluation of task assignment policies for supercomputing servers: The case for load unbalancing and fairness. Cluster Computing 2004 (2004)
Google Scholar
Sevcik, K.C.: Application scheduling and processor allocation in multiprogrammed parallel processing systems. Performance Evaluation, 107–140 (1994)
Google Scholar
Shmueli, E., Feitelson, D.G.: Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 228–251. Springer, Heidelberg (2003)
Chapter Google Scholar
Skovira, J., Chan, W., Zhou, H., Lifka, D.A.: The EASY - LoadLeveler API Project. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 41–47. Springer, Heidelberg (1996)
Chapter Google Scholar
Smith, W., Taylor, V.E., Foster, I.T.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999)
Chapter Google Scholar
Smith, W., Wong, P.: Resource selection using execution and queue wait time. predictions, p. 7
Google Scholar
Talby, D., Feitelson, D.: Supporting priorities and improving utilization of the ibm sp scheduler using slack-based backfilling. In: Parallel Processing Symposium, pp. 513–517 (1999)
Google Scholar
Tsafrir, D., Feitelson, D.G.: Instability in parallel job scheduling simulation: the role of workload flurries. In: 20th Intl. Parallel and Distributed Processing Symp. (2006)
Google Scholar
Yue, J.: Global Backfilling Scheduling in Multiclusters. In: Manandhar, S., Austin, J., Desai, U., Oyanagi, Y., Talukder, A.K. (eds.) AACC 2004. LNCS, vol. 3285, pp. 232–239. Springer, Heidelberg (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Barcelona Supercomputing Center,
Francesc Guim & Julita Corbalan

Authors

Francesc Guim
View author publications
You can also search for this author in PubMed Google Scholar
Julita Corbalan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Eitan Frachtenberg Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guim, F., Corbalan, J. (2008). A Job Self-scheduling Policy for HPC Infrastructures. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2007. Lecture Notes in Computer Science, vol 4942. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78699-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-78699-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78698-6
Online ISBN: 978-3-540-78699-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics