Distributed gang scheduling in networks of heterogenous workstations

doi:10.1016/S0140-3664(97)00020-0

Computer Communications

Volume 20, Issue 5, 1 July 1997, Pages 338-348

https://doi.org/10.1016/S0140-3664(97)00020-0 Get rights and content

Abstract

The wide availability of workstation networks and the rapid evolution of workstation technology is a motivation for investigating methods of harnessing the full power of such systems. Individual workstations are not usually effectively utilized by their owners. Owners may be willing to lend the processing power of their workstations if used in an unobtrusive way. The ability to effectively borrow the idle cycles of the workstations in a network and efficiently schedule parallel application programs concurrently onto those idle workstations is the topic of this paper. In this paper, we present a distributed scheduling algorithm that will track the available workstations, i.e. workstations not used by their owners, in networks and act upon those workstations by scheduling processes of parallel applications onto them. Our scheduling objectives are minimizing the average Turn Around Time (TAT) of the scheduled applications and maintaining fairness among scheduled applications by granting each application all the resources it requires. Moreover, scheduling solutions are narrowed to those that produce a responsive and scalable scheduling algorithm.

Section snippets

Introduction and background

Major attention is currently being given to concurrent processing via distributed systems 1, 2, 3, 4. The feasibility of concurrent processing via distributed computing is supported by a variety of software systems [5]and the availability of a great number of problems that can be solved by distributed computing [6]. The Parallel Virtual Machine (PVM), P4, Linda and others [5], are examples of software systems that can be used to execute parallel application pro grains onto a distributed

Assumptions and terminology

In this section, we present and discuss the assumptions and the related terminology used in the scheduling algorithm.

The number of processes of an application or a job, or Virtual Processors (VPs) as we refer to in this paper, defines the VP-count X. Scheduling in a heterogenous environment should assign a number of VPs proportional to the processing power of each workstation. The term `Siblings' is used to indicate all the concurrent VPs of a job. In the discussion of workstations scheduling,

The scheduling model

Interaction properties among the VPs of a job are factors that influence the scheduling strategies. Fine grain applications that exhibit close cooperation among their VPs are shown to perform well if executed by gang scheduling [14]. Gang scheduling is a technique where all the VPs of an application are scheduled on the available processors. Multiple jobs are scheduled by time sharing the workstations time. We consider scheduling fine grain applications where the VPs exhibit close interactions

Allocation algorithms

The allocation of an application to a set of workstations with unequal processing powers is performed in two steps: first each workstation in the initial set is assigned a number of VPs proportional to its processing capacity such that the TAT of the application is minimized. Two algorithms are introduced to produce the optimal number of VPs per workstation for the minimal TAT. Those algorithms are the Assignment Equation (AE), and the Rounding Algorithm (RA). Second, further optimization is

Static allocation

In this section, we present algorithms corresponding to operations that are not restricted by workstations architecture. An example of such operations is the spawn of a new job which can be scheduled without architecture restrictions, i.e. statically. The spawn activation calls the Regular Pattern Search (RPS) algorithm which is presented next.

Dynamic allocation

In this section, we present some allocation algorithms and Bin operations which consider architecture heterogeneity. First, we address the heterogeneity issue.

Comparison with related work

First we present some of the known distributed scheduling algorithms for distributed environments, then we compare them with the approach proposed in this paper.

Kremien et al. [12]scheduling algorithm subdivides the system into domains. Load information is exchanged only among processors in the same domain. Every node independently determines the nodes it includes in its domain. A domain includes all the nodes having opposite load status, i.e. over-loaded and under-loaded nodes. When an

Summary and conclusion

We presented a distributed scheduling algorithm that enables the concurrent execution of applications onto networks of non-dedicated heterogeneous workstations. Such networked environments impose several requirements on a scheduling algorithm. The scheduling algorithm must be dynamic, since it executes applications in a non-dedicated environment. The algorithm must take migrations into consideration and, therefore, migrations are to be invoked at points when it is deemed to be cost effective.

Acknowledgements

The first author acknowledges the comments and suggestions made by Jon Walpole, and Steve Otto during the sabbatical year spent at the Oregon Graduate Institute of Science and Technology.

Khaled Al-Saqabi received his B.S.E. degree from the University of South Florida, his M.S. degree from the Ohio State University, and his Ph.D. degree from North Carolina State University in 1982, 1985, 1989; respectively. He is an assistant professor in the department of Electrical and Computer Engineering at Kuwait University. He was a member of the BLITZEN group at the Micro Electronics Center of North Carolina (MCNC), Research Triangle Park between 1987 and 1989. He was a visiting research

References (19)

T.L. Casavant, J.G. Kuhl, A Taxonomy of scheduling in general-purpose distributed computing systems, IEEE Trans....
M.W. Mutka, Estimating capacity for sharing in a privately owned workstation environment, IEEE Trans. Software...
V.S. Sunderam, PVM: A framework for parallel distributed computing, Concurrency: Practice and Experience 2 (4) (1990)...
G.A. Geist, V.S. Sunderam, Network based concurrent computing on the PVM system, Concurrency: Practice and Experience 4...
C.C. Douglas, T.G. Mattson, M.H. Schultz, Parallel programming systems for workstation clusters, Technical Report 975,...
J. Boyle, R. Butler, T. Disz, B. Glickfeld, E. Lusk, R. Overbeek, J. Patterson, R. Stevens, Portable Programs for...
A.S. Tanenbaum, Operating Systems Design and Implementation, Prentice Hall, New Jersey,...
M.K. Litzkow, M. Livney, M.W. Mutka, Condor – A hunter of idle workstations, in: Proceedings the Eighth International...
F. Douglas, J. Ousterhout, Process migration in the sprite operating system, in: Proceedings the Seventh International...

There are more references available in the full text version of this article.

Cited by (5)

Scheduling and resource management using PSO in P-grid
2010, Proceedings of 2010 International Conference on Communication and Computational Intelligence, INCOCCI-2010
A heuristic on job scheduling in grid computing environment
2008, Proceedings - 7th International Conference on Grid and Cooperative Computing, GCC 2008
Task scheduling by Mean Field Annealing algorithm in grid computing
2008, 2008 IEEE Congress on Evolutionary Computation, CEC 2008
Task scheduling by neural network with mean field annealing improvement in grid computing
2006, Canadian Conference on Electrical and Computer Engineering
QoS guided Min-Min heuristic for grid task scheduling
2003, Journal of Computer Science and Technology

Mansoor Sarwar is currently an Associate Professor of Electrical Engineering at the University of Portland, Oregon. He earned his undergraduate degree in Electrical Engineering from the University of Engineering and Technology, Lahore, Pakistan, and M.S. and Ph.D. degrees in Computer Engineering from Iowa State University. His current teaching and research interests are in experimental performance evaluation, parallel and distributed computing, operating systems, and engineering education.

Kassem Saleh born in Beirut, Lebanon in 1963, received the B.Sc., M.Sc. degrees in Computer Science, and the Ph.D. degree in Electrical Engineering from the University of Ottawa, Canada in 1985, 1986 and 1991, respectively. He was a computer systems specialist at Bell Canada from 1985 to 1991 then he joined Concordia University as an assistant professor for one year. He is currently an assistant professor in the department of electrical and computer engineering at Kuwait University. He was awarded the IBM telecommunications Software Scholarship in 1988, the George Franklin Prize for the best paper in 1990 from the Canadian Interest Group on Open Systems (CIGOS), and the Distinguished Young Researcher Award from Kuwait University in 1995. His research and teaching interests include software engineering, distributed system design and communications protocol engineering.

View full text