Elsevier

Parallel Computing

Volume 37, Issue 8, August 2011, Pages 428-438
Parallel Computing

Dynamic scheduling of a batch of parallel task jobs on heterogeneous clusters

https://doi.org/10.1016/j.parco.2010.12.004Get rights and content

Abstract

This paper addresses the problem of minimizing the scheduling length (make-span) of a batch of jobs with different arrival times. A job is described by a direct acyclic graph (DAG) of parallel tasks. The paper proposes a dynamic scheduling method that adapts the schedule when new jobs are submitted and that may change the processors assigned to a job during its execution. The scheduling method is divided into a scheduling strategy and a scheduling algorithm. We also propose an adaptation of the Heterogeneous Earliest-Finish-Time (HEFT) algorithm, called here P-HEFT, to handle parallel tasks in heterogeneous clusters with good efficiency without compromising the makespan. The results of a comparison of this algorithm with another DAG scheduler using a simulation of several machine configurations and job types shows that P-HEFT gives a shorter makespan for a single DAG but scores worse for multiple DAGs. Finally, the results of the dynamic scheduling of a batch of jobs using the proposed scheduler method showed significant improvements for more heavily loaded machines when compared to the alternative resource reservation approach.

Research highlights

► HEFT version for parallel task scheduling. ► Comparison in terms of efficiency and makespan of two single DAG schedulers. ► A less efficient algorithm achieves shorter makespans for single DAG scheduling. ► A more efficient algorithm achieves shorter makespans for several concurrent DAGs. ► Non-reservation policy improves performance significantly for heavily loaded machines.

Introduction

A heterogeneous computer cluster is defined in this paper as a set of processors connected by a high speed network, with different processing capacities. The cluster is a dedicated machine as in [1], [9], [22], as opposed to a geographically dispersed, loosely connected grid. Such clusters can be private clusters of an organization, but also they can be the nodes of a grid infrastructure, which can only provide good performance if at least the end clusters also have good performance. This paper concentrates on optimizing the makespan of a batch of jobs that may arrive at the heterogeneous cluster scheduler at different times.

The task scheduling problem is to allocate resources (processors) to the tasks and to establish an order for the tasks to be executed by the resources [9], [22]. In this context, there are two different types of task scheduling: static and dynamic. Static strategies define a schedule at the compile or at launch time based on the knowledge of the processors’ availability and the tasks to execute. The aim is usually to minimize the schedule length (or makespan) of the tasks [1], [10], [11], [14], [22].

Dynamic strategies, on the other hand, are applied when the arrival times of the tasks are not known a priori, so the system needs to schedule tasks as they arrive [9], [13], [21]. We consider non-preemptive tasks, so that only the tasks waiting to be executed and the newly arrived tasks are considered in the re-scheduling. In this paper, the aim of dynamic scheduling is to minimize the completion time of a batch of tasks, as in [13], [21]. Real world applications where such a batch of jobs can occur are found in the processing of image sequences, where objects contained in the frames are identified and followed along frames. The objects’ shapes may vary along frames, and the problem is therefore known as deformable object tracking [18]. This problem has applications in video surveillance, microscopic imaging processing, and biomechanical analysis. In this context, a multiple job (or DAG) environment may be, for example, the input from several cameras forming a batch of jobs whose processing time should be minimized. The jobs are therefore related, and only the collective result is meaningful.

Dynamic strategies can aim also to minimize some global performance measurements related to the system’s quality of service (QoS) [9]. The optimization of performance measurements related to the service quality for a dynamic multi-user environment, where jobs are unrelated, is outside the scope of this paper. However, this is another important problem that will be considered in a future work.

The parallel task model, or mixed-parallelism, combines task and data parallelism at the same time. Data parallelism is common in many scientific applications that are based on linear algebra kernels, for example, and it can exploit heterogeneous clusters as reported in [2]. Also, many applications are described by DAGs, where the edges represent the task precedence as well as the communication load between tasks. However, DAGs with many branches implicitly have tasks that can be executed in parallel, thus resulting in a mixed-parallelism if a single task is assigned to more than one processor. Mixed-parallelism has been widely reported in the literature [1], [5], [6], [7], [8], [12], [23].

In this paper, we first present a new adaptation of HEFT [22] for parallel task DAGs on a single heterogeneous cluster, called P-HEFT (P stands for parallel tasks). We then compare the results of this algorithm with those of another mixed-parallel scheduler, the Heterogeneous Parallel Task Scheduler (HPTS) [1], using a simulation of several machine configurations and job types.

We present results that compare the resource reservation policy [15], [25] to the one proposed here, which carries out a global management of the machine and allows a floating assignment of processors to the jobs, leading to better makespans when seeking to minimize the completion time of a batch of jobs.

The contributions of the present work are (a) a mixed-parallel version of HEFT [22] for heterogeneous clusters (P-HEFT), (b) a characterization of P-HEFT and HPTS through extended simulations, (c) a first approach to scheduling multiple mixed-parallel jobs dynamically and (d) an evaluation of the best approach, in terms of the makespan metric, to scheduling multiple DAGs that arrive at different times.

This paper is organized as follows: Section 2 defines the scheduling problem and revises related work. Section 3 presents the computational model used in this paper. Section 4 presents the dynamic scheduling method proposed in this paper. Section 5 presents the results, and finally, Section 6 presents the conclusions and future work.

Section snippets

Problem definition and related work

The problem addressed in this paper is the dynamic scheduling of a batch of jobs, or applications, represented by directed acyclic graphs (DAGs) on a distributed memory computer (i.e., a cluster). Prior works on dynamic scheduling for a single heterogeneous cluster consider independent tasks and assign one task to one processor (sequential tasks) [9], [13], [21]. In [21] a genetic algorithm is implemented that can schedule a batch of independent tasks with a short completion time. Here in this

Computational model

The computational model allows us to estimate the execution time of each task on a heterogeneous distributed memory machine. The processors used in a given task can be arranged in a grid (p, q) that optimizes its execution time. The processors are connected by a switched network that supports simultaneous communications between different pairs of machines.

The communications required to complete a task are included in the computation time as a function of the processors P = p × q used for that task

Dynamic scheduling method

The scheduling method is divided into two parts: a scheduling strategy and a scheduling algorithm. The scheduling strategy defines the instants when the scheduling algorithm is called to produce a schedule based on the machine information and the tasks waiting in the queue at the time it is called.

Results and discussion

In this section, we evaluate and compare the performance of the scheduling algorithms P-HEFT and HPTS when scheduling a single parallel task DAG using an extensive simulation setup. Then, the simulation results of the dynamic scheduling method are presented and compared to the alternative of resource reservation per job [15]. Finally, we present the results of a real problem in a real system.

The metrics used for comparison are the schedule length (makespan) and the efficiency. The makespan is

Conclusions

In this paper, we have proposed a dynamic method for scheduling a batch of mixed-parallel DAGs on heterogeneous systems that arrive at different times. It improves the response time by allowing variation in the computing power (number of processors) assigned to a job. It showed better performance than the resource reservation alternative for a wide set of machines, both homogeneous and heterogeneous, both for random generated DAGs and for a real world application.

To the best of our knowledge,

References (25)

  • S. Chakrabarti et al.

    Modeling the benefits of mixed data and task parallelism

  • P-F. Dutot et al.

    Scheduling parallel task graphs on (almost) homogeneous multicluster platforms

    IEEE Transactions on Parallel and Distributed Systems

    (2009)
  • Cited by (41)

    • Improving task scheduling with parallelism awareness in heterogeneous computational environments

      2019, Future Generation Computer Systems
      Citation Excerpt :

      AlEbrahim et al. [5] and Khan [16] respectively proposed a method to schedule workflow jobs in a heterogeneous computing system. Barbosa and Moreira [23] presented a dynamic scheduling method for finishing DAG jobs on a single heterogeneous cluster based on the Heterogeneous Earliest-Finish-Time (HEFT) algorithm [37]. N’Takpé and Suter [21] focused on mapping parallel tasks graphs to processors on multi-clusters for minimizing the makespan.

    View all citing articles on Scopus
    View full text