A Synthesized Heuristic Task Scheduling Algorithm

Aiming at the static task scheduling problems in heterogeneous environment, a heuristic task scheduling algorithm named HCPPEFT is proposed. In task prioritizing phase, there are three levels of priority in the algorithm to choose task. First, the critical tasks have the highest priority, secondly the tasks with longer path to exit task will be selected, and then algorithm will choose tasks with less predecessors to schedule. In resource selection phase, the algorithm is selected task duplication to reduce the interresource communication cost, besides forecasting the impact of an assignment for all children of the current task permits better decisions to be made in selecting resources. The algorithm proposed is compared with STDH, PEFT, and HEFT algorithms through randomly generated graphs and sets of task graphs. The experimental results show that the new algorithm can achieve better scheduling performance.


Introduction
A heterogeneous computing system (HCS) is a computing platform with diverse sets of interconnected resources via high speed network to execute parallel and distributed applications. Due to diverse computational resources, the efficiency of an application on the available resources is one of the key factors for achieving high performance computing. The general form of task scheduling problem in HCS can be represented as directed acyclic graph (DAG). The common objective of scheduling is to assign tasks onto suitable resources and order their execution so that task precedence requirements are satisfied with a minimum schedule length [1]. The scheduling problem is shown to be NP-complete [2][3][4], so it is expected to be solved by heuristic algorithm.
A typical task scheduling model is based on a directed acyclic graph, including static and dynamic scheduling. Heuristic algorithm is static scheduling algorithm, which is composed of duplication-based algorithms, clustering algorithms, and list scheduling algorithms [3]. List scheduling algorithms are widely used for their high scheduling efficiency and simple design idea. The basic idea of list scheduling is to construct a schedule list by assigning priority for each task, and tasks are selected to a processor which minimizes the execution time. Classical examples of list scheduling algorithms were proposed in [5][6][7][8][9][10]. The heterogeneous earliest finish time (HEFT) [5] uses a recursive procedure to compute the rank of a task by traversing the graph upwards from the exit task and vice-versa for critical path on a processor (CPOP) [5]. Improvement heterogeneous earliest finish time (IHEFT) [6] acquires a better task list by changing the task's upward weight calculation method. Stand deviation-based algorithm for task scheduling (SDBATS) [7] uses the standard deviation of the expected execution time of a given task on the available resources as a key attribute for assigning task priority. In [8], the proposed algorithm determines the critical path of the task graph and selects the next task to be scheduled in a dynamic fashion. Predict earliest finish time (PEFT) [9] algorithm is only based on computation of an optimistic cost table (OCT) that is used to rank tasks and for resource selection. A novel algorithm named the longest dynamic critical path (LDCP) [10] assigns priorities by considering the critical path attribute of the given DAG.
The idea of task duplication is to try to duplicate the parents of the current selected task onto the selected resource or on to the other resources, aiming to reduce or optimize the task finish time. Many duplication-based algorithms are proposed in recent years; for example, in [11], a path prioritybased heuristic task scheduling algorithm was studied. Heterogeneous critical task (HCT) [12] scheduling algorithm defines the critical task and the idle time slot. Selected task duplication for heterogeneous (STDH) [13] duplicates 2 The Scientific World Journal the parent tasks for advancing the earliest starting time of the current candidate task to reduce the inter processor communication cost. Heterogeneous earliest finish time with duplication (HEFD) [14] uses task variance as computation capacity heterogeneity factor for setting weights to tasks and edges. The algorithm in [15] is a three-phase algorithm with a dynamic phase to assign a priority to each task. To search and delete redundant task duplications dynamically in the process of scheduling, a novel resource-aware scheduling algorithm with duplications (RADS) was proposed in [16]. There are some researches such as [17,18] combining DVS technique with duplication strategy to reduce energy consumption. The algorithms proposed in [19,20] are designed to solve the problem of resource waste. Other duplicationbased algorithms such as selective duplication (SD) [21] and heterogeneous critical parents with fast duplicator (HCPFD) [22] are also worth studied.
Clustering algorithms merge tasks in a DAG to an unlimited number of clusters, and tasks in a cluster are allocated on the same resource. Some classical examples are cluster mapping algorithm (CMA) [23], clustering and scheduling system (CASS) [24], objective flexible clustering algorithm (OFCA) [25], and so on. In [26,27], two novel algorithms were proposed, but they have limitations in higher heterogeneity systems. To reduce energy consumption without increasing the schedule length, [28] reclaims both static and dynamic slack time and employs different frequency adjusting techniques in different slack time, and [29] studies the slack time for noncritical jobs, extends their execution time, and reduces the energy consumption without increasing the task's execution time as a whole. Additionally, these methods are based on DAG scheduling: heterogeneous critical path first synthesized (HCPFS) [30], heterogeneous select value (HSV) [31], heterogeneous chip multiprocessor global comparatively optimum task scheduling (HGCOTS) [32], and so on.
The HEFT [5] algorithm has two phases: the task prioritizing phase and the processor selection phase. In the first phase, a task list is generated by sorting the tasks in decreasing order of the upward rank. In the processor selection phase, tasks are scheduled to the best processor that minimizes the task's finish time. However, the algorithm does not take into account the critical task and interprocessor communication cost impact on the whole task graph scheduling time. Task scheduling efficiency is not high which still should be optimized further.
The PEFT [9] algorithm is based on the computation of an optimistic cost table (OCT) on which task priority and processor selection are based. The OCT is a matrix in which the rows indicate the number of resources, where each element OCT( , ) indicates the maximum of the shortest paths of children's tasks to the exit node considering that resource is selected for task . The algorithm has the same time complexity as the HEFT and SDBATS algorithms, that is, ( 2 ⋅ ) for tasks and resources. But the algorithm does not take into account the critical tasks, which decide the whole DAG scheduling time. So this algorithm delays the scheduling length to some extent.
The STDH [13] algorithm duplicates the parent tasks for advancing the earliest starting time of the current candidate task to reduce the interprocessor communication cost. By this way, the overall run time of application is shortened, but the algorithm scheduled only by one attribute. In the case of that different tasks should have the same priority, it cannot work well. Therefore, in order to evaluate the priority of tasks reasonably, multiple attributes are more suitable for sorting tasks.
In this paper, we propose a synthesized heuristic task scheduling algorithm based on both duplication-based techniques and list-based approach for heterogeneous computing systems. The HCPPEFT algorithm has two phases, the task prioritizing phase and the resource selection phase (see Algorithm 1). In the first phase, we suggest a new approach of critical task, the tasks with longer path to exit task, and the number of predecessors to construct the scheduling queue. In the second phase, the duplication of tasks is optimized for all immediate parent tasks and look ahead policies is to guarantee the tasks ahead will finish earlier. The rest of the paper is organized as follows. In Section 2, we define the task scheduling problem and discuss some basic attributes of DAG scheduling. Section 3 introduces HCPPEFT algorithm and in Section 4 the results of the experiment are discussed. Section 5 concludes the paper.

Task Model.
A scheduling system model is composed of an application in heterogeneous computing environment. An application is represented by = ( , ), where is the set of tasks and is the set of e edges between the tasks. Each edge ∈ represents the precedence constraint such that task should complete its execution before task starts. Communication cost required to be transmitted from task to task is represented by a ( × ) matrix, in which each gives the amount of data on edge. The heterogeneous system consists of a set of resources = { 1 , 2 , . . . , } of independent resources fully connected by a high speed network. A computation cost matrix is represented as ( × ), in which each , gives the estimated time to execute on resource . We further assume that any two connecting tasks scheduled on the same resource have zero communication costs.

DAG Scheduling Attributes.
Before proceeding to the next section, it is necessary to discuss some scheduling attributes such as rank upward, rank downward, earliest start time, earliest finish time, and optimistic cost table, which will be used in the proposed scheduling algorithm.

Definition 1.
A task with no predecessors is called an entry task ( entry ), and pred( ) denotes the set of immediate predecessors of task in a given task graph. Similarly, a task with no successors is called an exit task ( exit ), and succ( ) denotes the set of immediate successors of task in a given task graph. If a task graph has multiple entry or exit nodes, The Scientific World Journal 3 (1) Compute rank upward, rank downward, predecessors and OCT table for all tasks (2) Compute CT( ) = rank ( ) + rank ( ) for each task in the graph.
If CT( ) = CT( entry ), where entry is the entry task, CT = { entry } is the set of tasks on the critical path. (3) Sort the tasks in scheduling queue by critical task, decreasing order of rank value and increasing order of predecessors. (4) While there are unscheduled tasks in the queue do (5) Select the first task from the queue for scheduling (6) for each resource in the resource set ( ∈ ) do (7) Sort immediate parents of task by decreasing order of data arrival time (8) while the duplication condition is satisfied do (9) if the duplication of can reduce its start execution time then (10) Duplicate and update the earliest start time of and free time slots. (11) end (12) end (13) Compute the earliest finish time by (5). (14) EFT ( , ) = EFT( , ) + OCT( , ) (15) end (16) Assign task to the resource with minimize EFT of task . a dummy entry or exit node with zero communication edges and zero weight can be added to the graph. Definition 2. The upward rank of a task is calculated using the following equation: where is the average computation cost of task , and is the average communication cost of the edge from task to task . For the exit task exit , upward rank value is Similarly, the downward rank of task is defined by the following equation: For the entry task entry , the downward rank value is equal to zero.
where avail( , ) is the earliest time at which resource is ready for task execution, AFT( ) is actual finish time of task , and is communication cost of the edge from task to task . Before computing earliest finish time of a task , all the immediate predecessor tasks of must have been scheduled. For the entry task entry , EST( entry , ) = 0. Definition 4. If the task start execution time depends on the arrival time of parent tasks, there will be a free time slot, that is, the task to wait for the time of arrival of the parent task, can be defined as Definition 5. The OCT is a matrix in which the rows indicate the number of tasks and the columns indicate the number of resources, where each element OCT( , ) represents the maximum optimistic processing time of the children of task . The OCT value of task on resource is recursively defined by Definition 6. The schedule length (makespan) of the task graph denotes the finish time of the last task and is defined as follows: The objective function of the task scheduling problem is to determine an assignment of tasks of a given task graph to resources, such that its schedule length is minimized, which satisfying all precedence constraints.

The Proposed Algorithm HCPPEFT
3.1. Task Prioritizing Phase. The critical path is a path with the longest execution time in directed acyclic graph, which of task play a decisive role to finish time. In the process of task scheduling, critical task is assigned the highest priority to make it firstly scheduled. However, critical task usually have one or more parent tasks. If parent tasks cannot get a reasonable scheduling, the start execution time will be delayed. Therefore it not only assigns besides the critical tasks the highest priority, but also needs the parent tasks of critical tasks to allocate higher-priority. Immediate parent tasks are sorted in a decreasing order of rank upward values. If the rank upward values are equal, select the tasks with respect to the predecessors increase. In order to construct the task scheduling queue, two empty queues of tasks are constructed firstly, which are CTQ and RTQ. CTQ is used to store critical task and RTQ is used to store the task scheduling ready queue. The process of constructing task scheduling queues is expressed as Figure 1. The detailed explain is presented as follows.
(1) Upward rank (rank ), downward rank (rank ), and the summation of upward and downward ranks (rank + rank ) values for all tasks are computed.
(2) The critical path length is equal to the entry task's priority. The entry task is marked as a critical path task. If CT( ) = rank ( ) + rank ( ) for each task in DAG, then CT( ) = CT( entry ), where entry is the entry task, CT = { entry } is the set of tasks on the critical path. The all critical path task will be added to the CTQ by decreasing order of rank .
(3) The CTQ at the beginning contains all critical tasks.
In fact a critical task will be omitted from the CTQ and added to the RTQ after all parent tasks are added to the RTQ. The priority of the parent tasks is determined based on their rank . The parent task with the highest rank gets high priority. (4) If there are two or more parent tasks of rank are equal. The priority of the parent tasks is generated by increasing order of the number of predecessors. (5) The above steps are executed until the CTQ queue is null. The RTQ queue is task scheduling queue.
To illustrate how the task prioritizing queue is constructed, a random task graph with 10 tasks and 16 edges is shown in Figure 2. The corresponding computation cost with respect to each resource (i.e., { 1 , 2 , 3 }) is presented in Table 1. Based on the values in Table 2, the critical path in Figure 2 is { 1 , 3 , 7 , 10 }. The all critical tasks will be added to the CTQ by decreasing order of rank upward values; that is, CTQ = { 1 , 3 , 7 , 10 }. Because the entry task 1 has no parent task, it will be omitted from the CTQ and added to the RTQ immediately. After that, task 3 is added in RTQ and deleted from CTQ. Until parents tasks 2 and 5 have been selected, the critical task 7 can be selected. Till now, we can get CTQ = { 10 } and RTQ = { 1 , 3 , 5 , 2 , 7 }. Before the critical task 10 is selected, the parent tasks 8 and 9 must have been scheduled. But the rank upward values of parent tasks 8 and 9 are equal, selecting tasks with respect to the predecessors increase. Task 8 is selected and then is task 9 . At this time, the scheduling order of remaining tasks is { 4 , 8 , 6 , 9 , 10 }. Finally, the task queue is constructed, which is { 1 , 3 , 5 , 2 , 7 , 4 , 8 , 6 , 9 , 10 }.

Resource Selection Phase.
In this phase, tasks are assigned to the resources and duplication is employed to minimize the finish time. To start execution on a resource, the task has to wait for the data arrivals from all of its immediate parents, so as to meet precedence constraints. The parent task of task whose schedule on different resources and whose data arrival time at task is the latest parent task. Duplication condition [13] is defined as The Scientific World Journal  If it is satisfied, then the parent task will be duplicated on the resource that assigned to. Simultaneously compute the earliest finish time by (5) and update the free time slots by (6). It is to be noted that the best-suited resource may not be achieved if we followed by (5). Because the best scheduling consider not only the current task's gain in complete time but also the gain in a sequence of tasks. With computation of an OCT by (7) not only cannot increase the time complexity but also guarantee that the tasks ahead will finish earlier. In this way, we compute the optimistic EFT ( EFT ), which sums to EFT the computation time of the longest path to the exit task, to select a best-suited resource. EFT is defined as For a given DAG, the time complexity of scheduling algorithm is usually expressed in terms of number of tasks and number of resources . The time complexity of STDH is of constructing schedule queue ( 2 ). The resource selection for all tasks can be done in time complexity ( 2 ⋅ ). The total time is ( 2 ⋅ ). PEFT requires the computation of an OCT table is ( ( 2 + )), and to assign the tasks to resources, the time complexity is of the order ( 2 ⋅ ). The total time is  ( 2 ⋅ ). The HEFT algorithm has a ( 2 ⋅ ) complexity for tasks and resources. The HCPPEFT algorithm requires the construing task scheduling queue is ( 2 ⋅ ), and the time complexity of resource select phase is ( ( 2 + )). That is, the HCPPEFT algorithm has the same time complexity as the STDH, PEFT, and HEFT algorithms.

HCPPEFT Algorithm Implementation.
To evaluate the performance of the HCPPEFT algorithm, we implement this algorithm on the PC platform using MATLAB software.
A randomly generated task graph is shown in Figure 2 firstly. Then the corresponding computation costs are presented in Table 1. By calculating (7), the OCT values are shown in Tables 3 and 4 which give an example that demonstrates the HCPPEFT for the DAG of Figure 2 in the end.
As an illustration, Figure 3 presents the schedules obtained by the HCPPEFT algorithm, where the gray blocks with numbers are the duplicated tasks. The schedule length, which is equal to 69, is shorter than that of the related algorithms; specifically, the schedule lengths of STDH, PEFT, and HEFT Algorithm are 73, 78, and 77, respectively.

Experimental Results and Discussion
This section presents a performance comparison of the proposed algorithm HCPPEFT with four well-known task scheduling algorithms such as STDH, PEFT, and HEFT by randomly generated task graph and sets of task graphs. Three metrics are used for performance evaluation. The Scientific World Journal   Since a large set of task graphs with different properties is used, it is necessary to normalize the schedule length to the lower bound, which is called the schedule length ratio (SLR). SLR is defined as follows: where CP min is the minimum computation cost of the critical path tasks in SLR. There is no makespan less than the denominator of the SLR equation since the denominator is the lower bound. Therefore, the algorithm with the lowest SLR is the best algorithm with respect to performance.

Speedup.
The speedup value for a given task graph is defined as the ratio of sequential execution times to the parallel execution times. The sequential execution time is computed by assigning all tasks to a single resource that minimizes the computation costs. Speedup is defined as follows:

Efficiency.
In the general case, efficiency is defined as the ratio of the speedup value to the number of resources used in schedule task graph. Efficiency is defined as follows:

Randomly Generated Task Graph and Performance Comparison.
To evaluate the performance of the HCPPEFT algorithm, we first considered randomly generated task graphs. For this purpose, a random graph generator available at [13] was implemented to generate a variety of weighted graphs with various characteristics. The input parameters of the generator are the communication to computation ratio (CCR), number of tasks, out degree of a node (out degree), edge weight, node weight, and number of resources. Our simulation framework first generate a large set of random task graphs with different characteristics, which is followed by the execution of the task scheduling algorithms, and finally, it computes the performance metrics. The performance of the HCPPEFT, STDH, PEFT, and HEFT algorithms is compared with respect to various graph characteristics according to SLR, speedup, and efficiency.
Firstly, we have evaluated the performance of the algorithm with respect to various numbers of tasks and the number of resources was considered as a fixed value of 10. Each value of the experimental results is the average of the results produced from 200 different random task graphs. In these experiments CCR and out degree were considered as fixed value of 2 and 5, and was restricted to the following values: ∈ {20, 40, 60, 80, 100, 120, 150, 200}. The edge weight is generated randomly from 1 to 300, as the node weight is 1 to 30. In Figures 4 and 5, the simulation results show that the new algorithm outperforms the other algorithms according to the SLR and speedup. The average SLR value of the HCPPEFT algorithm is better than the HEFT algorithm by 19.99%, the PEFT algorithm by 14.43%, and the STDH algorithm by 5.72%; and the average speedup value of the HCPPEFT algorithm is better than the HEFT algorithm by 16.33%, the PEFT algorithm by 10.79%, and the STDH algorithm by 8.69%.
The next experiment is with respect to CCR increment. Each value of the experimental results is the average of the results produced from 200 different random task graphs. In The Scientific World Journal 7    The Scientific World Journal each graph, the CCR is randomly selected from 0.1, 0.25, 0.5, 1, 2, and 5, and the node weight is generated randomly from 1 to 30, as the edge weight is 1 to 300. Also , , and out degree are fixed to 10, 100, and 5, respectively. The results are shown in Figures 6 and 7. In comparison to the HEFT, PEFT, and STDH algorithms on all the generated graphs, the average SLR value obtained by the HCPPEFT algorithm is better by as much as 43.35%, 32.83%, and 7.08%, respectively; and the average speedup value is better by as much as 31.62%, 24.56%, and 7.72%, respectively. This improvement is due to the duplication phase and predicting earliest finish time. The communication cost can reach its lowest ratio by duplication with respect to the increment of CCR; therefore the new algorithm will have great improvement. On the other hand, we can select the best-suited resource that achieves a shorter finish time for the tasks in the next steps by forecasting the impact of an assignment for all children of the current task. The last set of experiments compare the performance of the algorithm as the value of resource numbers increase. Each value of the experimental results is the average of the results produced from 200 different random task graphs. In these experiments CCR, , and out degree were fixed to 0.5, 150, and 5, respectively, and the edge weight is generated randomly from 1 to 300 and the node weight is 1 to 30. The number of resources was restricted to the following values: ∈ {5, 8, 10, 12, 15}. In Figure 8, the simulation results show that the HCPPEFT algorithm outperforms the HEFT, PEFT, and STDH algorithms by 30.98%, 22.95%, and 7.45%, respectively. As was expected, the average scheduling length of HCPPEFT, HEFT, PEFT, and STDH algorithms is reduced as the number of resources increases. This decrement is due to parallelism characteristics.

Sets of Task Graphs and Performance
Results. In addition to randomly generated task graph, we also use the sets of task graphs to evaluate the performance of the new algorithm. The reference and related parameters of the sets of task graphs are shown in Tables 3 and 5. The schedule lengths of HCPPEFT, STDH, PEFT, and HEFT algorithmss are presented in Figure 9. The results show that the performance of the HCPPEFT algorithm outperforms the other algorithms.

Conclusion
In this paper, we have proposed a synthesized task scheduling algorithm for heterogeneous computing systems called HCPPEFT. This new algorithm is a two-phase algorithm that combines mechanisms of list-scheduling-based, duplicationbased, and look-ahead-based algorithms. Therefore, the HCPPEFT algorithm provides a more efficient way to schedule general task graphs. In the task prioritizing phase, three levels of priority are proposed to choose task. The method of constructing task scheduling queue not only takes account of critical tasks but also takes account of the importance of parent tasks. In the resource selection phase, the duplication of parent tasks is to reduce communication costs. Forecasting the impact of an assignment for all children of the current task is to select a best resource. The effective performance of the new algorithm is compared to three of the best existing scheduling algorithms: HEFT, PEFT, and STDH algorithms. The comparative study is based on randomly generated task graph and the sets of task graphs. The HCPPEFT algorithm outperforms the other algorithms in terms of average SLR, speedup, and efficiency.