UvA-DARE (Digital Academic Profiling the scheduling decisions for handling critical paths in deadline-constrained cloud workflows

In this paper, we study the scheduling decisions for handling deadline-constrained workflows in the context of planning customized virtual infrastructures in the cloud. We specifically focus on the effects of using different types of greediness in selecting cost-effective virtual machines for the tasks in an application’s workflow graph. The profiling procedure followed demonstrates that for the widely used approach of the partial critical path algorithm a greedy version is preferred to a more stringent version under different stress conditions, from tight to loose deadlines. Representative topologies of workflow applications are used to generate sets of task graph scheduling problems. Monitoring the performance of the partial critical path algorithm with different types of greediness reveals which of the topologies tested are difficult to solve under various stress conditions. It turns out that an invalid outcome of a greedy version of the partial critical path algorithm is more susceptible to become valid via a final refinement cycle than a less greedy version. The procedure outlined in this paper will allow for a systematic study of a specific heuristic in a workflow scheduling method to increase its success in infrastructure planning under different deadline conditions and is proposed to be part of a general profiling framework.


Introduction
In many scientific and industrial applications, e.g., climate modeling [1], disaster early warning [2], or IoT systems [3], workflows composed of many interdependent processing components are present. These workflows are usually complex due to the strict data dependencies among the different processing components, and the required time constraints or deadlines for finishing execution. Such applications often require a well-customized environment for optimizing the workflow performance to meet all the required time and data constraints. Since cloud computing has been evolving to be an obvious choice for more and more companies and research institutes, interest in accommodating more complex workflows on cloud infrastructure emerges [4].
Cloud environments provide virtualized infrastructures (via Infrastructure-as-a-Service) and allow application developers to establish a customized networked virtual environment for specific application requirements. Via the customized virtual environment, application components can then be fast and elastically deployed. Compared to traditional infrastructures, the virtual infrastructures offered allow users not only to pay per use but also to customize the computing and storage capacity (via selected virtual machines(VM)) [5].
Planning a customized virtual infrastructure for a complex workflow in the cloud faces several challenges. Cloud environments offer customizable infrastructure services (e.g., VMs, storage, and network) at different prices. As the execution time of an application's task heavily depends on the selected infrastructure service, it is crucial to optimize the selection of infrastructure services for the tasks of the workflow such that the data and time constraints are met for the lowest cost possible.
In case all the characteristics of the different infrastructure services of a cloud provider are known a priori, e.g. all the task execution times on the different VMs, and the task communication times, the planning of the workflow is considered as an off-line workflow scheduling plan. Scheduling approaches which apply the initial a priori state of the cloud are called static scheduling approaches.
In the static scheduling of a workflow, the application is represented by a Directed Acyclic Graph (DAG), where the different vertices of the graph represent the tasks of the application mapped onto the VMs offered by the cloud provider. There exists a rich history in the study of static scheduling problems on different services as multi-processor systems, the Grid, and the cloud. As already stated by Kwok and Ahmad [6] for the multiprocessor environment, and by Jiang et al. [7] for the Grid, it still holds for the cloud that it is difficult to evaluate the performance of the various scheduling approaches quantitatively. Therefore, a framework suitable to profile a (static) scheduling approach is of importance to come with a quantitative appreciating of a scheduling approach. The framework should be equipped with different sets of workflow examples to discover the kind of workflows the approach has difficulty with; Knowledge does not increase from a series of confirmatory observations, disconfirming instances do [8]. These difficult to solve workflow examples may be found by studying the performance under high-stress conditions, e.g. tight deadline conditions. Another important feature of a profiling framework should be a set of graph metrics or topological indices to discriminate between the workflow graphs the scheduling approach under study cannot solve and those it can solve successfully.
Like common practice in machine learning, the different workflow examples should be divided into a so-called training set and test set. The workflow examples in the training set are used to find those workflow examples difficult to solve by the scheduling approach. After finding a metric that can discriminate between the successful and unsuccessful workflows in the training set, the effectiveness of the found metric can be tested on a test set of workflows constructed according to the metric.
To demonstrate the profiling procedure we selected from the many different workflow scheduling approaches [9] the IC-PCP (IaaS Cloud Partial Critical Paths) [10] algorithm, which we slightly modified. The IC-PCP algorithm is a typical scheduling solution for deadline-constrained cloud workflows, which has been a basis for several other extended works. The basic idea of this approach is to map the tasks of the DAG onto the fastest VMs offered by the cloud provider such that the single deadline of the workflow is met and then recursively assign partial critical paths of the DAG to the cheapest VM possible. During the assignment, the communication latencies between tasks are treated as zero if they are on the same VM. This approach provides a straightforward way to assign VMs to an application with a single deadline.
The profiling procedure proposed in this paper will provide a better understanding of selecting algorithms for planning virtual infrastructure under different conditions of application characteristics. The profiling procedure might reveal the limits of a certain kind of heuristics, for instance, a greedy heuristic versus a less greedy, more stringent heuristic. More steps might be added to the profiling procedure presented below, and contribute to the increase of knowledge about different workflow scheduling approaches.
We organized the paper as follows. In Section 2, we review existing works on static scheduling. In Section 3, we propose our profiling approach of the widely used IC-PCP algorithm and present the data sets for the profiling. In Section 4, we summarize and analyze the profiling results. In Section 5, we conclude the paper and plan our future work.

Related work
There exists a large variety of workflow scheduling methods for the cloud. Different authors have constructed a taxonomy to capture this large variety of techniques in classification schema [11][12][13]. From this large diversity of methods, we restrict to those works that can be designated as static workflow scheduling problems for the cloud. In static scheduling all the characteristics of the individual tasks of a workflow, such as a task's processing and communication time and its synchronization requirements, are known before the workflow's execution. A common characteristic of static scheduling is that the set of tasks in the workflow forms a directed acyclic graph (DAG).
Also, the static scheduling approaches exhibit a vast diversity in optimizing the execution cost and execution time under a strict deadline within a given budget. Zheng and Sakellariou [14] used a Budget-Deadline Constrained heuristic called BHEFT, derived from the widely applied HEFT approach, Heterogeneous Earliest-Finish-Time algorithm [15]. Also inspired by the HEFT algorithm, Arabnejad and Barbosa [16] presented the Predict-Earliest-Finish-Time (PEFT) algorithm. Abrishami et al. [10] presented a scheduling algorithm based on the Partial Critical Path method (IC-PCP). Based on the IC-PCP algorithm Shi et al. [17] proposed a dataaware virtual machine type assigning algorithm in their elastic resource provisioning and task scheduling framework.
Methods applying meta-heuristics encompass methods based on Genetic algorithms; Zhu et al. [18] proposed an Evolutionary Multi-objective Optimization method (EMO), Wang et al. [19] presented a genetic algorithm based workflow Planning Algorithm (MEPA), an extended workflow scheduling with advanced resource planning. Other meta-heuristics applied in a global search to find the best solution of a workflow are a Particle Swarm optimization-based heuristic (PSO) applied by Pandey et al. [20], and an Ant Colony optimization-based heuristic applied by Xue et al. [21].
To benchmark these diverse scheduling methods for the cloud, a set of different workflow problems is needed. As all methods try to find the best solution under a strict deadline, it is of importance how this deadline compares to the longest execution path in the workflow graph or the critical path of the workflow. The closer the deadline to the time of the critical path, these scheduling approaches may exhibit difficulties in their search for a solution of the workflow problem. Furthermore, if specific workflow problems turn out to be a tough problem to tackle by some scheduling approaches under tight deadline conditions, it might be appropriate to look for a metric or topological index to characterize those problematic scheduling problems.
The runtime comparison for different Evolutionary Multi -Objective Workflow Scheduling algorithms in the cloud were studied by Zhu et al. [18] using the Real-World Scientific Workflows from the Pegasus project. Workflows from the Pegasus project [22] are the Montage, Cybershake, Epigenomics, LIGO Inspiral, and Sipht workflows, which are an integral part of the profiling procedure presented below. Juve et al. [23] provided a characterization of the workflows from the Pegasus project. Young Choon Lee et al. [24] studied the resource-efficient workflow scheduling in clouds utilizing the Real-World Scientific Workflows. Most recently Ilyushkin and Epema [25] used the Real-World workflows to profile dynamic and plan-based policies during runtime conditions. However, the effectiveness of scheduling decisions under tight deadline conditions for different workflow topologies is a subject not yet addressed in the scientific literature. A systematic profiling approach may reveal whether a chosen heuristic, greedy or less greedy, can be considered as a 'general scheduling solver'.
In the following sections, we present a profiling approach wherein these Real-World workflows are used to pinpoint the effectiveness of different greediness of the IC-PCP scheduling algorithm. This is a new approach to enhance insight into which task graph topologies might be complicated for a scheduling algorithm under tight deadline conditions.

Profiling procedure
The basic scenario of virtual infrastructure planning is shown in Fig. 1. It consists of three entities: the cloud provider, the cloud consumer, and the software-defined (virtual) infrastructure. The planning procedure takes the application requirements of a workflow and the available infrastructure resources as input and creates a customized virtual infrastructure for the application. In Fig. 1, the QoS refers to a set of parameters set by the user, reflecting different requirements. The objectives refer to the parameters the user wants to optimize. For instance, some users may want to finish the execution of a workflow as soon as possible. So in such cases, the objective becomes minimizing the workflow's total execution time or the makespan of the workflow. In the scenario of this paper, we try to optimize the overall monetary cost of executing a workflow. The cloud offers different levels of service at different prices which can be defined and provisioned on the user's demand. Such virtual infrastructure services are called ''software-defined infrastructures".
To judge how well a scheduling approach performs we define the performance as the total cost of the solution produced by the scheduling approach under the condition that the total execution time of the solution does not exceed the predefined deadline of the workflow. Through our observations, we distinguish the following aspects that may affect the performance of a virtual infrastructure planning approach: the scale of a workflow, the workflow type, deadline, heuristics (greedy or more stringent). The scale of a workflow specifies how many tasks there are inside a workflow, and the workflow type refers to the internal aspects of the DAG, like the degree distribution and dependencies among the different tasks in the DAG. In this paper, we try to characterize the workflow type by a graph metric or topological index.
After all one has to decide which workflow scheduling method(s) one wants to recommend. This choice is not straightforward, as each workflow scheduling approach exhibits weak and strong features. These weak and strong features may become apparent by a systematic profiling procedure that allows studying the effect of the different aspects that influence the virtual infrastructure planning. Fig. 2 depicts a schematic view of a profiling procedure. By selecting different workflow problems from a data set one can discover which DAGs are difficult to solve by a chosen scheduling method. In case the DAGs of the workflow problems from the data set can be characterized by different metrics or topological indices, one might be able to predict on the basis of these metrics which scheduling approach is the best choice for solving the workflow problem. This is of particular importance in software workbenches that automatically produce a scheduling plan for time-critical applications in the cloud [26,27,4,28,29].
To illustrate the different items of the profiling procedure we select from the diverse set of static scheduling approaches the IC-PCP algorithm [10] by Abrishami et al. to demonstrate the profiling procedure. The IC-PCP algorithm applies a particular kind of heuristics. It tries to assign a path of unassigned tasks in a workflow DAG to a single Virtual Machine (VM). Their approach can be characterized as an adaptation of the Critical Path Method (CPM) for the cloud. The Critical Path Method, developed in the late 1950s by Kelley and Walker ([30,31]), found its way in many different fields of science and technology [32]. In the IC-PCP algorithm, a partial critical path is constructed starting at an unassigned task from which a path of critical parent tasks is constructed. To find the cheapest possible VM for the partial critical path the IC-PCP algorithm applies a greedy assignment criterion. Besides the original greedy assignment criterion, we created a version with a more stringent assignment criterion to study the effect of greediness.

Problem description
A workflow scheduling problem for an application of tasks {t 1 , t 2 , . . . , t n } for a cloud environment is defined by the tuple WF C =< S, G, T ex , T com >, where S = {s 1 , s 2 , . . . , s m } is the set of services or virtual machine instances offered by the cloud provider. G is a Directed Acyclic Graph (DAG) representing the task graph of the application associated with the workflow where the set of vertices V = {t 1 , t 2 , . . . , t n } represents the set of tasks.
The set E = V × V represents the set of directed edges of the DAG, where e ij = (t i , t j ) ∈ E represents the communication between a task t i and its successor t j , both in V . The function Furthermore, every DAG has a vertex for an entry task t 0 without any predecessors, and a vertex for an exit task t n+1 that does not have any successors. Both tasks have zero execution cost and communication time with their successors and predecessors, respectively.
A valid solution of the workflow scheduling problem is an assignment of services to tasks such that the deadline D of the workflow is not violated. To each task four times are attributed, the earliest start time (EST), the earliest finish time (EFT), the latest start time (LST) and the latest finish time (LFT). These times have to satisfy the following equations for the tasks of the DAG after assigning a service s k to any task t i : By definition the communication time T (t i , t j ) is taken zero if both consecutive tasks t i and t j are assigned to the same service s k or VM. The times defined in Eq. (1) express the requirement that a task t i may not start earlier than any of its parent tasks or predecessor tasks has finished execution and communicating data with the service of t i .

Greediness of the IC-PCP algorithm
Applying a critical path-based algorithm is a widely used approach for tackling the multi-processor scheduling problem [33]. These algorithms are designed to solve the task scheduling in a fixed infrastructure (e.g. a fixed number of homogeneous processors). However, the cloud provides infrastructure level programmability, allowing the user to define and provision the resources needed. To such ends, the Partial Critical path approach (IC-PCP) of Abrishami et al. [10] is proposed. This IC-PCP approach resembles according to the classification of greedy algorithms [34] a Best-Local approach. The algorithm starts by assigning the best service, i.e. the fastest most expensive service, to each individual task, the start configuration, with a deadline large enough allowing the start configuration to be valid or implementable. According to the Best-Local approach the IC-PCP algorithm successively assigns tasks of a partial critical path to the cheapest service possible by applying a greedy step with respect to a local optimality criterion, such that each intermediate configuration of services remains the best so far after the greedy step. Besides a local optimality criterion, the algorithm must obey to a global optimality criterion, namely to keep the cost of the final configuration as low as possible, and to realize a latest finish time for the exit task not exceeding the deadline of the workflow. In Algorithms 1, 2 the pseudo code of the IC-PCP algorithm is presented, the variables used in the pseudo code are summarized in Table 1.

Algorithm 1 Parents Assigning Algorithm
1: procedure AssignParents(t) 2: while t has an unassigned parent do 3: while there exists an unassigned parent of t i do 5: add CriticalParent(t i ) to the beginning of PCP 6: t i ← CriticalParent(t i ) 7: call AssignPath(PCP) 8: for all t i ∈ PCP do 9: update EST and EFT for all successors of t i 10: update LST and LFT for all predecessors t i 11: call AssignParents(t i ) Algorithm 2 Path Assigning Algorithm 1: procedure AssignPath(PCP) 2: s i,j ← the cheapest applicable existing instance for PCP 3: if s i,j is null then 4: launch a new instance s i,j of the cheapest service s i which can finish each task of PCP before its LFT 5: schedule PCP on s i,j and set SS(t i ) 6: set all tasks of PCP as assigned Before we discuss the performance of the IC-PCP algorithm on different task graph topologies, some clarification is needed about the implementation of the pseudo code. In the Path Assigning Algorithm to find the cheapest possible service (VM) to be assigned to a partial critical path PCP is phrased as ''launch a new instance of the cheapest service which can finish each task in a partial critical path PCP before its LFT" [10]. To discover the exact meaning of this phrase it will help to look at the necessary requirement for any possible service s i ∈ S for the partial critical path PCP. Suppose PCP = {t m , t m+i , . . . , t n }, where according to the AssignParents procedure in Algorithm 1 critical parents are inserted at the beginning of PCP, so t m+i−1 is the critical parent of t m+i . Then an interpretation of the above phrase is that for the cheapest possible service s k ∈ S to be assigned to each of the tasks t m+i ∈ PCP, i = 0, . . . , n − m, the following requirement must hold: It follows that an assignment of the cheapest possible service s k according to this requirement does not allow for idle time on a VM assigned to a partial critical path, there is no communication time involved between any two consecutive tasks in an assignment. For a workflow with a task graph of n tasks the time complexity of the IC-PCP algorithm is O(n 2 ) [10].
Consider the example task graph shown in Table 2 . In this example t 0 and t 5 are the entry and exit task, respectively, both tasks having zero execution time and communication time.
The start configuration with all unassigned tasks corresponding to VMs of service s 1 results in EST (t 4 ) = 23. After assigning the partial critical path PCP = {t 1 , t 2 , t 4 } to a single VM of service s 2 , EST (t 4 ) decreases to 21, as the increase in execution time of t 1 and t 2 for service s 2 , equals 7, is overcompensated by zeroing Table 1 Summary of the variables used in the pseudo code of Algorithms 1, 2.
EST(t i ) Earliest Start Time, the earliest time task t i can begin LST(t i ) Latest Start Time, the latest time task t i can begin s i service or virtual machine, one of the services offered by the cloud provider PCP Partial critical path, a subgraph of sequential tasks to be scheduled on the same service or virtual machine s i,j the jth instance of service s i to be assigned to the tasks in PCP SS(t i ) Selected Service, is the service selected for processing task t i EFT(t i ) Earliest Finish Time, the earliest time task t i may finish, depends on EST(t i ) and the selected service for t i LFT(t i ) Latest Finish Time, the latest time task t i may finish, depends on LST(t i ) and the selected service for t i Table 2 An example task graph to be scheduled on a cloud offering three different services. In the start configuration each task t i is hosted on a VM of service s 1 , the fastest service offered.
the communication time in the critical path PCP, equals 9. The decreased value EST (t 4 ) turns out to have severe repercussions in case only Requirement (2) is applied, because this requirement is not violated for this assignment, and consequently, the algorithm stops with an invalid assignment of VMs. According to the Parents Assigning Algorithm the unassigned successor t 3 of t 1 gets updated after the assignment of PCP, resulting in EST (t 3 ) = 16 and EFT (t 3 ) = 23, and LST (t 3 ) and LFT (t 3 ) are not updated as the algorithm proceeds with AssignParents(t 1 ) because there are no predecessors of t 1 (except entry node t 0 ). As there are no unassigned parents of t 1 , and also no unassigned parents of t 2 , the next step in the for-loop of the Parents Assigning Algorithm addresses task t 4 . Its single unassigned predecessor t 3 gets updated, LST (t 3 ) = 12, and LFT (t 3 ) = 19. Next the algorithm proceeds with AssignParents(t 4 ) not noticing that LFT (t 3 ) = 19 < EFT (t 3 ) = 23, and fails to assign a VM to t 3 leaving the configuration in a non-implementable state. One adjustment to make IC-PCP with only Requirement (2) successful for this example is to implement an additional requirement for assigning a service to the tasks of a partial critical path. Observe that t 3 is not a dominant or critical parent of t 4 , but should be taken into consideration for a successful ending of the algorithm.
For every member t m+i of PCP we determine its dominant parent outside PCP designated as tp ;m+i , i.e. that parent t p of all the parents outside PCP having the largest sum EFT (t p )+T com (t p , t m+i ), where T com (t p , t m+i ) is the communication time between t p and t m+i . As an additional requirement for assigning any service s k ∈ S to the member tasks of PCP is: Likewise, for each task t m+i in PCP a dominant child tĉ ;m+i outside PCP is determined, which gives the additional requirement: Implementing the IC-PCP algorithm according to Requirements (2)-(4) also produces all the steps of the example workflow presented in [10], but confronted with the example graph in Table 2 the algorithm detects that no assignment for PCP = {t 1 , t 2 , t 4 } is possible. Consequently, the final state of the algorithm equals the start configuration.
To find out whether the greedy version with Requirement (2) or the more stringent version with Requirements (2)-(4) is to be preferred, tests on series of task graphs are performed.
The implementation of the IC-PCP algorithm used in section ''Profiling results" below differs slightly from the original version in Abrishami et al. [10] (Algorithms 1, 2). We replaced the pseudo code of Algorithm 2 by the pseudo code of Algorithm 3. After assigning a Partial Critical Path PCP to a service (procedure AssignPath(PCP)), all tasks are updated (whether assigned or unassigned). After that, a recursive call is made for each task in PCP. Algorithm 3 A different implementation of the IC-PCP algorithm according to Abrishami et al. [10] as listed in Algorithm 2. while there exists an unassigned parent of t i do 3: ... 4: call AssignPath(PCP) 5: update EST, EFT for all t k ∈ G 6: update LST, LFT for all t k ∈ G 7: for all t i ∈ PCP do 8: call AssignParents(t i ) It turned out that this approach produced better results than an implementation of the original pseudocode from our side. The time complexity has not changed as this version also exhibits the same time complexity as the original IC-PCP algorithm, namely O(n 2 ). As the focus of this paper is not to benchmark the original IC-PCP algorithm, but to study the success rate of an algorithmic approach under different tight deadline conditions, an implementation with a higher performance is justified. A side effect of this adjusted IC-PCP algorithm is that in some cases idle time might be introduced in assigned instances of tasks, as not only unassigned but also already assigned task are adjusted. When the example from Table 2 is solved with a larger deadline value equals 39, this adjusted version of the IC-PCP algorithm (applying Requirement (2) only) adds idle time to the instance of service s 2 with tasks t 2 , t 4 , with the result EFT (t 2 ) = 21 and EST (t 4 ) = 26, resulting in a valid solution. We will address this issue further in section ''The greedy version of IC-PCP revisited".

Data sets
Different topology sets are created to test both versions of the IC-PCP algorithm. From the Pegasus project [22]  From each fixed topology example a set of 20 task graphs was generated by assigning execution times to the vertices and communication times to the edges. Execution times are chosen from the integer range (5, . . . , 20) and represent the time a task will need running on the fastest service s 1 . Execution times for the slower services s 2 and s 3 are respectively 2 and 3 times the execution time on service s 1 . This is reasonable if compared with numbers from benchmarking VMs [39]. The communication time between any task and a successor is chosen from the integer range (5,20) under the condition that it may not exceed the execution time of the task on the fastest service. For each task graph, a large enough deadline is generated.
To test a graph metric that discriminates between easy and difficult to solve Real-world Scientific workflow example topologies (see section ''Characterization by Graph metrics or Topological indices'') with respect to the performance of both versions of the IC-PCP algorithm, a second set of task graphs is generated by means of the open source random graph generator GGen [40]. This tool allows for generating task graphs by controlling properties like in-degree/out-degree of the vertices. Two sets are generated for task graphs having a size of 32 and 64 vertices, where the in-degree parameter has values between 2 and 5 and the out-degree parameter values between 1 and 4. This choice prevents the GGen algorithm from generating simple task graphs, for instance, task graphs with a single path. For each size of a task graph, the graph generated by the GGen algorithm was checked for the metric under study. For a value of the metric indicating the task graph as easy to solve, the graph was added to a set called GG_32_0/GG_64_0. Otherwise, it was added to a set called GG_32_1/GG_64_1. Once each set contains 20 examples, for each generated example computation times and execution times were generated for the different VMs following the same procedure as explained above for the Real-world Scientific workflow examples.

Profiling results
In this section we evaluate two implementations of the IC-PCP algorithm for the data sets described above. A greedy version is compared with a less greedy, more stringent version of the IC-PCP algorithm. Experimental results show that under different deadline settings, different greediness can lead to different results.

Success rate of a greedy versus a more stringent version of the IC-PCP algorithm
The version of the IC-PCP algorithm applied in this section is the adjusted one as explained above, Algorithm 3. Two implementations with respect to assigning a service to a partial critical path are compared with each other, a greedy one and a less greedy one. The greedy one applies the necessary Requirement 2 only in the Path Assigning Algorithm and this implementation is designated by PCP_v0. The more stringent implementation applies the additional Requirements (3) and (4), i.e. the Requirements (2)-(4) in the Path Assigning Algorithm, and is designated by PCP_v1.
In Fig. 3 the success rate is displayed of the greedy implementation PCP_v0 versus the more stringent implementation PCP_v1 for different topology sets generated from the Real-world Scientific example workflows. All 20 task graphs from a topology set are presented to PCP_v0 and PCP_v1 under different stress conditions, from tight to loose deadlines. The success rate for the 20 task graphs is recorded for each new deadline. A new deadline is related to the time of the critical path of the start configuration, with each task assigned to the fastest service s 1 . The new deadline is determined by the stress parameter p and is defined as follows: From the success rate for different values of the stress parameter p we can see that there is a difference between the greedy version PCP_v0 and the more stringent version PCP_v1. For almost all topologies, except Montage 25 and Montage 100, the greedy version has higher success rates under tight deadline conditions, 70 < p < 100, than the more stringent version. As the partial critical path algorithm starts to find the cheapest assignment for tasks of the critical path before it deals with other tasks in the task graph, it is interesting to investigate for which values of the stress parameter p both algorithms are successful in finding the cheapest assignment for the tasks of the critical path. Fig. 4 displays the success ratio in assigning the critical path for the different topology sets. For the workflow problems in all topology sets it holds that the greedy version PCP_v0 is always able to assign the critical path to a virtual machine in agreement to Requirement (2), but for the Montage topologies it fails to finish successfully under tight deadline conditions, 70 < p < 100, see Fig. 3. The more stringent version PCP_v1 is able to find an assignment for the critical path in agreement with the additional Requirement (3) and (4) for values p < 80. Under tight deadline conditions, p > 80, the more stringent version PCP_v1 stops in an early stage where the final solution equals the start configuration with each task assigned to the fastest and most expensive service. If we have to choose between the greedy version and the more stringent version according to the success rate, the choice would be the greedy version PCP_v0, which only applies Requirement (2), as only for the Montage topology set there is a preference for PCP_v1 with the extra Requirements (3) and (4).
There is another reason to prefer PCP_v0 above PCP_v1, which will be the subject of the next section ''The greedy version of IC-PCP revisited". In that section, we present an extension of the PCP_v0 algorithmic approach that improves the success rate under tight deadline conditions, 70 < p < 100.

The greedy version of IC-PCP revisited
The greedy implementation PCP_v0 has an advantage over its more stringent counterpart PCP_v1. In case it fails, i.e. it ends in a final configuration that is not valid because some tasks start too early with respect to one of their parents, the greedy version can benefit from a repair cycle at the end. With a repair cycle we mean a sweep over the DAG after the last assignment taken by the algorithm, in order to resolve a disagreement between the earliest finish time (EFT) of a task and the earliest start time (EST) of one of its children, likewise for a disagreement between the latest finish time (LFT) of a parent task and the latest start time (LST) of one of its children. If the final configuration is valid, nothing will change. But an invalid configuration might become valid by inserting idle time, i.e. enlarging the EST of a task that starts too early. As explained above, Algorithm 3 may also add idle time to an instance by the updating EST, and EFT of all tasks, after an assignment of a partial critical path. So a valid solution produced by PCP_v0 may also have instances containing idle time.
Additionally, to the repair cycle, it is checked whether an instance with idle time can be split into separate instances when the idle time inserted is larger than the original communication time between the adjusted task and its parent task in the instance.
The enhanced implementation of PCP_v0 by adding a repair cycle at the end and a splitting procedure for instances with idle time is designated as PCP_v01. In Fig. 5 the performance improvement of this enhanced implementation PCP_v01 is compared with the implementation PCP_v0, as applied above in Fig. 3.
As each service or virtual machine has a cost we also looked at the total execution cost of a solution of the scheduling problem. In the same way as in [10] three different services s 1 , s 2 and s 3 are distinguished having an arbitrary cost value of 5, 2 and 1, respectively, where we define the cost as the price per unit time value. The higher the cost of a service the faster the execution time for a task. It turns out that a valid solution produced by PCP_v0 also benefits from this splitting procedure, resulting in a lower total execution cost. Another feature of the enhanced implementation PCP_v01 is its better mean total execution cost under tight deadline conditions, 70 < p < 100, due to the fact that its success rate has been improved. Like an enhanced implementation of the greedy version PCP_v0 we also constructed an enhanced implementation of the more stringent version PCP_v1, designated as PCP_v11 (i.e. PCP_v1 enhanced with a final repair cycle and splitting procedure). In Fig. 6 the mean total execution cost of PCP_v01 is compared with the mean total execution cost of PCP_v11.
The conclusion is that the greedy version with a finishing repair cycle performs well for all topologies except for the Montage topology. As all DAGs for a topology set is generated with the same algorithm, applying the same range of execution times and communication times, the question arises how far certain characteristics of a topology might play a role in predicting the performance under tight deadline conditions. What makes the Montage workflow topologies a difficult topology for the partial critical path algorithm under tight deadline conditions? If one compares the Montage topology with the other topologies [22], one would notice that the Montage topology contains fewer parallel paths. In the next section, we will search for a metric based on the number of parallel paths that allows to discriminate between the Montage topology and the other topologies.

Characterization by graph metrics or topological indices
When we look at the performance of the greedy version with a finishing repair cycle, PCP_v01, as displayed in Fig. 5, the question arises what the difference is between the Montage topology and the other topologies. As the DAGs for each topology set are generated with the same algorithm it is tempting to look at a pure topological difference, a metric or a topological index, depending on the vertex and edge set only. Here we need a graph metric or a topological index to discriminate between the Montage topologies and the other topologies. The number of metrics and topological indices is huge. From a theoretical perspective, the computational complexity of the metric or topological index is not the most important feature. First, we have to find one that discriminates between the Montage 25/100 topologies and the other topologies in Fig. 5. The problem of an easy to compute metric like the compactness expressed by E/|V | or E/|V | 2 is that it does not discriminate well, as two graphs with the same quotient may have a totally different structure. Among the more complex metrics, the name of the metric is not necessarily a good indication of its applicability in case of a task graph. For instance, the hyperbolicity of a graph, much used in the field of network theory, is a measure for the 'Treeness' of a graph, and as such might be useful to characterize a task graph. The more a task graph resembles a tree structure, with a combining entry node, the higher the expected success rate running IC-PCP. However, the hyperbolicity of a graph is not defined for directed acyclic graphs.
As already noticed above the Montage topologies show less parallel paths, which might be an inkling for the lower success rate for these topologies, as displayed in Fig. 5. In Table 3 the values of some easy to calculate metrics are listed for the different topologies applied.
From the values in Table 3 the number of paths per task in the DAG, #paths/|V |, discriminates between the Montage topologies and the other topologies. As a first choice, we take a value of 2.5 for this metric, below 3.6 (Montage 25) but above 2.18 (Inspiral 100). A first conjecture might be that if a workflow topology has a value below 2.5 we might expect a higher success ratio under tight deadline conditions for PCP_v01, the greedy version with a finishing repair cycle, than for values above 2.5.
A first trial to test this conjecture is by generating random task graphs by means of the open source random graph generator GGen [40]. Two sets are generated for task graphs having size 32 and 64 vertices where the in-degree parameter had values between 2 and 5 and the out-degree parameter values between 1 and 4. This choice prevents the GGen algorithm from generating simple task graphs, for instance, task graphs with a single path. For each size of a task graph the generated graph was checked for the number of paths per vertex, for a value less than 2.5 the graph was added to a set called GG 32_0/GG 64_0, otherwise, it was added to a set called GG 32_1/GG 64_1. Once each set contained 20 examples, for each example computation times and execution times were generated for the different VMs following the same procedure as explained above. Fig. 7 displays the success ratio of the PCP_v01 algorithm for the different sets, and for both sizes of task graphs, the success rate is larger if the number of paths per task is less than 2.5. A first tentative result to test a conjecture based upon the topology of a task graph.
The metric applied is a topological metric of the task graph that should be distinguished from a more detailed metric which quantifies the granularity of a DAG as defined by Gerasoulis and Yang [41] and applied by McCreary et al. [33]. They take the communication cost and the execution cost into account, whereas a workflow DAG for the cloud only defines the dependencies of the tasks and the communication cost among tasks and not the execution times of the tasks. These result from the outcome of the scheduling algorithm, although their metric might be applied to the start configuration with each task assigned to the fastest VM.

Discussion and future work
In this paper, we profiled two implementations of the IC-PCP algorithm with different rules of greediness for assigning the tasks of a partial critical path to a virtual machine. Through the systematic profiling proposed in this paper, the difference in performance of both implementations of the IC-PCP algorithm could be made more explicit as a function of different types of workflows and QoS requirements, this for the Real-World Scientific Workflows from the Pegasus project [22]. Besides the greediness of the heuristics, the deadlines and the workflow type are important features. The closer the deadline of the workflow problem to the time of the critical path for the fastest VMs, different success rates in solving a specific workflow problem became apparent for the IC-PCP algorithm. A striking feature is the fact that the greedy implementation of the IC-PCP algorithm could be enhanced. In case the greedy implementation failed to find a solution under tight deadline conditions and ended in an invalid not implementable state, a finishing repair cycle, (Fig. 5), was able to turn the invalid final state into a valid solution.
However, if the input workflow is a workflow 'similar' to Montage (Figs. 3 and 5), we observed that a repair cycle did not improve the success rate in solving the workflow problem. Another feature that becomes apparent from Fig. 6 is that the less greedy, more stringent, version produces solutions with the same mean cost as the greedy version under less tight deadline conditions except for the Montage topologies. This asked for a way to characterize the Montage workflow problem. We tried this by applying a graph metric for the DAG of the workflow problem that could discriminate between the Montage workflows and the other Real-World workflow problems. The graph metric was tested by generating two random sets of workflow problems (DAGs), one set with workflow problems 'similar' to the Montage workflow and the other set with workflow problems more 'similar' to the other Real-World workflow problems.
As we think that every scheduling approach based on heuristics or without proof of convergence for an NP-hard problem will fail for a certain kind of scheduling problem under tight deadline conditions, we propose to add the different steps of IC-PCP profiling presented above into a profiling framework Fig. 2. By applying such a profiling framework, one can recommend heuristics and greediness for a given scheduling algorithm given workflows with different characteristics. If an impairment of performance of a scheduling approach is discovered for a certain kind of workflow problems, it is a challenge to find a graph metric or a topological index that can characterize these difficult workflow problems. Difficulty in finding a graph metric is the fact that not all metrics have a meaning for directed acyclic graphs. More research in this field might be facilitated by looking at the use of topological indices in other fields of science, e.g. molecular sciences [42]. During the search for a 'successful' graph metric, we noticed that metrics that are purely degree based were in general not able to discriminate between the topology of Montage DAG and the DAGs of other Real-World workflow problems. Therefore, we think that more success can be expected from metrics that take the number of different paths in the DAG into account.
From a practical point of view, one might run different scheduling approaches in parallel in solving a single workflow problem and take the result of the winner. However, from a theoretical perspective, it is still of interest to characterize the success ratio of an algorithmic approach with a certain heuristic, or greediness, and to study its behavior under different tight deadline conditions and or topology features of the task graph.
A closer look at Fig. 3 raises the question of what makes topologies like Cybershake and Sipht rather insensitive to tight deadline conditions for the greedy approach. Compared to the other topologies the success rate reaches almost 100% for the whole range of the stress parameter. Another interesting question is under which circumstances a greedy implementation might be susceptible for a finishing repair cycle to improve its success rate under tight deadline conditions as found for the IC-PCP algorithm.
The proposed profiling procedure might also be suitable to study scheduling algorithms applying meta-heuristics including genetic algorithms (GA), particle swarm optimization (PSO) and ant colony optimization algorithms (ACO). From our experience, we know that these approaches become very time consuming compared to e.g. the IC-PCP algorithm, when the size of the task graph (DAG) becomes large.
To the best of our knowledge, this is the first attempt to build a systematic profiling framework suited to study different static scheduling algorithms.

Code and data availability
All four implementations of the IC-PCP algorithm used in this study as well as the data to produce the performance figures are available at https://bitbucket.org/uva-sne/ic-pcp-profiling/src/ma ster.