Cost-time trade-off efficient workflow scheduling in cloud
Introduction
Almost all scientific areas are nowadays more complex and relies on the analysis of large scale data sets, it is therefore required to use an automated management process in a scalable way. Scientific workflows have emerged as a suitable way of describing and structuring parallel computations and the analysis of large scale data sets [1]. Their successful use enhanced scientific advancements in various fields such as biology, physics, medicine, and astronomy [1], [2]. Complex experiments based mainly on the analysis of large-scale data sets, which sometimes require high computing power [1], are gradually exploiting the assets of commercial clouds [3], [4], [5], [6], [7].
Cloud providers offer several types of services, the main ones been the Software as a Service (SaaS), the Platform as a Service (PaaS), and the Infrastructure as a Service (IaaS) which is the principal service that benefits Workflow Management Systems (WMSs). The SaaS is offering dedicated software to cloud end users that are accessible on the Internet via a browser; The PaaS is providing the platform and the necessary Information technology (IT) environment for developers to implement their various services and applications on the Internet; while the IaaS is delivering virtualized resources called Virtual Machines (VMs) for lease to support users operations.
We can enumerate among the advantages of the Cloud: (i) its flexibility and elasticity; (2i) its pay-per-use billing model, which is helpful to avoid upfront expenses for personal or dedicated resources purchasing [5], [7]; (3i) its reliability and fault tolerance; (4i) and its variety of resources type (instance type) going from general purpose to specialized resources such as GPUs [8].
Workflow Scheduling is a well-known NP-complete problem [9]. The level of difficulty of workflow scheduling is also related to the number of factors to consider, as the user defined Quality of Service (QoS) like deadline, service cost, and so forth. Several researches on workflow scheduling for Heterogeneous Computing Systems (HCS) aim to minimize deadline and/or budget [10], [11], [12], [13], [14], [15], [16].
However, in regard to the above-mentioned complexity of both cloud environment and workflow structure, it is essential to design scheduling algorithms tailored for scientific workflows in order to take more advantage of clouds assets. Although the cloud has several advantages, like for instance its flexibility and elasticity, inefficient usage of resources and high computing costs may result if inadequate scheduling and provisioning decisions are made [17]. For example, it is very important to determine the types and the number of appropriate resources [18] and insure a good workload management [19] in order to avoid energy wastage and Service Level Agreement (SLA) violation when running workflow tasks.
In an unknown and diversified market, the assistance of a good sales consultant is important to tailor the quality of purchases to the customer needs and budget. In a cloud, this problem of optimizing the budget-quality ratio becomes the problem of optimizing the execution time and the computing cost according to the user-defined budget and deadline, and the role of a good sales consultant is played by a good scheduling algorithm.
In this paper, we propose a new workflow scheduling algorithm that aims to optimize execution time and processing cost, the Cost-Time Trade-off effective Workflow Scheduling (CTTWS). The CTTWS scheduling algorithm uses a novel concept, the Implicit Requested Instance Types Range (IRITR) evaluation, to determine a range of VMs instance types that best suits the workflow execution, in order to avoid overbidding as well as underbidding that may lead to budget and deadline violation respectively. Thereby, root tasks are executed on relatively fast instances that speed up execution, with the ability to be reused, and no task uses a slower instance than those in the IRITR. Our algorithm also uses new trade-off factors between time and cost to determine the most viable schedule, and uses this to get the most appropriate type of VM instance to provision. In this work, our trade-off function and its related issues, namely task selection and sparse budget evaluation, are based on a fine granularity approach, compared to their counterparts in the Budget Deadline Aware Scheduling(BDAS) algorithm [16], one of the most recent published work related to our goal and conditions, that rely on a big granularity approach.
We conducted experiments using WorkflowSim [20], an extension of CloudSim [21] for investigating workflows. Five well-known scientific workflows generated by the Pegasus workflow generator [22], namely Montage, Epigenomics, Cybershake, SIPHT and LIGO, each consisting in size of 50, 100, 200, 500 and 1000 tasks. We compare CTTWS and BDAS [16], stemming from their cost ratio, time ratio and success rate. This study shows that CTTWS is more effective than BDAS, and outperforms BDAS up-to 38.4% in terms of successful scheduling.
The remaining sections of the paper are organized as follows. Section 2 presents related work. Section 3 defines the workflow scheduling problem. Section 4 describes the CTTWS algorithm. Section 5 presents the experimental results for CTTWS and BDAS, and compares them. Section 6 concludes the paper.
Section snippets
Related work
Workflow scheduling in cloud environment is a relatively new and open issue and has a lot of challenges. It has been recently addressed in many studies.
One of the most widespread technique is the list based scheduling, and the most famous and widely used list based scheduling algorithm is the Heterogeneous Earliest Finish Time (HEFT) proposed by Topcuoglu et al. [10]. HEFT aims to minimize the makespan of workflow execution in heterogeneous environments. It firstly sorts the tasks of the
Workflow scheduling formulation
This paper studies the problem of workflow scheduling in cloud, with an emphasis on the optimization of execution time and processing cost. In this section, we introduce the workflow model, the cloud resource model, and the problem formulation. The meanings of the parameters found throughout this paper are summarized in Table 2.
The proposed scheduling algorithm
In this section, we present our proposed solution for the workflow scheduling problem, the Cost-Time Trade-off effective Workflow Scheduling (CTTWS), which aims to optimize both processing costs and times. Our scheduling algorithm has four main steps summarized in Table 3. The steps are not necessarily logically ordered in the table.
Performance evaluation and discussion
To evaluate the performance of our proposal we use five well-known workflows from different scientific areas. CyberShake is used to characterize earthquake hazards by generating synthetic seismograms and can be classified as a data intensive workflow with large memory and CPU requirements. The Montage application from the astronomy field is used to generate custom mosaics of the sky based on a set of input images. Most of its tasks are characterized by being I/O intensive while not requiring
Conclusion and future work
In this paper, we propose a new workflow scheduling in cloud named Cost-Time Trade-off efficient Workflow Scheduling (CTTWS). The CTTWS algorithm strives to minimize both user-defined budget and deadline in commercial cloud environments.
Due to the complexity of both cloud environment and workflow structure, it is essential to design scheduling algorithms tailored for scientific workflows in order to take advantage of cloud assets [18].
The proposed CTTWS algorithm consists in four main steps:
Declaration of Competing Interest
The authors declare that they have no conflict of interest.
References (44)
Np-complete scheduling problems
J. Comput. Syst. Sci.
(1975)- et al.
Deadline-constrained workflow scheduling in software as a service cloud
Scientia Iranica
(2012) - et al.
An extended intelligent water drops algorithm for workflow scheduling in cloud computing environment
Egypt. Inform. J.
(2018) - et al.
Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms
Future Gen. Comput. Syst.
(2018) - et al.
The impact of workload variability on the energy efficiency of large-scale heterogeneous distributed systems
Simul. Modell. Pract. Theory
(2018) - et al.
Gravitational search algorithm based novel workflow scheduling for heterogeneous computing systems
Simul. Modell. Pract. Theory
(2019) - et al.
Mows: multi-objective workflow scheduling in cloud computing based on heuristic algorithm
Simul. Modell. Pract. Theory
(2019) - et al.
Scheduling deadline constrained scientific workflows on dynamically provisioned cloud resources
Future Gen. Comput. Syst.
(2017) Scheduling for heterogeneous systems using constrained critical paths
Parallel Comput.
(2012)- et al.
A novel cost-efficient approach for deadline-constrained workflow scheduling by dynamic provisioning of resources
Future Gen. Comput. Syst.
(2018)
Deadline-constrained workflow scheduling in iaas clouds with multi-resource packing
Future Gen. Comput. Syst.
Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing
Future Gen. Comput. Syst.
An energy-efficient, qos-aware and cost-effective scheduling approach for real-time workflow applications in cloud computing systems utilizing dvfs and approximate computations
Future Gen. Comput. Syst.
Dynamic multi-workflow scheduling: a deadline and cost-aware approach for commercial clouds
Future Gen. Comput. Syst.
Characterizing and profiling scientific workflows
Future Gen. Comput. Syst.
Workflows for e-Science: scientific workflows for grids
Examining the challenges of scientific workflows
Computer (Long Beach Calif)
The cost of doing science on the cloud: the montage example
SC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing
Scientific workflow applications on amazon ec2
2009 5th IEEE International Conference on E-Science Workshops
Performance analysis of high performance computing applications on the amazon web services cloud
2nd IEEE International Conference on Cloud Computing Technology and Science
Experiences using cloud computing for a scientific workflow application
Proceedings of the 2nd International Workshop on Scientific Cloud Computing
The globus galaxies platform: delivering science gateways as a service
Concurr. Comput.
Cited by (25)
HEPGA: A new effective hybrid algorithm for scientific workflow scheduling in cloud computing environment
2024, Simulation Modelling Practice and TheoryEnergy-efficient virtual-machine mapping algorithm (EViMA) for workflow tasks with deadlines in a cloud environment
2022, Journal of Network and Computer ApplicationsCitation Excerpt :WorkflowSim is an extension of cloudSim that allows workflow scheduling algorithm developers to simulate scheduling algorithms. The inputs to the simulation environment are as follows: (1) the average bandwidth between resources is 20 MBps as in Arabnejad et al. (2018), Mboula et al. (2020), which is the average bandwidth setting offered by Amazon Web Services (Palankar et al., 2008; Sahni and Vidyarthi, 2015), (2) the processing matrix for each VM is measured in Million Instruction Per Second (MIPS) as in Rodriguez and Buyya (2018), Singh et al. (2019), Adhikari and Amgoth (2019), (3) the task lengths are set in Million Instruction (MI) as in Singh et al. (2019). In this experiment, the job that is close to its deadline is selected first and submitted to a VM with high processing speed for execution.
Energy-efficient VM opening algorithms for real-time workflows in heterogeneous clouds
2022, NeurocomputingCitation Excerpt :From users’ perspective, the execution of application must be finished within a given time range (i.e., deadline constraint, real-time constraint). Otherwise, the resource providers will violate the server-level agreement (SLA) and further negatively affect the quality of service (QoS) [8,9]. Therefore, how to design a workflow scheduling algorithm to meet the different requirements of resource providers and users has become a key issue that needs to be solved urgently.
Cost and makespan aware workflow scheduling in IaaS clouds using hybrid spider monkey optimization
2021, Simulation Modelling Practice and TheoryCitation Excerpt :The IRITR is responsible for determining the VM ranges suitable for the scheduling to avoid deadline and budget violation. However, the algorithms of [26] and [27] are designed for static VMs provisioning. Mittal et al. [28] proposed a new algorithm for optimizing the reliability of applications and energy consumption under the defined budget-deadline constraint.