Cost-time trade-off efficient workflow scheduling in cloud

doi:10.1016/j.simpat.2020.102107

Simulation Modelling Practice and Theory

Volume 103, September 2020, 102107

https://doi.org/10.1016/j.simpat.2020.102107 Get rights and content

Highlights

•
The proposed scheduling algorithm aims at minimizing user deadline and budget.
•
A range of instance types that best suit the workflow execution is determined.
•
A fine-grained cost-time trade-off factors produce the most viable schedule.
•
The types and the number of VMs, and the trade-off factors are very important.

Abstract

Cloud computing has become a promising solution for scientific workflow applications, due to its various assets. Workflow scheduling is a well-known NP-complete problem, so difficult to solve since there are no optimal solutions. Several workflow scheduling works aim at optimizing the makespan and the budget. However, more investigations are needed for appropriate resources chosing in the large set of instance types offered in cloud environments. This paper proposes a new scheduling algorithm called Cost-Time Trade-off efficient Workflow Scheduling (CTTWS), which consists of four main steps: task selection, Implicit Requested Instance Types Range (IRITR) evaluation, spare budget evaluation and VM selection. The IRITR evaluation is a novel scheduling concept, which aims at determining a range of VMs instance types that best suits the workflow execution, in order to avoid overbidding and underbidding that may lead to budget and deadline violation respectively. Compared to previous work, the results of simulations prove the effectiveness of our approach, especially when there is a large variety of instance types. This confirm that paying attention to the type of resources and their number is vital.

Introduction

Almost all scientific areas are nowadays more complex and relies on the analysis of large scale data sets, it is therefore required to use an automated management process in a scalable way. Scientific workflows have emerged as a suitable way of describing and structuring parallel computations and the analysis of large scale data sets [1]. Their successful use enhanced scientific advancements in various fields such as biology, physics, medicine, and astronomy [1], [2]. Complex experiments based mainly on the analysis of large-scale data sets, which sometimes require high computing power [1], are gradually exploiting the assets of commercial clouds [3], [4], [5], [6], [7].

Cloud providers offer several types of services, the main ones been the Software as a Service (SaaS), the Platform as a Service (PaaS), and the Infrastructure as a Service (IaaS) which is the principal service that benefits Workflow Management Systems (WMSs). The SaaS is offering dedicated software to cloud end users that are accessible on the Internet via a browser; The PaaS is providing the platform and the necessary Information technology (IT) environment for developers to implement their various services and applications on the Internet; while the IaaS is delivering virtualized resources called Virtual Machines (VMs) for lease to support users operations.

We can enumerate among the advantages of the Cloud: (i) its flexibility and elasticity; (2i) its pay-per-use billing model, which is helpful to avoid upfront expenses for personal or dedicated resources purchasing [5], [7]; (3i) its reliability and fault tolerance; (4i) and its variety of resources type (instance type) going from general purpose to specialized resources such as GPUs [8].

Workflow Scheduling is a well-known NP-complete problem [9]. The level of difficulty of workflow scheduling is also related to the number of factors to consider, as the user defined Quality of Service (QoS) like deadline, service cost, and so forth. Several researches on workflow scheduling for Heterogeneous Computing Systems (HCS) aim to minimize deadline and/or budget [10], [11], [12], [13], [14], [15], [16].

However, in regard to the above-mentioned complexity of both cloud environment and workflow structure, it is essential to design scheduling algorithms tailored for scientific workflows in order to take more advantage of clouds assets. Although the cloud has several advantages, like for instance its flexibility and elasticity, inefficient usage of resources and high computing costs may result if inadequate scheduling and provisioning decisions are made [17]. For example, it is very important to determine the types and the number of appropriate resources [18] and insure a good workload management [19] in order to avoid energy wastage and Service Level Agreement (SLA) violation when running workflow tasks.

In an unknown and diversified market, the assistance of a good sales consultant is important to tailor the quality of purchases to the customer needs and budget. In a cloud, this problem of optimizing the budget-quality ratio becomes the problem of optimizing the execution time and the computing cost according to the user-defined budget and deadline, and the role of a good sales consultant is played by a good scheduling algorithm.

In this paper, we propose a new workflow scheduling algorithm that aims to optimize execution time and processing cost, the Cost-Time Trade-off effective Workflow Scheduling (CTTWS). The CTTWS scheduling algorithm uses a novel concept, the Implicit Requested Instance Types Range (IRITR) evaluation, to determine a range of VMs instance types that best suits the workflow execution, in order to avoid overbidding as well as underbidding that may lead to budget and deadline violation respectively. Thereby, root tasks are executed on relatively fast instances that speed up execution, with the ability to be reused, and no task uses a slower instance than those in the IRITR. Our algorithm also uses new trade-off factors between time and cost to determine the most viable schedule, and uses this to get the most appropriate type of VM instance to provision. In this work, our trade-off function and its related issues, namely task selection and sparse budget evaluation, are based on a fine granularity approach, compared to their counterparts in the Budget Deadline Aware Scheduling(BDAS) algorithm [16], one of the most recent published work related to our goal and conditions, that rely on a big granularity approach.

We conducted experiments using WorkflowSim [20], an extension of CloudSim [21] for investigating workflows. Five well-known scientific workflows generated by the Pegasus workflow generator [22], namely Montage, Epigenomics, Cybershake, SIPHT and LIGO, each consisting in size of 50, 100, 200, 500 and 1000 tasks. We compare CTTWS and BDAS [16], stemming from their cost ratio, time ratio and success rate. This study shows that CTTWS is more effective than BDAS, and outperforms BDAS up-to 38.4% in terms of successful scheduling.

The remaining sections of the paper are organized as follows. Section 2 presents related work. Section 3 defines the workflow scheduling problem. Section 4 describes the CTTWS algorithm. Section 5 presents the experimental results for CTTWS and BDAS, and compares them. Section 6 concludes the paper.

Section snippets

Related work

Workflow scheduling in cloud environment is a relatively new and open issue and has a lot of challenges. It has been recently addressed in many studies.

One of the most widespread technique is the list based scheduling, and the most famous and widely used list based scheduling algorithm is the Heterogeneous Earliest Finish Time (HEFT) proposed by Topcuoglu et al. [10]. HEFT aims to minimize the makespan of workflow execution in heterogeneous environments. It firstly sorts the tasks of the

Workflow scheduling formulation

This paper studies the problem of workflow scheduling in cloud, with an emphasis on the optimization of execution time and processing cost. In this section, we introduce the workflow model, the cloud resource model, and the problem formulation. The meanings of the parameters found throughout this paper are summarized in Table 2.

The proposed scheduling algorithm

In this section, we present our proposed solution for the workflow scheduling problem, the Cost-Time Trade-off effective Workflow Scheduling (CTTWS), which aims to optimize both processing costs and times. Our scheduling algorithm has four main steps summarized in Table 3. The steps are not necessarily logically ordered in the table.

Performance evaluation and discussion

To evaluate the performance of our proposal we use five well-known workflows from different scientific areas. CyberShake is used to characterize earthquake hazards by generating synthetic seismograms and can be classified as a data intensive workflow with large memory and CPU requirements. The Montage application from the astronomy field is used to generate custom mosaics of the sky based on a set of input images. Most of its tasks are characterized by being I/O intensive while not requiring

Conclusion and future work

In this paper, we propose a new workflow scheduling in cloud named Cost-Time Trade-off efficient Workflow Scheduling (CTTWS). The CTTWS algorithm strives to minimize both user-defined budget and deadline in commercial cloud environments.

Due to the complexity of both cloud environment and workflow structure, it is essential to design scheduling algorithms tailored for scientific workflows in order to take advantage of cloud assets [18].

The proposed CTTWS algorithm consists in four main steps:

Declaration of Competing Interest

The authors declare that they have no conflict of interest.

References (44)

J.D. Ullman
Np-complete scheduling problems
J. Comput. Syst. Sci.
(1975)
S. Abrishami et al.
Deadline-constrained workflow scheduling in software as a service cloud
Scientia Iranica
(2012)
S. Elsherbiny et al.
An extended intelligent water drops algorithm for workflow scheduling in cloud computing environment
Egypt. Inform. J.
(2018)
M.A. Rodriguez et al.
Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms
Future Gen. Comput. Syst.
(2018)
G.L. Stavrinides et al.
The impact of workload variability on the energy efficiency of large-scale heterogeneous distributed systems
Simul. Modell. Pract. Theory
(2018)
T. Biswas et al.
Gravitational search algorithm based novel workflow scheduling for heterogeneous computing systems
Simul. Modell. Pract. Theory
(2019)
F. Abazari et al.
Mows: multi-objective workflow scheduling in cloud computing based on heuristic algorithm
Simul. Modell. Pract. Theory
(2019)
V. Arabnejad et al.
Scheduling deadline constrained scientific workflows on dynamically provisioned cloud resources
Future Gen. Comput. Syst.
(2017)
M.A. Khan
Scheduling for heterogeneous systems using constrained critical paths
Parallel Comput.
(2012)
V. Singh et al.
A novel cost-efficient approach for deadline-constrained workflow scheduling by dynamic provisioning of resources
Future Gen. Comput. Syst.
(2018)

Z. Zhu et al.

Deadline-constrained workflow scheduling in iaas clouds with multi-resource packing

Future Gen. Comput. Syst.

(2019)

G. Ismayilov et al.

Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing

Future Gen. Comput. Syst.

(2020)

G.L. Stavrinides et al.

An energy-efficient, qos-aware and cost-effective scheduling approach for real-time workflow applications in cloud computing systems utilizing dvfs and approximate computations

Future Gen. Comput. Syst.

(2019)

V. Arabnejad et al.

Dynamic multi-workflow scheduling: a deadline and cost-aware approach for commercial clouds

Future Gen. Comput. Syst.

(2019)

G. Juve et al.

Characterizing and profiling scientific workflows

Future Gen. Comput. Syst.

(2013)

I.J. Taylor et al.

Workflows for e-Science: scientific workflows for grids

(2007)

Y. Gil et al.

Examining the challenges of scientific workflows

Computer (Long Beach Calif)

(2007)

E. Deelman et al.

The cost of doing science on the cloud: the montage example

SC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing

(2008)

G. Juve et al.

Scientific workflow applications on amazon ec2

2009 5th IEEE International Conference on E-Science Workshops

(2009)

K.R. Jackson et al.

Performance analysis of high performance computing applications on the amazon web services cloud

2nd IEEE International Conference on Cloud Computing Technology and Science

(2010)

J.-S. Vöckler et al.

Experiences using cloud computing for a scientific workflow application

Proceedings of the 2nd International Workshop on Scientific Cloud Computing

(2011)

R. Madduri et al.

The globus galaxies platform: delivering science gateways as a service

Concurr. Comput.

(2015)

Cited by (25)

HEPGA: A new effective hybrid algorithm for scientific workflow scheduling in cloud computing environment
2024, Simulation Modelling Practice and Theory
Cloud data center comprises various physical and virtual machines, alongside storage datacenter services provided by cloud providers. Effectively mapping tasks to optimize resource utilization and load balance is essential for efficient task scheduling. This process, referred to as scheduling constraints, can significantly enhance overall efficiency. However, to harness the benefits of this scheduling, one must address the challenges arising during task execution. The interdependencies between tasks and the diverse resources available in the datacenter pose significant hurdles to efficient resource allocation. To address these challenges, this paper introduces the Hybrid HEFT-PSO-GA algorithm (HEPGA), aiming to efficiently allocate tasks to available resources across the datacenter. The HEPGA algorithm builds upon prior research by integrating the strengths of PSO (Particle Swarm Optimization) and GA (Genetic Algorithm) to optimize task scheduling in cloud computing environments. Through the fusion of PSO, GA, and HEFT-based Initialization, the algorithm strives to efficiently allocate tasks to processors, thereby minimizing the makespan. This approach capitalizes on parallel processing capabilities to further enhance resource utilization in the cloud environment. By varying the weights of the fitness function and considering the resources within the datacenter, we meticulously analyze the algorithm's performance concerning both makespan and Resource Utilization (RU). The results of these tests underscore the algorithm's consistent and robust resource utilization across diverse weight configurations, highlighting its adaptability to varying priorities. Moreover, the observed variations in makespan performance based on different weights emphasize the algorithm's potential for excellence when tailored to specific optimization goals.
An intelligent energy minimization algorithm with virtual machine consolidation for sensor-based decision support system
2023, Measurement: Sensors
Cloud computing has many benefits for businesses because of its distinctive qualities, including scalability, flexibility, on-demand service, and security. A good task scheduler is required to increase the efficiency of a cloud system, which performs numerous jobs simultaneously. On-demand access to resources that have been virtualized is made available as a service without the need for further waiting. For task scheduling issues based on a makespan limitation, energy consumption is decreased, significantly reducing energy cost. Additionally, the complexity of scheduling issues has increased primarily due to the application's lack of a makespan constraint. Unfortunately, reducing the energy cloud services use presents special research issues and difficulties. More precisely, because of the diversity of servers found in cloud centers, it is challenging to choose the best servers for cloud-based decision support systems to reduce energy usage. The presented approach is innovative and could be applied for complex applications while maintaining an average energy consumption for the running resources, which is a big challenge. The current process finds an optimized energy minimization in the cloud with a combination of Virtual Machine consolidation. The outputs were considered in terms of Energy, Virtual Machine Migration, Performance Degradation, Aggregated Ideal Time Factor, and Aggregated Overload Time Fraction. The obtained results in terms of the existing state of art approaches are far better than the competing approaches.
Energy-efficient virtual-machine mapping algorithm (EViMA) for workflow tasks with deadlines in a cloud environment
2022, Journal of Network and Computer Applications
Citation Excerpt :
WorkflowSim is an extension of cloudSim that allows workflow scheduling algorithm developers to simulate scheduling algorithms. The inputs to the simulation environment are as follows: (1) the average bandwidth between resources is 20 MBps as in Arabnejad et al. (2018), Mboula et al. (2020), which is the average bandwidth setting offered by Amazon Web Services (Palankar et al., 2008; Sahni and Vidyarthi, 2015), (2) the processing matrix for each VM is measured in Million Instruction Per Second (MIPS) as in Rodriguez and Buyya (2018), Singh et al. (2019), Adhikari and Amgoth (2019), (3) the task lengths are set in Million Instruction (MI) as in Singh et al. (2019). In this experiment, the job that is close to its deadline is selected first and submitted to a VM with high processing speed for execution.
Processing large scientific applications generates a huge amount of data, which makes running experiments in the cloud computing environment very expensive and energy-consuming. To find an optimal solution to the workflow scheduling problem, several approaches have been presented for scheduling workflow on cloud resources. However, more efficient approaches are needed to improve cloud service delivery. In this paper, an energy-efficient virtual machine mapping algorithm (EViMA) is proposed to improve resource management in the cloud computing environment to achieve effective scheduling that reduces cloud data center energy consumption, execution makespan, and execution cost. This ensures that the requirements of cloud users are met, and improves the quality of services offered by cloud providers. Our proposed mechanism considers the heterogeneity of scheduling from both cloud users’ and workflow applications’ perspectives. Through simulation experiments on real workflow datasets, the proposed EViMA can provide better solutions for both cloud users and cloud providers by reducing energy consumption, execution makespan, and execution cost better than the state-of-the-art.
Energy-efficient VM opening algorithms for real-time workflows in heterogeneous clouds
2022, Neurocomputing
Citation Excerpt :
From users’ perspective, the execution of application must be finished within a given time range (i.e., deadline constraint, real-time constraint). Otherwise, the resource providers will violate the server-level agreement (SLA) and further negatively affect the quality of service (QoS) [8,9]. Therefore, how to design a workflow scheduling algorithm to meet the different requirements of resource providers and users has become a key issue that needs to be solved urgently.
Minimizing energy consumption is a critical challenge for real-time workflows, particularly in heterogeneous cloud computing systems. State-of-the-art algorithms aim to minimize the energy consumed for processing such applications by choosing virtual machines (VMs) to shut down from all opened VMs (i.e., VM merging). However, such VM merging through an “on-to-close” approach usually incurs high computational complexity. This paper proposes an energy-efficient VM opening (EEVO) algorithm that is capable of choosing VMs to turn on from all closed VMs while satisfying the real-time constraint of applications. Considering that there are slacks that can be eliminated or reduced between adjacently scheduled tasks after using the EEVO algorithm, a dynamic scaling down EEVO algorithm (DEEVO) is further proposed. DEEVO is implemented by scaling down the frequency of VMs executing each task based on the dynamic voltage and frequency scaling (DVFS) technique. Experimental results demonstrate that, with the above-mentioned improvements, DEEVO achieves lower energy consumption for real-time workflows than state-of-the-art algorithms do. In addition, DEEVO outperforms state-of-the-art algorithms in the computational efficiency of accomplishing task scheduling.
Cost and makespan aware workflow scheduling in IaaS clouds using hybrid spider monkey optimization
2021, Simulation Modelling Practice and Theory
Citation Excerpt :
The IRITR is responsible for determining the VM ranges suitable for the scheduling to avoid deadline and budget violation. However, the algorithms of [26] and [27] are designed for static VMs provisioning. Mittal et al. [28] proposed a new algorithm for optimizing the reliability of applications and energy consumption under the defined budget-deadline constraint.
The researcher's predilection towards the concerned infinite resources and the dynamic provisioning on rental premises encourages the scheduling of complex scientific applications in the cloud. The scheduling of workflows in the cloud is constrained to QoS parameters. Many heuristic and meta-heuristic algorithms are widely investigated for the QoS constrained workflow scheduling problem. However, it is still an open area of research, as most of the existing techniques concentrate on minimization of either cost or time and ignores the optimization of multiple QoS constraints simultaneously. To address this problem, in this paper, a Hybrid Spider Monkey Optimization (HSMO) algorithm has been proposed. The proposed algorithm optimizes the makespan and the cost while satisfying the budget and deadline constraints. The proposed algorithm is the hybridization of recently developed SMO and the other popular heuristic BDSD algorithm. BDSD is a budget and deadline constrained algorithm, which guides HSMO in generating a feasible schedule. Moreover, the proposed strategy involves the penalty function to restrict selecting those solutions that fail to satisfy the QoS constraints. Experimental results demonstrate the effectiveness of HSMO over existing ABC, Bi-Criteria PSO, and BDSD algorithms.
Decentralized and scalable hybrid scheduling-clustering method for real-time applications in volatile and dynamic Fog-Cloud Environments
2023, Journal of Cloud Computing

View all citing articles on Scopus

View full text

Cost-time trade-off efficient workflow scheduling in cloud

Highlights

Abstract

Introduction

Section snippets

Related work

Workflow scheduling formulation

The proposed scheduling algorithm

Performance evaluation and discussion

Conclusion and future work

Declaration of Competing Interest

J. Comput. Syst. Sci.

Scientia Iranica

Egypt. Inform. J.

Future Gen. Comput. Syst.

Simul. Modell. Pract. Theory

Simul. Modell. Pract. Theory

Simul. Modell. Pract. Theory

Future Gen. Comput. Syst.

Parallel Comput.

Future Gen. Comput. Syst.

Future Gen. Comput. Syst.

Future Gen. Comput. Syst.

Future Gen. Comput. Syst.

Future Gen. Comput. Syst.

Future Gen. Comput. Syst.

Workflows for e-Science: scientific workflows for grids

Examining the challenges of scientific workflows

Computer (Long Beach Calif)

The cost of doing science on the cloud: the montage example

SC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing

Scientific workflow applications on amazon ec2

2009 5th IEEE International Conference on E-Science Workshops

Performance analysis of high performance computing applications on the amazon web services cloud

2nd IEEE International Conference on Cloud Computing Technology and Science

Experiences using cloud computing for a scientific workflow application

Proceedings of the 2nd International Workshop on Scientific Cloud Computing

The globus galaxies platform: delivering science gateways as a service

Concurr. Comput.