Cost-time trade-off efficient workflow scheduling in cloud

https://doi.org/10.1016/j.simpat.2020.102107Get rights and content

Highlights

  • The proposed scheduling algorithm aims at minimizing user deadline and budget.

  • A range of instance types that best suit the workflow execution is determined.

  • A fine-grained cost-time trade-off factors produce the most viable schedule.

  • The types and the number of VMs, and the trade-off factors are very important.

Abstract

Cloud computing has become a promising solution for scientific workflow applications, due to its various assets. Workflow scheduling is a well-known NP-complete problem, so difficult to solve since there are no optimal solutions. Several workflow scheduling works aim at optimizing the makespan and the budget. However, more investigations are needed for appropriate resources chosing in the large set of instance types offered in cloud environments. This paper proposes a new scheduling algorithm called Cost-Time Trade-off efficient Workflow Scheduling (CTTWS), which consists of four main steps: task selection, Implicit Requested Instance Types Range (IRITR) evaluation, spare budget evaluation and VM selection. The IRITR evaluation is a novel scheduling concept, which aims at determining a range of VMs instance types that best suits the workflow execution, in order to avoid overbidding and underbidding that may lead to budget and deadline violation respectively. Compared to previous work, the results of simulations prove the effectiveness of our approach, especially when there is a large variety of instance types. This confirm that paying attention to the type of resources and their number is vital.

Introduction

Almost all scientific areas are nowadays more complex and relies on the analysis of large scale data sets, it is therefore required to use an automated management process in a scalable way. Scientific workflows have emerged as a suitable way of describing and structuring parallel computations and the analysis of large scale data sets [1]. Their successful use enhanced scientific advancements in various fields such as biology, physics, medicine, and astronomy [1], [2]. Complex experiments based mainly on the analysis of large-scale data sets, which sometimes require high computing power [1], are gradually exploiting the assets of commercial clouds [3], [4], [5], [6], [7].

Cloud providers offer several types of services, the main ones been the Software as a Service (SaaS), the Platform as a Service (PaaS), and the Infrastructure as a Service (IaaS) which is the principal service that benefits Workflow Management Systems (WMSs). The SaaS is offering dedicated software to cloud end users that are accessible on the Internet via a browser; The PaaS is providing the platform and the necessary Information technology (IT) environment for developers to implement their various services and applications on the Internet; while the IaaS is delivering virtualized resources called Virtual Machines (VMs) for lease to support users operations.

We can enumerate among the advantages of the Cloud: (i) its flexibility and elasticity; (2i) its pay-per-use billing model, which is helpful to avoid upfront expenses for personal or dedicated resources purchasing [5], [7]; (3i) its reliability and fault tolerance; (4i) and its variety of resources type (instance type) going from general purpose to specialized resources such as GPUs [8].

Workflow Scheduling is a well-known NP-complete problem [9]. The level of difficulty of workflow scheduling is also related to the number of factors to consider, as the user defined Quality of Service (QoS) like deadline, service cost, and so forth. Several researches on workflow scheduling for Heterogeneous Computing Systems (HCS) aim to minimize deadline and/or budget [10], [11], [12], [13], [14], [15], [16].

However, in regard to the above-mentioned complexity of both cloud environment and workflow structure, it is essential to design scheduling algorithms tailored for scientific workflows in order to take more advantage of clouds assets. Although the cloud has several advantages, like for instance its flexibility and elasticity, inefficient usage of resources and high computing costs may result if inadequate scheduling and provisioning decisions are made [17]. For example, it is very important to determine the types and the number of appropriate resources [18] and insure a good workload management [19] in order to avoid energy wastage and Service Level Agreement (SLA) violation when running workflow tasks.

In an unknown and diversified market, the assistance of a good sales consultant is important to tailor the quality of purchases to the customer needs and budget. In a cloud, this problem of optimizing the budget-quality ratio becomes the problem of optimizing the execution time and the computing cost according to the user-defined budget and deadline, and the role of a good sales consultant is played by a good scheduling algorithm.

In this paper, we propose a new workflow scheduling algorithm that aims to optimize execution time and processing cost, the Cost-Time Trade-off effective Workflow Scheduling (CTTWS). The CTTWS scheduling algorithm uses a novel concept, the Implicit Requested Instance Types Range (IRITR) evaluation, to determine a range of VMs instance types that best suits the workflow execution, in order to avoid overbidding as well as underbidding that may lead to budget and deadline violation respectively. Thereby, root tasks are executed on relatively fast instances that speed up execution, with the ability to be reused, and no task uses a slower instance than those in the IRITR. Our algorithm also uses new trade-off factors between time and cost to determine the most viable schedule, and uses this to get the most appropriate type of VM instance to provision. In this work, our trade-off function and its related issues, namely task selection and sparse budget evaluation, are based on a fine granularity approach, compared to their counterparts in the Budget Deadline Aware Scheduling(BDAS) algorithm [16], one of the most recent published work related to our goal and conditions, that rely on a big granularity approach.

We conducted experiments using WorkflowSim [20], an extension of CloudSim [21] for investigating workflows. Five well-known scientific workflows generated by the Pegasus workflow generator [22], namely Montage, Epigenomics, Cybershake, SIPHT and LIGO, each consisting in size of 50, 100, 200, 500 and 1000 tasks. We compare CTTWS and BDAS [16], stemming from their cost ratio, time ratio and success rate. This study shows that CTTWS is more effective than BDAS, and outperforms BDAS up-to 38.4% in terms of successful scheduling.

The remaining sections of the paper are organized as follows. Section 2 presents related work. Section 3 defines the workflow scheduling problem. Section 4 describes the CTTWS algorithm. Section 5 presents the experimental results for CTTWS and BDAS, and compares them. Section 6 concludes the paper.

Section snippets

Related work

Workflow scheduling in cloud environment is a relatively new and open issue and has a lot of challenges. It has been recently addressed in many studies.

One of the most widespread technique is the list based scheduling, and the most famous and widely used list based scheduling algorithm is the Heterogeneous Earliest Finish Time (HEFT) proposed by Topcuoglu et al. [10]. HEFT aims to minimize the makespan of workflow execution in heterogeneous environments. It firstly sorts the tasks of the

Workflow scheduling formulation

This paper studies the problem of workflow scheduling in cloud, with an emphasis on the optimization of execution time and processing cost. In this section, we introduce the workflow model, the cloud resource model, and the problem formulation. The meanings of the parameters found throughout this paper are summarized in Table 2.

The proposed scheduling algorithm

In this section, we present our proposed solution for the workflow scheduling problem, the Cost-Time Trade-off effective Workflow Scheduling (CTTWS), which aims to optimize both processing costs and times. Our scheduling algorithm has four main steps summarized in Table 3. The steps are not necessarily logically ordered in the table.

Performance evaluation and discussion

To evaluate the performance of our proposal we use five well-known workflows from different scientific areas. CyberShake is used to characterize earthquake hazards by generating synthetic seismograms and can be classified as a data intensive workflow with large memory and CPU requirements. The Montage application from the astronomy field is used to generate custom mosaics of the sky based on a set of input images. Most of its tasks are characterized by being I/O intensive while not requiring

Conclusion and future work

In this paper, we propose a new workflow scheduling in cloud named Cost-Time Trade-off efficient Workflow Scheduling (CTTWS). The CTTWS algorithm strives to minimize both user-defined budget and deadline in commercial cloud environments.

Due to the complexity of both cloud environment and workflow structure, it is essential to design scheduling algorithms tailored for scientific workflows in order to take advantage of cloud assets [18].

The proposed CTTWS algorithm consists in four main steps:

Declaration of Competing Interest

The authors declare that they have no conflict of interest.

References (44)

  • Z. Zhu et al.

    Deadline-constrained workflow scheduling in iaas clouds with multi-resource packing

    Future Gen. Comput. Syst.

    (2019)
  • G. Ismayilov et al.

    Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing

    Future Gen. Comput. Syst.

    (2020)
  • G.L. Stavrinides et al.

    An energy-efficient, qos-aware and cost-effective scheduling approach for real-time workflow applications in cloud computing systems utilizing dvfs and approximate computations

    Future Gen. Comput. Syst.

    (2019)
  • V. Arabnejad et al.

    Dynamic multi-workflow scheduling: a deadline and cost-aware approach for commercial clouds

    Future Gen. Comput. Syst.

    (2019)
  • G. Juve et al.

    Characterizing and profiling scientific workflows

    Future Gen. Comput. Syst.

    (2013)
  • I.J. Taylor et al.

    Workflows for e-Science: scientific workflows for grids

    (2007)
  • Y. Gil et al.

    Examining the challenges of scientific workflows

    Computer (Long Beach Calif)

    (2007)
  • E. Deelman et al.

    The cost of doing science on the cloud: the montage example

    SC’08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing

    (2008)
  • G. Juve et al.

    Scientific workflow applications on amazon ec2

    2009 5th IEEE International Conference on E-Science Workshops

    (2009)
  • K.R. Jackson et al.

    Performance analysis of high performance computing applications on the amazon web services cloud

    2nd IEEE International Conference on Cloud Computing Technology and Science

    (2010)
  • J.-S. Vöckler et al.

    Experiences using cloud computing for a scientific workflow application

    Proceedings of the 2nd International Workshop on Scientific Cloud Computing

    (2011)
  • R. Madduri et al.

    The globus galaxies platform: delivering science gateways as a service

    Concurr. Comput.

    (2015)
  • Cited by (25)

    • Energy-efficient virtual-machine mapping algorithm (EViMA) for workflow tasks with deadlines in a cloud environment

      2022, Journal of Network and Computer Applications
      Citation Excerpt :

      WorkflowSim is an extension of cloudSim that allows workflow scheduling algorithm developers to simulate scheduling algorithms. The inputs to the simulation environment are as follows: (1) the average bandwidth between resources is 20 MBps as in Arabnejad et al. (2018), Mboula et al. (2020), which is the average bandwidth setting offered by Amazon Web Services (Palankar et al., 2008; Sahni and Vidyarthi, 2015), (2) the processing matrix for each VM is measured in Million Instruction Per Second (MIPS) as in Rodriguez and Buyya (2018), Singh et al. (2019), Adhikari and Amgoth (2019), (3) the task lengths are set in Million Instruction (MI) as in Singh et al. (2019). In this experiment, the job that is close to its deadline is selected first and submitted to a VM with high processing speed for execution.

    • Energy-efficient VM opening algorithms for real-time workflows in heterogeneous clouds

      2022, Neurocomputing
      Citation Excerpt :

      From users’ perspective, the execution of application must be finished within a given time range (i.e., deadline constraint, real-time constraint). Otherwise, the resource providers will violate the server-level agreement (SLA) and further negatively affect the quality of service (QoS) [8,9]. Therefore, how to design a workflow scheduling algorithm to meet the different requirements of resource providers and users has become a key issue that needs to be solved urgently.

    • Cost and makespan aware workflow scheduling in IaaS clouds using hybrid spider monkey optimization

      2021, Simulation Modelling Practice and Theory
      Citation Excerpt :

      The IRITR is responsible for determining the VM ranges suitable for the scheduling to avoid deadline and budget violation. However, the algorithms of [26] and [27] are designed for static VMs provisioning. Mittal et al. [28] proposed a new algorithm for optimizing the reliability of applications and energy consumption under the defined budget-deadline constraint.

    View all citing articles on Scopus
    View full text