Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing

doi:10.1016/j.future.2019.08.012

Future Generation Computer Systems

Volume 102, January 2020, Pages 307-322

https://doi.org/10.1016/j.future.2019.08.012 Get rights and content

Highlights

•
We introduce the dynamic workflow scheduling problem in cloud computing.
•
The sources of dynamism are resource failures and changes in number of objectives.
•
We propose a new algorithm that incorporates NSGA-II algorithm with neural networks.
•
It targets to exploit history of Pareto-optimal set for predictions after a change.
•
The comparative study on real-world workflows validates efficiency of our algorithm.

Abstract

Workflow scheduling is a largely studied research topic in cloud computing, which targets to utilize cloud resources for workflow tasks by considering the objectives specified in QoS. In this paper, we model dynamic workflow scheduling problem as a dynamic multi-objective optimization problem (DMOP) where the source of dynamism is based on both resource failures and the number of objectives which may change over time. Software faults and/or hardware faults may cause the first type of dynamism. On the other hand, confronting real-life scenarios in cloud computing may change number of objectives at runtime during the execution of a workflow. In this study, we propose a prediction-based dynamic multi-objective evolutionary algorithm, called NN-DNSGA-II algorithm, by incorporating artificial neural network with the NSGA-II algorithm. Additionally, five leading non-prediction based dynamic algorithms from the literature are adapted for the dynamic workflow scheduling problem. Scheduling solutions are found by the consideration of six objectives: minimization of makespan, cost, energy and degree of imbalance; and maximization of reliability and utilization. The empirical study based on real-world applications from Pegasus workflow management system reveals that our NN-DNSGA-II algorithm significantly outperforms the other alternatives in most cases with respect to metrics used for DMOPs with unknown true Pareto-optimal front, including the number of non-dominated solutions, Schott’s spacing and Hypervolume indicator.

Introduction

Cloud computing is a large-scale heterogeneous and distributed computing infrastructure for the scientific and commercial communities, which provides high quality and low cost services with minimal hardware investments. Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS) are among the most popular service layers that cloud computing delivers over the internet. In this paper, we will mostly refer to IaaS, where the customers can access hardware resources, on which the applications can be deployed.

Workflows are the common techniques to construct large scale compute and data intensive applications from different research domains. An application workflow is modelled with a directed acyclic graph where the nodes of the graph are tasks that are interconnected via compute or data resources. The workflow scheduling problem in cloud computing aims to map the tasks of a given application onto available resources [1], [2], [3], [4]. It is an NP-complete problem [1], in which the orchestration of task executions is the main concern in order to optimize the objectives specified in QoS.

The periodical workflow scheduling mostly involves with the applications that are submitted to the system periodically, where those applications have been increasingly encountered in different domains. They emerge in physics for gravitational waves yearly [5], in business for fiscal analysis monthly and in meteorology for weather forecasting and storm surge prediction daily or hourly [6]. The workloads of the tasks in the applications may vary in every period according to the amount of data they collect. These unpredictable fluctuations lead to period-based scheduling where the tasks should be rescheduled with respect to their latest workloads in every period.

The concept of dynamism in workflow scheduling that we consider in this study is twofold. The first type of dynamism is transient resource failures over time, where the resources may dynamically join to and leave from the cloud. They may arise in consequence of several events such as software faults (bugs, overflows, etc.) or hardware faults (irregular electric power, hard disk failures, etc.). The other source for dynamism in workflow scheduling problem is changing number of objectives during execution of workflows. Cloud computing confronts with real-life scenarios, where the number of objectives may change over time [7]. For instance, makespan of a workflow may not be taken into the consideration until a workflow with tighter deadline is submitted for execution. It is also emphasized that cloud computing is one of the subjects that is open for more variety of changing objectives to be considered. The objectives can be ignored or considered by several external factors including the level of power consumption, the level of unbalance of workloads among resources or changing QoS requirements. This dynamism has impact on the selection of optimal workflow scheduling solutions in different periods.

To the best of our knowledge, our paper is among the first attempts to model the workflow scheduling problem in cloud computing as a dynamic multi-objective optimization problem (DMOP) by considering the resource failures and changes in the number of objectives as main sources of dynamism. In this paper, we present a novel prediction based dynamic multi-objective evolutionary algorithm, called NN-DNSGA-II algorithm, that incorporates artificial neural network with the non-dominated sorting genetic algorithm (NSGA-II algorithm). The NN-DNSGA-II algorithm combines the strength of the neural network with dynamic NSGA-II to reveal the change patterns in the optimization environments. It exploits correlation between task–resource pairs and correlation between two successive optimization environments in order to estimate the future positions of optimal solutions.

Additionally, we adapt five dynamic multi-objective algorithms from the literature that do not require prediction, including DNSGA-II-A, DNSGA-II-B, DNSGA-II-HM, DNSGA-II-RI algorithms and dynamic variation of the particle swarm optimization algorithm called DMOPSO algorithm. They are included for types of changes in the scenarios that are not predictable, where they have different types of response mechanisms to cope with the changes. While DNSGA-II-A, DNSGA-II-RI and DMOPSO basically insert random solutions to re-diversify the population, DNSGA-II-B inserts mutated versions of the existing solutions in the population. Finally, DNSGA-II-HM use adaptation of mutation rates in the environment.

An extensive empirical study is carried out to demonstrate the performance of our algorithm, where the optimization objectives include the minimization of makespan, cost, energy and imbalance; and the maximization of reliability and utilization. The resource specifications are based on the of Amazon EC2; and the workflows from Pegasus Workflow Management System are the test beds. The comparisons of the algorithms are carried out by three metrics from the multi-objective optimization problems, which are the number of non-dominated solutions, Schott’s spacing and Hypervolume. The experimental evaluation validates that our NN-DNSGA-II algorithm outperforms the adapted ones up to 24 out of 30 cases in varying frequency of changes and up to 35 out of 45 cases in varying severity of changes.

The rest of the paper is organized as follows. Section 2 reviews the related work. In Section 3, we present workflow application model, the architecture for workflow execution and the objectives considered in this study. Dynamic workflow scheduling problem is presented in Section 4, which is followed by the proposed NN-DNSGA-II algorithm and adaptations of five dynamic multi-objective algorithms from the literature. Section 5 presents experimental setup and the definitions of metrics considered in this study. The results and discussions of our empirical study is presented in Section 6; and Section 7 concludes the paper.

Section snippets

Related work

Workflow scheduling on distributed resources is an extensively studied NP-complete problem [1]. Although there are a few single objective workflow scheduling problems [8], [9], most of the existing research have multi-objective characteristics [1], [10], [11]. Although some of them linearly combine multiple objectives [12], [13], it may not appropriate to the nature of the real-world problem at some certain conditions. One limitation is setting weight parameters with correct values.

Application model

A workflow application, $G = (T, D)$ , is modelled through a directed acyclic graph, where $T = (t_{1}, t_{2}, \dots, t_{n})$ is the set of tasks and $D = (d_{i j} | t_{i}, t_{j} \in T)$ is the set of edges among precedence-constrained tasks. The weight of a task is defined by its reference execution time in seconds and the weight of an edge is defined by its output data size in bytes. For a task $t_{i}$ , $p r e d (t_{i})$ refers the set of its immediate predecessors and $s u c c (t_{i})$ refers the set of its immediate successors. A task with no predecessor is $t_{e n t r y}$

Dynamic workflow scheduling problem

A multi-objective optimization problem (MOP) is an optimization problem that has two or more objectives, where at least two objectives may be in conflict with one another. For a MOP, the set of solutions that are not dominated by any other solutions is called as Pareto-optimal set (POS) in decision space and Pareto-optimal front (POF) in objective space. True POF is not known a priori for this problem. On the other hand, a dynamic multi-objective optimization problem (DMOP) is a MOP, in which

Experimental setup

The algorithms are evaluated by using ten real world workflows with approximately 100 and 1000 tasks from Pegasus Workflow Management System [39]. They include Montage from astronomy, CyberShake from geology, Epigenomics from biology, Inspiral from physics and Sipht from bioinformatics. The Fig. 2 symbolically represent their topological structures such as the task pipelines, data distributions and aggregations. The specifications of the resources in Table 2 are compatible with Amazon EC2, US

Results and discussion

The results of our experimental evaluation are given in two parts. The first subsection presents the performance evaluation of the algorithms for resource failures. The second subsection is for validating performance of our algorithms for changing number of objectives. In order to indicate the significance between the results of the algorithms, the Wilcoxon ranksum test [47] is carried out at the 0.05 significance level for each table that consider workflows with 100 tasks (specifically, Table 4

Conclusions

This paper is one of the first systematic attempts to model the dynamic workflow scheduling problem as a dynamic multi-objective optimization problem (DMOP), where the sources of dynamism are driven by changing resources due to failures and changing number of objectives due to a set of real-world scenarios encountered during workflow executions. The minimization of makespan, cost, energy and imbalance, and the maximization of reliability and utilization are the optimization goals considered in

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The preliminary version of this paper has been presented in the CloudAM workshop of the International Conference on Utility and Cloud Computing (UCC 2018) [46]. The workshop paper includes only a rudimentary model of the problem and results of a few preliminary tests by adapting a set of dynamic multi-objective evolutionary algorithms from the literature.

Goshgar Ismayilov is currently a master of science student in Computer Engineering Department at Marmara University, Istanbul, Turkey. He received the B.S. degree in Computer Engineering Department from Marmara University in 2016. His research interests include dynamic optimization problems, evolutionary computation, cloud computing and machine learning.

References (46)

AlkhanakE.N. et al.
A hyper-heuristic cost optimisation approach for scientific workflow scheduling in cloud computing
Future Gener. Comput. Syst.
(2018)
ZhouX. et al.
Minimizing cost and makespan for workflow scheduling in cloud using fuzzy dominance sort based HEFT
Future Gener. Comput. Syst.
(2019)
MasdariM. et al.
Towards workflow scheduling in cloud computing: a comprehensive analysis
J. Netw. Comput. Appl.
(2016)
CalzarossaM.C. et al.
A methodological framework for cloud resource provisioning and scheduling of data parallel applications under uncertainty
Future Gener. Comput. Syst.
(2019)
ZhuZ. et al.
Evolutionary multi-objective workflow scheduling in cloud
IEEE Trans. Parallel Distrib. Syst.
(2016)
D. Klusácek, B. Parák, G. Podolníková, A. Ürge, Scheduling scientific workloads in private cloud: problems and...
TanW. et al.
Business and Scientific Workflows: A Web Service-Oriented Approach
(2013)
BrownD.A. et al.
AllenG. et al.
Towards an integrated gis-based coastal forecast workflow
Concurr. Comput.: Pract. Exper.
(2008)
ChenR. et al.
Dynamic multiobjectives optimization with a changing number of objectives
IEEE Trans. Evol. Comput.
(2018)

RodriguezM.A. et al.

Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds

IEEE Trans. Cloud Comput.

(2014)

LiuL. et al.

Deadline-constrained coevolutionary genetic algorithm for scientific workflow scheduling in cloud computing

Concurr. Comput.: Pract. Exper.

(2017)

ZhangM. et al.

An adaptive multi-objective evolutionary algorithm for constrained workflow scheduling in clouds

Distrib. Parallel Databases

(2018)

. Shubham, R. Gupta, V. Gajera, P.K. Jana, An effective multi-objective workflow scheduling in cloud computing: A PSO...

S. Pandey, L. Wu, S.M. Guru, R. Buyya, A particle swarm optimization-based heuristic for scheduling workflow...

FardH.M. et al.

A truthful dynamic workflow scheduling mechanism for commercial multicloud environments

IEEE Trans. Parallel Distrib. Syst.

(2013)

RodriguezM.A. et al.

A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments

Concurr. Comput.: Pract. Exper.

(2017)

A. Iosup, M. Jan, O.O. Sonmez, D.H.J. Epema, On the dynamic resource availability in grids, in: 8th IEEE/ACM...

ZhuX. et al.

Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds

IEEE Trans. Parallel Distrib. Syst.

(2016)

PlankensteinerK. et al.

Meeting soft deadlines in scientific workflows using resubmission impact

IEEE Trans. Parallel Distrib. Syst.

(2012)

Z. Chen, K. Du, Z. Zhan, J. Zhang, Deadline constrained cloud computing resources scheduling for cost optimization...

H. Zille, A. Kottenhahn, S. Mostaghim, Dynamic distance minimization problems for dynamic multi-objective optimization,...

H. Chen, X. Zhu, D. Qiu, L. Liu, Uncertainty-aware real-time workflow scheduling in the cloud, in: 9th IEEE...

Cited by (144)

A two-stage preference driven multi-objective evolutionary algorithm for workflow scheduling in the Cloud
2024, Expert Systems with Applications
The workflow scheduling problem considered difficult in the Cloud becomes even more challenging when multiple scheduling criteria are used for optimization. It is much harder to maximize the conflicting interests of users, service providers and the environment, and to obtain “triple win” solutions. This paper presents a two-stage preference driven multi-objective evolutionary algorithm (tsp-MOEA) to address the workflow scheduling problem involving multiple roles. We formulate it as a multi-objective optimization problem with preferences for the first time by considering the special preferences of different participants in decision-making. In particular, a preference distance strategy is introduced in the first stage to speed up the arrival of solutions around the decision makers’ region of interest (ROI); and a special preference region ranking strategy is also designed to focus on searching in the ROI in the second stage. Moreover, the two-stage transition is completed adaptively, which can more fully find a variety of elite solutions. In addition, an elite learning strategy is adopted to guide the global evolution of the population to further enhance the quality of the solution. Extensive experiments demonstrate that the proposed tsp-MOEA approach outperforms recent state-of-the-art algorithms, and can provide significantly better solutions for scientific workflow scheduling in the Cloud.
Co-evolutionary and Elite learning-based bi-objective Poor and Rich Optimization algorithm for scheduling multiple workflows in the cloud
2024, Future Generation Computer Systems
Cloud computing is a cost-effective environment for deploying large-scale scientific applications. However, multi-workflow scheduling has great challenge since users may request a series of applications with different Quality of Service (QoS) at the same time. In this paper, a Co-evolutionary and Elite learning-based bi-objective Poor and Rich Optimization algorithm (CE-PRO) is proposed for scheduling applications to minimize the makespan and cost of each workflow. First, an MPMO framework is combined with PRO to optimize two objectives by two populations, respectively for better balancing the search diversity and convergence speed, where each population is updated by an improved PRO, which adopts the middle-class sub-population and re-defines the update mechanism for rich individuals to enhance search diversity and reduce the possibility of falling into local optima. Second, to restrain each population focusing overly on its respective objective, a global information exchange pool is innovatively designed to save the non-dominated solutions ever found, which will be used back as the shared guiding solutions to foster inter-population communication and co-evolution during an evolutionary process. Third, a hybrid mutation-based Elite Enhancement Strategy (EES) is developed by introducing multiple scales of mutation operations into elite solutions alternatively and iteratively to exploit excellent individuals and explore more trade-off solutions. Extensive experiments are conducted on real world scientific workflows with different types and scales, and the experimental results demonstrate that in most cases, our proposed CE-PRO outperforms its peers in the number of obtained non-dominated solutions, and the solution diversity and quality as well. In particular, the dominance of CE-PRO is superior to its peers by at least 25.62%.
A multi-objective fitness dependent optimizer for workflow scheduling
2024, Applied Soft Computing
Workflow scheduling is a significant challenge due to the large scale of workflows and heterogeneity of cloud resources. The vast size of the cloud makes execution times higher, leading to high computational and communication costs. Workflow scheduling is an $N P$ -hard problem, thus, creating meta-heuristic algorithms is one of the best options for finding optimal solutions. This paper models workflow scheduling as a multi-objective optimization problem that considers execution time and communication cost. Optimization efforts are accomplished by proposing a Fitness-Dependent Optimizer (FDO) inspired by bee reproductive behavior. However, it has many drawbacks, including being a single-objective problem. To improve this, we present a Genetic Algorithm-based multi-objective FDO, eliminating many of the previous algorithm’s issues. The proposed algorithm takes advantage of both the Genetic Algorithm and FDO. Moreover, it does not show signs of sticking to a local optimal solution. The proposed algorithm is compared with the Genetic Algorithm (GA), Particle Swarm Optimization (PSO), GA-PSO, and FDO, where it shows its effectiveness by performing better on both parameters.
Dynamic multi-objective workflow scheduling for combined resources in cloud
2023, Simulation Modelling Practice and Theory
Cloud resource providers offer idle resources to users as spot instances. The price of the instances changes with market supply and demand, and the dynamic price can have a significant impact on workflow scheduling. In this work, we use a combination of spot and on-demand instances as the foundation cloud resource and characterize the dynamic workflow scheduling problem as a dynamic multi-objective optimization problem (DMOP), where the dynamics originate from the dynamic price of spot instances. The scheduling solution is found by considering three objectives: maximizing the reliability of the instances while minimizing the makespan and cost. In addition, we provide an enhanced MOEAD algorithm called MOEA/D-URDI that combines diversity introduction and uniform random sampling, where the uniform random sampling paradigm is used to generate the initial weight vector. The dynamic multi-objective optimization evolutionary algorithm DMOEA/D-URDI is then created by combining the method with a dynamic optimization framework. Our technique beats existing algorithms, according to experimental data based on dynamic benchmark sets and three well-known scientific procedures in terms of metrics on dynamic benchmark sets and better ensures reliability in scheduling scientific workflows while reducing makespan and cost.
Knowledge-driven adaptive evolutionary multi-objective scheduling algorithm for cloud workflows
2023, Applied Soft Computing
Workflow scheduling in cloud platforms is a highly challenging issue because it faces multiple conflicting optimization objectives and large-scale decision variables. Most of the existing multi-objective workflow scheduling algorithms regard the focused problems as black boxes, and optimize large-scale decision variables as a whole. This leads to inefficiency in searching solution spaces that grow exponentially with the increase of decision variables. To compensate the above deficiency, this paper proposes a knowledge-driven adaptive evolutionary multi-objective scheduling algorithm, KAMSA for short, to optimize makespan and cost of workflow execution in cloud platforms. Specifically, we excavate the knowledge that adjustment of a task’s execution only affects its successor tasks to divide large-scale decision variables into a series of groups, so as to give play to the strengths of divide-and-conquer technology to improve the evolutionary search efficiency. Moreover, we develop an adaptive resource allocation scheme to reward more evolution opportunities for groups with high contributions to further improve the evolutionary search efficiency. We compare the proposed KAMSA with five state-of-the-art competitors in the context of 20 real-world workflows and the Amazon elastic compute cloud (EC2). The comparison results verify the KAMSA’s advantages by prevailing over the five competitors on 18 out of the 20 test cases with respect to the metric hypervolume.
An incremental learning evolutionary algorithm for many-objective optimization with irregular Pareto fronts
2023, Information Sciences
Multi objective optimization problems (MOPs) with irregular Pareto Fronts (PFs) appear frequently in a large of practical cases. Proximity and diversity are the key indicators for decomposition-based multi-objective evolutionary algorithms (MOEA/D) to solve MOPs, which are decided by the neighborhood structure and a set of predefined direction vectors. In certain MOPs with irregular PFs, the effectiveness of MOEA/D is restricted by the shape of the PF. This limitation arises because MOEA/D generates direction vectors in a uniform manner, which can result in invalid direction vectors due to the irregularity of the PF. When dealing with many-objective optimization problems (MaOP) that have irregular PFs, such as degenerate or disconnected PFs, this decline in effectiveness becomes particularly problematic. To address this limitation, a copula incremental learning (CIL) scheme has been developed to progressively extract implicit knowledge on the appropriate distribution of direction vectors for generating non-uniform direction vectors. Additionally, a niche hierarchical selection (NHS) methodology is employed to construct the neighborhood structure and prevent the generation of duplicate solutions. These enhancements are designed to improve the overall effectiveness of MOEA/D. Efficiency is further ensured through the use of convergence-guided direction (CGD) to approximate irregular PFs. Statistical analysis indicates that the proposed method outperforms other competitive algorithms across most test benchmarks, particularly in its ability to effectively address MaOPs with irregular PFs.

View all citing articles on Scopus

Haluk Rahmi Topcuoglu is currently a Professor in Computer Engineering Department at Marmara University, Turkey. He received the B.S. degree and the M.S. degree in Computer Engineering from Bogazici University, Istanbul, Turkey in 1991 and 1993, respectively. He received the Ph.D. degree in Computer Science from Syracuse University, Syracuse, NY in 1999. His research interests mainly include dynamic optimization problems, workflow scheduling in cloud computing, task scheduling and mapping for multicore architectures, software-based hardware reliability and parallel programming. He is a member of the IEEE, the IEEE Computer Society and the ACM.

View full text

Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing

Highlights

Abstract

Introduction

Section snippets

Related work

Application model

Dynamic workflow scheduling problem

Experimental setup

Results and discussion

Conclusions

Declaration of Competing Interest

Acknowledgement

Future Gener. Comput. Syst.

Future Gener. Comput. Syst.

J. Netw. Comput. Appl.

Future Gener. Comput. Syst.

Evolutionary multi-objective workflow scheduling in cloud

IEEE Trans. Parallel Distrib. Syst.

Business and Scientific Workflows: A Web Service-Oriented Approach

Towards an integrated gis-based coastal forecast workflow

Concurr. Comput.: Pract. Exper.

Dynamic multiobjectives optimization with a changing number of objectives

IEEE Trans. Evol. Comput.

Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds

IEEE Trans. Cloud Comput.

Deadline-constrained coevolutionary genetic algorithm for scientific workflow scheduling in cloud computing

Concurr. Comput.: Pract. Exper.

An adaptive multi-objective evolutionary algorithm for constrained workflow scheduling in clouds

Distrib. Parallel Databases

A truthful dynamic workflow scheduling mechanism for commercial multicloud environments

IEEE Trans. Parallel Distrib. Syst.

A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments

Concurr. Comput.: Pract. Exper.

Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds

IEEE Trans. Parallel Distrib. Syst.

Meeting soft deadlines in scientific workflows using resubmission impact

IEEE Trans. Parallel Distrib. Syst.