A secure solution based on load-balancing algorithms between regions in the cloud environment

The problem treated in this article is the storage of sensitive data in the cloud environment and how to choose regions and zones to minimize the number of transfer file events. Handling sensitive data in the global internet network many times can increase risks and minimize security levels. Our work consists of scheduling several files on the different regions based on the security and load balancing parameters in the cloud. Each file is characterized by its size. If data is misplaced from the start it will require a transfer from one region to another and sometimes from one area to another. The objective is to find a schedule that assigns these files to the appropriate region ensuring the load balancing executed in each region to guarantee the minimum number of migrations. This problem is NP-hard. A novel model regarding the regional security and load balancing of files in the cloud environment is proposed in this article. This model is based on the component called “Scheduler” which utilizes the proposed algorithms to solve the problem. This model is a secure solution to guarantee an efficient dispersion of the stored files to avoid the most storage in one region. Consequently, damage to this region does not cause a loss of big data. In addition, a novel method called the “Grouping method” is proposed. Several variants of the application of this method are utilized to propose novel algorithms for solving the studied problem. Initially, seven algorithms are proposed in this article. The experimental results show that there is no dominance between these algorithms. Therefore, three combinations of these seven algorithms generate three other algorithms with better results. Based on the dominance rule, only six algorithms are selected to discuss the performance of the proposed algorithms. Four classes of instances are generated to measure and test the performance of algorithms. In total, 1,360 instances are tested. Three metrics are used to assess the algorithms and make a comparison between them. The experimental results show that the best algorithm is the “Best-value of four algorithms” in 86.5% of cases with an average gap of 0.021 and an average running time of 0.0018 s.


INTRODUCTION
Choosing regions or zones in the cloud environment aims to provide better performance and latency.Cost is also an important factor in choosing regions and zones in the cloud.Sometimes a business chooses the region based on the location of its customers.Load balancing is also an essential factor as it improves data security.In fact, the sensitivity of applications hosted in the clouds varies according to their importance.Indeed, sensitive applications require more than one region to avoid the risk of interruption.Given an efficient load balancing system to store the files in the cloud environment is a secure solution to guarantee an efficient dispersion of the stored files to avoid the most storage in one region.Consequently, damage to this region does not cause a loss of big data.For example, if there are many files that have a high-security level and these files are stored in the same region, a hack of the system damages all these high levels files.The load balancing ensures the dispersion of these files through different regions.
In the cloud environment, providers connect regions and availability zones through a point-to-point network.For economic, technical, and security reasons, cloud regions are geographically distributed at multiple points around the world.This means that the data passes from one region to another via the Internet network.Data travels through global routers all the time.The performance and the quality of the links in terms of speed and the level of security are not the same in all sections of the Internet network.Data availability of applications is an important factor for the cloud customer.The cloud infrastructure is designed to ensure this constraint.A global network of interconnected servers and systems provides nearly limitless fail-over scenarios.Cloud technology makes it possible to permanently replicate and synchronize any type of data (Alzakholi et al., 2020).In the event of a disruption server outage or network disruption and the cloud, setup will simply switch to a replica and prove to offer access to systems and data.For the end user, the transition is seamless in most scenarios, without realizing that a failure has occurred.Cloud security is a concern for any organization (Singh & Chatterjee, 2017).Security issues can arise when moving critical systems and sensitive data to a cloud computing solution (Lee, 2013).In addition, the load balancing problem is studied in several domains in literature.Indeed, the healthcare domain focused on the scheduling of the given quality reports to be treated by physicians in a hospital.The number of pages that have each report must be considered as a decision variable.this means that a load balancing of the total number of pages will be imposed (Jemmali et al., 2022c).On the other hand, in the domain of the industry, the maximization of the minimum completion time for the problem of the parallel machine is studied in Jemmali & Alourani (2021).In this latter work, a mathematical model is proposed to solve the NP-hard problem.In the same domain, other work solves this problem approximately by proposing several heuristics and giving experimental results resulting from these heuristics to compare between them (Jemmali, Otoom & al Fayez, 2020).
Several research projects studied load balancing in cloud computing.In Al Nuaimi et al. (2012) and Ghomi, Rahmani & Qader (2017), the authors gave a survey regarding the application of load balancing in cloud computing.An analysis of the load balancing in cloud computing is detailed and discussed in Sidhu & Kinger (2013).
The rest of the article is organized as follows.'Literature review' is reserved for the literature review of the studied problem.'Architecture and model' presents the novel architecture and model for the cloud system incorporating load balancing.The problem description is detailed in 'Problem description'.In 'Proposed algorithms', a presentation of the proposed algorithms is illustrated and discussed.The experimental results and the discussions of the obtained results are detailed in 'Experimental and discussion'.A conclusion and future directives are given in 'Conclusion'.

LITERATURE REVIEW
According to the latest studies, cloud provider-side or client-side security measures are not sufficient.Researchers have to put a lot of effort into countering security threats and defending the cloud system in general.In Arunarani, Manjula & Sugumaran (2017), the heights included a security service for task planning.For this, they developed an algorithm based on a hybrid optimization approach to minimize the risk rate.They claim that the processes developed can both minimize execution costs and meet time constraints.In Chen et al. (2017), the authors developed new approaches to reduce monetary costs and use the cheapest resources.Their approach called SOLID consists of selective duplication of previous tasks and encryption of intermediate data.In Fard, Prodan & Fahringer (2012), the authors introduced a new model for cloud pricing and a truthful scheduling mechanism.The goal of this work was to minimize the cost and the global execution time.Their results are compared with classical algorithms and Pareto solutions.In Francis et al. (2018), the authors presented a summary of secure data flow planning models in the cloud environment.The article presents a solid mathematical study for the basic maintenance of dynamic nodes in graphs and which need updating for the base number of each vertex.Other researchers focused on edge-computing techniques in the cloud computing environment to enhance resource allocation and increase management quality (Hua et al., 2019).In addition, they affirmed that scheduling mechanisms could present better performance, especially in real-time applications.In this context, the authors in Sang et al. (2022) presented heuristics applied in device-edge-cloud cooperative computing.The authors considered that planning tasks enhanced the use of limited resources in the edge servers.They studied scheduling problems to find satisfaction and agreement between task numbers whose deadlines are met for cooperative computing at the device edge.Likewise, in Wang et al. (2020), the authors developed a binary nonlinear programming (BNP) model to solve the problem of optimizing deadline violations in different heterogeneous computational environments like cloud, edge, etc.The goal is to maximize the number of completed tasks and enhance resource utilization.The authors in Han et al. (2019) proposed a general model to solve task distribution and scheduling problems on edge networks in order to minimize the response time of these tasks.Indeed, jobs are generated in an arbitrary order and at arbitrary times on mobile devices.Then, they are unloaded on servers with upload and download delays.In the same context, the authors in Aburukba, Landolsi & Omer (2021) discussed the delay problem required by IoT applications.According to them, cloud computing generates unacceptable delays between IoT devices and cloud data centers.They preferred fog computing, which brings IT services closer to IoT devices.They developed heuristics based on a genetic algorithm to satisfy requests as much as possible within acceptable deadlines.In Bezdan et al. (2021), the authors improved the search operator of traditional FPA by replacing the worst individuals with randomly generated new individuals in the search space to avoid getting stuck in local minima at the start of the optimization process.This improved FPA, called EEFPA, was used to find optimal scheduling of tasks in cloud computing environments, minimizing makespan as the primary goal.EEFPA was the best planner compared to similar approaches in this study.In the network, the load balancing is applied to schedule several packets to the different routers ensuring the load balancing of the total size of transmitted packets through routers (Jemmali & Alquhayz, 2020a).The gas turbine engine is another domain in that the load balancing is applied (Jemmali et al., 2019;Jemmali, Melhim & Alharbi, 2019).The Gray Wolf Optimizer (GWO) has been proposed for planning tasks in cloud computing to use resources more efficiently and minimize overall execution time (Bacanin et al., 2019).This algorithm has been compared with several scheduling methods such as FCFS, ACO, Performance Budget ACO (PBACO), and Min-Max algorithms.Experimental results showed that GWO was the best-performing scheduler and PBACO was his second-best.However, since the performance of his large-scale GWO has not been evaluated, it is not preferable when the number of tasks is large.In Tawfeek et al. (2013), the authors developed an optimization of their colony of ants to handle task scheduling in cloud computing, with the aim of reducing makespan.This algorithm was compared with his two conventional algorithms such as FCFS and RR and showed better performance than both.The problem with this algorithm is that it converges slowly, requiring multiple iterations to get a usable solution.
In Hamad & Omara (2016), the authors proposed a genetic algorithm (GA)-based task scheduling algorithm to find the optimal assignment of tasks in cloud computing to optimize manufacturing margins and costs and resource utilization.
In Jia, Li & Shi (2021), the authors presented a task scheduler based on an improved Whale Optimization Algorithm (IWOA).Standard WOA was improved with IWOA using two factors: nonlinear convergence coefficient and adaptive population size.IWOA outperformed the compared algorithms in terms of accuracy and convergence speed when planning small or large tasks in cloud computing environments.
Various partial computation offload algorithms have been designed for IoT systems in a heterogeneous 5G network.A review of this work shows that the algorithms were implemented for the purpose of minimizing energy consumption and reducing delay (Singh et al., 2020;Yang et al., 2018).However, researchers have paid little attention to the use of MEC load reduction for IoT security.In Alladi et al. (2021), the authors implemented partial computation for many users uploading.The authors described a deep learning engine (DLE)-intrusion detection architecture based on artificial intelligence (AI) to identify and classify media traffic in the Internet of Vehicles (IoV) into possible cyberattacks.These DLEs have also been deployed on MEC servers instead of the remote cloud.Taking into account the mobility of the vehicle and the real-time needs of the IoV network.
Rapid adoption and ease of use across all industries, the pervasiveness of the Internet of Things concept, and the continued development of infrastructure and technology have increased user demand for cloud computing, doubling data volumes and user demands.Scheduling tasks becomes a more difficult topic.Provisioning resources according to user requirements and maintaining the end-user quality of service (QoS) requirements is a daunting task (Nayar, Ahuja & Jain, 2019).The work that can be near our objective studied in this article is the load balancing of the size of files that must be stored in different storage support (Alquhayz, Jemmali & Otoom, 2020).In this work, the authors proposed different algorithms using different techniques like the iterative method and the probabilistic method.Recently, the authors in Jemmali et al. (2022a) apply the load balancing method by proposing several novel algorithms to solve the problem of the drone battery for monitoring the solar power plant.The proposed algorithms in this latter work are assessed and compared between them.The security parameters in scheduling data in clouds were considered in several works like Meng et al. (2020) and Houssein et al. (2021).
Table 1 provides a scope of improvement of the related works discussed previously.
In the domain of smart parking, the number of persons in vehicles must be taken into consideration to schedule the vehicles on the available parking.Several algorithms are proposed to solve this problem based on the load balancing problem.This problem is proven to be NP-hard by the authors Jemmali et al. (2022b) and Jemmali (2022).The budgeting and the management of the projects are exploited to solve a modeled problem of load balancing.In fact, in Jemmali (2021b), the authors proposed heuristics to solve the problem of the scheduling of several projects characterized by their expected revenue.An experimental result shows the best-proposed heuristic in the work compared with all others.In the same context, the authors in Jemmali (2021a) proposed an optimal solution for the project assignment.Each project is characterized by its budget.The problem is to find a schedule that assigns all projects to the given municipalities ensuring the load balancing of the total budget in each municipality.Another work treated a similar problem in Alharbi & Jemmali (2020) and Jemmali (2019).
These existing works have several limitations that can be presented as follows:   2019) and Jemmali, Melhim & Alharbi (2019) Gray Wolf Optimizer (GWO) has been proposed for planning tasks in cloud computing to use resources more efficiently and minimize overall execution time Bacanin et al. (2019) An optimization of his colony of ants to handle task scheduling in cloud computing, with the aim of reducing makespan

Tawfeek et al. (2013)
A genetic algorithm (GA)-based task scheduling algorithm to find the optimal assignment of tasks in cloud computing to optimize manufacturing margins and costs and resource utilization Hamad & Omara (2016) Whale optimization algorithm to solve the task scheduling in cloud computing Jia, Li & Shi (2021) Various partial computation offload algorithms have been designed for IoT systems in a heterogeneous 5G network  • Scalability: some algorithms cannot give a solution in an acceptable time for big-scale instances; • Overhead: Different developed heuristics for load balancing can generate overhead; • Limitation of implementation: Some heuristics can only be suitable to particular kinds of files and virtual machines with specific characteristics.
In this article, a novel method based on the grouping procedure is proposed.This method is applied to different scheduling routines and generates a set of algorithms that solve the studied problem.In Alquhayz, Jemmali & Otoom (2020), the developed algorithms are based on the dispatching rules method.The proposed algorithms classify the files into different groups.The choice of files that contains different groups makes the schedule more dispersed and gives differentiated results.Changing the way that we select files into groups and between groups is the core of the difference between the proposed algorithms.

ARCHITECTURE AND MODEL
In the cloud environment, most of the proposed models aim to minimize delay and increase performance.In our work, we propose a new model to assign planning data to appropriate regions by considering the file stability in each region.
The components of the model are as follows: • Users are the workflow generators.Data can be files, databases, videos, etc.
• Scheduler represents the developed heuristics: The developed heuristics should provide suitable scheduling solutions which guarantee a minimum of makespan and the appropriate destination region.Heuristics consider incoming workflow, the queued data, and the resource allocation state.
• Cloud service provider: allocate adequate resources to the appropriate services, calculate costs, and guarantee the availability of the resources.
• Region 1 and 2: These are the cloud resources.It contains all available VMs that are capable of receiving storage data.Each region has its own characteristics like geographical position, cost, and availability parameters.
The main idea of this proposal is to assign data to a suitable cloud region.In the cloud, scheduling is an essential process for guiding files to be stored.After receiving user requests Regularly, the cloud service provider sweeps up all available resources and collects information about regions.This component translates this information to the scheduler, which contains developed heuristics.The scheduler gathers information again, checks the developed heuristic results, and assigns each file to the suitable region.The scheduler component, the heart of our work, is working with the collaboration of a cloud service provider.It collects the necessary information and dynamically calculates the best solution to assign and dispatch each file into its corresponding region.Figure 1 shows the proposed model.In this section, a novel architecture and model for the studied problem will be presented and detailed.

PROBLEM DESCRIPTION
In general, cloud providers do not charge for viewing or modifying data at the same level of infrastructure, but they charge for migrating data from one region to another.In addition to the cost of data transfer, migrating data from one region to another increases the risk of interception and hacking.The criteria for choosing regions and zones in the cloud environment is not only the cost but also the stability of the files in each region.Our goal in this work is to transfer files to the right places to avoid moving them several times.Moving and transferring files over the global network not only increases the cost of the cloud but also the likelihood of being intercepted, lost, hacked, etc.The used space when the file F i is executed

Ts j
The total used space in the virtual machine Vr j

Ts min
The minimum used space ∀Vr j , j = {1,...,V n } In our work, we will develop algorithms that provide load balancing between storage servers in different regions to minimize migration actions.The method that we propose ensures the fair distribution of several files to different virtual machines cited in different regions and zones.
In literature, several heuristics and algorithms are used to find an approximate solution (Jemmali, 2021b).Alquhayz, Jemmali & Otoom (2020), Jemmali (2021a) prove that heuristics produce acceptable solutions in a cloud environment.These solutions may not be perfect, but they are still valuable.
All presented notations and their definitions are defined in Table 2.
Several objective functions can be adopted to solve the studied problem.In this article, we adopt the objective function detailed in Eq. (1).Hereafter, Gfv denotes the gap of the used space between the different virtual machines.This gap will be the objective that must be minimized.

Gfv =
V n j=1 (Ts j − Ts min ). ( The objective is to find a schedule that can minimize Gfv.In these circumstances, finding an approximate solution is very challenging.In this article, we propose several algorithms that solve approximately the studied problem.To give a clear idea of the studied problem, we give Example 1. Example 1 In this example, we give a scenario that can be realized in a real circumstance.Suppose that, there are three virtual machines and nine files to be executed by these virtual machines.In this case, we have V n = 3 and F n = 9.The objective is to find a schedule that can give an acceptable solution to assign all these files to different virtual machines.Table 3, illustrates the sizes of the different files.Assume that we will choose the shortest size-based algorithm.This algorithm is based on the following: we sort all files according to the increasing order of their size and the scheduling will be done one by one on the virtual machine that has the minimum value of Ts j .The result obtained by this algorithm is presented in Fig. 2.This Figure shows that in the virtual machine Vr 1 the files {2,8,9} are executed.However, in the virtual machine Vr 2 the files {5,6,7} are executed.Finally, in the virtual machine Vr 3 the files {1,3,4} are executed.Now, the calculation of the Ts j is necessary to determine the total gap Gfv.As a result, the values of Ts 1 , Ts 2 , and Ts 3 are 28, 36, and 42, respectively.Consequently, Gfv = 3 j=1 (Ts j − Ts min = (28 − 28) + (36 − 28) + (42 − 28)) = 22.For this schedule, the gap between the virtual machines is 22.The objective is to find another schedule that gives a better result which means a gap of less than 22. Applying the algorithm of the longest size that sorts the files according to the decreasing order of their size and the scheduling will be done one by one on the virtual machine that has the minimum value of Ts j .The result obtained by this algorithm is presented in Fig. 3.This Figure shows that in the virtual machine Vr 1 the files {1,2,9} are executed.However, in the virtual machine Vr 2 the files {5,6,7} are executed.Finally, in the virtual machine Vr 3 the files {3,4,8} are executed.Now, the calculation of the Ts j is necessary to determine the total gap Gfv.As a result, the values of Ts 1 , Ts 2 , and Ts 3 are 36, 36, and 34, respectively.Consequently, Gfv = 3 j=1 (Ts j −Ts min = (36−34)+(36−34)+(34−34)) = 4.It is clear the first schedule presented in Fig. 1 gives a gap greater than the result obtained by schedule 2. The difference between these two schedules is 22 − 4 = 18.So, just by changing the sorting method between the first algorithm and the second one we gain 18 units of gap value.

PROPOSED ALGORITHMS
In this section, we present and detail all the proposed algorithms.These algorithms are based on the classification method.Indeed, the files are grouped into three groups.The choice of files that contains different groups makes the schedule more dispersed and gives differentiated results.Ten algorithms are proposed in this work.All these algorithms are based on the grouping method.

New grouping method
This method subdivides the files into three groups G 1 , G 2 , and G 3 .At the start, these groups are empty.The number of files in G 1 , G 2 , and G 3 are denoted by f 1 , f 2 , and f 3 , respectively.In the practice, f 1 = Fn 3 , f 2 = Fn 3 , and It is clear that the groups G 1 , G 2 , and G 3 depend on the manner that initially the files are sorted.Therefore, the initial state of the set of files is very important to determine the groups.Ten algorithms are presented in this article based on the grouping method.Changing the way that we select files into groups and between groups is the core of the difference between the proposed algorithms.Figure 4 gives an example of the subdivision of the given set of files.

Longest file size algorithm (LFS)
Firstly, the files are sorted according to the decreasing order of their size.The scheduling of the sorted files will be done one by one on the virtual machine that has the minimum value of Tf i until all files finish their execution.The complexity of this algorithm is depending on the algorithm that sorts the files.The heap sort is adopted for this algorithm.Consequently, the complexity of this algorithm is O(nlogn).All these steps are described in Algorithm 2. Hereafter, we denoted by DER(F ) the procedure that receives as input a list of numbers F and sorts these numbers in decreasing order.These numbers will be the sizes of the files that we want to sort them.The procedure DER(F ) is based on the heap sort method.Hereafter, we denoted by SCL(F ) the procedure that schedules the element i (∀i,1 ≤ i ≤ F n )

Third-grouped set algorithm (TGS)
The content of the groups depends on the manner that the files are sorted initially.We adopt three manners to sort the files.
• First manner: Take the files as given initially without applying any sorting.
• Second manner: Sort the files according to the increasing order of their size.
• Third manner: Sort the files according to the decreasing order of their size.
For each manner, firstly we create the groups G 1 , G 2 , and G 3 .After that, we constitute a permutation for these groups.There are six possibilities to constitute a sequence of groups.The first sequence is G 1 , G 2 , and G 3 denoted as {G 1 ,G 2 ,G 3 }.For this sequence, we schedule all files in G 1 , next we schedule all files in G 2 and finally, we schedule all files in G 3 .The second sequence is So, for each manner, six sequences are executed and the best solution is picked and returned.

Third-grouped with minimum-load algorithm (TGM)
The division into groups of files is adopted in this algorithm.Three groups are created by the same method described in the section 'New grouping method' with the three manners detailed in the same subsection.For each manner, a solution is calculated and the best solution is selected.The algorithm is designed in five steps.The first step is to apply the first manner and create the three groups G 1 , G 2 , and G 3 .The second step is to calculate the load of each group.The load is the sum of all sizes of the files in the group.Indeed, the load of G 1 is denoted by Lo 1 and equal to The load of G 2 and G 3 are denoted by Lo 2 and Lo 3 , respectively.So, we have The third step is to choose the group which has the minimum load.This group will be denoted by Gc.The fourth step is to schedule the first file in Gc.We update loads of different groups and a new choice of a Gc will be determined and so on until the schedule of all the files.The total gap is calculated and denoted by Gfv 1 .We restart step 1 with the second manner described in the above Subsection and the total gap for this solution is calculated and denoted by Gfv 2 .Finally, for the fifth step, we restart, step 1 with the application of the third manner, and a new gap is calculated and denoted by Gfv 3 .The best solution Gfv is calculated as given in Eq. ( 2).

Third-grouped excluding-files with minimum-load algorithm (TEM)
The first step of this algorithm is the select the V n longest files.Each file will be scheduled in a distinguished virtual machine.Now, the F n − V n remaining files will be scheduled in the virtual machines according to TGM .We denoted by EXL(F ) the function that returns a list that contains the V n longest files among F .We denoted by Rem(F ) the function that returns a list that contains the F n − V n remaining files after excluding the V n longest files among F .Hereafter, we denoted by IER(F ) the procedure that receives as input a list of numbers F and sorts these numbers in decreasing order.Hereafter, we denoted by GrP(F ) the procedure that subdivided the listed files F into the three groups G 1 , G 2 , and G 3 described in the section 'New grouping method'.The procedure MG() is responsible to return the group that has the maximum load.The procedure SCLF (L) is responsible to schedule the first element of L on the available virtual machine.All steps of the algorithm are described in the Algorithm 3.

Third-grouped one-by-one algorithm (TGO)
The determination of the three groups described above is adopted for this algorithm.These three groups will be created as described in the section 'New grouping method'.The three manners as also applied.For each manner, firstly we create the groups G 1 , G 2 , and G 3 .After that, we constitute a permutation for these groups.There are six possibilities to constitute the order of groups.The first order is G 1 , G 2 , and G 3 and denoted by Order 1 .This means that we schedule the first file from G 1 , the first file from G 2 , and the first file from G 3 .For the fifth order, we schedule applying the same method, the first file from F n V n 12,32,52 4,5,6 60,160,260,360 4,6,8,11 450,550,650 6,8,11 BVF dominates the four used algorithms.Consequently, we discussed only six algorithms in the experimental results.These algorithms are LFS, TGS, OST , BVT , BFI , and BFO.

EXPERIMENTAL AND DISCUSSION
The performance of the proposed algorithms is measured and discussed in this section.
Several classes of instances are coded and tested.These instances and the proposed algorithms are codded in C++ using a computer with an i5 processor and memory of 8 GB.
The proposed procedures are tested on a set of instances that are detailed in the following Subsection.

Instances
The tested instances are coded to be used by the proposed algorithms measuring the performance in terms of gap and time.These instances are depending on the manner that we generate the Sz i values.Indeed, the generation of Sz i is based on two distributions.The first one is the uniform distribution and is denoted by UN [.].The second one is the normal distribution and is denoted by NO [.].
The generated classes are illustrated as follows: • Class 1: Sz i in UN [25,130].
The choice of the number of virtual machines and the number of files that can be tested are presented in Table 4.
For each number of virtual machines and each number of files, 10 different instances were generated.In total, the number of generated instances is (3 The generation of instances is inspired by the analysis presented in Alquhayz, Jemmali & Otoom (2020); Alquhayz &Jemmali (2021a), andJemmali &Alquhayz (2020b).

Metrics
All algorithms presented in 'Proposed algorithms' will be discussed based on several metrics.These metrics are defined as follows.
• − → Z The minimum value obtained after executing of all algorithms.
• Z The value of the presented algorithm.
• Mp The percentage of instances when

Notes.
Bold indicates the best results and the underline indicates the second-best results.
• Ag The average Gp for a fixed set of instances.
• Time The time of execution of an algorithm for a fixed set of instances.This time is in seconds and we recorded it as ''.''if the time is less than 0.0001 s.

Discussion results
In this subsection, we discuss the performance of the proposed algorithms.This discussion is based on five kinds of analyses.The first kind is an overall analysis of the obtained results.
The second kind is based on the number of files discussed.The third kind is based on the number of virtual machines discussed.While the fourth kind is based on the class's discussion.Finally, the fifth kind is based on the pair discussion.These kinds of analyses are discussed separately in the following subsections.

Overall results
Table 5 presents the overall results for all algorithms.This table shows that the best algorithm is BFO in 86% of cases with an average gap of 0.021 and an average running time of 0.0018 s.The second best algorithm is BVT in 79.3% of cases with an average gap of 0.044 and an average running time of 0.0013 s.Table 5 shows that the maximum average gap of 0.332 is obtained by the LFS algorithm.The minimum running time of 0.0006 s is reached for TGS and OST algorithms.While the average running time of the LFS algorithm is less than 0.0001 s.

Number of files discussion
In this subsection, we discuss the variation of the average gap and time when the number of files changes.Table 6 presents the average gap Gfv of all algorithms according to the number of files F n .This latter table shows that for the best algorithm, BFO the minimum average gap of 0.001 is reached when F n = 60.The second minimum value of the average gap of 0.002 is obtained when F n = 360.On the other hand, for BFO, the maximum average gap of 0.054 is obtained when F n = 550.Table 7 presents the average running time Time in seconds of all algorithms according to the number of files F n .This latter table shows that the maximum average running time of 0.0048 is reached when F n = 550 for algorithm BFO.While the minimum average running time of less than 0.0001 is reached four times for the LFS algorithm when F n = {12,32,52,60}.

Number of virtual machines discussion
In this subsection, we discuss the variation of the average gap and time when the number of virtual machines changes.Table 8 presents the average gap Gfv of all algorithms according to the number of virtual machines V n .

Classes discussion
In this subsection, we discuss the variation of the average gap and the time when the class changes.
Table 10 presents the average gap Gfv of all algorithms according to the classes.This latter table shows that the minimum average gap of 0.002 is obtained for BFO and for Class 1.While the class that has the maximum average gap for the BFO algorithm is Class 4. Regarding all algorithms, the maximum average gap of 0.472 is obtained for the OST algorithm and for Class 2. We can see that, for the BFO algorithm, Class 1 is easier than the others because for this class the average gap is the minimum value of 0.002.

Pair discussion
In this subsection, we discuss the variation of the average gap when the pair of (F n ,V n ) changes.
Figure 5 presents a comparison between the best algorithm BFO and BFI according to the pair(F n ,V n ).This latter figure shows that the curve of BFO is always below the curve of BFI for all values of the pairs (F n ,V n ).This explains that BFI is the best algorithm.In addition, it is easy to see that the total number of the different pairs is 34.

Comparison to existing algorithms
In this subsection, we discuss the comparison between the proposed algorithms and the existing ones.In the literature, in Alquhayz, Jemmali & Otoom (2020), the authors develop algorithms to solve the storage problem.The best three algorithms in the latter work are NISA, SIDA r , and SDIA r with percentages of 45.2%, 75.2%, and 41%, respectively as detailed in Table 7 in Alquhayz, Jemmali & Otoom (2020).On the other hand, the three best-proposed algorithms are TGS, BVT , and BFO with percentage of 61.3%, 79.3%, and 86.5%, respectively.Now, we compare the three best algorithms in Alquhayz, Jemmali & Otoom (2020) to TGS, BVT , and BFO.Hereafter, the percentage is calculated based on the minimum value obtained over the six algorithms (three best-existing algorithms and three best-proposed ones).
Table 12 presents an overall comparison between the three existing best algorithms and the three best-proposed algorithms.This table shows that the best algorithm is BFO in 63.2% of cases with an average gap of 0.145 and an average running time of 0.0018 s.The second best algorithm is BVT in 58.2% of cases with an average gap of 0.167 and an average running time of 0.0013 s.Table 12 shows that the maximum average gap of 0.429 is obtained by the NISA algorithm.This table shows that the proposed algorithms outperform those developed in the literature.The best existing algorithm is SIDA r with  a percentage of 50.8%, while the best-proposed algorithm is BFO with a percentage of 63.2%.Table 13 presents the comparison of the average gap Gfv values between the three existing best algorithms and the three best-proposed algorithms according to the number files F n .This latter table shows that for the best algorithm, BFO the minimum average gap of 0.006 is reached when F n = 12.The second minimum value of the average gap of 0.009 is obtained when F n = 650.On the other hand, for BFO, the maximum average gap of 0.349 is obtained when F n = 160.Seven times BFO reach the minimum average gap values when F n = {12,32,60,260,360,450,650}.
The proposed algorithms show their efficiency in the average gap.Indeed, the minimum average gap of 0.021 is reached when comparing all the proposed algorithms and the minimum average gap of 0.145 when comparing the existing algorithms to the proposed ones.The proposed algorithms are non-dominant.This means that the permutation of some tuples of these algorithms can give better results.The three best-proposed algorithms are TGS, BVT , and BFO.The results detailed in tables and figures show the performance of the algorithms.The application of the grouping method has a remarkable impact on the performance of the algorithms.Indeed, these three algorithms are based on the grouping method.This means that the grouping method shows its efficiency in the studied problem.

CONCLUSION
In this article, a developed optimized algorithms scheduling based on load balancing for minimizing data migration from one region to another in a cloud environment was presented.The concept and the model are presented and explained.A novel grouping method is presented.This method is utilized to obtain performed algorithms to solve the studied problem.Ten algorithms are proposed.Due to the dominance rule between the algorithms, only six algorithms are discussed in the experimental results.Four classes of instances are generated and tested.These classes resulted in 1,360 instances in total.These experimental results show that the best algorithm is the ''Best-value of four algorithms (BFO)'' in 86.5% of cases with an average gap of 0.021 and an average running time of 0.0018 s.Cloud security is the first challenge for developers and researchers.For a company, choosing the best region to keep sensitive data is an important task because it avoids unnecessary migration from one region to another and this can decrease security levels and increase risks.By giving initial solutions, our proposed algorithms can be enhanced and give better solutions.In the future, the performance of our algorithms can be evaluated in the case of big data flow.The proposed algorithms can be enhanced by applying some meta-heuristics.
• Sadok Turki conceived and designed the experiments, prepared figures and/or tables, and approved the final draft.
• Wael M. Khedr performed the experiments, prepared figures and/or tables, and approved the final draft.
• Abdullah M. Algashami analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the article, and approved the final draft.
• Mutasim ALsadig analyzed the data, prepared figures and/or tables, and approved the final draft.
Chen et al. (2017) Minimizing the completion time and monetary cost Fard, Prodan & Fahringer (2012) Survey of different previous works, by defining the factors required in securing workflows through the execution Francis et al. (2018) Algorithms to increase the parallelism and minimize the processing time Hua et al. (2019) Task scheduling problem to optimize the service level agreement: a new formulation into binary nonlinear programming and developed-heuristic with three stages Sang et al. (2022) Optimizing deadline violations for executing tasks: a formulation as a binary nonlinear programming model maximizing the number of completed tasks and optimizing the resource utilization of servers Wang et al. (2020) General model to solve task distribution and scheduling problems on edge networks in order to minimize the response time of these tasks Han et al. (2019) Delay problem required by IoT applications Aburukba, Landolsi & Omer (2021) Improved the search operator of traditional FPA by replacing the worst individuals with randomly generated new individuals in the search space to avoid getting stuck in local minima at the start of the optimization process Bezdan et al. (2021) Load balancing of the total size of transmitted packets through two routers Jemmali & Alquhayz (2020a) Load balancing in the gas turbine engine: algorithms and approximate solutions Jemmali et al. ( Singh et al. (2020) and Yang et al. (2018)     Implementation of a partial computation for many users uploading Alladi et al. (2021) (continued on next page) Eljack et al. (2023), PeerJ Comput.Sci., DOI 10.7717/peerj-cs.15136/28

Table 4 Choice of (F n
,V n ).

Table 6 The average gap Gfv of all algorithms according to the number files F n .
Bold indicates the best results and the underline indicates the second-best results.

Table 7 The running time Time of all algorithms according to the number files F n .
Bold indicates the best results and the underline indicates the second-best results.

Table 9 The average time Time of all algorithms according to the number of virtual machines V n .
Bold indicates the best results and the underline indicates the second-best results.

Table 10 The average gap Gfv of all algorithms according to classes.
Bold indicates the best results and the underline indicates the second-best results.

Table 11 The average running time Time in seconds of all algorithms according to classes.
Notes.Bold indicates the best results and the underline indicates the second-best results.

Table 9
presents the average time Time of all algorithms according to the number of virtual machines V n .

Table 13 Comparison of the average gap Gfv values between the three existing best algorithms and the three best-proposed algorithms according to the number files F n .
Bold indicates the best results and the underline indicates the second-best results.