Research on Cloud Computing Resource Scheduling Strategy Based on Firefly Optimized Genetic Algorithm

In the cloud computing system environment, combined with the first-level scheduling model of task-virtual machine resource nodes, the individual coding, fitness function, selection replication and cross-variation process are redesigned, and the cloud computing resource scheduling model based on genetic algorithm is established. Corresponding to fireflies and virtual machine resource nodes, this paper redesigned the firefly decision domain update method, selected attraction probability formula and location movement strategy, and combined with genetic algorithm to establish cloud computing resource scheduling model based on firefly-genetic algorithm. Experiment with the CloudSim cloud computing simulation platform. The results show that the task completion time of the resource scheduling model is smaller than that of the single genetic algorithm. The virtual machine load is more balanced, the task completion time is short, and the overall optimization effect of the resource scheduling scheme is obvious.


Research background
The large-scale use of virtual machine technology in the cloud computing system greatly enhances the scalability of the network, thereby accessing cloud computing resources anytime and anywhere according to the dynamic needs of users, improving the service quality, service capability and service efficiency of cloud computing service providers, in view of its The advantages of low cost, virtualization technology support, and diversified services have been rapidly developed on a large scale and widely used.
In the cloud computing environment, the common load scheduling algorithms include a polling algorithm, a random mode, a hash mode, a fastest response, and a minimum connection mode. These algorithms are relatively simple, and the direction of resource scheduling optimization is relatively simple. The advantages and disadvantages of each algorithm are relatively obvious, resulting in the failure of the load balancing effect of the virtual machine and the imperfect resource scheduling strategy. In addition to this there are some research load balancing algorithms, as follows: Wenzhong X [3] proposed an improved genetic algorithm based on simulated annealing. In the iterative evolution process, the improved algorithm is based on the dual fitness function based on task average completion time and load balancing to judge the low-adaptity and retain excellent individuals. The adaptive cross-variation probability function was also redesigned when the genome was mutated.
Wenqing C [5] proposed an intelligent optimization strategy for virtual resource scheduling. Combining virtual machine resource characteristics, optimizing chromosome replication selection and cross mutation in population, designing virtual machine load function and cloud task optimal span function as dual fitness function.
Lijiao G [6], an improved genetic algorithm (IGA) based on chromosome coding method and fitness function is proposed, and the algorithm simulation experiment is carried out on the cloud simulator CloudSim. The results show that the proposed algorithm is superior to traditional genetic algorithms in terms of performance and quality of service QoS, and can be better applied to cloud computing environment resource scheduling under large-scale tasks.

Resource scheduling architecture
The computing data center has a large amount of computing resources and storage resources to provide users with a wealth of services, which are allocated to users on an as-needed basis. The resources in cloud computing are abstracted into virtual resources. The management of virtual resources is the core technology of cloud computing. The efficient resource scheduling method is the key to ensure the efficient operation of cloud systems. At present, cloud computing resource scheduling is mainly divided into two levels. The first-level scheduling is a matching problem between the cloud task and the virtual machine resource, and the cloud task is mapped to the virtual machine resource, so that the virtual machine completes the task at the fastest time and realizes the maximum virtual resource. The secondlevel scheduling is a mapping between virtual machines and physical machine resources, so that virtual machines are created or migrated on the appropriate hosts. The first-level scheduling studied in this paper, namely cloud tasks and virtual machine scheduling.

Resource scheduling model
Suppose the cloud computing system has n cloud tasks T, ie T={t1, t2, t3, ..., tn}, m computing resources Vm, ie VM={Vm1, Vm2, Vm3, ..., Vmm}, then The following form is described: ti={id，length, pesNumber, fileSize, outputSize…} Where, ti represents the i( 0 i n  )th computing task in the task set T, id represents the task number, length represents the length of the task at execution time, pesNumber represents the number of processors to be used during execution, and fileSize represents the file size before the task is submitted. outputSize represents the output size after the task is executed. Vmj={vmid，mips，ram，bw…} Vmj represents the jth ( 0 j m  ) resource node in the computing resource set VM, vmid represents the virtual machine number, which is the unique attribute for identifying the virtual machine, mips represents an indicator of the processing performance of the computer, represents the computing power of the virtual machine, and ram represents the virtual machine. Memory, bw represents network bandwidth. The computing task is assigned to the computing resource node, and each task can only be executed on one resource node, and the mapping relationship between the task set T and the virtual resource VM is represented by the matrix X as formula (1) x is the distribution relationship between the task i and the virtual resource j, and the following relationship is also satisfied: . Therefore, an expected completion time to ETC (Excepted Time to Completion) of the task set T and the virtual resource set Vm can be derived, as expressed by the formula (3) (4): All task completion time for virtual resource j is as shown in formula (5): For the resource scheduling problem, the virtual machine resource utilization is maximized and the resource waste is reduced, that is, the average completion time is required to be the smallest, that is, the objective function description is as shown in formula (6):

Individual coding and population initialization
In this paper, the resource-task real number coding method is used to initialize individual populations, and the generation method is random. Suppose that n computing tasks are deployed in m virtual machine resources, denoted by t and Vm respectively. Set the cloud computing task id={0,1,2,3,...,n}, virtual machine resource node id={0,1,2,3,...,m}, then the chromosome gene length is m The genotype is the virtual machine resource id, which ranges from 0 to m-1. Such as the following chromosome: {3,1,3,2, ... ,m-2,m-1}

Fitness function
For the resource scheduling problem, it is required to maximize the virtual machine resource node utilization and the task completion time is as small as possible. Therefore, it is decided to use the task average completion time as the objective function for description. Therefore, in the calculation, the reciprocal of the task average completion time is selected as the The fitness function of the individual, the objective function expression is as in formula (7):

Cross and variation.
Select two individuals with higher probability from the population according to the cross-probability function (9) to exchange a certain or some genes According to the calculated probability, two individuals are selected in the population, and the crossover position is used to randomly cross the genes at the two positions. Mutation operation is another method of generating new individuals. Compared with cross-operation, mutation is considered locally, which enhances the ability to search locally, and makes individuals closer to the optimal solution. At the sa In order to test the effect of the proposed me time, mutation maintains the diversity of the population and the probability of mutation. Function as formula (10) Among them, 3 4 3 4 , (0 k 1, 0 k 1) kk     is the constant, max f is the largest fitness value in the population, and avg f is the average fitness value at each iteration, ' f is the fitness value of the mutant individual.

Resource scheduling based on firefly algorithm
In the population, each firefly represents a resource scheduling scheme, and fluorescein brightness represents the fitness value of the scheduling scheme.

Update the firefly fluorescein value in the middle group
Use random initialization, according to formula (1), In the t-th iteration of the firefly population, the fluorescein is updated as in formula (11):

Update dynamic decision domain
The decision domain refers to the field of view of this firefly, and the fireflies within this range can be attracted, which is the search space of the current scheduling scheme. The size of the decision domain directly affects the convergence state of this algorithm. If the decision domain is too large, it will cause the iteration to be slow and waste computing resources; if it is too small, it will easily cause premature convergence and convergence too fast, and the algorithm will fall into local optimum. Therefore, according to the resource scheduling scheme after each iteration The fitness value updates the search space. The dynamic decision domain update method is as shown in formula (13).
The probability that i chooses to move to j is as in formula (16).
Assuming firefly i chooses to move to firefly j, the position is updated as in formula (17).
Where, x is the moving step size.

Resource scheduling based on firefly-genetic algorithm
The genetic algorithm obtains the optimal solution through adaptive adjustment and probabilistic search, and has good global search ability. The genetic algorithm tends to be precocious and easy to be trapped in local optimum, which leads to unreasonable resource scheduling scheme. The firefly algorithm has AEMCME 2019 IOP Conf. Series: Materials Science and Engineering 563 (2019) 052104 IOP Publishing doi:10.1088/1757-899X/563/5/052104 6 fewer parameters, relatively simple operation, and better reliability. The relative optimal solution obtained by the genetic algorithm is used as the initial solution of the firefly algorithm, which not only ensures the excellent quality of the initial solution of the firefly algorithm, but also greatly improves the search efficiency of the firefly algorithm, and is beneficial to jump out of the local optimum and obtain the global optimal solution. update the dynamic decision domain corresponding to the firefly to determine the direction of firefly movement 12 update location 13 END IF 14 decoding to get the optimal resource scheduling scheme 15 END

Analysis of results
In order to test the effect of the proposed algorithm in resource scheduling optimization in cloud computing environment and verify whether the load balancing expected effect can be achieved, this paper builds the CloudSim simulation platform under the local hardware resource environment and uses some test data randomly generated, which's range is 100~10000.
Comparative experiments were performed using a single genetic algorithm (GA) and the fireflygenetic(GA-FA) algorithm proposed in this paper. The core parameters of the experimental hardware platform are Intel® CoreTM i7-4710MQ CPU @2.5GHz, 8G memory. The proposed algorithm setting parameters are shown in Table 2 and Table 3:  In the experiment, the cloud computing task length is set from 1000 to 40000, and the virtual machine computing performance mips is set from 1000 to 5000. The control variable method is used to analyze the change of the number of cloud tasks and the number of virtual machine resources to the average task completion time and virtual machine load. Impact. Due to the large randomness of the algorithm, we take the average of multiple simulations as a reference result to ensure the effect of the experiment.  Table 4.  As Figure 1, When the number of tasks is small, there is almost no difference in the completion time of the same number of tasks under the two algorithms. With the gradual increase of the number of cloud computing tasks, the improved algorithm task completion time is obviously shorter than the single genetic algorithm, and the improved algorithm optimizes the original allocation scheme.
(b)When the total number of computing tasks is 300, the number of virtual machine resources changes, and the average time to complete the task is recorded, as shown in Table 5.  tends to be consistent. This is because the virtual machine computing cluster is expanding, its task processing capability has fully met the task requirements.

Summary
The cloud computing resource scheduling strategy based on genetic algorithm and firefly algorithm is proposed. From the experimental results, the optimized algorithm task completion time is shortened and the scheduling efficiency is improved. However, this algorithm still has some shortcomings, such as ideal experimental environment, fixed virtual machine performance parameters, and computing task parameters.