A Genetic Based Leader Election Algorithm for IoT Cloud Data Processing

: In IoT networks, nodes communicate with each other for computational services, data processing, and resource sharing. Most of the time huge data is generated at the network edge due to extensive communication between IoT devices. So, this tidal data is transferred to the cloud data center (CDC) for efficient processing and effective data storage. In CDC, leader nodes are responsible for higher performance, reliability, deadlock handling, reduced latency, and to provide cost-effective computational services to the users. However, the optimalleader selection is a computationallyhard problem as several factors like memory, CPU MIPS, and bandwidth, etc., are needed to be considered while selecting a leader amongst the set of available nodes. The existing approaches for leader selection are monolithic, as they identify the leader nodes without taking the optimal approach for leader resources. Therefore, for optimal leader node selection, a genetic algorithm (GA) based leader election (GLEA) approach is presented in this paper. The proposed GLEA uses the available resources to evaluate the candidate nodes during the leader election process. In the first phase of the algorithm, the cost of individual nodes, and overall cluster cost is computed on the bases of available resources. In the second phase, the best computational nodes are selected as the leader nodes by applying the genetic operations against a cost function by considering the available resources. The GLEA procedure is then compared against the Bees Life Algorithm (BLA). The experimental results show that the proposed scheme outperforms BLA in terms of execution time, SLA Violation, and their utilization with state-of-the-art schemes.


Introduction
The Internet of Things (IoT) is defined as the immense number of devices connected. Recent research [1] predicts that by 2020, more than 50 billion of the devices will communicate online which shows that number of persons on the globe is far less than that. However, a huge amount of data will be generated by these traditional computing devices like PCs, cellphones, and smart sensing devices, etc. According to another study [2,3] IoT devices will produce 1.6 Zettabytes of IoT data by 2020. Therefore, in this digital era, data is declared as the "new oil" [4,5] which needs to be processed to extract meaningful information. The efficient processing of data requires a lot of resources, however, with the advancement of internet technologies and cloud computing (CC), information processing resources have become less expensive. Cloud computing provides different resources e.g., memory, storage, processing cores, etc., to the hosted applications as per their data processing requirements. Cloud computing provides benefits like lower data storage [6] and processing cost, pay-per-use, fast deployment of applications, flexibility, and scalability of the hardware architecture. In cloud computing, the data center is the combination of multiple network servers and nodes, and they collaborate with each other in form of a cluster to share the resources. Cluster leaders are responsible to manage the communication and synchronization of the nodes for resource sharing. Therefore, selection of an optimal node as a leader is a fundamental requirement to prevent the network to go in an unpredictable state.
In the data center, the leader election to select the best node is a challenging problem, as leaders are responsible to manage the segregated data, share the resources amongst nodes, and to overcome the latency. The leaders are responsible for parallel and distributed processing, which is a primary activity performed in a data center; hence, communication amongst the nodes cannot occur effectively without the leader/master nodes, therefore, the leader selection cannot be compromised. As nodes are directly associated with the leader, therefore, the leader ensures that deadlock may never occur amongst the network nodes, and tasks can be processed efficiently. [7][8][9][10] and algorithms [11][12][13][14][15][16] have been proposed in the distributed IoT networks and cloud computing research for effective leader election. In [17][18][19][20][21] bully algorithm was proposed for leader election. In the bully algorithm, leader nodes were dynamically generated based on the node ID criteria. In [22,23], the ring algorithm was proposed. In the ring algorithm, every node shares its ID with all other nodes and maintains a list of IDs. From this list, the algorithm picks one node as a leader on the bases of priority. In [24][25][26][27][28] node IDs have been randomly generated and a priority number was assigned to each node. The node with the highest priority number was then selected as the leader node. In [29][30][31][32] message-passing approach has been considered for leader election. The message passing approach introduces latency due to higher message passing rate and slower node response. The main drawback of all these approaches is that the resource profile of the nodes is 56 always overlooked. Due to which, the selection of the weaker nodes becomes equally probable. The weak leader crash in high load scenarios, thus, leader elections are needed to be reoccurred that slows 58 down the overall processing and delays the task execution over the network nodes. Therefore, during the leader election phase, node resources must be considered to ensure the task processing occurs efficiently, and to bring stability in the network.
In this paper, a genetic algorithm (GA) based Leader Election (GLEA) approach is presented for IoT data processing on cloud computing. Our algorithm [16] selects the node with the most available resources as a leader and ensures the communication amongst the nodes occurs effectively. Moreover, we also ensure that the resources are efficiently shared amongst the nodes and minimum delay occurs during the task execution. Our algorithm utilizes the meta-information i.e., tasks, VMs, and servers available in the data centers and compute the resource score for different servers based on the available server resources i.e., MIPS, memory, bandwidth, and throughput, etc. This information is utilized to generate the initial chromosome population for the GA. Chromosome comprises of multiple tasks on different servers as genes. The length of The rest of the paper is organized as follows: In Section 2, the state-of-the-art in leader election is presented. In Section 3, the proposed GA-based leader election approach is described. In Section 4, experiments and the results are presented, and finally, we conclude our work in Section 5.

Literature Review
In the literature, there are frequent leader election protocols and algorithms that are discussed in the literature. Ktari et al. [33] have proposed an agent-based election algorithm for a dynamic tree. The important focus of the author is to sustain a forest of trees or the root node is taken as the leader node in the tree. The selection of the leader node is based on the highest ID value that is generated randomly and without considering any resources during the election process of the leader. Authors present an algorithm of leader election for a probabilistic investigation of traffic lights in [14]. In [16] proposed a new leader election algorithm for IoT network which is based on the tree routing protocol. During the process of leader election, each node forwards the value and the best value node is elected as a leader.
In [34], the Old algorithm of the ring considers a unidirectional interface that links the entire hubs or nodes. This algorithm has been presumed that a procedure is successfully running on every hub or node. In this system, each node is generated a unique priority number randomly. Now, that node is considered as a leader who has a maximum priority number. If the leader hub fails, then initiating the whole election process for the selection of a new leader. The method of leader hub election requires a total of 2(n − 1) messages, transmitted all through the network where, to begin with (n − 1) messages for the leader election process, and then another (n − 1) messages are sent to choose the novel leader. The EffatParvar et al. [35], have to revise the outdated ring algorithm that is already mentioned, in the relation of requirement and authentication of more than a few distributed protocols or algorithms for a leader, they are using the Temporal Ordering Specification Language (TOSL) and toolbox is Analysis of Distributed Processes (ADP).
A game theory approach [36], proposed in a completed connected network for analyzing the leader node. In this approach, each node has an equivalent chance of being elected a leader. This is possible through the Nash Equilibrium, but this approach is not a fair solution to the leader problem. In [30], proposed (FRLLE) Failure Rate and Load-based Leader Election algorithm for bidirectional ring networks to address the leader election problem. The FRLLE algorithm elects a leader with a minimum rate of load and failure. This algorithm reduces the time complexity due to fewer messages passing to elect the node as a leader. The extensive variations are required in the existing algorithm of a leader such as the bully algorithm [27], and ring [34,35] algorithms that are grounded on a tree structure and compare with the complexity of the message.
Bounceur et al. [37] proposed an algorithm for a leader is WBS (Wait-Before-Starting) for IoT networks. The node which will wait for the least before starting the execution of the process is called the leader node in the network. The leader node is the first node that starts the execution by sending the message to the other nodes in the network. After electing the leader node, other nodes will start the processing of their programs. In [29,38] modify the recent bully election algorithm procedure for cloud computing systems, where the election of a leader is on the bases of a Super Node (SN). This algorithm increases the election speed and the complexity of the message is reduce in O(n2/k) to O(n2), where n is the number of hubs or nodes, and k is the district. Another algorithm for the elect a leader was proposed for ring topology which was bidirectional with the O (n log n) complexity of the message [28]. The algorithm of the leader election is proposed for Active Network in [39]. The main objective of this algorithm is to select one leader. If the selected leader is failed in the network, then initiate reelection with O (log n) rounds for elected a new leader. Another process-terminating algorithm for a leader was proposed for ring topology that unidirectional, which has homonym processes [28]. The message complexity of this leader election algorithm is O (k2n2) O(k2n2), where n is the number of nodes and k is an upper bound value on the multiplicity of the labels. The process of leader election is very important to improving the efficiency of node communication and optimize the load of nodes for processing the tasks. "3-Phase Leader Election Algorithm" proposed [40], for leader election this algorithm use 3 phases. Firstly, filter the nodes then validate the prime node and finally elect the prime nodes as a leader node but in this scheme complexity of message passing is involved.
Shindo et al. [41], Biswas et al. [11] proposed a multi-leader algorithm to decrease communication delay and reduce the latency among the leaders. Three experiential tactics are used in a unified way for computing the appropriate status of leaders in a polynomial period. A probabilistic grounded model for elected a leader is proposed in MANETs [26]. The authors have exposed enhancements on the consumption of energy and the consistency problem of channel communication. There are other more than a few algorithms such as; leader election in a peer-to-peer network [42], elect a leader in a distributed network via software mediators to rise the election process speed, saving energy [42,43], and negotiate some more protocol for fault-tolerant election in asynchronous distributed systems [44]. The authors introduce two new mobility conscious algorithms for leader election in ad-hoc network systems [23,45]. These algorithms confirm that every connected node in the topology has just a single leader. These two new algorithms depend on a temporarily ordered routing algorithm called TORA. Most of the current election algorithms are based on the unique Id or priority numbers that are randomly generated by the system. They have not carefully considered the topology of the network and do not properly take care of real-time resources in the process of leader election [28,[46][47][48].
In the Internet of Mobile Things (IOMT) devices are considered as a smart mobility device. Mobile-Hub (M-Hub) is known as a middleware in IoT that collects information from the edge of network devices. M-Hub is run on devices and monitors these devices independently without considering the neighbor M-Hubs. According to this situation, Silva et al. [49] introduced Neighborhood-aware M-Hub (NAM-Hub) to elect a leader to integrate with the neighbor M-Hub for efficient computation without any delay or interaction. Lei et al. [50] proposed a Groupchain method for block-chain structure, appropriate for computing services in IoT. Groupchain method applied to the leader group to cooperatively execute blocks for better transaction productivity and introduced the efficient method to supervise the performance of participants in the leader group.
After a literature review, we analyze that present algorithms for leader election based on the unique number of id that system generates randomly. Most of the algorithms only consider a random id's for leader election do not consider the topological changes. The problem of Leader election is more challenging with the complexity increase in cloud computing. To overcome this problem, many leader election algorithms are designed, but still, these algorithms do not consider maximum resources due to increased complexity. We are proposing a novel technique for the leader election algorithm by using Genetic algorithms as they are used to solving NPhard problems, which consider maximum resources and cover more aspects like synchronization, communication, and resource sharing for leader election.

Proposed Model
Consider a data center D c with S = {S 1 , S 2 , . . . , S i , . . . , S n } servers grouped in form of C = {C 1 , C 2 , . . . , C ir , . . . , C m } and each server S i has R = R i.1 , R i.2 , R i.3 , . . . , R ip resources, e.g., CPU, ram, bandwidth, and throughput, etc. Each cluster C r receives T = {t 0 , t 1 , t 2 , . . . , t k1 , t k } tasks. The tasks are processed by different servers S i that can be represented in form of chromosomes Q jk = S 1 t 1 , S 2 t 3 , S 3 t 2 t 4 , . . . , S j t k where S i t k represents genes in a chromosome Q. Based on the chromosomes in each cluster C r , we apply the genetic operations i.e., crossover, and mutation to generate chromosome population P. Afterward, each chromosome is evaluated against a fitness function and correspondingly the leader node L i is selected. The selected leader nodes are then responsible to distribute the tasks in the cluster to effectively perform the load balancing and improve the network efficiency. The architecture of the proposed method is provided in Fig. 1.

Operation of GA
GA (Genetic Algorithm) is a bio-inspired algorithm and based on the Theory of Evolution and Genetics. The genetic algorithm belongs to the Evolutionary Algorithm. Theory of Evolution demonstrates the process of evolution, individuals who have a higher survival probability and environment adaptable. In Genetic Algorithm indicates the gene, that genes make chromosome, for the new individual perform crossover on chromosome and mutation on a gene. The primary operators of GA involve selection, crossover, mutation. The purpose of the selection of GA in our methodology is to select the best node (individual) with a higher probability for the leader. GA operator plays a very important role in achieving the best and optimal solution.

Population
In our proposed algorithm GLEA, we take a server S i that contains three tasks (S 1 t 1 , S 1 t 2 , S 1 t 3 ) for the population. There are many ways to schedule the tasks between intermediate nodes. In GLEA, population formation is scheduled as follows: t 5 in S 5 , t 7 S 7 , and t 2 in S 1 , as shown in Fig. 2. There are multiple numbers 'n' of chromosomes for each P z , where the population represents as P z = {Q 1 , Q 2 , Q 3 , . . . , Q z }. In each set of population contain multiple chromosomes, that represent by Q z . The chromosomes contain genes and the genes represent different tasks on different servers. For example, the chromosome Q 1 represents as Q 1 = {S 1 t 1 , S 1 t 3 , S 2 t 4 , S 3 t 2 , S 3 t 5 , . . . , S i t k }.

Crossover
The procedure of crossover is applied to two population individuals called chromosomes which are the Server S i , and a Task t k . There are some crossover strategies, we use a two-point crossover strategy as shown in Fig. 3. Where, Q 1 and Q 2 two selected chromosome which have the highest fitness values, randomly selected θ 1 and θ 2 two cut point position using Eq. (1).
Whereas, rand (2) * (k − 1) + 1 generate a random number between 1 to k to randomly selected cut-point position from 1st and 2nd population. These chromosomes are mutual to form two new individuals of the population called offspring. The server 'S' and the 't' task are selected between the best individuals in the population with a preference toward the cost function. In this way, well solution and better offspring are generated in the next population or generation, and this process is repeated till then the better solution is achieved.

Mutation
The operation of Mutation is a urinary which introduces the changes into the features of the offspring, which is coming about from the process of crossover. These deviations are minor according to the probability of mutation. Generally selected a very slight value. So, the innovative server or offspring will not be reformed from the previous and unique or original one. In our scheme, we use the substitution process of mutation. In this, we select a position from the population and substitute the values, shown in Fig. 4.

Fitness Function
We evaluate the quality of our solution by using the fitness function. We will evaluate all solutions by this function. A solution with the greatest fitness value will be the most optimal solution. Following the system, parameters are used in the fitness function, and the weight of these parameters changes according to user SLA (Service Level Agreement) using the knapsack algorithm. For example, if the user wants to give 10% weightage to CPU MIPS, 30% to RAM, 20% to Bandwidth, and 40% to throughput, then weights of system parameters are shown in Tab. 1 [51]. We have assigned weights to these parameters. The parameter with greater weight will have a greater impact on fitness value. So, our fitness value F v from Eq. (2).
where F s is the fitness of the server and calculate using Eq. (3).
As we aim to find a solution with the maximum available resources, we will select the leader with the maximum value of the server.

Selection
There are a few selection approaches such as tournament, sorting, and roulette selection. Tournament selection is the method of choosing an individual from a randomly generated population and the individual which has the highest fitness value will be selected. In the sorting selection method, firstly calculate the fitness value of every individual. After sorting the individual according, the fitness value and assigned a probability to a solution will be a selected solution. The third method is the roulette method, used in this technique. Roulette selection is also called a Fitness proportionate selection, selects the useful individuals for recombination. In roulette selection, the fitness function assigns a fitness value to a possible chromosome. Calculate the probability of each chromosome. if f a is the fitness of everyone 'a' in the population, its probability is being calculated through Eq. (4). Where b is the number of individuals in the population.

Phases of GLEA
In this algorithm, we have two phases first is the task allocation phase and the second is the Selection phase.

Task Allocation Phase
In this phase, we aim to allocate tasks to different servers. When a new task is submitted, the first step will choose the best leader L B from the set of nodes. A leader L B can be chosen from the subsequent Eq. (5).
where R L is the resources of the leader. The following Eq. (6) can find R L .
In the second step selected leader will assign the task to a server 'S selec ' in its cluster with maximum available MIPS. 0 S 0 selec can be found from the following Eq. (7).

Leader Selection Phase
In this phase, choose leaders L from the set of clusters C in a datacenter DC. We will use the genetic algorithm to choose the leader in each cluster. The genetic algorithm will return us an optimal server from each cluster, and we will choose it as a leader. After each time interval, our system will run this phase to optimize a leader.

Experiments and Results
Several experiments are conducted to evaluate the performance of the proposed method by comparing it against the state-of-art BLA algorithm. Detail experiment analyses are described in the subsections.

Experimental Setup
To elaborate on the performance of the proposed method, a set of extensive experiments are performed using Cloudsim Plus [20] simulator that is based on Cloudsim [21]. Cloudsim is a framework for modeling and simulating cloud data centers. The Cloudsim Plus is an extension of Cloudsim to simulate the more realistic scenarios. For simulation used multiple set of heterogeneous servers (MIPS range 1000-4,000), for processing used AMD-Ryzen 5 2500, and Python library to generate the graph. The proposed method performance is evaluated, based on performance parameters like execution time, SLA Violation, and Utilization of memory. Execution time is the time in which the servers finished the tasks. Utilization is defined as the ratio of the allocated MIPS from the total MIPS of the server during the time interval. An SLA violation is a violation when the task or job is not completely executed in the given time frame. To evaluate the performance randomly generates the graphs of the experiment using the input parameter shown in Tab. 2.

Experimental Results
In this section, we explain the details about how to perform experiments. Experiments were used to compare the efficiency and performance of the proposed GLEA algorithm with the traditional BLA algorithm. In experiments using the system parameter like CPU MIPS, RAM, Bandwidth, and throughput. The weightage is given to these system parameters according to the requirement of users shown in Tab. 3.

Experiment 1; Evaluation Through the Increment of Host
In the first conduct experiment, the efficiency of the proposed GLEA is compared with the BLA algorithm based on execution time, SLA violation, and memory utilization. In this scenario, evaluated the performance by incrementing the number of hosts/servers but used the fixed number of tasks. The range of hosts is 20-100 with an increment of 20, and the tasks range is fixed to 1000. In this simulation, the number of hosts is not fixed (increment by 20) and the task number is fixed. Fig. 5, shows that the proposed GLEA performance is better than BLA in terms of execution time. GLEA takes minimum time to complete a task as compared to the BLA algorithm. Fig. 6 shows SLA violation, the GLE algorithm remains to perform better as compared to the BLA algorithm with the fixed number of tasks. Fig. 7 shows the utilization of resources for both GLEA and BLA algorithms. Utilization is the same because algorithms schedule VMs to hosts is the same. Utilization is shown in the range of 0 to 1. Results validate that the GLEA performance is better and improved the efficiency with the increment of servers in terms of performance parameters like execution time and SLA violation.

Experiment 2; Evaluation Through the Increment of Task
In the second experiment, to evaluate the performance and efficiency by incrementing task and the number of hosts is fixed. The range of tasks from 100 to 10,000 with an increment of 100. In this experimental set-up, the number of hosts is 50. Fig. 8 equate the time to completely execute the assigned tasks or jobs. In this scenario validate the performance parameter like execution time by incrementing the number of tasks and the number of hosts is fixed. The BLA algorithm is less performed in the execution of a task to complete than the GLEA algorithm. It is the efficiency of the GLEA that takes less time to complete the job execution. Fig. 9 shows the GLEA reduces the SLA violation as in comparison to the BLA algorithm during the execution of a workflow.

Experiment 3; Evaluation Through Different Weightage
In this simulation, changed the weightage of parameters to evaluate the performance of the GLEA algorithm with the BLA algorithm. We used two different scenarios, in each simulation to give different weightage to parameters shown in Tab. 4. Using this distinct weightage of parameters analyze the execution time and SLA violation of both GLEA and BLA algorithms. In scenarios 1, the number of tasks is increasing by 1000, and the number of hosts is not increasing that is fixed. The range of hosts is the same, that is 50 but the weightage of resources is different. The weightage of the parameter in scenario 1 is CPU 10%, RAM 60%, Bandwidth 20%, and Throughput 10% shown in the above table. To ensure the validity of the GLEA algorithm in terms of execution time and SLA violation. Fig. 10 shows the GLEA algorithm again performs better in time to complete the task or job execution. When the number of hosts is increased then SLA violation is decreased because greater resources are accessible to complete the execution of the tasks. This experiment results prove that the GLEA algorithm substantial performance and improved efficiency in terms of performance metrics. In scenario 2, the number of hosts and the number of tasks or jobs is the same as scenario 1. The weightage of system parameters is changed. The weightage of CPU 10%, RAM 10%, Bandwidth 70%, and Throughput 10% in scenario 2. Fig. 11 shows the execution time and SLA violation is better by experimenting with the GLEA as compared to the BLA algorithm. As a result, prove by performing these experiments the GLEA algorithm achieves better performance in terms of performance parameters when compared with the BLA algorithm.

Tasks Scheduling Comparison
Now we compare the performance of the GLEA Algorithm with the conventional algorithm BLA. In this simulation, ten tasks are randomly generated, and the length of each task is (51000-60000). Then, the execution time of each server can be calculated as shown in Tab. 5. Fig. 12 represents the assignments of tasks by the BLA algorithm and Fig. 13 shows the execution time of tasks on different sever after applying the GLEA algorithm. GLEA schedules the tasks on those servers which meet the requirement of tasks and efficiently execute it with the minimum time. Through GLEA, T9 executes on S2 with minimum time as compared to BLA execute T9 on S3, so GLEA efficiently executes the tasks as compared to BLE.

Conclusion
In our work, we centered on the issues of leader election in an IoT environment, through cloud data centers to ensure that the task is executed efficiently with well-organized services and satisfy the user requirements. Leader election is a challenging task in the cloud computing environments due to the occurrence of deadlock and node failures in resource sharing. To overcome the aforementioned challenges, we have proposed an optimized algorithm called the GLEA algorithm, motivated by the genetic processes that occur in nature. The presented GLEA algorithm utilizes the available resources to analyze the candidate nodes during the LE process. Initially, the fittest value of individual nodes and overall cluster cost is calculated based on available resources. Secondly, the optimal nodes are selected as the leader nodes by employing the GA against a cost function by considering the useable resources. To analyze the efficiency and reliability of our algorithm, we performed a simulation of the GLEA algorithm and compared the results against the BLA method. Our findings suggest that the GLEA algorithm is more efficient in terms of execution time, memory utilization, and SLA violation. In the future, we plan to extend our work to fog and mobile computing.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.