A genetic algorithm-based energy-aware multi-hop clustering scheme for heterogeneous wireless sensor networks

Background The energy-constrained heterogeneous nodes are the most challenging wireless sensor networks (WSNs) for developing energy-aware clustering schemes. Although various clustering approaches are proven to minimise energy consumption and delay and extend the network lifetime by selecting optimum cluster heads (CHs), it is still a crucial challenge. Methods This article proposes a genetic algorithm-based energy-aware multi-hop clustering (GA-EMC) scheme for heterogeneous WSNs (HWSNs). In HWSNs, all the nodes have varying initial energy and typically have an energy consumption restriction. A genetic algorithm determines the optimal CHs and their positions in the network. The fitness of chromosomes is calculated in terms of distance, optimal CHs, and the node's residual energy. Multi-hop communication improves energy efficiency in HWSNs. The areas near the sink are deployed with more supernodes far away from the sink to solve the hot spot problem in WSNs near the sink node. Results Simulation results proclaim that the GA-EMC scheme achieves a more extended network lifetime network stability and minimises delay than existing approaches in heterogeneous nature.


INTRODUCTION
The latest technology development in wireless communication, sensing devices, and microelectronics have opened new frontiers in wireless sensor networks (WSNs). Critical WSNs applications include environmental monitoring (Lanzolla & Spadavecchia, 2021;Jin et al. (2013) analysed the impact of heterogeneity in WSNs, energy level, and hierarchical cluster structures. Smaragdakis, Matta & Bestavros (2004) proposed a protocol that prolongs the stability period of sensor nodes in heterogeneous WSNs (HWSNs). In Qing, Zhu & Wang (2006), Saini & Sharma (2010), average network energy and the nodes' residual energy select the optimal CHs. In Capone et al. (2019), the proposed algorithm minimises delay based on signal-to-interference-and-noise-ratio (SINR) in WSNs. Wei et al. (2011) found optimal cluster sizes based on the hop count to the sink node. It is also used to extend the network lifetime and minimise energy consumption. Several heterogeneous routing protocols in WSNs (Tanwar, Kumar & Rodrigues, 2015) are reviewed and analysed with performance metrics. The algorithm in Bandyopadhyay & Coyle (2003) organises the nodes into several clusters in WSNs and generates a hierarchy of CHs. A genetic algorithm (GA) is a meta-heuristic algorithm used to solve optimisation problems (Pantazis, Nikolidakis & Vergados, 2013;Pal & Saraswat, 2017). GA is an appropriate scheme for solving any clustering problems in WSNs. It is also used to resolve persistent optimisation problems (Mehta & Pal, 2017). In this article, HWSNs use GA for solving the multi-hop clustering based on the newly defined fitness function (Kachitvichyanukul, 2012).
Existing solutions have the advantage of cluster formation done through the residual energy and prolonging the lifetime of WSNs. However, re-clustering consumes more energy while the end-to-end delay is not minimised. This motivates us to devise an approach for designing energy-aware multi-hop clustering for HWSNs. WSNs with heterogeneous nodes result in better network stability and extend the network lifetime. Energy consumption has been minimised using GA by selecting the optimal CHs during the re-clustering. The main contributions of this article are specified as follows: A GA-based energy-aware multi-hop clustering algorithm (GA-EMC) is proposed for selecting the optimal number of CHs dynamically during the re-clustering. A framework for optimised transmission scheduling and routing is formulated to reduce the delay under the SINR model for HWSNs. A combination of weak and robust sensor nodes using their residual energy mitigates the re-clustering issues. For optimising cluster construction, the GA maintains the stability of the nodes in a network. A dynamic power allocation scheme for sensor nodes is proposed to have a guaranteed QoS for nodes.
The structure of this article starts with the introduction related to the wireless sensor network, genetic algorithm, and multi-hop clustering paradigms. The following section describes the existing multi-hop clustering algorithms and their issues. In the next section, we present the GA-EMC algorithm, followed by the section which addresses the experimental results and analyses the performance of GA-EMC. The following section is a discussion, and finally, the last section discusses the conclusion.

RELATED WORKS
This section presents the various modern and advanced multi-hop clustering schemes in WSNs. Many researchers have done some work in multi-hop clustering algorithms based on GA, and an overview of that work is given here. Saini & Sharma (2010) developed the probability-based CH selection and decreased the average CHs energy consumption. In Capone et al. (2019), Wei et al. (2011), the spatial distribution of CHs in WSNs by constructing a multi-hop table. It is also used to decrease the CHs when directly transmitted to the sink or base station (BS). The algorithm of Tanwar, Kumar & Rodrigues (2015) selects the CHs with higher residual energy and achieves better load-balancing among CHs. In Bandyopadhyay & Coyle (2003), CHs energy consumption was minimised during the data routing process and achieved better time complexity. The procotol of Pantazis, Nikolidakis & Vergados (2013) satisfies the QoS requirements in WSNs, and (Amgoth & Jana, 2015) addresses the cluster formation and CH selection using weight metrics in HWSNs.
In general, sensor networks can be heterogeneous regarding the initial energy, computational ability of the WSN nodes, and the bandwidth of the links (Jin et al., 2013). Designing WSNs with heterogeneous nodes increases the reliability and network lifetime. Computational and link heterogeneity reduces the latency in data transmission (Smaragdakis, Matta & Bestavros, 2004;Qing, Zhu & Wang, 2006). Various parameters are used to classify the nodes in HWSNs (Saini & Sharma, 2010). Capone et al. (2019) studied transmission scheduling and multi-hop routing to minimise delay using SINR. The initial energy varies according to the node's distance from the sink to overcome the energy hole problem in multi-hop networks (Wei et al., 2011). Tanwar, Kumar & Rodrigues (2015 categorised several heterogeneous routing protocols with predefined parameters by enhancing network lifetime and node heterogeneity in WSNs. GA has been used for the CHs' optimal selection in recent research. The main focus of the GA-based clustering algorithms is the fitness function. The fitness function determines the goodness of an individual to be selected for the next generation (Bandyopadhyay & Coyle, 2003). Pantazis, Nikolidakis & Vergados (2013) critically analysed the energyefficient routing protocols for WSNs. The method in Pal & Saraswat (2017) is based on biogeography-based optimisation in HWSNs. The fitness value is modified further by incorporating the residual energy of the remaining nodes that enhances the performance. It prolongs the network lifetime (Mehta & Pal, 2017). Meta-heuristics techniques are widely applied to solve several clustering problems in WSNs (Kachitvichyanukul, 2012;Bhushan, Pal & Antoshchuk, 2018). Fanian & Rafsanjani (2019) reviewed the various protocols and their properties in WSNs. Afsar, Mohammad & Tayarani (2014) investigated and presented more clustering approaches. Bari, Jaekel & Bandyopadhyay's (2008) approach formulates clusters and considers the relay nodes as CHs in two-tiered sensor networks that prolong the relay node lifetime. The method in Younis, Youssef & Arisha (2003) extends the network lifetime dynamic route selection and reduces energy consumption. Liu & Lin (2005) critically investigated and addressed the power-conserving issues in WSNs, and the algorithm in Zhang et al. (2017) solves the energy balance problem in WSNs. Gupta & Pandey (2016) have considered the location of BS and residual energy as clustering parameters to solve an energy hole problem in HWSNs. Darabkh, Zomot & Al-qudah's (2019) scheme minimises the average energy consumption and prolongs the lifetime of WSNs. Javaid et al.'s (2013a) technique for HWSNs dynamically elects the CH. It extends the network lifetime. Pal et al. (2015a) analyses the heterogeneous node locations and selects optimal CH based on the distance between the clusters. The algorithm in Sarkar & Senthil Murugan (2019) improves the energy and lifetime of both nodes and networks by choosing the optimal CHs. Kumar & Kumar (2016), Mann & Singh (2017) maximise the network energy and extend nodes' network lifetime by selecting optimal CH in WSNs. Fan's (2013) method investigates several issues such as energy consumption, coverage, and data routing in WSNs. This method improves the coverage ratio and prolongs network lifetime. Javaid et al.'s (2013b) scheme increases the node stability period and sends more packets to BS.
Ali, Shahzad & Khan's (2012) algorithm optimises the clusters in a network and minimises the data traffic and energy dissipation among nodes. Rakhee & Srinivas (2016) continuously monitors patients' data by selecting an optimal path in the body area network. It also enhances network lifetime, load balancing, and energy on the overall network. Pal et al.'s (2015b) method achieves a load-balanced network. It prolongs the lifetime of WSNs by optimising CH selection approach (Hoang et al., 2014) that reduces the distance between the CH and CMs in WSNs to improve energy conservation. Lin et al.'s (2012) approach maximises the lifetime of heterogeneous nodes based on sensing coverage and network connectivity. The approach in Singh & Lobiyal (2012b) selects energy-aware clusters and optimal CH based on hop count and locations. Pal et al. (2020) considers the GA parameters for enhancing the CH performance in WSNs. Mhemed et al.'s (2012) approach investigates the cluster formation that reduces energy consumption. Many research proposals exist in the related works addressing the energy-efficient hierarchical clustering issues, but node heterogeneity of WSN nodes has not been exploited to its full potential.
Energy efficiency is the essential component in extending the life of WSN systems that are resource-constrained, particularly in terms of energy. The energy-aware clustering algorithms become a significant factor in WSNs since multi-hop clustering methods relate to the network's communication operations. The energy, computation, and link are the three broadly divided basic types of heterogeneity of WSN. Another vital factor to consider is the heterogeneity of data creation rates, which considers nodes with varying data transmission requirements. As a result, distinct performance evaluation parameters must be used to categorise sensor nodes. So there is a necessity to categorise sensor nodes based on different performance evaluation metrics. Motivated by the above facts, in this article, we provide a genetic algorithm-based energy-aware multi-hop clustering scheme for heterogeneous WSNs. Table 1 Pal et al., 2020), our study is distinguished by the type of algorithm. In this approach, two methods are investigated in HWSNs. The first method uses GA to enhance performance by selecting the optimal CHs during the clustering and re-clustering phases. The second method extends the first method by featuring optimal transmission scheduling. In this method, we carefully analyse the transmission scheduling and communication among CHs. As a result, we address various properties and analyses of node strategies to minimise the end-to-end delay, extend the network lifetime, and improve energy efficiency. However, this is the first article presenting a GA-based energy-aware multi-hop clustering to minimise end-to-end delay, expand the network lifetime, and enhance energy efficiency in HWSNs.

MATERIALS AND METHODS
In HWSNs, clusters are formed based on GA. GA finds optimal CHs by considering the network coverage and its energy level. The CHs perform data aggregation and transmit the combined data packets to the sink. A multi-hop network is used to send packets from CHs to the sink. Neighbouring sink nodes consider regular, advanced, and supernodes. These nodes have different initial energy. Regions near the sink have a more significant number of supernodes than other regions. The next-hop CH is selected with the distance between the CHs, the residual energy, the number of CMs, and the neighbouring CHs associated with the given CH in routing. Various symbols and notations used in the proposed work are mentioned in Table 2. It is improved in the cluster data transmission phase after the CHs are selected It reduces the network energy, network overhead, and cost.
Hot-spot problems are created It uses competition range to construct clusters of even sizes.
Achieves load balance among CHs Uneven clustering strategy The clever strategy of CH selection, residual energy of the CHs and the intracluster distance for cluster formation.
Achieves constant message and linear time complexity.
High message complexity for building backbone network of CHs.
Minimising the end-to-end delay The delay is significantly reduced by combining cooperative forwarding (CF) and forward interference cancellation (FIC).

E2HRC (Shang, 2013)
Messaging structure for clustering and routing Balancing average energy consumption, network load and improving network performance

2015)
A tight closed-form expression for the optimal number of CHs in the network Balancing energy consumption amongst all sensor nodes and prolonging the network lifetime.
It is more sensitive to any changes in the network size.
EDDEEC (Jin et al., 2013) Probabilities for CH selection based on initial, the remaining energy level of the nodes and average energy of network

Multi-hop network model
A WSNs is assumed to be a bidirected graph G ¼ ðV; E; CÞ; where V denotes the network size, E V Â V is two-way communication links, and C ¼ fði; jÞ; ðj; iÞ : fi; jg 2 Eg is the direct links. We consider that fi; jg 2 E iff, the SINR is convinced, i.e., rði; jÞ c ! g and rðj; iÞ c ! g; where rði; jÞ and rðj; iÞ be the energy at node j when node i is sending and receiving data packets, respectively. It can be represented as rði; jÞ :¼ PðiÞuði; jÞ; rðj; iÞ :¼ PðjÞuðj; iÞ; where PðiÞ and PðjÞ be the transmitted energy of nodes i and j. Here, uði; jÞ ¼ uðj; iÞ is to obtain the communication link fi; jg; c is the noise power and g is the SINR value. The network size occurrence to i 2 V is defined by ÀðiÞ; i.e., ÀðiÞ ¼ fj 2 V : fi; jg 2 Eg: Assume that the order of time T :¼ f1; 2; :::;sg and x is the group of nodes in a time. The direct link ði; jÞ 2 C is very dynamic, only if i 2 x; j = 2 x and the resulting SINR is convinced: A node can either send, receive, or be inactive at a particular time. A group c C of communication links will be simultaneously very active for the compatible set ðcÞ situations. The group of active sensor node c is represented by xðcÞ :¼ fi : 9j; ði; jÞ 2 cg: The SINR is applied to communication links ði; jÞ 2 c : Consider the data packet set M, and each data packet m 2 M needs a time for a particular transmission and is sent from source SðmÞ to sink DðmÞ: The data packets are available in the respective sources on time s ¼ 1. The time s ¼ T occurs in transmitting the packets and computes the total network delay.

Optimised energy model
The proposed GA-EMC is adopted an optimised energy model (Abo-Zahhad et al., 2015) that minimises energy consumption. The nodes' energy is needed to communicate a data packet consisting of l bits of a packet is denoted by Eq. (3).
where, d represents the distance between the nodes involved in the communication, q is the energy dissipated in the source and sink. It considers the factors like modulation and digital coding. The variables a and b represent space and multipath fading coefficients, and the threshold d 0 decide whether to use a multipath fading model. Equation (4) gives the energy spent by the sink receiving l bit of packets.
The CM node spends the energy to send a packet to its CH. The power spent by the CM to transmit l bits of a packet to its CH is determined by Eq. (5).
where, d i;CH i represents the Euclidean distance between the i th CM and its CH. A CH spends its power to receive a packet from its CMs, aggregate all the packets, and send it to other CHs. In addition to forwarding the local cluster data, CHs may also forward the traffic received from other CHs. Equation (6) shows the energy required by CHs.
where, N CM j r R ðlÞ denotes the energy spent by CH to accept data packets from the CMs. The second gives the energy spent in aggregation, the third term gives the energy spent for data transmission to the next-hop CH, and the last term r F CH j represents the energy spent in forwarding the relay traffic. r F CH j is the sum of energy required to receive k bits of the packet from all the low-level CHs and communicate the packets to the parent CH as shown by Eq. (7), Phases in the proposed GA-EMC protocol The proposed GA-EMC contains four phases: Heterogeneous Nodes Deployment, Clustering Formulation, Selection of Next-hop Neighbor, and Packet Transmission. The proposed GA-EMC scheme's main idea is to optimise energy management of the WSNs by minimising the intra-cluster distance between a CH and a CM. Using Euclidean distance, the distance between the CM and the CH is calculated for WSNs. The CM is placed in the cluster with the least space between it and the others. The nodes interact directly with the sink if the distance between the sink and the sensor node is smaller than the distance between CH and CM. When a node joins a cluster, it sends a JOIN message to the other nodes and the CH to let them know it's there. The CH assigns each node a time slot for data collection. After the data has been acquired, the CH aggregates it before sending it to the sink. The nodes may sleep during this entire process, but CH must be awake at all times. This lowers CH's energy, and few nodes die over time due to living in a sparse network. In each cycle, the clusters are reconstructed, and CHs are chosen.
The fitness function is used in the proposed GA-EMC technique to reduce the intracluster distance between the sensor nodes and the cluster head (CH). The function optimised the CH's placement, which impacts the estimated number of packet retransmissions along the path and hence on the network's overall energy usage. Because GA-EMC works with the fitness function, the proposed technique is preferable in terms of performance measurement in terms of energy consumption.
Minimising the distance between CMs and their CHs examines the sink distance, intracluster distance, and residual energy of CMs to determine their ideal positions. These phases are described below.

Heterogeneous nodes deployment phase
In multi-hop communication, the CHs are situated very close to the sink node and have to forward more packets received from other nodes, and their power is exhausted quicker than the CHs far away from the sink. This creates a hot spot in the regions near the sink. To solve this issue, sensor nodes are classified into regular nodes with initial energy r 0 , advanced nodes with initial energy r 0 ð1 þ aÞ, and supernodes with initial energy r 0 ð1 þ bÞ joules. The value of energy heterogeneity constants a and b are greater than 1. The WSNs consist of N nodes in total with m a Â N advanced nodes, m s Â N supernodes, and ð1 À m a À m s Þ Â N regular nodes. The areas near the sink node have more supernodes than the areas away from the sink.

Clustering formulation phase
In this phase, more clusters are formed in HWSNs. It also contains two other sub-phases, namely CHs Selection and CM Association phases. The CHs selection phase selects an optimal CH. Each CM is associated with any one of the nearest energy-efficient CH in the CM association phase.

CH selection phase
This phase uses the GA for selecting optimal CHs and their location. GA is working on natural genetics and natural selection principles and is used to optimise various parameters. GA is applied in multiple fields for solving constrained and unconstrained optimisation problems (Bandyopadhyay & Coyle, 2003).

CM association phase
Each CH sends a CH advertisement ðCH ADV Þ message containing its identifier, location, initial and residual energies and starts a clustering timer with a predefined value in the CM association phase. When a CM receives CH ADV , it stores the information in the cluster table (CT). A CM may receive CH ADV from one or more CHs. For each CH entry stored in the CT, CM calculates the cost as given by Eq. (8). CM selects their respective CH with low cost and sends JOIN message to the optimal CH.
Here c 1 is a constant 0 c 1 1. By setting proper value for c 1 , we can decide how much importance to give to distance and energy in the CH selection. The terms f 1 and f 2 are calculated as provided by Eqs. (9) and (10).
where, k j i and u j i represents the initial and residual energies of CH j i respectively. CH j i is a CH present in the cluster table of CM i and d max is the distance among the CH and its CMs, and it is calculated by Eq. (11), The CHs collect the JOIN message from the CMs until the clustering timer expires. Upon the expiry of the timer, CHs create a dynamic time division multiple access (TDMA) scheduling for the packet transmission and send it to the CMs. The GA-based clustering algorithm shows the various steps involved in forming clusters and optimal CH selection.

Next-hop neighbour selection phase
Each CH broadcasts a neighbour advertisement message that contains information like identifier, location, initial and residual energies, distance to sink, and the size of CMs associated with it. When a CH receives a neighbour advertisement message, it adds the information contained in the packet to the neighbour. As shown in Fig. 1, CHs use multihop paths to communicate the data packets to the sink. The next-hop CHs are chosen based on the distance, residual energy, and the size of CMs associated with the next-hop CH, the number of CHs that have reached via the next-hop CH. When more CHs can be reached via a CH, the CH will help forward the packet reliably. CH with more residual energy, less distance, smaller CMs, and more neighbouring CHs prefer the next-hop CH.
For each CH node in the neighbour table, a merit value (MV) is calculated based on the above factors. Equation (12) shows the calculation of MV.
Here h represents the neighbouring CHs of CH i .

Data transmission phase
It involves communication within the cluster and communication between sink and CH. In intra-cluster communication, the CH receives packets from their CMs per the dynamic TDMA scheduling. CM also senses the data from the surroundings and sends them to the concerned CHs during a particular time. The CMs turn off their radio in the remaining time to save the energy wasted during idle listening. Each CH has many next-hop CH neighbours, and the best neighbour node is selected in the next-hop neighbour selection phase.

Genetic algorithm
In GA, each result to a specific problem is denoted by a chromosome using a binary coding scheme. A group of chromosomes constitutes the population. The initial population consists of randomly selected chromosomes, and each bit in the chromosome is called a gene. For each chromosome, a fitness value is calculated, and it evaluates the effectiveness of the chromosome. Chromosomes with high fitness values will get more chances to create new chromosomes. The GA involves three basic operations: selection, crossover, and mutation to select the best chromosome. The selection process duplicates good chromosomes and eliminates the poor ones, and there are many selection methods like tournament selection, ranking selection, and roulette wheel selection. The crossover operation selects two parents, recombines them, and creates two children. Crossover can be either single-point crossover or multi-point crossover. Crossover does not introduce any new genetic properties. Mutation operation introduces new genetic properties. These operations are repeated for a given number of generations (Bandyopadhyay & Coyle, 2003;Pantazis, Nikolidakis & Vergados, 2013). The implementation of various GA operations is explained below.
i. Binary Coding: Binary coding scheme represents each chromosome for the given sensor scenario as a string of 0 s and 1 s . a chromosome of length N bits signifies HWSNs with N nodes. The chromosome size is the same as the size of the network. In the chromosome set, value 1 and 0 represents the CH and CM, respectively. Figure 2 shows the chromosome representation of a network with 20 sensor nodes. Nodes S 2 ; S 3 ; S 11 are CHs, and the remaining nodes are CMs.
ii. Objective Function: The objective function (d) is used for selecting optimal CHs. In designing d, the following facts are considered. The optimal CH consumes more energy than the CM, so the number of CHs must be minimised. The power required for intracluster communication depends on the distance between CHs and CMs, and the power required for inter-cluster communication depends on the distance between two CHs. To save power, we have to reduce the size of optimal CHs ($), the distance between CHs and CMs (#), and the two CHs distance (s). By selecting CHs with higher residual energy, we can deliver packets reliably. The d selects the CHs by considering the above factors, and it is a minimisation function as given in Eq. (19).
where, u represents the sum of the residual energy associated with the CHs.
Eq. (21) determines the sum of the distance between CMs from their respective CHs. where, CH i represents the i th CH, C i denotes the set of CMs associated with CH i and CM j represents j th CM node associated with CH i . In Eq. (22), s represents the total distance between all the CHs in the i th level to the parent CH nodes in the i À 1 th level. The node level is considered to find out the parent CH nodes. All the CHs in level 1 send packets to the destination directly. CHs in the remaining level send packets to their parent CHs in a multi-hop fashion to the sink, and CHs in i À 1 th level is the parent of CHs in the i th level.
where j 2 P i and P i represents the set of parent CHs associated with CH i .
iii. Fitness Function: GA is generally suitable for solving the maximisation problem. Since our aim is minimising d, this problem is transformed into maximising the fitness value f v . For each chromosome in the population d is used to calculate the f v as given by Eq. (23).
iv. Selection: It is used to select chromosomes with higher f v to join the mating pool to form a new population for the subsequent generations. The proposed method uses the Roulette wheel selection method.
v. Crossover: The proposed GA-EMC scheme uses a single-point crossover. A random value (0 to 1) and two chromosomes have been selected for this operation. The crossover operation is performed only if the selected random value is less than the crossover probability p c . Otherwise, no crossover is done. If it is decided to perform crossover, an arbitrary crossover point is selected. After the crossover point, the two-parent chromosomes exchange their packet to generate two child chromosomes. Table 3 shows the crossover operation.
vi. Mutation: In bit-level mutation, a random value is chosen for every bit in a chromosome. Suppose this random value is less than P m , then the mutation is performed to invert the bit. Otherwise, the bit is kept as such.
As shown in Table 4, in the first chromosome, no mutation is performed, whereas in the second chromosome, six bits are mutated.
Selection, mutation, and crossover operations are repeated for given generations. The better chromosome is selected at the end of the last generation. In the best chromosome selection, if the genome value is 1, the node becomes CH, and otherwise, it becomes CM. Minimising end-to-end delay with packet forwarding mechanism Let s represent an upper limit on the delay with T ¼ f1; 2; :::;sg: A mathematical model is designed to analyse the number of data packets sent and received between CH and CMs for a particular time. We use the binary variables:

GA-based clustering algorithm
GA-EMC is specially formulated to minimise the delay required to send packets from source to destination. The constraint w tþ1 w t ; t 2 Tnfsg has been forced all the time after the first round. The constraint P s2S Z t i;s þ Y t i;s w t ; i 2 v; t 2 T ensures that a node can either send and receive data packets at a particular time or nothing to be done. The Y T i;s ; s 2 S; i 2 VnfOðsÞg; t 2 T and a 1 OðsÞs ¼ 1; a T DðsÞs ¼ 1; s 2 S defines variable a and set the conditions for the starting and ending of the dynamic TDMA scheduling. Finally, the constraint Z t i;s a t i;s ; i 2 V; s 2 S; t 2 T expresses the SINR state for sending data packets s on link i; j ð Þ at a time t. Subsequently, the SINR state is stable when Z t i;s ¼ Y t j;s ¼ 1À agreeing to the case when all nodes besides only node i is sending packets in a network. Although, node j receives a data packets s from node i in a time t, then P t2T w t becomes equivalent to We observe that the packet forwarding mechanism is used to increase the transmissions in HWSNs. In GA-EMC, the packet forwarding mechanism increases the transmissions at a particular time, and more data packets are transmitted to the CMs through adjacent clusters. This is possible for increasing the use of packet forwarding and forward interference cancellation mechanisms among CMs in all clusters in the ensuing time, which is more cooperative for minimising delay in HWSNs.

RESULTS
In this section, the performance of GA-EMC is analysed and compared with E-MDSP (Capone et al., 2019) and EEWC (Pal et al., 2020). Simulations are performed using the Network Simulator (NS2) (Issariyakul & Hossain, 2012). An HWSN consists of 400 nodes in a simulation area. To evaluate the GA-EMC performance, we have considered the metrics such as network lifetime, throughput, network stability, the number of data packets sent to the sink, and the average energy consumption in the whole network. The various parameters for simulation are presented in Table 5.

Network lifetime
To extend the HWSNs' lifetime, we have considered the alive nodes in each round. Figure 3 illustrates that the proposed GA-EMC scheme extends the lifetime of alive nodes in every round than EEWC and E-MDSP. The proposed GA-EMC provides a better network lifetime than existing schemes. The proposed GA-EMC uses multi-hop communication for packet delivery to extend the network lifetime. Compared to the existing schemes, the first node dies after 1,800 rounds in GA-EMC. Later, the last node remains alive for 2,100 rounds. In EEWC and E-MDSP schemes, the nodes have died after 1,000 and 1,600 rounds, respectively. Figure 3 shows that the proposed GA-EMC scheme prolongs the network lifetime and stability, and the last alive node can still respond to the network in this approach.

Throughput
In HWSNs, the proposed GA-EMC algorithm analysed the number of data packets sent, each CH sends data packets to the sink, and the CMs send data packets to their respective CHs. As shown in Fig. 4, the EEWC performs poorly with less data packet communication. Similarly, the E-MDSP gives the best behaviour than EEWC and also provides poor performance than GA-EMC. The number of data packets sent from CHs is increased significantly by the EMC-GA and achieves better throughput when compared to the other schemes. Figure 5 illustrates the regular time interval from the beginning of the network process until the death of the first node in HWSNs. As shown in Fig. 5, the GA-EMC has a better stability period than the other schemes. The first dead node starts at 1,800 rounds in the GA-EMC scheme, whereas the first dead node starts at nearly 1,000 and 1,600 rounds under the EEWC, E-MDSP approaches. The stability duration of GA-EMC compared with the EEWC scheme increases from 1,000 to 2,500 rounds, and the E-MDSP increases from 1,600 to 2,500 rounds. So, GA-EMC provides better stability duration and prolongs the network lifetime.

Stability period
Minimising the end-to-end delay Figure 6 displays the analysis of various approaches in terms of delay. It shows that the EEWC acquires the extreme delay of 0.04 s in 2,500 rounds. However, the delay is low compared to EEWC, and it is maximum than the GA-EMC. At the same time, E-MDSP achieves a minimum delay than EEWC, but it fails to outperform GA-EMC. GA-EMC approach achieves a low delay of only 0.02 s at the 2,500 rounds. Even though more packets are transmitted in the proposed protocol than in EEWC and E-MDSP, the average energy consumption till a particular round is less in the proposed Figure 5 Number of rounds vs number of dead nodes. The X-axis represents the number of rounds, and Y-axis represents the number of dead nodes. Green-line, red-line and blue-line represent proposed GA-EMC, E-MDSP and EEWC, respectively. It shows the regular time interval from the beginning of the network process until the death of the first node in HWSNs. The GA-EMC has a better stability period than the other schemes. The first dead node starts at 1,800 rounds in the GA-EMC scheme, whereas the first dead node starts nearly 1,000 and 1,600 rounds under the EEWC, E-MDSP approaches. The stability duration of GA-EMC compared with the EEWC scheme increases from 1,000 to 2,500 rounds, and the E-MDSP increases from 1,600 to 2,500 rounds. So, GA-EMC provides better stability duration and prolongs the network lifetime.

Impact of sink node location on the HWSNs lifetime and stability
Network stability is measured by the round when the first node died. To study the impact of sink location on the network stability and lifetime, we have considered three scenarios. In scenarios 1, 2, and 3, the sink is situated at various places such as the middle, top right corner, and outside the field, respectively. Table 6 showed the comparison of the round when a given percentage of nodes died for different sink positions. On average, the proposed protocol extends the round when the last node died by 30.94% by considering different sink positions. GA-EMC extends the network lifetime and better stability in all three cases, and it provides more significant improvement when the sink is at the corner and outside the field due to multi-hop routing. Figure 8 shows the rounds when the first node died in the three scenarios. As shown in Fig. 8, the round when the first node died is postponed by 10.98%, 23.47% and 46.94% in scenarios 1, 2 and 3, respectively. This shows that the proposed protocol performs better for longer distance transmission. Compared to EEWC and E-MDSP, the GA-EMC Figure 8 Comparing GA-EMC with EEWC and E-MDSP based on network lifetime. The X-axis represents each of three scenarios: i.e., the first node died, the middle node died and the last node died. The Y-axis represents the rounds when the first node died in the three scenarios. Green-bar, red-bars and blue-bars represent proposed GA-EMC, E-MDSP and EEWC, respectively. Full-size  DOI: 10.7717/peerj-cs.1029/ fig-8 provides a 27.13% improvement in the round when the first node died, considering the average of different sink positions.

DISCUSSION
The proposed GA-EMC scheme outperforms the existing methods, especially EEWC and E-MDSP, in almost all aspects. It extends the lifetime of alive nodes in every round and prolongs the network lifetime and stability. Also, it significantly increases the number of data packets sent from CHs and achieves better throughput. It provides better stability duration and prolongs the network lifetime. Furthermore, it achieves a lower delay and reduces the average energy consumption till a particular round. It extends the network lifetime and better stability in all three cases. Due to multi-hop routing, it improves when the sink is at the corner and outside the field and performs better for longer distance transmission.

CONCLUSIONS
In this article, a GA-EMC scheme is presented for extending the lifetime and minimising the delay in HWSNs. In selecting the optimal CHs, the fitness value is calculated based on cluster distances, the number of CHs, and their initial and residual energies. Each cluster selects a CH with minimum distance, higher residual energy, minimum CMs, and maximum neighbours as its next hop in inter-cluster routing. The energy hole problem created due to multipath routing is solved by deploying more higher energy supernodes in the areas closer to the sink. The mathematical model for energy consumption for clustering with multi-hop data transmission is explained. The experimental results proclaim that GA-EMC prolongs the HWSNs lifetime, minimises the delay, and maximises stability compared to EEWC and E-MDSP for various positions of the BS, primarily when the BS is situated in the network corner and outer area. The death of the first and last nodes is prolonged by 27.13% and 30.94%, respectively, compared with EEWC and E-MDSP. In the future, the simulation can be repeated to see the impact of the number of nodes in HWSNs. Also, the performance of GA_EMC can be analysed in actual (not simulated) HWSNs in some practical scenarios.