IDUC: An Improved Distributed Unequal Clustering Protocol for Wireless Sensor Networks

Due to the imbalanced energy consumption among nodes in wireless sensor networks, some nodes die prematurely, which decreases the network lifetime. To solve this problem, existing clustering protocols usually construct unequal clusters by exploiting uneven competition radius. Taking their imperfection on designing the uneven competition radius and intercluster communication into consideration, this paper proposes an improved distributed unequal clustering protocol (IDUC) for wireless sensor networks, where nodes are energy heterogeneous and scattered unevenly. The cores of IDUC are the formation of unequal cluster topology and the construction of intercluster communication routing tree. Compared with previous protocols, IDUC is suitable for various network scenarios, and it can balance the energy consumption more efficiently and extend the lifetime of networks significantly.


Introduction
A wireless sensor network (WSN) consists of plentiful lowpower sensor nodes capable of sensing, processing, and communicating. These sensor nodes observe the environment phenomenon at different points in the field, collaborate with each other, and send the monitored data to the base station (BS). As sensor networks have limited and nonrechargeable energy resources, energy efficiency is a very important issue in designing the network topology, which affects the lifetime of WSNs greatly. Thus, how to minimize energy consumption and maximize network lifetime are the central concerns when designing protocols for WSNs.
In recent years, clustering has been proved to be an important way to decrease the energy consumption and extend lifetime of WSNs. In clustering scheme, sensor nodes are grouped into clusters, in each cluster, a node is selected as the leader named as the cluster head (CH) and the other nodes are called cluster members (CMs). Each CM measures physical variables related to its environment and then sends them to their CHs. When the data from all CMs arrive, CHs aggregate data and send it to the BS. Since CHs are responsible for receiving and aggregating data from their CMs and then transmitting the aggregated data to the specified destination, the energy consumption of which is much higher than that of CMs. To solve this problem, most clustering algorithms divide the operation into rounds and periodically rotate the roles of CHs in the network to balance the unequal energy consumption among nodes. However, there exists another problem; that is, energy consumption among CHs is also imbalanced due to the distance to the BS. In single-hop networks, CHs farther away from the BS need to transmit data to a long distance. Thus, the energy consumption of these CHs is larger than that of CHs closer to the BS. In multihop networks, CHs closer to the BS undertake the task of forwarding data, which means that the energy consumption of CHs closer to the BS is larger. The imbalanced energy consumption of nodes leads to a certain number of nodes dying prematurely, causing network partitions. To solve this problem, researchers design unequal clustering algorithms to balance the energy consumption among CHs.
In this paper, aiming at energy heterogeneous networks where nodes are deployed unevenly, a more practical network case, we propose an improved distributed unequal clustering

Related Works
Since the energy consumption of CHs is much larger than that of CMs, in order to balance the energy among nodes, most clustering protocols adopt a rotation mechanism of CHs. The rotation methods used by the existing clustering algorithms can be divided into time-driven rotation and energy-driven rotation. In time-driven clustering algorithms [1][2][3][4][5], the role of the CH is rotated in the entire network periodically according to a predetermined time threshold. As each rotation is carried out in the entire network, the large overhead of recluster causes a lot of unnecessary energy waste. In energy-driven clustering algorithms [6][7][8][9][10][11], the role of CH is rotated when the residual energy of CH is less than a threshold. Recluster process only happens in local area; thus the large cost of global topology reconstruction can be avoided.
However, aside from the imbalance energy consumption among CHs and CMs, there also exists another imbalance consumption phenomenon among CHs that can impact the network lifetime significantly. To solve this problem, many unequal clustering algorithms have been proposed. The unequal clustering algorithms proposed in [12][13][14] all divide the network field into cirques. In [12], clusters in the same cirque have the same size, whereas clusters in different cirques have different sizes. Some high-energy nodes are deployed to take on the CH role to control network operation, which ensures that the energy dissipation of nodes is balanced. In [13], a cirque-based static clustering algorithm for multihop WSNs is proposed. Clusters closer to the BS have smaller sizes. Utilizing virtual points in a corona-based WSN, static clusters with dynamic structures are formed in ERP-SCDS [14].
The communication way of CHs in the distributed clustering protocol EECS [15] is single-hop, and the protocol adopts a weighted faction to control the numbers of CMs to construct unequal clusters. That is, the cluster size is smaller if it is farther away from the BS, vice versa.
EEUC [16] is also a distributed unequal clustering algorithm with intercluster multihop communication, which elects CHs based on the residual energy of nodes. Each node becomes a tentative CH with a probability . However, the competition radius used by EEUC is not ideal for heterogeneous WSNs, and since the quality of the generated CHs is affected by , there also exists "isolate points" in EEUC in some cases. LUCA [17] is similar to EEUC but presents more accurate theoretical analysis of optimal cluster size based on the distance between the CH and the BS.
In [18], we proposed EADUC to overcome the defects of EEUC. When designing the competition radius, besides the distance between nodes and the BS, the residual energy of nodes is also taken into account. That is, CHs closer to the BS and possessing lower residual energy have smaller cluster sizes to preserve some energy for the intercluster data forwarding; thus the cluster size is more reasonable and more suitable for heterogeneous WSNs. Simultaneously, EADUC overcomes the "isolate points" problem.
In [19], we proposed ECDC, in this algorithm, different coverage importance metrics are designed for different practical applications. We select cluster heads based on the relative residual energy and the coverage importance metrics of nodes. The intercluster communication adopts multihop forwarding mechanism. This algorithm can construct a better clustering topology with lower energy dissipation and better coverage performance through less control information.
These protocols described above, such as EEUC, only consider the distance between nodes and the BS, which is not suitable for heterogeneous networks; thus EADUC and ECDC also take residual energy of nodes into account besides the distance factor. However, they all overlook the distribution of nodes in WSNs, and it is not always effective to apply these algorithms into networks where nodes are scattered unevenly.
Aiming at this problem, what we need to do is design a protocol, which is suitable for various network scenarios, an improved distributed unequal clustering protocol (IDUC) is proposed in this paper. IDUC is effective in both heterogeneous and homogeneous network scenarios. In addition, it is suitable for WSNs where nodes are scattered evenly or unevenly. Our main contribution in the paper is as follows.
(1) A new cluster head competition radius is proposed; it considers the distance among nodes and the BS, the residual energy of nodes, and the number of neighbor nodes within the nodes' communication range.
(2) To meet the gap between the number of nodes within the communication ranges and finally cluster ranges, when designing the intercluster routing tree, CHs will choose CH nodes that possessing higher energy and fewer CMs as their next hops.

Network Model.
To simplify the network model, we adopt a few reasonable assumptions as follows.
(1) There are sensor nodes that are distributed in an × square field.
(2) The BS and all nodes are stationary after deployment.
(3) All nodes can be heterogeneous.
(4) All nodes are location-unaware.  (6) The BS is out of the sensor field. It has enough energy, and its location is known by each node.
(7) Each node has a unique identity .
To transmit an -bit data to a distance , the radio expends energy is where is the transmission distance, elec , fs , and mp are parameters of the transmission/reception circuit. According to the distance between the transmitter and receiver, free space fs or multipath fading mp channel models is used. While receiving an -bit data, the radio expends energy is 3.2. Problem Description. As described above, some clustering protocols construct unequal clustering topology by uneven cluster head competition radius. However, these protocols, such as EEUC, only consider the distance between nodes and the BS, which is not suitable for heterogeneous networks; thus EADUC also takes residual energy of nodes into account besides the distance factor. Nonetheless, if we applied these algorithms into networks where nodes are scattered unevenly, such case is very likely to appear as shown in Figure 1, if the distance between and BS is near to the distance between and BS; meanwhile, the residual energy of and is also approximate, and it is notable that the number of CMs within the cluster range of is much larger than , which can also lead to the imbalanced consumption of and .
Meanwhile, in most practical applications, the deployment of nodes in networks is not always uniform, as shown in Figure 2.  clustering algorithms are inclined to be designed based on networks as Figure 2(a), an ideal network model, whereas these networks, as shown in Figure 2(b), are often neglected; since nodes are unevenly scattered, the nodes density is different in different area of the network. In such scenario, case appearing in Figure 1 easily happens when we applied existing clustering protocol. Thus, we need to control the number of CMs of each cluster; that is, if nodes have more communication neighbor nodes, their cluster competition radii should be smaller, vice versa. In fact, it is easy to obtain a method to solve this problem, as shown in Figure 1, that is, to reduce the competition radius of , and to increase the competition radius of , correspondingly. With the adjustment of competition ranges, the numbers of CMs covered by and are all adjusted to be more reasonable. Thus, it is necessary to design a new CH competition radius for such networks, besides the distance from the nodes to BS and the residual energy of nodes, we also take the number of neighbor nodes within the communication range of nodes into account. However, we have to admit that the number of neighbor nodes within the node initial communication range is very likely to be not equal with the number of CMs within its final cluster range. Thus, to further balance the consumption among CHs, when we construct the intercluster multihop routing tree, each CH needs to count the number of its CMs, and then it chooses the neighbor CH with fewer CMs and higher residual energy as its next hop.

IDUC Details
The whole operation is divided into rounds, where each round consists of a cluster setup phase and a data transmission phase. In the cluster setup phase, a clustering topology is formed, and, in the data transmission phase, a new routing tree is constructed to forward data. To save energy, the data transmission phase should be longer than the cluster setup phase. The descriptions of node states and several control messages are shown in Table 1, respectively.

Cluster Setup Phase.
In the network deployment phase, the BS broadcasts a signal, and each node can compute its approximate distance to the BS based on the received signal strength; this step is necessary when designing an unequal distributed clustering algorithm. The following is the cluster setup phase. The first subphase of this phase is information collection phase, whose duration is set as 1 . At the beginning of this phase, each node broadcasts a message within its communication range , and the message contains the node and its residual energy. Meanwhile, the node will receive from its neighbor nodes, and each node calculates the average residual energy of its neighbor nodes by using the following formula: where denotes the number of neighbor nodes of and denotes the residual energy of the th neighbor of . For any node , it calculates its waiting time for broadcasting the message according to the following formula: where is a real value randomly distributed in [0.9, 1], which is introduced to reduce the probability that two nodes send at the same time. After 1 expires, it starts the next subphase, cluster head competition phase, whose duration is set as 2 . In this phase, for any node , if it receives no when time expires, it broadcasts the within competition range to claim that it will be a CH. Otherwise, it gives up the competition. In order to generate unequal clusters, these nodes need to calculate their own competition radius . In [15], based on the distance between nodes and BS, the formula of is as follows: where max and min are the maximum and minimum distance from nodes to the BS, ( , ) is the distance from node to the BS, is a weighted factor whose value is in [0, 1], and max is the maximum value of competition radius. By analyzing the formula (5), we can obtain that a larger ( , ) can generates a larger , which can guarantee that CHs farther away from the BS will control larger cluster areas, whereas CHs closer to the BS can control smaller cluster areas.
In heterogeneous networks, nodes have heterogeneous initial energy. In the case that each node has the same energy consumption, nodes with low initial energy will die prematurely, reducing the network lifetime. In order to take full advantage of high-energy nodes, these high-energy nodes should take more tasks. Therefore, considering both the distance from nodes to the BS and the residual energy of nodes, we gave an improved formula of in EADUC [18] as follows: where and are the weighted factors in [0, 1] and is the residual energy of node . From the above formula we can see that the competition radius of the node is determined by ( , ) and . Formula (6) means that CHs with higher residual energy and farther away from the BS will control larger cluster area.
However, the cluster competition radius designed above are not suitable for all networks where nodes are scattered unevenly, especially when the distance between these nodes and BS is similar, and the residual energy of these nodes is also approximate. Thus, we need to design a new competition radius to avoid imbalanced energy consumption in such case.
Meanwhile, another remarkable problem generated in EADUC is that there is no restriction on the relation of and ; thus, in such case where both ( max − ( , ))/( max − min ) and 1 − ( / max ) are large and their weighted factors are also large, the we obtain is likely to be a negative value, which is not meaningful in practical applications; therefore, it is necessary to give a limit on the relation of and . Aiming at above disadvantages of existing , we propose a new cluster head competition radius , which is set as follows: where denotes the number of nodes in the network and is the number of neighbor nodes within the communication range of . , , and is the weighted factors in [0, 1], and we set + + ≤ 1. Formula (7) means that CHs closer to the BS, with lower residual energy and more communication neighbor nodes will have smaller cluster size. In conclusion, firstly, CHs closer to the BS can save energy for data forwarding. Secondly, CHs with lower residual energy dominating smaller clusters can avoid their premature death and prolong the network lifetime. Thirdly, CHs with more communication neighbor nodes control smaller clusters, which makes the competition radius more suitable for nonuniform networks. Obviously, in formula (7) makes IDUC suitable for various network scenarios.
(1) If the network is energy homogeneous, we can set = 0 and + ≤ 1.
(2) If the distribution of nodes in the network is uniform, we can set = 0 and + ≤ 1.
(3) If nodes in the network is energy homogeneous and the distribution is nonuniform, we can set + + ≤ 1.
According to practical network applications, we can adjust , and , and to be the optimal value to extend the network lifetime.
When 2 expires, the next subphase is the cluster formation phase, whose duration is 3 . In this phase, each plain node chooses the nearest CH and sends the , which contains the and its residual energy. According to the received , each CH creates a node schedule list including the ℎ for its CMs. At this point, the entire cluster setup phase is completed. Algorithm 1 give the details of the whole cluster setup phase.

Data Transmission Phase.
In the data transmission phase, each CM collects local data from the environment periodically and then sends the data to the CH within its time slot according to the TDMA scheduling list to avoid the collisions among the members in the same cluster. When data from all the member nodes has arrived, the CH aggregates the data and sends it to the BS. Thus, this section is divided into two subphases, and -. CMs sense and collect local data from the environment, and send the collected data to CHs. This process is called -. For simplification, CMs communicate with CHs directly, just like LEACH.
In -phase, we will construct a routing tree on the elected CH set, each CH will forward these data they have collected and aggregated from their CMs to the BS by other CHs. This multihop communication from CHs to the BS will further reduce and balance the energy consumption.
Several nodes need to be selected as child nodes of the BS from all CHs and communicate with the BS directly. Therefore, each CH determines whether to be selected as the child node of the BS depending on its distance to the BS according to a threshold Euclidean distance . If the distance from CH to the BS is less than , communicates with the BS directly, and sets the BS as its next hop. Otherwise, it communicates with the BS through a multihop routing tree.
The concrete process is as follows. We set the duration as 4 . At the beginning, each CH broadcasts a message within the radio radius with the values of the , the residual energy, and the distance to the BS. To ensure the connectivity of all CHs, we set the radio radius = 3 . If the distance from CH to the BS is less than , it chooses the BS as its next hop. Otherwise, it chooses its next hop according to these received . CH chooses the neighbor CH with higher residual energy, fewer CMs, and no farther away from the BS as its next hop. We give the formula of "Cost" when CH chooses CH as its next hop as follows: where denotes the residual energy of CH , denotes the number of CMs of . is a random value in [0, 1], and it is used to determine which factor is more important in choosing the next routing node. We can obtain from (8) that, nodes with higher residual energy and fewer CMs have larger cost value.
To visually demonstrate the construction of inter-cluster communication routing tree, we give an instance shown in Figure 3. Node 1 chooses its next hop CHs which are closer to the BS than it, here only 4 is chosen. For 2 , when it chooses its next hop based on the distance to the BS, 1 , 4 and 5 are selected as candidate relay nodes, since 5 has the maximum cost, 5 is finally selected. For 4 , firstly 7 and 9 are selected, since cost( 7 ) > cost( 9 ), 7 is finally selected. For 9 , 10 and 11 , since their distances to the BS is smaller than DIST, they communicate with the BS directly.
Algorithm 2 give the details of this phase.

Protocol Analysis
Theorem 1. There is at most one within each cluster competition radius .
Proof. As we state previously, formula (4) ensures that different nodes have different waiting time. Assume that node has a shorter waiting timer than others and broadcasts the within radius . Thus, all nodes within this range will give up the competition and become plain nodes. Therefore, there is no more than one CH within the radius of any CH.
From Theorem 1 and the proof, we can see that nodes with relatively higher energy are elected as CHs, and there is one and only one CH within the competition radius of any CH.

Theorem 2. The cluster head set generated by the IDUC algorithm is a dominating set, which can cover all the network nodes.
Proof. According to Theorem 1, there is no more than one CH within a cluster, so the cluster head set must be an independent set. After the execution of the IDUC algorithm, each node in the network either is the CH, or the member node of one cluster, any plain node adding to the cluster head set will destroy its independence. Hence, the cluster head set is a maximal independent set. Since a maximal independent set is also a dominating set, the cluster head set generated by the IDUC algorithm is a dominating set.
Therefore, we conclude that the waiting time of any node is smaller than 2 . That is, any expected CH will broadcast a and become a CH before 2 expired, which can avoid the generation of "isolate points. " Proof. At the beginning of each round, each node broadcasts a . Thus, there are in the whole network. In each round, each CM broadcasts a , while each CH broadcasts a , a ℎ , and a . Suppose the number of generated CHs is , then the total number of is − , and the numbers of , ℎ , and messages are all . Thus, the total number of control messages in the entire network is + ( − ) + + + = 2 + 2 . Therefore, the message complexity of control messages in the network is ( ). IUDC adopts a distributed clustering strategy. Thus, the time complexity of the entire network is equal to that of a single node (1). In other words, the time complexity is a constant and has nothing to do with the network size.
Taking a comprehension analysis of IDUC, we can summarize the advantages of IDUC as follows.
(1) Nodes with relatively higher energy are elected as CHs; thus the frequency of recluster will be lower, which is helpful to reduce the energy consumed in reclustering.
(2) There are no "isolate points" in the clustering topology generated by IDUC, which is proved in Theorem 3.
(3) The design of competition radius takes the distance from nodes to the BS, the residual energy of nodes, and the number of neighbor nodes into account. Thus the setting of is more reasonable and suitable for both uniform networks and nonuniform networks.
(4) From Theorems 1 and 3, there is one and only one CH within the competition radius of any CH.
(5) The construction of the new routing tree takes the distance from CHs to the BS, the residual energy of CHs, and the number of CMs covered by CHs into account, which makes the IDUC more suitable for heterogeneous and nonuniform networks.

Simulations
The simulation is performed in − 2, and every simulation result shown in our paper is the average of 50 independent experiments unless otherwise specified. Each experiment is done in different scenarios and two scenarios are chosen to be shown as follows.   Table 2.  we run the cluster setup algorithm of IDUC in Scenario 2. The clustering topology gained is shown in Figure 5. It is obvious that the cluster competition radius is more reasonable when we set ( = 0.3, = 0.3, = 0.4), it contributes to an even clustering topology. That is, the design of in formula (7) avoids imbalance energy consumption among CHs due to the nonuniform distribution of nodes.

Cluster Head Distribution.
By analyzing formula (7), we can draw the conclusion that , , and max are all impact factors which can influence the competition radius of CHs. max is the key parameter to determine the number of CHs generated by IDUC, while , , and determine the weights of nodes' distance to BS, nodes' residual energy, and nodes' communication neighbors number in designing , respectively. We run IDUC in Scenario 2. In this section, we first set = 0.  Figure 6, and it shows the relationship between the number of CHs generated by IDUC and the cluster maximal competition radius max . As Figure 6 shown, the curve of = 0.3, = 0.3, and = 0.4 is higher than that of = 0.15, = 0.15, and = 0.2; meanwhile, the curve of = 0.15, = 0.15, and = 0.2 is higher than that of = 0, = 0, and = 0. The reason is that when max is fixed, with the increase of , , and , the cluster competition radius decreases. Furthermore, the number of CHs decreases with the growth of max , which means the number of generated CHs is determined by max , , , and . When , , and are fixed, the increase of max leads to the increase of , correspondingly, the number of CHs will decrease. As shown in Figure 6, three curves all decline with the gradually increase of max .
To prove the validity of our intercluster routing tree, in Scenario 2, we set = 0. tree, only referring to the energy of CHs, referring to both the energy of CHs and the number of CMs as well as only referring to the number of CMs. In these cases, we compare the network lifetime (we define the network lifetime as percentage node alive PNA [8]. That is, the network lifetime is defined as the time when 90 percent of nodes are still alive). As shown in Figure 7, when set = 0.5, the network lifetime is the maximal in three groups. Thus, we can obtain that the simulation results coincide with the theoretical analysis. In practical application, we can adjust to be an optimal value according to different network scenarios.
The following is the stability analysis of IDUC. We run LEACH, HEED, EADC, ECDC, and IDUC in two scenarios. Figure 8 shows the distribution of CHs numbers in different scenarios, we can see that IDUC and ECDC can achieve more stable performance than other algorithms. Compare IDUC with ECDC, we found that the stability of ECDC is better than that of IDUC in Scenario 1. However; when considering Scenario 2, the stability of IDUC is better than that of ECDC.

Network Lifetime.
In EADUC, we proved that the network lifetime in heterogeneous scenarios is longer than that in homogeneous scenarios if the residual energy of nodes is taken into account when designing the competition radius of CHs. Since we also consider the residual energy of nodes, in our simulation, we only need to test the performance of IDUC in different scenarios where nodes are distributed uniformly and nonuniformly, respectively. Thus, we set = 0.3, = 0.3, = 0.4, = 0.5, and max = 160 m and run IDUC in these scenarios and then compare its network lifetime with EADUC and ECDC. From Figure 9, we can see that the network lifetime of IDUC in uniform scenario is slightly longer than EADUC. The reason is that, different from EADUC, IDUC applies to network scenarios with nonuniform nodes distribution. Thus, in Scenario 2 of Figure 9, we can see that the network lifetime of IDUC is obviously longer than that of EADUC and ECDC, since no matter in designing or in selecting the intercluster routing nodes, IDUC considers the nodes density, which can balance and reduce the energy consumption of CHs and thus extend the network lifetime.
To further test the performance of IDUC, in heterogeneous network Scenario 2 where nodes are scattered unevenly, we run LEACH, EEUC, EADUC, ECDC, and IDUC. Results in the Figure 10 show that ECDC and IDUC perform far better than LEACH, EEUC, and EADUC in prolonging the network lifetime.

Conclusion
In this paper, an improved distributed unequal clustering protocol IDUC is proposed, we design a new cluster competition radius considering the distance between nodes and the BS, the residual energy of nodes, and the numbers of neighbor nodes within the node communication range. Furthermore, to bridge the gap between the numbers of nodes within the initial communication radius and final cluster radius, we design a new intercluster communication routing tree. Theoretical analysis and simulation show that, the protocol is suitable for various network scenarios. In these scenarios, the nodes energy can be efficiently balanced and the network lifetime can be extended significantly.