An overlapping clustering approach for routing in Wireless Sensor Networks

The design and analysis of routing algorithm is an important issue in Wireless Sensor Networks (WSNs). Most traditional geographical routing algorithms cannot achieve good performance in duty-cycled networks. In this paper, we propose a k-connected overlapping clustering approach with energy awareness, namely k-OCHE, for routing in WSNs. The basic idea of this approach is to select a cluster head by energy availability (EA) status. The k-OCHE scheme adopts a sleep scheduling strategy of CKN, where neighbors will remain awake to keep it k-connected, so that it can balance energy distributions well. Compared with traditional routing algorithms, the proposed k-OCHE approach obtains a balanced load distribution and consequently a longer network lifetime.


Introduction
Studying the behavior of dynamic sensor networks becomes a hot topic. Movements of nodes make the wireless sensor networks (WSNs) [1] a dynamic one. These nodes can communicate with each other in wireless communication radius without any static network interactions. An important issue in dynamic geographical networks is the design and analysis of routing algorithms. Due to the limited communication range of the wireless transceivers, mobile nodes cannot communicate with other nodes unless they are within each other's geographical regions [2,3]. Thus, it may be necessary for a mobile node to require aids of other nodes after checking their geographical routing information in forwarding data packets to its destination.
It will be much more uncertain when we consider the routing issue in a duty-cycled network, since the dutycycle scheduling aims to prolong the network lifetime [4,5] by making some nodes sleep and wake up when packets transmission occurs. Studies on adopting conventional routing protocols in wireless sensor network in a dynamic convention have been generally discussed [3,6]. Our interest in the routing problem in a duty-cycled network falls into the following two aspects: (1) existing routing algorithm could place a heavy load on a newly joined node due to routing table updates and (2) the wireless network connectivity.
Conventional routing algorithms concentrate on finding the shortest path, without much concern about critical issues such as energy efficiency and network lifetime. The problem we discuss here is how to route efficiently in a duty-cycled sensor network. The basic idea behind the algorithms is to divide the network into a number of overlapping clusters. A node's sleep scheduling leads to a change in the network topology, then the membership of cluster changes as well. We propose a cluster formation scheme called -OCHE, which selects cluster heads considering energy availability firstly, and then the cluster heads recruit cluster members. This cluster creation scheme can well be adopted in different routing circumstances.
The rest of this paper is organized as follows. Section 2, we survey related work. Section 3 defines network model, sleeping scheduling model, energy consumption model, and some notations. In Sections 4 and 5, we further detail the -OCHE scheme and routing algorithm. We present specifics of simulation experiments which validate the correctness of the proposed algorithm in Section 6. Finally, Section 7 concludes the paper.

Related Work
Similar to other networks, overload balance, prolonging lifetime, and scalability are the major design concerns of wireless sensor networks. In conventional multihop communications in WSNs, sensors close to the sink are often overloaded, resulting in increased latency and reduced network life span. Such overload might cause latency in communication and reduce life span of network. In addition, the original architecture is not scalable for larger set of sensors covering a wider area of interest. To allow the network to cope with additional load and to be able to afford a large area of interest, clustering routing has been pursued. The main aim of clustering routing is to efficiently maintain the energy consumption of sensor nodes by involving them in multihop communication with a particular cluster and by performing data aggregation and fusion to reduce the number of transmitted messages to sink.

Routing Protocols.
Many routing protocols [7][8][9][10][11] have been studied in the field of wireless sensor networks. Karp and Kung present greedy perimeter stateless routing (GPSR) [7], a novel routing protocol for wireless datagram networks that uses the positions of routers and a packet's destination to make packet forwarding decisions. GPSR makes greedy forwarding decisions using only information about a router's immediate neighbors in the network topology. When a packet reaches a region where greedy forwarding is impossible, the algorithm recovers by routing around the perimeter of the region. By keeping state only about the local topology, GPSR scales better in perrouter state than shortest-path and adhoc routing protocols as the number of network destinations increases. Under mobility's frequent topology changes, GPSR can use local topology information to find correct new routes quickly.
Shu et al. propose an efficient two-phase geographic greedy forwarding (TPGF) [8] routing algorithm for WMSNs. TPGF takes into account both the requirements of real-time multimedia transmission and the realistic characteristics of WMSNs. It finds one shortest (near shortest) path per execution and can be executed repeatedly to find more on-demand shortest (near shortest) nodedisjoint routing paths. TPGF supports three features: (1) hole bypassing, (2) the shortest path transmission, and (3) multipath transmission, at the same time. TPGF is a pure geographic greedy forwarding routing algorithm, which does not include the face routing, for example, right/left hand rules and does not use planarization algorithms, for example, GG or RNG. This point allows more links to be available for TPGF to explore more routing paths and enables TPGF to be different from many existing geographic routing algorithms.
However, these traditional routing algorithm will overload relay nodes with the increase in sensor density. Besides, convergence characteristics of these algorithm are not good enough to meet the need of dynamic networks, such as dutycycled networks.
In [14], Heinzelman et al. have proposed a distributed algorithm for wireless sensor networks (LEACH) in which sensors randomly select themselves as cluster heads with some probability and broadcast their decisions. The remaining sensors join the cluster of the cluster head that requires minimum communication energy. LEACH is one of the most popular clustering routing algorithms for sensor networks and is completely distributed. However, LEACH uses singlehop routing where each node can transmit directly to the cluster head and the sink. Besides, there are a number of clustering algorithms constructing clusters not more than 1hop away from a cluster head, such as DCA [15] and DMAC [17]. Similar to these, Baker and Ephremides [13] propose overlapping cluster with = 1. In large networks single-hop clustering, as shown in Figure 2, may generate a large number of cluster heads and eventually lead to the same problems as if there is no clustering.
In addition, the TEEN [16] and APTEEN [18] are hierarchical protocols designed to be responsive to sudden changes in the sensed attributes such as temperature. Younis et al. [19] have proposed a different hierarchical routing algorithm based on a threetier architecture.
To the best of our knowledge, there is only one clustering algorithm that specifically controls overlapping in the formation of clusters, that is, KOCA [20]. Goal of KOCA is to ensure that the entire network is covered with connected overlapping clusters considering a specific average overlapping degree. KOCA is still a static clustering in which the cluster formation is not changed all the time. This condition causes an unbalance load among all nodes. A node that roles as cluster head (CH) will get more load than a non-CH and so that it will die faster. The death of CHs will break the whole network because the link between nodes and center will be broken. Therefore, KOCA cannot be applied in actual situation commendably. Rotating the CH role distributes this higher burden among the nodes, thereby preventing the CH from dying prematurely. To overcome this problem, we propose -OCHE which allows a node to go to sleep while keeping its neighbors connected, thus the role of CH can be rotated and the load of CH can be balanced. The most important is that -OCHE is the first approach to combine the interior cluster routing with exterior cluster routing which achieves balanced load distribution, longer network lifetime, and quicker routing.

Network Model.
We consider a multihop wireless sensor network where all nodes are alike. We assume that each node has a unique id. The locations of sensor nodes can be obtained by GPS. All sensors transmit at the same power level and hence have the same transmission range . Each sensor node is aware of its geographic location and its 1-hop neighbor nodes' geographic locations. We assume that sensor nodes can know the location of base station by receiving the packet, which comes from there. This assumption is the same as that used in [7,25,26].
All communications are over a single shared wireless channel. A wireless link can be established between a pair ( * Run the following at each node * ) (1) Get the information of current remaining energy EA.
timer fires; (27) Send to its parent node.
Send the to its parent node.  of nodes only if they are within wireless range of each other. The -OCHE algorithm only considers bidirectional links. It is assumed that MAC layer will mask unidirectional links and pass bidirectional links to -OCHE. We refer to any two nodes that have a wireless link as 1-hop or immediate neighbors. Nodes can identify neighbors using beacons.

Sleeping Scheduling Model.
To ensure the network connectivity and prolong its lifetime, we assume that all nodes operate under CKN-based [27] sleep/awake duty cycling. Time is divided into epochs, and each epoch is . On each epoch, nodes run CKN, shown in Algorithm 2, to decide whether to be awake. A node can go to sleep assuming that at least of its neighbors remain awake to keep it connected.
(2) Broadcast and receive the ranks of its currently awake neighbors . Let be the set of these ranks. (6) Go to sleep if both the following conditions hold. Remain awake otherwise.
(i) Any two nodes in are connected either directly themselves or indirectly through nodes within 's 2-hop neighbors that have larger than ; (ii) Any node in has at least neighbors from . And nodes reach a consensus to take turns to sleep, while the whole network is globally connected. Therefore, by changing the value of , the network can manipulate the sleep rate , so that it can proceed with further work with clustering.

Energy Consumption Model.
We use the same radio model defined in [28]. The amount of energy required to transmit an -bit message over a distance is TX ( , ) given by (1) elec is the energy dissipating to power the transmitter or receiver circuitry. The parameters and are the amount of energy dissipating per bit in the radio frequency amplifier according to the distance 0 , which is given by (2) The energy consumed by receiving this packet is TX ( , ) shown by (3),

Notations.
(1) Network size ( ): the number of nodes in the network. Sensor nodes are deployed randomly in a square area with side length of .
(2) Energy consumption rate (EC): for each node, ER i denotes residual energy on the battery, and then EC can be defined as (4), where initial is the initial energy of each node.
(3) Energy availability (EA): (4) Minimum number of awake neighbors in an epoch for each node ( ): through varying the value of , we can keep the network connected and optimize the geographic routing performance.
(5) Average node degree ( ): the average degree of a node is the number of its neighbor nodes. The relation between the average node degree ( ) and the radio range ( ) of a node is given by

The Proposed Algorithm.
In -OCHE, shown in Algorithm 1, a node can go to sleep assuming that at least of its neighbors remain awake to keep it connected. Given a , we can obtain a new network topology within the range of awake nodes in each cycle. Each node can have three possible states: cluster heads (CHs), boundary nodes (BNs), and normal nodes. A cluster head possesses information about not only its own cluster (such as member nodes' IDs) but also adjacent clusters (such as boundary node and adjacent clusters' members). A boundary node belongs to multiple overlapping clusters connecting different clusters to transfer and forward data, and it improves the network robustness effectively. Normal nodes are internal nodes that belong only to one cluster.
In this section, we present the description of the cluster head's selection process as well as the clusters' generation; we then give an example to illustrate a cluster generated by -OCHE.

Cluster Head Selection Procedure.
In -OCHE approach, the important operation is to select a set of cluster heads (CHs) among the nodes in the network and recruit the normal nodes as these CHs' members. The -OCHE approach adopts EA to select CHs, and EA is defined as the battery residual energy after certain consumption during a certain period. At the beginning of each cycle, each node compares its EA to the threshold (the threshold is an empirical value, which is used to control the number of CHs in the network. The optimal threshold is obtained when the CH nodes take 15% [20] of all the nodes in the network. And we use this value for experiments presented in this paper.), if its EA is bigger than the threshold, then it becomes a CH and advertises itself as a CH to the sensors within its transmission range to recruit cluster members. This advertisement ( ) is forwarded to all sensors that are no more than hops away from the CH through controlled flooding. The recruitment message's ( ) header includes CHID, SID, EA, and HC, where CHID is cluster head ID, SID is the sender node ID, and HC is the number of hops leading to the CH node. The HC field is used to limit the flooding of the message to hops. By receiving the recruit message from CHs, a sensor node joins those clusters no matter whether it has belonged to a cluster.
However, if a CH receives a recruit from another CH V, and V has the higher EA, then gives up being a CH and joins V's cluster. Since the forwarding is limited to hops, if a sensor does not receive a CH advertisement within a reasonable time duration, it can infer that it is not within n hops of any cluster head and hence become a CH. In -OCHE, the maximum time that a node should wait for CH advertisement message is set to half cycle of CKN. Note that this is a distributed algorithm and does not demand clock synchronization between the sensors.

Cluster Generation. Each node maintains a table,
, that stores information about the clusters it belongs to. Upon receiving a new message, a node will add an entry in its and check the HC field in the message. Then the node updates HC and parent fields in the corresponding entry in the CH table if the recent message came over a shorter path. Often a message traveling the shortest path in terms of the number of hops would arrive first. However, delay may be suffered at the MAC or link layers. For every entry in its , a node sends a join advertisement ( ) message to CH in order to become a member of the corresponding cluster. To limit the flooding, the message is unicasted using the field .
. The message has the form [SID, CHID, EA] where SID is the ID of the node that will join the cluster and CHID is the ID of the CH node responsible for this cluster. Upon receiving the message, the parent node will add SID to its children field. When a CH node receives join advertisement ( ) sent by an ordinary node, it will compare the number of member nodes to threshold to admit new member and update the count of cluster nodes if the size is smaller than threshold or else abandon the request. Supposing that the rejected node has cluster head already, the clustering process ceases. Otherwise, it looks for another appropriate cluster to join in. There is only one single CHID entry in a ordinary node's CH table, because it belongs to one cluster head, while the overlapping cluster node which connects different clusters has multiple CHID entries. Each cluster head maintains a list of all cluster members, a list of adjacent clusters, and a list of boundary nodes.
The -OCHE algorithm avoids the fixed cluster head scheme, with periodic replacement done by sleep scheduling mechanism to balance the node energy consumption. All cluster members send the current state information to cluster head, and -OCHE chooses the node with the highest EA to be the new head. When the new cluster head gets cluster head's notification, it broadcasts recruitment message and the new cluster forming phase triggers. The process can reduce the energy consumption of broadcast of temporary head.
Note that -OCHE stops in ( ) steps. As is a constant value, the clustering process terminates in a constant number of iterations regardless of the network size.

An Illustrative Example.
We demonstrate our algorithm with Figure 1, which is selected from the simulation results optionally, and we add some necessary information to make it more comprehensible. In Figure 1, there are three clusters with three corresponding cluster heads , , and , and nodes with black frame in the overlapping area are boundary nodes. Note that two cluster heads are not immediate neighbors. Since boundary nodes belong to multiple clusters, their tables contain CHs of those clusters. , , and can communicate with each other through their common boundary nodes in their neighbor cluster head tables.

Routing Algorithm
We first discuss the necessary data structures to be maintained at each node for the routing algorithm, as shown in Table 7. We then explain the routing construction and recovery procedures in the network. The routing construction can be divided into two phases: interior cluster routing and exterior cluster routing.
During the interior and exterior routing phase, routes are constructed between all pairs of nodes. The routing recovery phase takes care of maintaining routing table considering sleep schedule and recovering from an individual node failure.

Interior Cluster Routing.
After cluster head selection and cluster generation procedure, each node completes the construction of two tables: and . stores the information of cluster it belongs to and stores the information of its neighbor clusters. In interior cluster routing construction phase, each node constructs interior cluster routing  3 , and . For cluster , we take the CH node , a normal node 3 , and a boundary node into consideration. As the structure of cluster is a spanning tree, the root is , level-1 children are 1 , 2 , 3 , 4 , and , and level-2 children are 5 , 6 , , and .     to choose the next hop, as they are all its level-1 or level-2 children. When destinations are . * and . * , will check . to choose the next hop, as they are in neighbor clusters. For the other destinations, uses exterior cluster routing which will be discussed in the following section.
In the routing table of 3 , when destinations are and , 3 will check .
to choose the next hop, as they are its children. When destinations are . * and . * , 3 will check .
to choose the next hop, as they are in neighbor clusters. Considering that . has two nodes, and , 3 will choose the node of higher EA to be the next hop. For the other destinations, 3 will check .
to choose the next hop. In the routing table of , when destinations are in neighbor cluster, will check . to choose the next hop, as they are in neighbor clusters. For the other destinations, will check .
to choose the next hop. Considering that . has two entries, will choose the node of higher EA to be the next hop. The results of routing table of , 3 , and are shown in Tables 8, 9, and 10.

Exterior Cluster Routing Construction.
We consider each cluster as a node in exterior cluster routing phase. Each CH node takes the responsibility of each cluster. An original routing algorithm (e.g., GPSR and TPGF algorithms) is running exterior clusters. As shown in Figure 3, each hexagon International Journal of Distributed Sensor Networks 7 Outer Routing presents a cluster and the black nodes are CH nodes. The normal nodes are ignored except source and sink. When source node sends a packet to sink node , node will run interior cluster routing and then packet will be relayed to CH node . At that time, will run exterior cluster routing to decide which neighbor cluster to be the relay cluster and then run interior cluster routing to the CH node of relay cluster.
In general, as shown in Figure 3, the routing path of GPSR  is straight while -OCHE's path is sinuous and adaptive to nodes' energy availability which can obtain a balanced load distribution and consequently a longer network lifetime.

Routing
Maintenance. This phase begins when nodes' status change due to duty-cycle scheduling. The route maintenance in our approach basically boils down to cluster maintenance. After a change in topology, all the nodes have the complete cluster information in the form of and . If all CH nodes have a consistent view of the topology, routing loops will not form. However, due to long propagation delay, network partitions, and so forth, some nodes may have inconsistent topology information. This might lead to formation of routing loops. However, these loops are short term, because they disappear within bounded time (required to traverse the diameter of the network).
The new cluster information will be propagated throughout the network. Among exterior neighbor clusters, it should be noted that only the boundary nodes are responsible for broadcasting and rebroadcasting any new information. This helps in quick dissemination of information across the network. Thus, the convergence of the cluster-based protocols is very quick. When a node of a cluster stops working, after  a certain time, all its neighbors will detect this event. In interior cluster, only its parent node will update the information of and alarm this event to CH node. If the dead node has children, each child will select one neighbor who has the highest EA as its new parent.
Let us illustrate it with an example, as shown in Figures  4 and 5. Let node disappear. This event will be detected by nodes , , , , and and CH node. Since nodes , , , , and are not parent nodes, they will just update to indicate the change. The parent node does not need to forward this event to CH as CH is itself; otherwise it needs to relay the event to CH. Node and node , as 's children, have to look for their new parents. As ( ) is 's ( 's) only neighbor which has the highest EA, it becomes 's ( 's) new parent.

Experiment Setup.
To verify the correctness and effectiveness of the proposed -OCHE algorithm, we conduct a detailed simulation using the NetTopo [29]. In our simulation, the studied WSN has the topology: 400 * 400 m 2 . The number of deployed sensor nodes ranges from 200 to 1000 (each time increased by 100). The value of changes from 1 to A source node is deployed at the location of (50, 50), and a sink node is deployed at the location of (350, 350). We use the GPSR routing protocol implemented in routing layer of the simulator to deliver message. All simulation parameters [30] are listed in Table 11. In Figures 9(a), 9(b), 9(c), 9(d), 9(e), and 9(f), the execution of -OCHE is demonstrated.
In this section, we evaluate traditional GPSR and GPSR in -OCHE approach in terms of energy consumption, network lifetime, and recovery time.

Energy Consumption.
First we compare energy consumption of each sensor in traditional GPSR and -OCHEbased GPSR, which can reflect lifetime of network. Figure 6 reports energy consumption of sensors in two protocols after 600 routing queries have been processed, whereaxis and -axis together decide the location of each sensor node and -axis represents the value of energy consumption. Figure 6(a) shows that some sensors in traditional GPSR consume a lot of energy, especially those located along the two edges and diagonal line of the sensor field to which the data sink belongs. So these sensors are energy hungry ones which consume all 5 joules, while sensors located outside this region just consume as little as 0.1 joules after 600 queries. Obviously, the energy consumption in traditional GPSR is very unbalanced. On the contrary, the load in -OCHE-based GPSR balances very well, as shown in Figure 6(b), where no energy intensive nodes exist. In -OCHE-based GPSR the maximum energy consumption is 3 J and the minimum energy consumption is 0.04 J. In other words, in the -OCHE-based GPSR, by consuming 5 J, the sensor network can process routing at least 1000 times.

Network
Lifetime. Finally, we compare network lifetime, which is more attractive to application scientists and system designers. We set the value of (the threshold determines the ratio of active nodes) 90%. The comparison between traditional GPSR and -OCHE protocols is reported in Figure 7, where -axis is the initial energy of each sensor and -axis is the value of network lifetime. From the figure, it can be easily seen that network lifetime in the traditional GPSR is about 1/6 of that in -OCHE. Additionally, if we decrease the value of , the gap between the traditional GPSR and -OCHE-based GPSR will become much wider. Thus we conclude that -OCHE indeed extends network lifetime much more than that of traditional GPSR. Figure 8 shows the impact of the network size (node density) on the time to repair from an individual node failure for GPSR and -OCHE, respectively. The -axis is the network size and the -axis is the value of recovery time. It is obvious that the recovery time of -OCHE is much   more better than that of GPSR, especially at high values of network size. For low values of network size, that is network size <30, -OCHE will consume a little bit more recovery time due to the maintenance of clusters. For high values of network size, the recovery time of GPSR increases with network size linearly, while the recovery time of -OCHE increases very slowly.

Conclusion
In this paper, we propose a -connected overlapping clustering approach with energy availability for routing and topology information maintenance in WSNs. We compare -OCHE with the classical GPSR, and simulation results show that -OCHE balances the load to extend the lifetime of sensor network. What is more, -OCHE achieves shorter recovery time than GPSR, especially with large network size.