Energy-Efﬁcient Chain Formation Algorithm for Data Gathering in Wireless Sensor Networks

In wireless sensor networks, since sensor nodes are distributed in inaccessible regions for data gathering, they need to be operated during an assigned time without battery recharging and relocation. For this reason, there has been abundant research on improving energy e ﬃ ciency. PEGASIS, one of the well-known chain-based routing protocols for improving energy e ﬃ ciency, builds a chain based on the greedy algorithm. However, due to long communication distance of some sensor nodes in a chain formed by the greedy algorithm, unbalanced energy consumption of sensor nodes occurs. Eventually, the network lifetime from this cause decreases. We propose energy e ﬃ cient chain formation (EECF) algorithm to resolve the unbalanced energy consumption problem caused by long-distance data transmission of some nodes in a chain formed by the greedy algorithm. The simulation results are used to verify the energy consumption balance of sensor nodes and the whole network lifetime. In simulation, it is shown that EECF produces better results than the greedy algorithm.


Introduction
In wireless sensor networks (WSNs), sensor nodes enabled with sensing, computing, and communication functions gather data (e.g., temperature, humidity, infrared light, sound, shock and pressure, etc.). The data is then transmitted to the sink connected to external network [1]. However, sensor nodes are commonly distributed in inaccessible regions depending on the type of application, and the sink is located far away from sensor nodes. For this reason, sensor nodes with the limited battery resource need to be operated during the assigned time without battery recharge and relocation [2]. If each node transmits its data directly to the sink, some nodes that are far away from the sink will die much earlier than the other sensor nodes. This is as a result of rapid energy depletion due to long distance data transmission. Consequently, this problem limits the use of WSN to gather data in certain regions. This becomes a cause that cannot be done to gather data in certain regions. Thus, a more effective use of energy becomes the major challenge in WSNs [3]. To improve energy efficiency, many researchers have suggested various routing algorithms [4][5][6][7][8][9][10][11][12].
In these routing algorithms, power-efficient gathering in sensor information systems (PEGASIS) [12] is one of the well-known chain-based routing protocols. In PEGASIS, a chain is formed using the greedy algorithm. One among sensor nodes will be randomly selected as a leader at each round, and sensor nodes transmit data to their neighbor node along a chain toward the leader node. During data gathering, every sensor node except the end nodes in the chain fuses own data and data received from its neighbor node. The leader node typically receives data from both its neighbors and transmits fused data to the sink. Sensor nodes only communicate with their neighbors in the chain, and take turns selecting the leader node, thereby reducing their energy consumption as well as balancing of energy consumption per round. However, PEGASIS using the greedy algorithm still has weakness. The weakness is that since sensor nodes already in the chain cannot be revisited to prevent looping, distance between neighbors increases gradually. As a result, due to long distance data transmission of some sensor nodes, they consume much more energy to transmit fused data to neighbor. Besides, in WSNs, sensor nodes have limited battery resources. Thus, this unbalanced energy consumption will lead to the lifetime decrease of a network. Hence, sensor nodes need to extend the network lifetime via balanced energy consumption.
In order to achieve this, instead of using the greedy algorithm in PEGASIS, we propose an energy efficient chain formation (EECF) algorithm based on in-order tree traversal algorithm in hierarchical tree (using Buttenfield's [13,14] strip tree geometry algorithm). In EECF, it is possible to extend the network lifetime in terms of two things. First is computation of hierarchical tree by strip tree geometry algorithm. In this computation, binary subtrees are computed via recursive subdivision of the network field. Therefore, three nodes (i.e., parent node, left and right nodes) in each subtree will have laid the foundation for communicating within their small region. Second is chain formation using in-order tree traversal algorithm. During the chain formation, each subtree will connect with nearest subtree as well as each other within its own in the following manner: left child → parent node → right node. In summary, since each subtree means a small region of the network field, sensor nodes can communicate with their near neighbors within small regions without the need to communicate over a long distance in order to gather data.
We used OMNeT++ simulator to evaluate performance. The simulation results are used to verify the balanced energy consumption of sensor nodes and the whole network lifetime. The remaining sections of this paper are organized as follows. Section 2 presents PEGASIS protocol and PEGASIS-related research, and strip tree geometry algorithm is briefly described in Section 3. In Section 4, we describe EECF algorithm, which extends the network lifetime of PEGASIS via balanced energy consumption among sensor nodes. The simulation results for performance verification are described in Section 5. Finally, we conclude the paper with the discussion on our future work in Section 6.

PEGASIS Protocol and Related Research
PEGASIS [12] is the well-known chain-based routing protocol for data gathering. In PEGASIS, immobile sensor nodes are randomly distributed in the network field as shown in Figure 1. The sink is located far from sensor nodes and is fixed. Each node using global positioning system (GPS) knows its own location and that its neighbor nodes. All sensors are homogeneous and energy restricted with the same initial energy. These sensor nodes can control their power and communicate with the other nodes or the sink directly.

Chain Formation.
A chain using a greedy algorithm starting from farthest node away from the sink is formed before the first round. During chain formation process by the greedy algorithm, sensor nodes select the nearest node as the next node. For example, when sensor nodes are randomly distributed as shown in Figure 2, chain index of the sensor nodes is different from identification number of each node. Each node has a unique index in the chain. As stated above, chain formation of PEGASIS start from farthest node away from the sink and use the greedy algorithm. In Figure 2, since a sensor node farthest away from the sink is node-1, chain formation is started from node-1. Therefore, the chain index of node-1 becomes 0 (i.e., C0). According to chain formation process by the greedy algorithm, the next node of node-1 (C0) connects to the nearest node-3. The index of node-3 is increased 1 for the chain index of node-1. In PEGASIS, sensor nodes already contained in the chain cannot be selected as the next node to prevent a loop. For this reason, they (i.e., node-1 and node-3) are excluded from the list of the next candidates on node-3. Until all the sensor nodes are included in the chain, the formation process of a chain is repeated.

Data
Gathering. If chain formation is completed, one node is randomly selected as the leader to transmit fused data to the sink in each round. The leader node sends a small token to the end nodes of the chain for data gathering. By token passing, the data held by each node is transmitted toward the leader node along the chain. Sensor nodes except end nodes fuse own data and data received from their neighbor and then transmit the fused data to the other neighbor node along the chain as shown in Figure 3. The leader node receives data from both neighbors commonly and transmits fused data to the sink which is located far away from sensor nodes.

PEGASIS-Related Research.
In PEGASIS, various algorithms [15][16][17][18][19] have been proposed in different contexts to increase the network lifetime in WSNs. The algorithms proposed in [15][16][17] basically focused on the efficient use of energy for gathering data. The topic of an investigation has gradually been changed to the importance awareness of a leader node for a data transmission to the BS [18,19]. In [15], the authors formed the multiple chains using greedy algorithm to gather data efficiently in a chain-oriented sensor network. In each chain of different levels, a leader node is selected based on the remaining energy. The leader in higher level chain receives data from the other leaders, and then transmits to the BS. The authors of [16] proposed the concentric clustering scheme which considers the network density and location of the BS to enhance its performance and prolong lifetime. In each circle of different levels, sensor nodes form a chain using a greedy algorithm. One node is selected as a header of each chain, and one among these headers transmits aggregated data to the BS. diamondshaped (DS) PEGASIS [17] has expanded the concentric clustering scheme [16] to transmit reliable data. In a chain of each circle of different levels, one or two nodes are selected as the header node according to levels. Aggregated data is transmitted to the BS along the selected header nodes as diamond-shaped structures. The authors proposed diverse strategies of leader selection in [18]. In a chain (using the greedy algorithm), a leader node is selected according to energy aware in the randomly chosen block among 2 or 4 blocks as well as random, shuffle, and high-energy. Lim et al. [19] proposed how to deal with the transmission failure of a leader node to raising energy efficiency. When the leader International Journal of Distributed Sensor Networks node for each round cannot transmit aggregated data to the BS, data loss will occur. One node among leader node's neighbors is selected based on residual energy and transmits the aggregated data to the BS without loss of data.
As described above, most PEGASIS-related algorithms including PEGASIS used the greedy algorithm. In this case, neighbor distance will increase gradually since sensor nodes already in the chain cannot be revisited to prevent loop. Consequently, some sensor nodes need long distance data transmission, thereby consuming much more energy to transmit fused data to their neighbors. This unbalanced energy consumption will lead to the lifetime decrease of a network. In this paper, we propose an energy efficient chain formation (EECF) algorithm for the network lifetime Leader node Nonleader nodes extension via balanced energy consumption. To our knowledge, this paper is the first study based on both algorithms (strip tree geometry algorithm and in-order tree traversal algorithm). The simulation results are used to verify the balanced energy consumption of sensor nodes and the whole network lifetime.

Buttenfield's Strip Tree Geometry Algorithm
Buttenfield's strip tree geometry algorithm [13] is one among vector-based algorithms used to generalize spatial data in geographic information systems (GIS) environment. In [14], algorithm for transmission of vector geospatial data is developed. In EECF algorithm, we use a hierarchical tree computed based on strip tree geometry algorithm. Network field using strip tree geometry algorithm will be recursively subdivided. Subtree will be added in hierarchical tree via the recursive subdivision. Each node communicates with near neighbors in subtree. Hierarchical subdivision by strip tree geometry algorithm is explained below. In Figure 4, initial vector point of strip tree geometry algorithm is point 1 and 27. The first anchor line length is measured as  distance between two endpoints (i.e., coordinates (x 1 , y 1 ) and (x 27 , y 27 )) by a mathematical formula the following: Anchor line length is used to standardize the minimum bounding rectangle (MBR). The MBR is the rectangle bounding the line segment. A coordinate (x 15 , y 15 ) of maximum perpendicular distance from the line connecting between point 1 and 27 will become the first vector point. This vector point 15 is stored in hierarchical strip tree. When the value of perpendicular distance is less than the threshold, that point is removed. By maximum perpendicular distance of point 15, two fields ("left-strip" and "right-strip") are indicated as shown in Figure 4. This process will continue until the preset recursive count.

Our EECF Algorithm
In order to balance energy consumption in PEGASIS, we propose EECF algorithm for chain formation. In EECF, if sensor nodes received a message for chain formation, each node computes a hierarchical tree using strip tree geometry algorithm and then transmits this message to the next node selected based on in-order tree traversal algorithm until all the sensor nodes are included in the chain. After chain formation, one node-like in PEGASIS using the greedy algorithm is randomly selected as the leader to transmit fused data to the sink in each round. By token passing, each node transmits fused data toward the leader node along the chain. The leader node transmits data received from its neighbors to the sink.

Basic Assumptions.
In our sensor network model for simulation, the network field is a 2-dimension area with 100 × 100 (m) size. In the network field, immobile sensor nodes with the same capability are randomly distributed. The sink is fixed at a far location (50, 300) from sensor nodes. We assume the following to form a chain using EECF algorithm in PEGASIS. We have used the same parameters in the same environments as PEGASIS for a fair evaluation. Each node knows global knowledge of the network field (i.e., sensor nodes location, and location of sink), and can directly communicate with the other nodes or the sink. An initial energy for sensor nodes is identical and has energy restriction.

Hierarchical Tree Computation.
First of all, we divide the network field using a straight line that connects a start and end node. The start node becomes furthest node from the sink, and a node with longest communication distance from this start node becomes the end node. Thus, in Figure 5, when the location of the sink is (50, 300), nodes 7 and 8 are the start and end node, respectively. The network field is divided into two regions by a straight line (L 7-8 ) that connects nodes 7 (x 7 , y 7 ) and 8 (x 8 , y 8 ). The straight line (L 7-8 ) equation is computed as follows: So if two regions are R 7 and R 8 , regions R 7 and R 8 have to include both start and end node. Therefore, member nodes of regions R 7 and R 8 are the following: 1 , node 3 , node 5 , node 7 , node 8 }, R 8 = {node 2 , node 4 , node 6 , node 7 , node 8 , node 9 }.
Second is to decide root node for hierarchical tree of each region. In the first step, the network field was divided into  region R 7 and R 8 . In R 7 and R 8 , node 5 and 6 are located at longest vertical distance from straight line (L 7-8 ), respectively. Thus, two nodes become root node of each hierarchical tree as shown in Figure 6. If straight line (L 7-8 ) equation is ax + by +c = 0, vertical distance (V 5 ) of node 5 can calculate from the following: Next step is subdivision process of network field by root nodes. In Figure 5, region R 7 divides into two subregions by vertical line V 5 , and region R 8 also divides by V 6 . Ultimately, the network field divides into four regions (i.e., R 7-5 , R 7-6 , R 8-5 , and R 8-6 ). And nodes 5, 6, 7, and 8 can connect new four straight lines (L 7-5 , L 7-6 , L 8-5 , and L 8-6 ) as shown in Figure 5.
Fourth step performs child nodes addition to hierarchical tree. In four regions (R 7-5 , R 7-6 , R 8-5 , and R 8-6 ), sensor nodes within each region calculate their own vertical distance from four straight lines (L 7-5 , L 7-6 , L 8-5 , and L 8-6 ). As a result, node 1, 2, 3, and 4 located at longest vertical distance are added as child node of root node 5 and 6 as shown in Figure 6. Regions R 7-5 , R 7-6 , R 8-5 , and R 8-6 are once more divided each by vertical lines of nodes 1, 2, 3, and 4, and then longest vertical distance. That is to says the network field divides into 8 regions. In 8 regions, each node calculates its own vertical distance, and sensor node located longest vertical distance is added to hierarchical tree. If all the sensor nodes except start (node 7) or end node (node 8) were added in hierarchical tree, two nodes (node 7 and 8) are added lastly to hierarchical tree, respectively. Through the above computation process, we can see two hierarchical tree structures from the second figure in Figure 5 as shown in Figure 6. In this hierarchical tree, subtrees are added via recursive subdivision of the network field. Sensor nodes will communicate with each other in each subtree, and this has laid the foundation for communicating in small region.

Chain Formation.
We use in-order tree traversal algorithm to form an energy efficient chain on the hierarchical tree. As first, in-order traversal visits left subtree from node of lower level, then visits on root node (i.e., parent node of left subtree). Lastly, it visits the right subtree. In our algorithm, chain formation from start node is started because a start node is located left subtree on low-level as shown in Figure 6. The start node select next node (node 1) located at a more short distance between node 1 and 4. From node 1, sensor nodes (node 5, 3, and 8) form a chain in the order by inorder traversal algorithm, and in hierarchical tree of region 8, chain formation continues in the same manner from left subtree (node 8) on low-level as shown in Figure 7. Figure 8 shows a definite comparison between two chains using the greedy algorithm and EECF algorithm under the identical conditions for data gathering. In both algorithms, chain formation is started from the furthest node 7 from the sink (50, 300). In case of the greedy algorithm, the next node for node-7 becomes the nearest node-1, and the index of node-1 is increased by 1 for the chain index of node-7. Until all the sensor nodes are included in the chain, this process is repeated. In this chain, we can see long communication distance between nodes 4 and 8. This is because node-2 is selected as the next node for node-3 because it is the nearest. Moreover, sensor nodes already in the chain cannot be revisited to prevent loop. Thus, chain using the greedy algorithm is formed as follows: In order to compare two algorithms, we illustrate the chain using EECF algorithm on Figure 7 at network field (100 m × 100 m) as shown below.

Simulation Environment.
We used OMNeT++ simulation tool to evaluate performance for EECF algorithm and two algorithms [12,16]. In this section, our EECF algorithm is called "EECF-PEGASIS," and the algorithms in [12,16] are called "original-PEGASIS" and "enhanced-PEGASIS," respectively. In original-PEGASIS and enhanced-PEGASIS, data held by each node is gathered along the chain using the greedy algorithm. In original-PEGASIS, after single chain formation, the leader node gathers data via token passing and transmits fused data to the sink. When the sink data is received data, this means one round. By contrast, enhanced PEGASIS forms a chain in each different level. In our simulation, we divide the network field into five levels for chain formation. According to the total number of sensor nodes, the number of sensor nodes in each level set the same. One node is selected as a header of each chain, and low level one among these headers transmits aggregated data to the BS. In case of EECF-PEGASIS, sensor nodes transmit a message for chain formation to their next node based on in-order tree traversal path in computed hierarchical tree. If this chain formation is completed, like original-PEGASIS, one leader node transmits gathered data to the sink. We measured the performance in three different scenarios on the location of the sink: (50, 200), (50, 300), and (50, 400). In all scenarios, the total number of sensor nodes will increase from 50 up to 250 (50, 100, 150, 200, and 250). In case the total number of sensor nodes is the same, sensor nodes are distributed randomly at same location Table 1: Radio model.

Radio model Formulas
Transmitting for fair evaluation. Using these scenarios we will be able to compare the performance of three algorithms through increase in number of sensor nodes in each simulation, as well as comparing results of performance based on the location of the sink. In our simulation, there is only one sink with unlimited energy. Sensor nodes randomly distributed in a 100 m × 100 m field. After distribution, the location is fixed. All the sensor nodes are homogeneous and have the same capability. Each of them can adjust its transmission range, and Initial energy is 1.0 J (energy restriction). The size of data that moves from node to node in the chain is the same by using data fusion. The energy consumption for data fusion is 5 nJ/bit/message. We do not consider token passing energy, computation energy, and delay time.

Simulation
Measurements. In order to analyze an accurate performance, we define three measurement results: longest communication distance, average communication distance, network lifetime, and average remaining energy. Our purpose in this paper is to extend the network lifetime via balance energy consumption among sensor nodes. We measured until the energy of the first sensor node is exhausted. Firstly, the longest communication distance indicates longest distance between any two nodes in the chain. According to formulas in Table 1, sensor nodes will consume much more energy to transmit data.  Besides, since communication distance has a vital effect on energy consumption, the measurement of long-distance data transmission of certain nodes is critical to balance energy consumption among sensor nodes. Secondly, the network lifetime means the total number of rounds, and the need for maximization. The reason is that in WSNs, all the sensor nodes need to collect data and transmit their data to the sink during the assigned time without battery recharge. Therefore, it is important to maintain the lifetime of all the sensor nodes for a long time as possible. Lastly, the average remaining energy is used to analyze balance of energy consumption. We define this measurement as the sum of remaining energy values of all the sensor nodes divided by total number of sensor nodes in each simulation. In case of a large value, this means unbalance of energy consumption. This is because the remaining energy of one node among sensor nodes is almost zero, and other nodes except this one node maintain high energy. Therefore, the average remaining energy needs to be minimized by contrast to network lifetime for balancing of energy consumption.

Radio Model.
We, as in [15][16][17][18][19], apply the same radio model used in PEGASIS [12] for message transmitting and receiving. In the first two formulas in Table 1, E TX (k, d) is energy consumption for transmitting a k-bit message and a distance d, assuming that d is d 2 energy loss caused by channel transmission. The second E RX (k) is the formula to compute energy consumption for receiving a k-bit message.
In order to compare performance under the identical simulation conditions as in PEGASIS, we define parameters in Table 2. E elec means an energy that the transmitter or receiver circuitry needs for running. The necessary energy for transmitter amplifier is 100 pJ/bit/m 2 , k is the message size   in bit, and d is the distance in meters between the transmitter and receiver node.

Simulation Results.
We evaluated the network lifetime of three algorithms (i.e., original-PEGASIS, Enhanced PEGA-SIS, and EECF PEGASIS) under different sink location environments until the energy of the first sensor node is exhausted. In each simulation, the total number of sensor nodes is increased from 50 up to 250. Figure 9 shows longest distance between any two nodes in the chain. In EECF-PEGASISS, this longest distance was reduced by around 40% than the greedy algorithm results. In Table 1, distance d 2 energy loss will occur for transmitting. In other words, this means that communication distance is closely related with energy consumption of each node. Therefore, since higher energy consumption of some nodes that caused by long distance data transmission is very important factor to effect in the whole network lifetime, the longest communication distance needs to be minimized.
From Figures 10, 11, and 12, EECF-PEGASIS shows better performance than original PEGASIS and enhanced-PEGASIS. In EECF algorithm, since each subtree in hierarchical tree means smallest unit region in the network   field, sensor nodes can communicate with their nearest nodes within each subtree. Furthermore, each subtree can be connected with its nearest subtree from the viewpoint of the whole hierarchical tree through in-order tree traversal algorithm. As a result, each node communicates with its neighbors within small regions without long-distance data transmission in order to gather data. By these reasons, the network lifetime increased under different sink location environments as shown in Figures 10, 11, and 12. In addition, because all the simulations run in the same network size (100 m × 100 m), each algorithm shows aspect analogous in Figures 10, 11, and 12.
We also evaluated an average remaining energy for different number of sensor nodes under different sink location environments until the energy of the first sensor node is exhausted. When we compare the three algorithms from Figures 13, 14, and 15, EECF-PEGASIS shows low values than original PEGASIS and enhanced PEGASIS. The average remaining energy means average values for remaining energy values of alive nodes except one node. Thus, in case of a large value, this means unbalance of energy consumption. Therefore, the average remaining energy needs to be minimized for energy consumption balance.

Conclusion
In this paper, we have proposed an energy efficient chain formation (EECF) algorithm for resolving unbalanced energy consumption problem caused by long communication distance of some sensor nodes in PEGASIS using the greedy algorithm. In EECF, we use two algorithms, Buttenfield's strip tree geometry algorithm and in-order tree traversal algorithm. To evaluate performance, we measured three metrics until the first sensor node fails. First measurement values are the longest communication distance among communication distance between any two nodes in the chain. Second is the network lifetime, and lastly, we measured the average remaining energy. Through simulation results, we have proved good performance of EECF algorithm in both the balance of energy consumption and whole network lifetime. In our future work, we have a plan to combine our algorithm and the greedy algorithm for reduction of total communication distance and average distance in a chain.