A Collaborative Data Gathering Mechanism Based on Fuzzy Decision for Wireless Sensor Networks

In spite of the increasing demand for all kinds of sensing services and applications, there still lacks a clear understanding of collaborative techniques to design collaborative protocols for wireless sensor networks. This paper proposes collaborative data gathering mechanism based on fuzzy decision for wireless sensor networks. The proposed algorithm integrates some key parameters, for example, nodes’ residual energy level, the number of neighbors, centrality degree, and distance to the sink, into fuzzy decision. Numerical and simulation results validate the proposed algorithm for the networks in finding the optimum cluster heads and realizing better performances in clusters distribution and energy efficiency improvement.


Introduction
In recent years, wireless sensor networks (WSNs) have seen rapid development in many applications, such as in environmental monitoring, target tracking, outer space exploring, and industrial automation [1]. In spite of the increasing demand for all kinds of sensing services and applications, there still lacks a clear understanding of collaborative techniques to design collaborative protocols for WSNs. Notably, through collaboration WSNs can organize efficiently, prolong system lifetime, and handle dynamics, all with the final goal of eventually executing reliably multiple user applications. Since sensor nodes are expected to be remotely deployed and usually equipped with limited power, it is always inconvenient or even impossible to replenish the power. It is clear that energy conservation and collaborative processing have become the most important challenge in the design of wireless sensor networks. The design should take into consideration the power limitation and incorporate some techniques to maximize the network's lifetime [2,3].
The main task of a sensor node in the monitoring field is to detect events, perform quick local data processing, and transmit the data to the sink node. Among these processes, routing and data gathering are the major considerations in designing the pattern and operation modes of the WSNs. A routing algorithm achieving significant energy efficiency is cluster routing [4]. At present, the cluster organization is widely used in wireless sensor networks. The advantages of using clustering techniques for data collection are as follows: (1) cluster head (CH) node is responsible for longdistance transmission and forwarding of data to ensure the original data communications coverage area; (2) the cluster members can close the communication unit whenever it is necessary, so that the network's energy consumption can be effectively saved; (3) clustering topology is simple and easy to manage and is conducive to the application of distributed algorithms; (4) the protocol for communication between nodes is adaptive and just relies totally on the neighbor node information to decide whether to join or become a cluster head, so that network routing control information can be reduced; (5) the data from all members gathered in the cluster should be aggregated by cluster head, and it can greatly reduce the amount of network traffic. This paper considered the remaining energy level of sensor nodes, centrality degree, and the number of neighbor nodes and consequently determined the weight of each 2 International Journal of Distributed Sensor Networks attribute for cluster head selection by the fuzzy decision, which can effectively increase the high level of residual energy of sensor nodes become cluster head node probabilities and ensure that the distribution of cluster heads is more uniform and reasonable.
For reducing the communication overhead, data aggregation method was presented in wireless sensor networks. For the sensor network with high nodes density, there may be a large amount of redundant data from those who detect the objects in the same domain [5]. According to the relevance of sensing data, error function and fuzzy correlation function can be designed to acquire a comprehensive support degree among the nodes, and then it is possible to determine the reliability of each sensor and improve precision of the data fusion result. As far as the accuracy of cluster head data gathering was concerned, it can effectively reduce the amount of intracluster data and improve the energy efficiency [6].
This paper specifically focuses on exploring energyefficient data gathering mechanism for collaborative WSNs to make WSN-based applications more reliable and effective in industry-related scenarios. In this paper, a collaborative data gathering mechanism based on fuzzy decision (DGM-FD) is proposed, aiming at distributing the load among all sensor nodes evenly. In order to reach this objective, some critical parameters, such as nodes' residual energy level, the number of neighbors, centrality degree, and distance to the sink, are taken into account during the cluster head competition. Numerical and simulation results validate that the proposed algorithm can effectively find the optimum cluster heads and realize performances both in proper distribution of clusters and in improving the energy efficiency of the networks.
The specific contributions of this paper include the following: (i) a literature survey about various existing data gathering algorithms and analysis of their advantages and disadvantages, (ii) an effective collaborative data gathering mechanism based on fuzzy decision for wireless sensor networks is proposed, (iii) an algorithm (DGM-FD) for collaborative data gathering in wireless sensor networks is proposed, (iv) performance analysis of the proposed algorithm and an evaluation of the algorithm with respect to other existing algorithms.
The rest of the paper is organized as follows. We present the related works in Section 2. The energy-efficient data gathering mechanism is formally discussed in Section 3. The proposed DGM-FD model is introduced in detail in Section 4. In Section 5 the proposed mechanism is evaluated and finally we conclude in Section 6.

Related Works
In this section, we present a review of the recent developments of collaborative wireless sensor networks. The focus of attention varies from application specific detection to enhancement of middleware. In order to gather information more efficiently in energy consumption, clustering algorithm is introduced into the applications of wireless sensor networks. In WSNs, nodes can be partitioned into a number of small groups called clusters. Each cluster has a coordinator, referred to as a cluster head, and a number of member nodes. The cluster heads are responsible for aggregating the collected data and forwarding it to the base station (BS) through other cluster heads in the network. By rotating cluster heads periodically, the energy consumption of the sensor nodes over the network can be balanced.
Several WSNs applications require only an aggregate value be reported to the observer. In this case, sensors in different regions of the field can collaborate to aggregate their data and provide more accurate reports about their local regions [7]. Data aggregation reduces the communication overhead in the network, leading to significant energy savings.
Meghanathan proposed two distributed algorithms to construct (i) stable predicted link expiration time-based data gathering (LET-DG) trees that also incur lower delay per round as well as larger throughput per tree and (ii) energyefficient minimum-distance spanning tree based data gathering (MST-DG) trees that incur larger node and network lifetimes and inflict lower coverage loss on the underlying network at any time instant [8]. Bober and Bleakley proposed BailighPulse, an energy efficient data gathering protocol for mostly-off WSN applications [9]. BailighPulse incorporates a novel multihop wake-up scheme that allows for energy efficient recovery of network synchronization after long off periods. Ebrahimi and Assi proposed MSTP [10], a new method for data aggregation in large-scale WSN using compressive sensing and random projection. The proposed method selects random projection nodes to generate routing trees with each projection node gathering a weighted sum from all the nodes in the network. Gupta et al. presented the problem of static itinerary based Agent migration protocol in WSN [11]. ETMAM (energy and trust aware mobile agent migration) presented an integrated solution for reliable agent migration within network. In ETMAM, energy and trust are both considered for deciding the next node for the agent migration.
EECS algorithm had improved LEACH (low-energy adaptive clustering hierarchy) method by changing probability. In probability function [12], energy parameter has been considered to choose cluster heads. Also reduction in search space has increased clustering speed. Li et al. proposed an unequal clustering algorithm (EEUC), where the clusters close to the base station have smaller sizes than clusters far from the base station [13]. However, it may produce lone nodes since the cluster head election is probabilistic.
Fuzzy logic showed its ability to cope with information with a high degree of uncertainty in heterogeneous engineering fields [14]. For some problems, it is usually assumed that there is fuzziness in each objective due to the imprecise nature of judgment as a decision maker [15]. The fuzzy mechanism can be used to find a compromised solution, which looks at the way the solutions are contributing to each objective and assigns a fuzzy variable [16]. The fuzzy mechanism supplies a possible way of finding a compromised solution in case solutions are very close to each other. In this paper, the problem of choosing the optimal cluster heads in wireless sensor networks is solved by a fuzzy based mechanism.
Some of the clustering algorithms employ fuzzy logic to handle uncertainties in WSNs. FCAs use fuzzy logic for blending different clustering parameters in selecting cluster heads [17]. They assign chances to tentative cluster heads according to the defuzzified output of fuzzy if-then rules. The tentative cluster head becomes a cluster head if it has the greatest chance in its vicinity. There are distributed and centralized fuzzy logic clustering approaches. A fuzzy energyaware unequal clustering algorithm (EAUCF) was proposed to address the hot spots problem [18]. EAUCF aimed to decrease the intracluster work of the cluster heads that either are close to the base station or have low remaining battery power. A fuzzy logic approach was adopted in order to handle uncertainties in cluster head radius estimation. The Gupta fuzzy protocol using the fuzzy logic approach to select CHs utilizes three parameters: energy level, concentration, and centrality [19]. The protocol is used to collect energy level and location information for each node in the setup stage.
The CHEFL (cluster head election mechanism using fuzzy logic) protocol used a fuzzy logic approach to maximize the lifetime of WSNs [20]. It was similar to the Gupta protocol but it does not need the BS to collect information from all nodes. Instead, the CHEF protocol uses a localized CH selection mechanism with fuzzy logic. The LEACH-FL (LEACH protocol using fuzzy logic) protocol was proposed in [21]. This protocol used fuzzy logic to improve the LEACH protocol by considering three different parameters: energy level, node density, and distance between the CH and the BS. This model was the same as the Gupta protocol in the sense of a setup stage and a steady-state stage. In order to choose cluster heads, a two-level fuzzy method that included local level and global level was used [22]. In local level, node's capability of being cluster head can be evaluated based on two parameters: energy and the number of neighbors. In global level, three parameters had been considered: centrality degree, closeness to base station, and the distance between cluster heads.

Collaborative Data
Gathering. At present, in most of the existing clustering algorithms, a certain probability of nodes is chosen to become cluster heads that is usually based on the residual energy of nodes, the average energy, neighbors, and other factors [23]. In case of an uneven distribution of network, the energy consumption of nodes is likely to cause uneven and rapid death and is resulting in the coverage holes problem. Therefore, while designing clustering data collection protocol for WSNs, we should consider the following aspects: firstly, clustering algorithm uses fully distributed control mechanisms and overcomes the shortcomings of inadaptability in centralized way. Secondly, compared to the direct routing, clustering algorithm can uniformly manage the distributed nodes in the network so as to achieve the load balancing of the whole network. Thirdly, for avoiding excessive energy consumption of certain nodes with high burden, the distribution of cluster heads should be chosen as evenly as possible.
The collaborative strategy ensures the efficiency and the robustness of the data gathering, while limiting the required communication bandwidth. Data aggregation has emerged as a useful paradigm in sensor networks. The key idea is to combine data from different sensors to eliminate redundant transmissions and provide a rich, multiperspective view of the environment being monitored. However, most of the research works focus on reducing the energy consumed by the sensors during the process of data gathering, such as finding routes between pairs of end nodes. On the other hand, some researches focus on the intrinsic characteristics of the collected data. In some scenarios, sensors are deployed to monitor continuous environmental conditions such as temperature, humidity, or seismic activity and periodically produce relevant information by sensing an extended geographic area that is eventually transmitted to the sink for processing. Since different sensors partially monitor the same spatial region result in the correlated data, the data aggregation can reduce the energy consumption in transmission. For the organization model of clustering, due to the members distributed in an immediate area, there is a strong correlation of the data collected in the cluster.
In the following sections, we will analyze and define the critical elements for cluster head selection, and a correlation function in fuzzy theory is adopted for the intracluster data aggregation.

Centrality
Degree. Due to the uneven distribution of nodes in wireless sensor network, the centrality degree of cluster head will influence the whole intracluster energy dissipation. The more the cluster head deviated from the geometric centre of the cluster, the more energy the member node will consume for transmitting the message. Assume that node is the candidate cluster head with a coordinate which is ( , ). The positions of the all nodes in the range of node can be termed as ( 1 , 1 ), ( 2 , 2 ), . . . , ( , ), respectively. In an ideal condition, we can give the energy function for the intracluster communication of cluster head : where is constant coefficient. It can be derived from (10), and it can be observed that total is the most optimal on the condition that the cluster head located in The centrality degree of cluster head is defined as follows: Centri degree ( ) International Journal of Distributed Sensor Networks where 0 is the communication radius of the sensor and is the number of member nodes in the cluster. With the consideration of different cluster size, the centrality degree can be derived as follows: Cluster head with high centrality degree can help reduce intracluster communication energy. Therefore, the centrality degree of candidate cluster head can contribute to the rational distribution of cluster heads.

The Number of Neighbor Nodes.
In this paper, the network coverage ratio COV is defined as the mathematical expectation of coverage ratio of the whole monitored area: where is the probability of coverage monitoring area for each sensor node and is the number of nodes in the monitored area. For node , its sensing range SRange is covered by neighbor nodes at least. Then, we have where set ( ) is the set of neighbor nodes of node and is the number of all neighbor nodes in the sensing range. In practical applications, the coverage ratio of network only meets the corresponding specified coverage ratio. Assume that is the lower limit of the specified coverage; we can get the number of neighbor nodes that meet the coverage ratio of node in terms of sensing range SRange ; that is to say, COV ( ) = : The optimum number of clusters opt for the sensor nodes distributed randomly in a ( × ) field and opt is defined as follows [24]: Hence, the average number of neighbors is / opt for a cluster. And the deviations between the neighbors of a candidate cluster head and the average number can be calculated by the following equations:

Energy Level.
In the cluster head election, the remaining energy of nodes is a very important factor. The selected nodes which are with plenitudinous residual energy and are relatively close to the sink can well extend the lifetime of network. In order to optimize the energy consumption, this paper adopts the concept of residual energy level. Assume that initial energy of nodes is 0 , and the initial energy of the node is divided into grades 1∼ . In the initial phase, most of the nodes are of abundant energy, and the probability of premature death of node is low. In contrast, in the final stage, the residual energy of node is relatively small. Therefore, we define the energy of each level according to the following formula: With the energy dissipation of the sensors, the level of the residual energy will vary from 1 to . When the remaining energy is reduced, the energy differential is smaller. This indicates that the gap of energy level is getting smaller while the node declines. During the selection of cluster heads, the nodes with high energy level have an advantage. Thus, the energy level of nodes should reasonably reflect the nodes' residual energy: where REL is the energy level of node and is the adjustable parameters.

Distance.
According to the radio energy dissipate model, the node closer to the sink is elected as a cluster head which consumes the lower power than the one far away. Therefore, the distance function is defined as follows: where DS is the distance from node to the base station and DS max is the node most distant to the base station in WSNs. The four indicators described above for the reasonable selection of cluster head have a very important impact. In the next section, we propose the method based on the correlation functions and fuzzy nearness to analyze the proportion of these critical indicators in the cluster heads selection. It will effectively avoid the situation where the lower residual energy node becomes a cluster head node or the distribution of cluster head is uneven.
International Journal of Distributed Sensor Networks

The Proposed DGM-FD Mechanism
The most important thing is to select appropriate cluster heads according to the above indicators for balancing energy expenditure and improving the network lifecycle. Fuzzy decision can be an effective method to sort the complex objectives in the domain of decision theory or select the optimal target resolution with respect to the restriction of fuzzy logic.
The most important factors for the nodes competing for being cluster head (i.e., the remaining energy level of sensor nodes, centrality degree, and the number of neighbor nodes) are fully considered. It would be simplistic and subjective if it is to just use the weighted sum of the evaluation results. Therefore, we adopt the method of fuzzy mathematics to obtain the comprehensive evaluation, and the objective weights of attributes are determined through the entropies according to the positive relationship between the values of the samples. According to the sampled index value of all nodes in the cluster, the ideal solution and the negative ideal solution are calculated. Next, the distance between the object attributes and the ideal solutions can be obtained by the method of weighted Euclidean distance. Finally, the nodes with the opt maximum determining membership will be selected as cluster head in the next round.
After measuring the various attributes for all nodes, the observing matrix can be constructed. The structure of the matrix can be expressed as follows: where denotes the value of the attribute from node .
(2) In order to reflect the differentiation among the raw data of each attribute and reduce the computational overhead of nodes, normalization scheme is adopted and degree of the deviation from the mean value can be given as follows: where mean value of the attributes iŝ= (1/ )∑ =1 and the standard deviation of the attributes is = In addition, the matrix of fuzzy attributes = ( ) × can be constructed.
(3) To solve the problem of cluster head selection, the objective function and the constraints of this model can be described as follows: where is weighting factor and 1 + 2 +⋅ ⋅ ⋅+ = 1, > 0. (4) Calculating the separation of each alternative from the ideal solution and negative ideal solutions in (14), + is the distance (in Euclidean sense) of each alternative from the ideal solution and − is the distance from the negative ideal solution and both are defined as follows: According to the information theory [25], the relative entropy of systems and is defined as follows: In terms of the definition of the relative entropy, + denotes the relative entropy is associated with attribute and ideal solution + , and − is the relative entropy associated with attribute and positive ideal solution − . There exist the following expressions: The relative closeness to the ideal solution is calculated by the following equation: Corr = − /( − + + ). According to the relative closeness of all attributes value to the ideal solution or the negative ideal solution, the vector of the weight in the decision model can be obtained as follows: . (18) Finally, opt nodes can be chosen as a relatively large value, which will be selected as cluster head in the next round.
In densely distributed network, there is an abundance of information that can be collected by sensors. For the nodes especially in one cluster, they usually locate in an adjacent region and monitor same objects. In order to minimize the volume of the transmitted data, we can design an aggregation scheme to exploit spatiotemporal correlations in the readings, which is obtained by the nodes in the network. When the target object is detected, the readings from different sensing nodes often exhibit certain correlation. Obviously, it is possible that there is a lot of redundant data which has a strong correlation in the process of perception. With some nodes being scheduled into the sleep state to avoid the production of these redundant data, it will undoubtedly reduce network energy consumption and prolong the network lifetime.
Therefore, the correlation function in fuzzy theory is presented to calculate the mutual support degree between the nodes during the process of the data aggregation in a single cluster. Then the redundant nodes, which should be changed into sleeping mode, can be chosen in order to save the energy. After receiving the sensed data of all members, the cluster heads will start analyzing perceptual characteristics of these readings. Moreover, as far as cluster density distribution was concerned, the more the nodes with equal or approximate coverage area in the network, the faster the running speed and the more the residual energy. Generally speaking, the nodes' sensibility of data has a positive correlation with the comprehensive support degree.
Assume that and are the perceptional data from nodes and , respectively. With each node's sensing data subject to Gaussian distribution, the probability distribution functions can be described as ( ) and ( ). and denote the data collected in an interval: then the perception data confidence distance between and can be measured as follows: According to confidence distance measurement, confidence distance matrix sensor data can be constructed as follows: Next, we can obtain the supported degree of the data by the other nodes based on the method of fuzzy logic. In general, the threshold is given, and a random the integration factor is calculated as follows: If = 0, the relationship of the supported data between and is poor. Whereas = 1, the relationship is tightened. The problem is that the value of threshold is too absolute and subjective. Thus, the actual situation is not conducive to make an objective judgment. For evaluating the effectiveness of the readings of all nodes, the correlation function of fuzzy theory is used to determine the supported degree for each node in the cluster. Suppose = 1 − ; the correlation function can be defined as follows: The threshold ( ) for node can be calculated according to (15): And the correlation matrix ( ) can be derived as follows: 11 12 where denotes the supported degree of from , 0 ≤ ≤ 1.
In order to minimize the energy consumption, the nodes with low supported degree can be filtered as the redundant nodes, which will be in sleep mode in the next rounds. The rounds of sleep sleep are defined by the following equation: where is the percentage of redundant nodes, active is the working rounds of the cluster head, and 0 is the initial energy of the node.
The number of rounds that the redundant nodes are scheduled to be in sleep mode is associated with the node residual energy.

Simulation Experiment
In this section, the experiments are implemented to evaluate DGM-FD. DGM-FD is compared with two different data gathering algorithms, namely, EECS [12] and FSC [26]. EECS and FSC are known as the classical methods for collaborative data gathering in wireless sensor networks. Several experiments are conducted on NS2 simulator to evaluate these algorithms. In order to evaluate the proposed algorithm, three different scenarios are developed. In the first scenario, the base station is located at the center of the WSNs. In the other two scenarios, the base station is outside the WSNs. In 100 × 100 square area, the distribution of clusters in the network is compared to DGM-FD and FSC. Assume that the transmission radius of the sensor node is 20 meters and each cluster head forwards the aggregated data to the base station directly without using a relay node. In these experiments, the number of deployed sensors is 400 and we choose the formed clusters in the random round for analysis and comparison with different position of the sink node.
In order to extend the lifetime of sensor nodes, FSC makes use of fuzzy logic for cluster head selection. In this section, we firstly utilized FSC to examine the distribution of nodes in the WSNs. Figures 1(a) and 1(b) show the snapshots of proposed algorithm and FSC, respectively. When the sink node locates at the center of the network, there are 65 clusters in DGM-FD and the number of isolated nodes is 6. The average number of nodes in the cluster is 6.15, and the cluster size variance is 4.07. obtains less fluctuation in the number of nodes in all clusters. In other words, there is not much difference in the number of members for most clusters, and these cluster heads will not undertake excessive communication burden. In FCS, the scale of the clusters is much larger, and the cluster head will have to bear a heavier traffic load and sometimes may lead to premature death.
In the next scenario, the base station is located at the edge area of the wireless sensor network. The sink coordinates on this scenario are (100, 100), and the topology of cluster formation is shown in Figure 2. DGM-FD produced 66 cluster heads and 5 isolated clusters, in comparison to 39 cluster heads and 5 isolated nodes. Nevertheless, the cluster size variance of FCS is 8.37, which is still much larger than 3.68 in DGM-FD.
In the third scenario, the base station locates at the edge area of the wireless sensor network and the position of sink is (100, 50), and the topology of cluster formation is shown in Figure 3. It is observed that there are 66 clusters in DGM-FD. Meanwhile, there are 7 isolated nodes with a cluster size variance 3.98. Comparatively, 41 clusters come into being in FCS, and the cluster size variance is 5.37. Due to the factors of centrality of cluster head taken into account, most cluster heads generally located in the center of the cluster, which can significantly reduce energy expenditure. Since the average size of the cluster in DGM-FD is smaller than that in FCS, the former can afford relative low burden. Figure 4 shows the distribution of the alive sensor nodes with respect to the number of rounds for each simulated algorithm. As shown in this figure, DGM-FD is apparently the most energy-efficient algorithm. The sensor nodes of EECS and FSC start to die in the earlier rounds. The death time of the first node of DGM-FD is much smaller than that of nodes of all the other algorithms. DGM-FD provides at least 1970 stable rounds for these particular WSNs, whereas EECS provided 1080 rounds and FSC 1490 rounds. Furthermore, the energy of the network exhausts gradually about 2250 rounds at EECS and 2440 rounds at FSC. But DGM-FD cannot undertake 2700 rounds even when all nodes die. This figure clearly shows that the proposed algorithm is more stable than the other algorithms, in which sensor node deaths begin later for DGM-FD and continue linearly until all sensor nodes die.
The distribution of the average energy consumption of all nodes with respect to the number of rounds for each algorithm is depicted on a fast line chart in Figure 5. The simulation results show that the average energy consumption of all nodes varies comparative stably during the most of During the formation of clusters in DGM-FD, the residual energy of nodes is paid full consideration, and the node with high energy level is selected as cluster head. The factors about the number of adjacent nodes and centrality degree are introduced, which makes the cluster size even and the cluster head locates near the centrality of its cluster as much as possible, while in EECS, due to the random selection of the cluster head, low residual energy nodes may also be selected as cluster head and in this case the death of nodes of low energy level would be accelerated.
The energy dissipation of all nodes in the three algorithms, as well as the variation tendency, is illustrated in Figure 6. As shown in the figure, DGM-FD consumes much less energy than FSC and EECS. As explained earlier, the optimal cluster heads selection is adopted at every round, and the reasonable intracluster data aggregation mechanism is added in DGM-FD, resulting in a significant energy-saving effect.

Conclusions
To improve the energy-efficiency and achieve the network load balancing, a DGM-FD mechanism is proposed in this paper. DGM-FD aims to distribute the workload among all sensor nodes evenly. Moreover, a correlation function in fuzzy theory is adopted for the intracluster data aggregation. The experimental results show that the proposed algorithm can effectively optimize the selection of cluster heads and obtain better performances both in proper distributing of cluster heads and in improving the energy efficiency of the networks. In the future, we will explore energy-efficient data gathering mechanism for collaborative WSNs to make WSNbased applications more reliable and effective in industryrelated scenarios.