Spatial Correlation Based Low Energy Aware Clustering (LEACH) in a Wireless Sensor Networks

. In this paper, an enhanced Low Energy Aware Cluster Head (LEACH) protocol is proposed. It applies aggregation strategies in the area monitored by sensor nodes to reduce the number of reports sent to sink and to save energy. The basic idea is to weight the information sensed by sensors based on the distortion area in order to estimate better the event at the sink node. This approach seeks to exploit the spatial correlation among nodes and among clusters to assign diﬀer-ent importance to the information aggregated and forwarded by the cluster head nodes. A multi-zone monitoring related to clusters is proposed, and a dynamic weights management is presented to consider distortion at cluster level introduced in the event estimation. A mathematical formulation of the problem and the proposal to weight space and data information is led out. Simulation campaigns in Matlab show the eﬀectiveness of the event estimation in terms of event estimation distortion and network lifetime.


Introduction
Wireless Sensor Networks have attracted considerable attention during recent decades in the research community and, more recently, have become an increasingly popular technology on the market which is widely used in practical applications [1], together with other architectures, such as cellular networks [2] and satellite communications [3].Typically, these networks consist of a number of elements called nodes, small in size and having a low cost that can communicate with each other.Typically, these nodes sense the environment and events and forward reports to a collector called sink.The high asymmetric traffic generated by this multitude of sensors can drain large amounts of energy reducing both the network lifetime and the capability to estimate the event in time.Many papers have been proposed to design energy-efficient routing protocols, to reduce the protocol overhead, to reduce the MAC complexity etc.In this paper, we apply the concept of spatial correlation to improve aggregation capabilities at a cluster layer in order to reduce the number of nodes sending reports to the sink.The concept of spatial related distortion is applied to LEACH in order to improve its performance and to differently weight the cluster heads in data forwarding leading to improvements in event estimation at the sink.This paper has as its objective the evaluation of a trade-off solution between energy-efficient algorithms and specific reliability requirements such as the delivery of a significant amount of measurements and coverage of a monitored area, using the correlation between measurements simultaneously detected by sensor nodes distributed over the monitored area.Reliability from a sensor network point of view is related to the capability to receive a sufficient amount of information from sensors in order to estimate the event correctly.The organization and cooperation of sensor nodes to estimate phenomena with a small number of nodes and information exchanged in the network can lead to many advantages in terms of energy saving and congestion reduction.On the basis of these considerations, this paper proposes an extended model to compute the event and the distortion produced by environmental conditions such as noise and variance associated to the data sensed by sensor nodes.The paper is organized as follows: • Section 2. gives a brief overview of the considered protocols and algorithms that seek to obtain an advantage by spatial-temporal correlations of sensors.
• Section 3. recall the basis of the correlation model adopted in our proposal.
• Section 4. formalizes the contribution and the concept of distortion-aware aggregation in clustering protocols such as LEACH is proposed.
• Section 5. contains simulation results and section VI concludes the paper.

Related Work
There are many studies in the literature that consider correlation information about sensor nodes in a monitoring area.All these contributions, such as [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], investigate theoretical aspects of the correlation to evaluate how the information can be coded, aggregated and distributed among sensor nodes.Interesting papers consider both spatial and temporal correlation of sensor nodes to offer a better reference model to higher layer protocols such as MAC layer and transport layer [4], [5], [6].However, all these studies consider the effective information sensed by sensor modules and transmitted to the sink suggesting of using the average value of the overall info collected from sensor nodes.Differing from these approaches, we also suggest considering the possibility of dividing the monitored space into sub-areas with different distances from the monitored event and with different spatial and temporal correlations.Through this approach and by weighting the information sensed by each sensor node at the sink, it is possible to reduce distortion further with a positive effect on the energy consumption or data collisions on the distributed network.Basing our approach on the model presented in [4] we extended the model to provide a more accurate estimation criteria to use in higher layer protocols, and the model has been applied to LEACH [16].

Architecture and Correlation Model for WSN
In a sensor field, each sensor observes the noisy version of a physical phenomenon.The sink is interested in observing the physical phenomenon using observations from the sensor nodes with the highest accuracy.The physical phenomenon of interest can be modeled as a spatial-temporal process (t, x, y) as a function of time t and spatial coordinates (x, y).The model for the information gathered by N sensors in the event area is illustrated in Fig. 1.The sink is interested in estimating the event source S according to the observations of the sensor nodes, n i , in the event area.Each sensor node n i observes X i [n], the noisy version of the event information, S i [n], which is spatially correlated to the event source S. In order to communicate this observation to the sink through the WSN, each node has to encode its observation.The encoded information Y i [n], is then sent to the sink through the WSN.The sink decodes this information to obtain the estimate, S, of the event source S. Each observed sample X i [n], of sensor n i at time n is represented as: where the pedex i denotes the spatial location of the node n i , i.

Spatial Correlation in WSN
Encoded transmission for the sensor observations is adopted in this work.Each node n i sends a scaled version Y i , of the observed sample X i to the sink according to encoding power constraint P E : where σ 2 S and σ 2 N are the variances of the event information S i and the observation noise N i , respectively.The estimation, Z i , of the event information S i is the Minimum Mean Square Error (MMSE) estimation of Y i , which is given by: In order to investigate the distortion achieved when a smaller number of nodes sends information, it is assumed that only M out of N packets are received by the sink, where N is the total number of sensor nodes in the event area.Since the sink decodes each Y i using the MMSE estimator, the event source can simply be computed by taking the average of all the event information received at the sink.Then, S, the estimate of S is given as: The distortion achieved by using M packets to estimate the event S is given as: where we use the mean-squared error as the distortion metric.D(M ) shows the distortion achieved at the sink as a function of the number of nodes M that send information to the sink and the correlation coefficients ρ(i, j) and ρ(s, i) between nodes n i and n j , and the event source S and node n i , respectively according with [4].

Correlation Based Approach Related to Energy Issues in WSN
In WSN, many individual nodes deployed in large areas sense events and send corresponding information about these events to the sink.When an event occurs in the sensor field, all the nodes in the event area collect information about the event taking place and try to send this information to the sink.Due to the physical properties of the event, this information can be highly correlated in nature according to the spatial correlation between sensor nodes.Intuitively, data from spatially separated sensors is more useful to the sink than highly correlated data from closely located sensors.Thus, it may not be necessary for every sensor node to transmit its data to the sink; instead, a smaller number of sensor measurements may be adequate to communicate the event features to the sink within a certain distortion constraint.Consequently, significant energy saving can be achieved by choosing representative nodes among the nodes in the event area without degrading the achieved distortion at the sink.It is clear that a reduced number of nodes transmitting information decreases contention in the wireless medium resulting in decreased energy consumption.Energy consumed from both transmission of packets and collision penalties can be reduced drastically if the spatial correlation is exploited.Therefore, it is important to find the minimum number of representative nodes that achieve the distortion constraint given by the sensor application.This minimum number can be given as: where D max is the maximum distortion allowed by sensor application.

Information Decoding Based on Multi-Zone and Spatial Correlation
This distortion is computed by where d S, Ŝ denotes its metric.In Eq. ( 1) an estimation Z i of the event information S i , i = 1, . . ., N is defined as a minimum mean square error (MMSE) of the encoded sample Y i : • Y i .and the estimation Ŝ of the event source S is calculated such as referred in Eq. ( 5).Actually, Eq. ( 5) is the average value of the estimations Z i , so using such an estimation of the event source, an equal importance to the information coming from the sensors, distributed throughout the monitored area, is given.This gives rise to the question of whether the estimation values Z i could be treated in some other way, perhaps keeping track of the membership of any particular node to some region or cluster within the monitored area rather than another.

Splitting the Monitored Area in Multiple Zones or Cluster
It is possible to divide the event area into sub-areas.
Assuming that the coordinates of a physical event are known, one could define the circular sub-areas around it.Let k be the number of these sub-areas (zones).So these zones, just for simplicity, are represented by k circular rings bounded by (k − 1) concentric circles (their outer circles) of radii r j , j = 1, . . ., k.Each radius, in fact, is the maximum distance from the event for the zone it defines.The value of the parameter k is a project choice.By setting k, and knowing the positions of all the nodes in the entire area, there is then a precise number of the sensors within each defined area.For the purpose of obtaining a better estimation of the event source compared to [4], based on the sensor measurements originated from the sensors distributed throughout the event area, the way to estimate this information could be changed by assigning specific weights to the sub-areas within the original event area, and thereby be able to give a greater importance, expressed with the specific weights defined, to the information originating from one sub-area rather than another.This estimation is better if, for a fixed number of representative nodes, a lower level of distortion in case of splitting the event area into zones is obtained.So if M * multi denotes the optimal number of representative nodes in the case of splitting, then the level of distortion achieved is lower compared to that achieved considering the entire event area a single macro-zone (macro-area).
where M * multi = M * .On the other hand, this signifies the ability to obtain using the estimation Ŝmulti (M ) in the multi-area case the same level of distortion as in the single macro-zone case, but by using a smaller number of active nodes.
where M * multi = M * .Based on the considerations on the spatial correlation formulated in Eq. ( 2), the sink considers the information originated from a zone situated closer to the physical event as more significant.While decoding the measurements given from the sensors situated in that zone, the sink assigns a greater weight to such information.Given the event coordinates, every zone A j , for example, can be assumed to be a circular area defined by the radius of its outer circumference, which is the greater distance for this zone from the event as expressed in Fig. 2. A set of nodes located at a distance of less than r j and greater than r j−1 belongs to the area.Given a fixed nodes topology within a monitored area, a membership of each node n i is defined: n i ∈ A j .Figure 3 shows a clustered area where different clusters representing different regions with spatial correlations.As in the general case Eq. ( 6) the distortion is given by: where the mean square error is used as the distortion measure.However, with the difference in this case that the estimation Ŝmulti (M ) is a weighted average of all the information received from the sensors with the specific weights assigned to the proper areas.In fact, a coefficient P j is associated to every zone A j .

Weights Computation in a Multi-Zone or Clustered Area
After defining k zones or clusters within the event area, the coefficients P 1 , P 2 , . . ., P k represent the weights associated, respectively, to the A 1 , A 2 , . . ., A k .The estimation Ŝmulti (M ) is calculated as the following sum: where P j is a specific weight associated with the cluster A j , Sj is a partial estimation of the information revealed by the nodes, situated, respectively, in the cluster A j .Each partial estimation Sj , in turn, could be calculated simply as an average of the measurements given from the nodes, allocated in A j , as in the general case considering a single macro-zone: where m j is the number of representative nodes in the area A j , Z jt is the value revealed by the sink, given an observation of the node t situated in the area A j .The estimation Eq. ( 11) is used for calculation of the distortion Eq. (10).Considering the possibility of improving the estimation of the event source in accordance with Eq. ( 10), the coefficients associated with the areas must be such as to further minimize the distortion compared to the single macro-zone case, providing a more accurate estimation of the event source.In addition to the total distortion defined by Eq. ( 6), which is based on all the observations from all the sensors belonging to the overall monitored area, it could calculate a partial distortion D j , j = 1, . . ., k in each of k clusters, varying, as in the general case, the number of representative nodes within each zone.Analogously the minimum distortion in each individual zone (cluster) A j is achieved, when all nodes belonging to the zone send their information to the sink.Furthermore, after reaching a certain number m * j of representative nodes, chosen from their total number n j in the area A j , the distortion does not decrease further, but remains almost constant.Then, a smaller number of sensor nodes could be activated in each zone A j , while achieving the desired values of the distortion function defined by the specific sensor application requirements, and save, therefore, energy.Let n j be the total number of nodes deployed in the area A j .By establishing a discarding threshold or a reliability threshold, expressed as the maximum distortion permitted for this area by the sensor application, a certain level of distortion D j is achieved in each zone A j , so that D j ≤ D max j .The distortion D j is achieved by activating an optimal number of nodes m * j , selected from their total number n j .The lower the distortion produced in an area, the more reliable is the information originating from this area, and a higher weight is given to this information.Since there is a spatial correlation between the sensor observations and the actual event information, the area (cluster) closest to the event is characterized by the lowest distortion and so on.
Following the intuition that a major weight should be given to a more reliable cluster, characterized by a lower distortion or by a lower error, an inverse relationship between the partial distortions D j , obtained in each individual area is defined, and the coefficients P j , j = 1, . . ., k, are associated to these areas.Then: The formula Eq. ( 5) for the event source estimation calculation is represented as follows: If there are k clusters within the monitored area, considering Eq. ( 13), the terms in Eq. ( 19) could be grouped by zones according to Eq. (11): Substituting Eq. ( 12) in Eq. ( 11) we get: where M = m 1 + m 2 + . . .+ m k is the total number of representative nodes located throughout the event area.P j coefficients, according to work in [10] where the variance of data are considered as a measure of the distance by the event, can be computed as follows: In this case, each weight P j is assigned to a cluster giving more importance to samples sent by nodes that reduce the variance from the event estimation value.This approach leads to an event estimation at the sink computed by Eq. (20).
As a second strategy to assign weights to each cluster, the local distortion computed by each cluster head is used.In this case, all the information sent by the cluster head is weighted with weights w i according to the following formula: where k is the number of clusters in the monitored area.In the approach above, the distortion has been considered to give lower importance to cluster where the information carries less reliable information according to [10].Higher distortion values mean lower w i values.In the distortion evaluation not only the variance around the local average is accounted but also the correlation among sensors belonging to the same area such as expressed in [10].

Performance Evaluation
In this section, two case studies are considered.The first one considers the distortion evaluation at a sink in an area of 500 m × 500 m, an event area of 200 m for the single zone (macro-zone) and different radii for the multi-zone case.The second case considers the multizone approach with a weighted info processed at the sink and in different network conditions to see the impact of the zone size on the distortion degradation.We did not show more graphics that we considered in our analysis due to limitations of space.For both cases a power exponential correlation model with and is applied such as that adopted in [4].The design goal of the adaptive data aggregation based on the spatial correlation is to perform a redundancy suppression to increase energy efficiency and to prolong the network lifetime.We used a network simulator (ns-2) and a wireless sensor network test-bed to evaluate the performance of the proposed data weighting strategies in terms of event estimation distortion and network lifetime.The performances of the direct communication to sink from each sensor, classical LEACH and weighted LEACH are considered.In the following paragraph, we first outline the simulation environment, which is followed by the performance evaluation results of the proposed scheme under the data weighting strategies applied to LEACH, considering variance and distance aware weights from the event source are presented.

Simulation Scenario
In our simulation, 100 sensor nodes are randomly distributed in a 500 m × 500 m sensing field with the sink located at (x = 50 m, y = 185 m).The initial energy for each node is assumed to be 0.5 J.In addition, the bandwidth of the channel was set to 1 Mb•s −1 , and the maximum allowed distortion is ±3 • C. For LEACH, each data message was 4 bytes long (32-bit IEEE 754 data format is used), and the packet header for each type of packet was 34 bytes long.The proposed data weighting strategies are implemented based on the LEACH module [18].We use the same radio propagation as described in LEACH, which is described below.
In Eq. ( 21) total energy consumption E T x (L, d) for transmitting L-bit message over a distance d can be expressed as the sum of the following two terms: Eq. ( 1) the electronics energy consumption E T x−elec (L), and Eq. ( 2) the amplifier energy consumption E T x−amp (L, d).E T x−amp can be further expressed in terms of energy consumption for a single bit E T elec (50 nJ•bit −1 ), while E T x−amp can be further expressed in terms of f s (10 pJ•bit −1 •m −2 ) or mp (0.0013 pJ•bit −1 •m −2 ), depending on the transmitter amplifier modern operation.The f s and mp are power loss factors for free space (d 2 loss) and multipath fading (d 4 loss), respectively.An empirical value of the threshold d 0 is set to f s mp .On the receiving side, the total energy consumption E Rx (L) for receiving L-bit message is equal to the amount of E T x−elec (L).

Simulation Results
In the following sub-sections simulation results are presented under classical LEACH with few nodes (cluster heads) sending data to sink and Adaptive LEACH with different weights (P j and W j ) given by cluster heads to data before sending them to the sink: Figure 4 depicts the two distortions obtained considering two different ways to estimate the event source S. The red line is obtained considering Eq. ( 5) and Eq. ( 6) and allowing just to cluster heads to send information collected by sensor nodes towards the sink.The blue line is obtained considering the clustering approach and by applying Eq. (20) where a weighted average is considered.The modified weighted average applied by the cluster head provides a more precise event estimation reducing data distortion.

2)
LEACH with Weighted Data vs. Spatial Distortion Aware LEACH The condition expressed in Eq. ( 14) assures that the weighted average in the case in which the monitored area is divided into clusters, cannot provide an event estimation worse than the case in which Eq. ( 18) is applied.This means that if the condition in Eq. ( 19) is respected, the data coming from thecluster head are not under-weighted and can have the right consideration in the total event estimation at the sink.
In Fig. 5, two event estimation distortions obtained in two different ways to estimate the event are depicted.The red line represents the distortion computed with Eq. ( 18) whereas the green line is obtained by applying Eq. ( 19) for each cluster.Also, if when the number of nodes is low the performance with weights P j is worse, after increasing the number of sensor nodes sending reports in the area slightly, the benefits of the dynamic weights W j overcome the performance of the criteria applied for the weights P j .

3) Event Distortion and Network Lifetime
In order to evaluate the robustness of the weighting strategies in LEACH, some simulation campaigns changing the number of sensors (Fig. 6), fixing the cluster head percentage to 10 % and considering event estimation distortion and network residual energy (Fig. 7) are shown.Also direct communication among sensor nodes and sink without clustering is considered to emphasize the benefits of both clustering and spatial correlation-aware clustering.lifetime because it reduces the number of nodes sending information to the sink.In this case, only the cluster head with aggregation and data weighting strategies can send data to the sink reducing the data forwarding and filtering data to the sink.This also ensures that more nodes can become clusters better balancing energy among nodes and supporting a higher number of rounds in the cluster election.

Conclusions
In this paper, dynamic data aggregation techniques at cluster level are proposed.These aggregation strategies seek to consider variance and distortion of data to estimate the event in the monitored area.Simulation results show that the possibility of considering the event distortion at cluster level in the LEACH can provide a good way to reduce distortion prolonging the network lifetime.The spatial correlation can be a useful concept to improve data aggregation and fusion also for other clustering protocols applied to Wireless Sensor Networks (WSNs).In future works the temporal correlation will also be exploited in order to evaluate the positive effects in LEACH and in other clustering protocols.

Fig. 5 :Fig. 6 :
Fig. 5: Distortion evaluation under weighted LEACH with two different weighting strategies (P j and W j ).

Figure 7 Fig. 7 :
Figure7shows the number of alive sensors.It is possible to see how Eq. (19) also improves the network e. (x i , y i ), S i [n] is the realization of the space-time process s(t, x, y) at time t = t n and (x, y) = (x i , y i ), and N i [n] is the observation noise.{N i [n]} is a sequence of i.i.d Gaussian random variables of zero mean and variance σ 2 N .It is further assumed that the noise each sensor encounters is independent of each other, i.e., N i [n] and N j [n] are independent with i = j and ∀n.The sink is interested in reconstructing the source S according to a distortion constraint: