A Signature-Based Data Security Technique for Energy-Efficient Data Aggregation in Wireless Sensor Networks

Data aggregation techniques have been widely used in wireless sensor networks (WSNs) to solve the energy constraint problems of sensor nodes. They can conserve the significant amount of energy by reducing data packet transmission costs. However, many data aggregation applications require privacy and integrity protection of the real data while transmitting data from the sensing nodes to a sink node. The existing schemes for supporting both privacy and integrity, that is, iCDPA, and iPDA, suffer from high communication cost, high computation cost, and data propagation delay. To resolve the problems, we propose a signature-based data security technique for protecting sensitive data aggregation in WSNs. To support privacy-preserving data aggregation and integrity checking, our technique makes use of the additive property of complex numbers. Out of two parts of a complex number, the real part is used to hide the sampled data of a sensor node from its neighboring nodes and adversaries, whereas the imaginary part is used for data integrity checking at both data aggregators and the sink node. Through a performance analysis, we prove that our privacy-preserving data aggregation scheme outperforms the existing schemes up to 50% in terms of communication and computation overheads as well as up to 3 times in terms of integrity checking and data propagation delay.


Introduction
Wireless sensor networks (WSNs) have been widely studied in ubiquitous computing environment. The WSNs can be applied to various types of applications, such as environment management and military monitoring [1][2][3][4]. However, the sensor nodes that form WSNs have resource constraints such as limited power, slow processor, and less memory. For these reasons, it is essential to improve the energy efficiency of sensor nodes (or WSN) in order to enhance the quality of application service [5][6][7][8][9][10]. The first issue of WSNs is to reduce energy consumption in WSNs. Because the amount of energy consumption for communication is the greatest, it is important to reduce communication overhead. For reducing communication cost, transmitting the required and partially processed data is more meaningful than sending a large amount of raw data. In general, sending raw data causes the energy consumption of sensor nodes because duplicated messages are sent to the same node, called implosion, as well as neighboring nodes receive the duplicated messages if two nodes share the same observing region, called overlapping.
In recent years, data aggregation has been actively used to combine data coming from many sensor nodes. An extension of this approach is in-network aggregation which aggregates data progressively as data are passed through the network [11][12][13][14]. In-network data aggregation can reduce the number of data transmissions and the number of nodes involved in gathering data from a WSN.
The second issue of WSNs is how to preserve sensitive measurements where data privacy becomes an important aspect from an adversary [15]. In many scenarios, the confidentiality of transported data can be considered critical. For instance, data from sensors might measure patients' health information such as heartbeat and blood pressure details. In addition, a future application might measure household details such as power and water usage, thus computing average trends and making local recommendations. Since sensitive data is transported wirelessly among sensor nodes, it is typically prone to interception and eavesdropping. It is mandatory to maintain the data privacy of sensor nodes even from other trusted participating sensor nodes of the WSNs. As a result, even though private data are overheard and  [25] -DADPP [26] -PHA [27] -HP2S [28,29] -SMART [16] -iPDA [19] -CDA [31] -AH [32] -Our scheme -Sheng and Li [30] -PIA [33] Figure 1: Classification of the privacy preserving-data aggregation techniques for WSNs.
decrypted by adversaries, it is necessary to prevent recovering the sensitive information of a sensor node [16][17][18].
The last issue of WSNs is data integrity [19][20][21]. In communication, data integrity is simply defined as maintaining consistency and correctness of messages (message without modification by adversaries). In other words, it is ensured that the received data is not altered in transit either by an adversary or by noise in the data collecting node, that is, sink node. Data pollution due to the noise is an unintentional process and it can be handled by using some existing mechanisms like cyclic redundancy checking (CRC). Hence, the integrity checking due to the unintentional data pollution is out of the scope of this research. On the other hand, the mechanisms like CRC are unable to cope with the intentional data pollution by an adversary because the adversary can generate the same CRC of the source node after modifying the data. As data aggregation result is used for making critical decisions, the aggregation result must be verified before accepting it. For this reason, it is required to design a data protocol for WSNs which can ensure that the aggregated result has not been polluted (manipulation of data by an adversary) on the way to the sink node.
Since data privacy and integrity protection processes consume a significant amount of precious resource (i.e., limited power) of sensor nodes, they shorten the lifetime of the WSNs. Therefore, it is necessary to devise a light-weight technique, which can achieve data privacy and integrity protection efficiently. However, the existing work needs much resource consumption of sensor nodes due to generating unnecessary messages in the network. For this reason, in this paper, we propose a resource-efficient data security technique that can aggregate sensitive data while protecting data integrity in WSNs. Our technique protects from the leak of the sensed data by using the algebraic properties of the complex numbers. Our technique not only ensures that no trend about the sensitive data of a sensor node is released to any other nodes and adversaries, but also can aggregate and hide data for data privacy during transmissions to the data sink. Out of two parts of a complex number, the real part is used to hide the sampled data of a sensor node from its neighboring nodes and adversaries, whereas the imaginary part is used for data integrity. Before transmitting data to a parent node, every sensor node transforms its sampled data into a complex number form. The real part is generated by combining the sampled data with a unique private seed and the imaginary part is generated by appending an imaginary unit to the modified sampled data. Thus, our technique prevents from recovering sensitive information even though private data are overheard and decrypted by adversaries or other trusted participants. For strong data security, our technique can be built on the top of the existing secure communication protocols like [22]. Moreover, our technique can be applied to any type of WSNs regardless of network topology since it is a general approach.
The rest of the paper is organized as follows. In Section 2, we present some related work. Section 3 describes our integrity-protecting sensitive data aggregation technique. Simulation results are shown in Section 4. Along with some future research directions, we finally conclude our work in Section 5.

Related Work
In this section, we present related work for privacy-preserving data aggregation schemes. Figure 1 illustrates the classification of the privacy-preserving data aggregation techniques for WSNs. These techniques are broadly categorized into two categories: homogeneous techniques and heterogeneous ones. They are categorized based on the type of nodes in the WSNs, particularly the type of data aggregating nodes (aggregators). The aggregators can either be special (more powerful) nodes or regular sensor nodes. Moreover, the techniques are further divided into five groups: perturbation in homogeneous technique, shuffling, privacy homomorphism, perturbation in heterogeneous, and hybrid. First, the perturbation technique is also known as data customization. In this technique, every sensor node uses encryption key and/or seeds (private or public) generated by randomization techniques [23,24] in order to hide the sampled data before transmitting them to a parent node. The perturbation in homogeneous technique include iCPDA [21], Conti et al. 's scheme [25], DADPP [26], PHA [27], and HP2S [28,29], International Journal of Distributed Sensor Networks 3 while the perturbation technique in heterogeneous includes Sheng and Li's scheme [30]. Second, in the shuffling technique, every sensor node slices its data into the fixed number ( ) of data pieces and sends a data piece to the selected − 1 number of neighboring sensor nodes. The remaining one piece of data is kept with it. After that, every sensor node assembles the received data pieces including its own piece of data and sends the assembled data to a parent node. SMART [16] and iPDA [19] belong to the shuffling techniques. Third, the privacy homomorphism technique has a special feature that allows arithmetic operations to be performed on ciphertext without decryption. This technique is fast and resourceefficient for privacy-preserving data aggregation, but it has a limitation that it performs only addition and multiplication operations. Before the sensed data are sent to the aggregators, they are encrypted by using the respective keys of sensor nodes and they are added or multiplied without decryption. The CDA [31], AH scheme [32], and our scheme belong to the privacy homomorphism techniques. Finally, the hybrid technique achieves privacy-preserving data aggregation for WSNs by combining the previous techniques. PIA [33] is only the hybrid technique in this literature.
In the previous section, we addressed three important considerations for WSNs, which are energy consumption, data privacy, and data integrity. However, iPDA and iCPDA are the only works to support both privacy preservation and data integrity for WSNs; we provide the detailed explanation of iPDA and iCPDA in Section 2.1.

Privacy Preserving Data Aggregation Scheme with Data
Integrity. He et al. proposed iPDA [19] and iCPDA [21] schemes for WSNs to support privacy-preserving data aggregation as well as data integrity. In the iPDA scheme, they protect data integrity by designing two node-disjoint aggregation trees rooted at the query server where each node belongs to a single aggregation tree. In this technique, first, every sensor node slices its private data randomly into pieces and − 1 pieces are encrypted and sent to the randomly selected sensor nodes of the aggregation tree keeping one piece at the same sensor node. The same process is independently done for each sensor node using another aggregation tree. Then, all the sensor nodes which received data slices from multiple sensor nodes decrypt the slices using their shared keys and sum the received data slices including their own. After that, each sensor node sends the sum value to its parent from the respective aggregation tree. In the same way, the sum data from another set of sensor nodes are transmitted to the query server through another aggregation tree. In the end, the aggregated data from two node-disjoint aggregation trees reach to the base station where the aggregated data from both aggregation trees are compared. If the difference of the aggregated data from the two aggregation trees does not deviate from the predefined threshold value the query server accepts the aggregation result; otherwise, it rejects the aggregated result by considering it as polluted data. However, there are some shortcomings in the iPDA. First of all, during protecting data privacy it generates high traffics in the WSN. As a result, communication cost is significantly increased in the iPDA. Secondly, all sensor nodes use secret keys to encrypt all of their data slices before sending to their respective 2( − 1) number of sensor nodes. So, every sensor node has computation overhead of decrypting all the slices they received before aggregating them.
In the iCPDA, three rounds of interactions are required. Firstly, each node sends a seed to other cluster members. Next, each node hides its sensory data via the received seeds and sends the hidden sensory data to each cluster member. Then, each node adds its own hidden data to the received hidden data and sends the calculated results to its cluster head which calculates the aggregation results via inverse and multiplication of matrix. To enforce data integrity, cluster members check the transmitted aggregated data of the cluster head. There are some disadvantages of iCPDA. Firstly, the communication overhead of iCPDA increases quadratically with the cluster size. Secondly, the computational overhead of CPDA increases quickly with the increase of the cluster size which introduces large matrix, whereas lower cluster size introduces lower privacy-preserving efficacy.
Both iPDA and iCPDA support very weak data integrity checking because if any node modifies its sampled value 30 to 300 and uses the value 300 for aggregation process none of both methods can detect such misbehavior in the network. Hence, in this paper, we propose a new, efficient (in terms of communication overhead and data propagation delay), and general (in terms of supporting network topology) scheme in order to support data privacy and achieve integrity assurance in data aggregation for WSNs. Our scheme is based on the algebraic properties of the complex numbers and it not only ensures that no trend about sensitive data of a sensor node is released to any other nodes and adversaries but also provides data integrity checking of the aggregated value of sensor data.

Integrity-Protecting Sensitive Data Aggregation Technique
To overcome the previously mentioned shortcomings of the iPDA and iCPDA, in this section, we propose a new energy-efficient data aggregation scheme for preserving data privacy in WSNs. Our scheme exploits an additive property of complex number to aggregate the sensed data in WSNs.
Our assumption is that we only focus on additive aggregation function (SUM), like the iCPDA and iPDA. This is because other aggregation functions, such as average, count, variance, and standard deviation, can be obtained by using the additive aggregation function [34]. In our scheme, out of two parts of a complex number ( + ), the real part ( ) is used to hide the sampled data of a sensor node from its neighboring nodes and adversaries, whereas the imaginary part ( ) is used for data integrity checking at both data aggregators and the sink node. Before transmitting data to a parent node, every sensor node transforms its sampled data into a complex number form. The real part is generated by combining the sampled data with a unique private seed and the imaginary part is generated by appending an imaginary unit to the modified sampled data. For this, the sampled value is first mingled with a private seed and then the result ( ) is combined with another real number having ( ) to generate a complex number form ( = + ). The real number with ( ) is the absolute difference between the previous sample data and the current sample data of a node. Note that during network deployment, a Master Device (MD) [35] securely provides a unique real number as a seed to every sensor node of the WSNs after establishing a pairwise secret key with them. Since the MD is an offline server, it shares this information only with the query server for future reference. Thus, the seed of each sensor node is private in the network. Data can be aggregated in upper levels during their transmissions to the query server by using the algebraic properties of complex numbers. Our scheme can check the integrity of the aggregated data at both data aggregators and the sink node at the same time.
The proposed privacy and integrity preserving technique is performed through five steps. In the first step, we assign a special type of positive integer 2 (where = 0 to × 8 − 1, such that is the number of free bytes available in the payload) to every sensor node as node ID. This is because the binary value of every integer of 2 type has only one high bit (1). In addition, the position of the high bit for all integers of this type is unique. The sink node knows a data contributing sensor node through the signature of Node-ID as shown in Table 1. The Node-ID of a sensor node is used to generate a signature of a fixed length. A signature is a fixed size bit stream of binary numbers for a given integer. Signature of a senor node ID can be generated by using the technique presented in the work [36]. We can determine the length of the signature based on the size of a given WSN. When the size of the WSN increases we can increase the length of the signature up to the bytes. In other words, different size WSNs can have signatures of different lengths. The detail of using signatures has been presented in our previous work [37].
When the network receives an SQL-like query for SUM aggregation function, in the second step, the sampled sensitive data ds of each sensor node is, first, concealed in by combining with a unique seed (sr) which is a private real number. The seeds can be selected from an integer range (i.e., space between lower bound and upper bound). By increasing the size of the range, we can further increase the level of the data privacy. Hence, our approach can support data privacy feature strongly. To support data integrity, an integer value -the difference of the previous sensed value and the current sensed value of the sensor node-with is appended to the by using genCpxNum() function to form a complex number = + , where and are real numbers called the real part and the imaginary part of the complex number, respectively, as shown in Table 2. Complex numbers can be added, subtracted, multiplied, and divided by formally applying associative, commutative, and distributed laws of algebra. For the first round, the complex number (value of ) is zero. In Table 2, for instance, the reading 17 of node 5 is encrypted into 46 + 3 . The reading 17 is added to 29, which is a private seed of node 5 and the mask value 46 is calculated. Then, assuming that 3 is the difference value of previous reading and current reading of node 5, the 3 is appended to the result 46 to get 46 + 3 which is a complex number form of the 17 after data customization process. Node 5 includes its signature, that is, 00000101, when it transmits the data as ⟨00000101, 46 + 3 ⟩. We assumed that any sensor node cannot be compromised before sending first round data to the sink node. Every source sensor node keeps the original sensed value of the current round to deduce in the next round which is updated in each round of data transmission. Next, the source node encrypts the customized data 1 , that is, 1 = + , and the signature of the node by using a secret key , [22] and transmits the cipher text to its parent. The term , denotes a pairwise symmetric key shared by nodes and , where the node encrypts data by using a key , and the node decrypts the data by using the key , . In this way, our algorithm converts the sampled data into an encrypted complex number form. Hence, it not only protects the transmitting trend of private data but also does not let neighboring sensor nodes and adversaries to recover sensitive data even though they overheard and decrypted the sensitive data.
In the third step, the parent sensor node (i.e., data aggregator) decrypts the received data by using respective pairwise symmetric keys of its child sensor nodes. For each child node, the parent node computes the difference value ( ) of the two real units by using the stored previous data and the received current data of the child node. For the first round, the value of is also zero. For this, the parent node always keeps the record of the previously received data from each of the child nodes and it updates the previous data by current one in every round. To support local integrity checking, the parent node first compares just computed difference value with the currently received difference value (imaginary unit) from the child node and then compares the difference value with local threshold . If the imaginary unit of the child's current data is equal to the computed difference value and the imaginary unit is not greater than , then the parent node accepts the data of the child node. Otherwise, the parent node rejects the data of the child sensor node considering it as polluted data. For example, we assume that the value for is set to 2 for local integrity checking. Because a parent node checks the integrity of its' child nodes, node 4 checks the local integrity of the node 8. In Figure 1, since the imaginary part of node 8 is 2, which is less than or equal to , node 4 accepts the data of node 8. On the other hand, node 5 will be rejected by its parent node 2 because imaginary part of node 5 is greater than . In the same way, the parent node assures the data integrity of child nodes. After that the parent node adds the data of child nodes including its own by using additive property International Journal of Distributed Sensor Networks 5 Thus, our algorithm supports local integrity checking which enforces to provide consistent data from child nodes. The above process continues at all nodes of the upper levels of the network until the whole partially aggregated data of the network reach to the sink node. In the fourth step, when the sink node receives all intermediate result sets (partially aggregated encrypted customized data with superimposed signature) from the 1hop child nodes, it decrypts them by using respective pairwise symmetric keys and computes the final aggregation SUM 2 from . Since SUM 2 is of complex number form and the sensed data has been concealed in the real unit by using private seeds, identifying the information of the contributed sensor nodes is necessary to deduce actual SUM value. In the last step, the sink node first knows data contributing nodes by checking the high bits (1 s) of the received superimposed signature by performing bitwise AND operation with the prestored signature files or superimposed signature of the Node-IDs of the all nodes of the network. For this, it separates SUM 2 into real unit SUM 2R and imaginary unit SUM 2IM . Because the sampled data of sensor nodes has been concealed within the real unit, the sink node computes the actual aggregated result SUM by subtracting (an inverse operation of masking, step 2) SUM 1R (a freshly computed sum value of the private seeds of the contributed source nodes) from SUM 2R . The final result SUM is always accurate and reliable because of the following two reasons. First, a complex number is an algebraic expression and hence the underlying algebra gives the accurate result of the aggregated sensor data. Second, since the private seeds are fixed integer values (i.e., seeds are not random numbers) after collecting data by the sink node a complex number subtracts exactly the same values that have been added to the sensor data during data hiding process by every source node. At the same time, before accepting the SUM, the sink node performs global integrity checking of SUM to assure whether the SUM 2 has been polluted by an adversary in transit or not. For this, like parent nodes, the sink node also computes the difference value ( ) of the two real units by using the stored previous data and the received current data from the network. The sink node first compares just computed difference value with the currently received difference value, that is, SUM 2IM , from the network and then compares the difference value (SUM 2IM ) with global threshold Δ (for every application, the maximum value for Δ = × , where is the total number of nodes in a network). If the imaginary unit SUM 2IM of the current data from the network is equal to the just computed difference value and the SUM 2IM is not larger than Δ, then the sink node accepts the data of the network and returned the actual SUM to the query issuer. Otherwise, the sink node rejects the SUM considering it as forged/polluted data by adversary or other nodes. For example, as shown in Figure 2, we assume that a local integrity threshold per node equals to 2 and the maximum value for a global threshold (Δ) is calculated as Δ = × = 2 × 8 = 16 . Since a sensor node 5 does not participate in data collection, the global integrity checking value Δ can be computed as × = 2 × 7 = 14 . In this scenario, the received data is considered as a consistent one and is accepted by the sink node, (1) because the value computed at the sink node, that is, 9 , is the same as the one received from the network and (2) the value is less than the global integrity checking value, that is, 9 < 14 . The overall algorithm that performs sensitive data aggregation and integrity checking is illustrated in Algorithm 1.

Performance Evaluation
In this section, we present simulation results of our scheme by comparing it with iPDA and iCPDA schemes in terms of communication overhead and integrity checking. For this, we use TOSSIM [38] simulator running over TinyOS [39] operating system and GCC compiler. We consider 100 sensor nodes distributed randomly in 100 m × 100 m area. As presented in directed diffusion [40], we use such parameters as receiving power dissipation of 395 mW and transmitting power dissipation of 660 mW. Moreover, MATLAB 7.6.0.324 (R2008a) is used to get execution time required for data customization and data aggregation.    Figure 3 shows communication overhead in terms of the number of messages generated in a WSN with respect to varying number of sensor nodes. As expected, the number of messages in the iPDA, iCPDA, and our schemes increases when the number of sensor nodes increases. This is because every sensor node in the WSN is capable of sensing data and when the number of source nodes increases, the number of messages also naturally increases in all of the three schemes. However, our scheme outperforms the iPDA and iCPDA schemes because the existing schemes generate unnecessary messages in the network. The reason is that in our scheme each sensor node can customize its data by itself and it does not need to generate extra messages in the network for data privacy and integrity checking. On the other hand, the iPDA and iCPDA schemes generate six messages and four messages, respectively, for privacy preservation and integrity checking. Due to many messages exchanged among the nodes, the existing schemes cause high data collisions. That is to say, the number of messages generated in the network increases drastically as the number of sensor nodes becomes larger. iPDA and iCPDA schemes consume much energy for successful data transmission, compared with our scheme. The messages generated in the WSN are finally consumed by the sink node. For this, message transmission and message reception processes are involved. Both processes require significant amount of energy. Figure 4 shows communication overhead in terms of energy dissipation by the iPDA, iCPDA, and our schemes with respect to varying number of sensor nodes in the WSN. As expected, the dissipated energy by all three schemes increases when the number of sensor nodes increases. This is because every message generated in the network requires some amount of energy to reach the sink node. However, the power consumption by our scheme is always lower than that of iPDA and iCPDA schemes. The reason is that the iPDA and iCPDA schemes generate too many unnecessary messages in the WSN while achieving integrity protection and privacy preservation in data aggregation. And also every sensor node becomes active for longer time to communicate all the messages. However, in our scheme, every sensor node can achieve both integrity protection and privacy preservation by comparing the current complex number with the previous one. Hence, the energy consumption of our scheme is reduced by 80% and 60% over the iPDA and iCPDA, respectively. Table 3 shows the computation overhead of data aggregation. The result shows that iCPDA has the worst performance on the computation overhead for privacy-preserving data aggregation. The reason is that the iCPDA uses a timeconsuming encryption method with two seeds to achieve data privacy. On the other hand, the computation cost of our scheme is about two times and 83 times faster than those of the iPDA and iCDPA, respectively. It is shown that our scheme reduces a significant amount of resource (CPU time) usage for achieving private data aggregation. This is because our scheme reduces the number of communication messages by using the additive property of a complex number. Figure 5 shows data propagation delay in terms of average time required by sampled data of sensor nodes to reach to the sink node considering data privacy and integrity checking. During this process, a sensor node in iPDA and iCPDA has to communicate (i.e., transmit and receive) at least six and four messages, respectively. Hence, sensor nodes in both iPDA and iCPDA need more active time to perform all communications than our scheme resulting in very high data propagation delay in the existing work. In this way, dutycycling, which is the percent of time that an entity spends in an active state as a fraction of the total time [41], is also increased in the existing schemes. The iCPDA generates less number of messages than the iPDA but has complex  computation for privacy preservation and longer size message than that of the iPDA. Moreover, in iCPDA, the sampled data of sensor nodes is sent to the opposite direction (data is transmitted from the cluster head to the cluster members) of the sink node for privacy preservation process. Therefore, the iCPDA has the worst performance among the three schemes.

Data Integrity.
On the other hand, every sensor node in our scheme sends only one message (the aggregated data) to its parent node because it checks the integrity of the sensed data without the communication of other sensor nodes. Figure 6 provides the performance of three schemes in terms of the detection ratio of polluted messages for integrity checking. It is shown that our scheme can detect all polluted messages, whereas iPDA and iCPDA can detect less than 30% of polluted messages. The reason is that every node in our scheme checks the integrity of its incoming data received from the lower-level nodes. On the other hand, only the sink node can check the integrity of the aggregated data in iPDA, whereas only the sink node and the cluster heads can perform the integrity checking in iCPDA.

Conclusion
In this paper, we proposed an efficient and general scheme in order to aggregate sensitive data protecting data integrity for private data generating environments such as patients' health monitoring application. For maintaining data privacy, our scheme applies the additive property of complex numbers where sampled data are customized and given the form of complex number before transmitting towards the sink node. As a result, it protects the trend of private data of a sensor node from being known by its neighboring nodes including data aggregators in WSNs. Moreover, it is still difficult for an adversary to recover sensitive information even though data are overheard and decrypted. Meanwhile, data integrity is protected by using the imaginary unit of complex-numberform customized data at the cost of just two extra bytes. Through simulation results, we have shown that our scheme is much more efficient in terms of communication and computation overheads, data propagation delay, and integrity checking than the iPDA and iCPDA schemes.
As future work, we will provide more simulation results by designing data integrity and sensitive data-preserving scheme under collusive attacks. Moreover, we will improve our privacy-preserving data aggregation scheme to support MAX and MIN aggregations.