An Efficient Confidentiality and Integrity Preserving Aggregation Protocol in Wireless Sensor Networks

Wireless sensor networks (WSNs) are composed of sensor nodes with limited energy which is difficult to replenish. In-network data aggregation is the main solution to minimize energy consumption and maximize network lifetime by reducing communication overhead. However, performing data aggregation while preserving data confidentiality and integrity is challenging, because adversaries can eavesdrop and modify the aggregation results easily by compromised aggregation nodes. In this paper, we propose an efficient confidentiality and integrity preserving aggregation protocol (ECIPAP) based on homomorphic encryption and result-checking mechanism. We also implement ECIPAP on SimpleWSN nodes running TinyOS. Security and performance analysis show that our protocol is quite efficient while persevering both aggregation confidentiality and integrity.


Introduction
Wireless sensor networks consist of a large number of sensor nodes which can be deployed to sense, transmit, and process the data collected from the environment in many applications such as battlefield surveillance, health care monitoring, and traffic regulation. The environment data will be transmitted to the base station (BS) hop by hop. Sensor nodes have limited energy storage and their computational ability is not so powerful as base station. We cannot supply energy to the nodes at all times because they are usually deployed in the areas humans hardly reach. If some key nodes cannot work as we supposed, the whole network will break down. The energy of the node is mainly spent on data transmission. On TelosB nodes, sending and receiving one bit data cost 0.72 J and 0.81 J which are more than 600 times than 1.2 nJ cost on processing one instruction by processor [1]. Therefore, how to reduce communication overhead is the key issue to prolong the life of wireless sensor networks [2,3].
Other than transmitting data directly to the base station, wireless sensor nodes can perform aggregation operations, such as SUM and AVERAGE, on aggregation nodes. Through this approach, the communication overhead will be reduced largely. An aggregation tree should be built before data aggregation. The topology can be tree-based or cluster-based depending on the applications. Then the sensors need to collect environment data from the monitoring area and transmit the data to the bases station hop by hop. When the intermediate nodes receive the data from their child nodes, they aggregate them using aggregation functions.
Sensor nodes should be deployed in the hostile areas in some applications, such as battlefield surveillance and target tracking. If these data was leaked to the enemies, it would bring heavy losses. The adversaries can contact the physical nodes easily and they can tamper the original sensing data so that the base station will receive the false aggregation results. If the base station cannot receive the true data, it will make wrong decisions. Some other attacks also affect the aggregation process like denial of service (DoS) or data selective forwarding. General data aggregation protocols do not consider these attacks, so they are useless in military applications. Designing a data aggregation protocol 2 International Journal of Distributed Sensor Networks should take security into consideration. All above, secure data aggregation mandates three security factors. Researchers have proposed many data aggregation protocols which can provide some secure protections. Most protocols can only provide confidentiality protection [4,5] or integrity protection [6,7]. Some protocols claim that they can protect both data confidentiality and data integrity, but some problems still exist. Data must be transmitted in some special forms if data integrity needs to be protected. But data form may be changed when some encryption algorithms are used to protect data confidentiality. They are contradictory in a way. Designing a data aggregation protocol which can provide both data confidentiality and data integrity is a challenge.
In this paper, we propose an efficient confidentiality and integrity preserving aggregation protocol (ECIPAP) in wireless sensor networks motivated by protocols SIES [8] and SHIA [9]. Our scheme is based on a lightweight homomorphic encryption which is energy efficient. Through the resultchecking phase, every node in the wireless sensor network can verify if its data was added to the final aggregation results. We use a random number dissemination mechanism to update the keys stored in sensor nodes so as to guarantee the data freshness against replay attacks. This protocol also uses TESLA [10] to broadcast authenticated queries along with the random numbers. We implement our protocol on TelosB physical nodes. Through the results of the experiment and the theoretic analysis, we will show the practicability and high efficiency of our protocol.
The paper is organized as follows. In Section 2, we introduce the related work in this research area. Section 3 explains the system model in our protocol. Section 4 describes our protocol in detail. Section 5 analyses the security, experiment, and performance of ECIPAP. We conclude this paper in Section 6.

Related Work
Hu and Evan proposed the first integrity-preserving hierarchy data aggregation protocol in 2003 [11]. The main idea of this protocol is delayed aggregation and delayed authentication. Girao et al. proposed an aggregation scheme that can guarantee the end-to-end data confidentiality using symmetric key based homomorphic encryption [12]. After that, a series of secure data aggregation protocols have been proposed. Existing protocols, which claim that they can protect both data confidentiality and data integrity, have some potential risks. Generally, we can think that both confidentiality and integrity preserving protocols should satisfy the following conditions.
(i) The encrypted sensing data should only be decrypted at the base station but not on the intermediate nodes thus to implement end-to-end confidentiality.
(ii) If the adversaries tamper the sensing data, base station can verify the data integrity through checking the final aggregation results.
(iii) The energy consumption should be reasonable because of the limited energy in each node.
The protocols proposed in [13][14][15][16] use pair keys to encrypt the data transmitted between two nodes based on the existing integrity-preserving data aggregation protocols. These protocols achieve hop by hop confidentiality but not endto-end confidentiality. Based on a confidentiality protection mechanism, paper [17] provides data integrity protection through using a global key ( ) and a key ( ) shared with the base station to compute the authenticated message ( ) + ( ) of message . This protocol is not secure if an adversary compromises a node and gets the global key ( ) , using which the adversary can forge message as wished. Protocol in [18] uses EC-ElGamal to achieve confidentiality protection. It also uses a global key and the node's private key to compute the signature of message . However, through compromising a sensor node the adversary can get which can be used to tamper the cipher and compute the corresponding signature. Paper [19] proposed a protocol that can protect the data confidentiality. Meanwhile, this protocol also can protect the aggregation data sent by cluster header from being tampered. But adversary can compromise a node and get the public key of the cluster; then the aggregation data can be tampered freely.
Chan et al. presented a provable secure tree-based innetwork data aggregation protocol named SHIA [9]. Without assuming a particular data structure, SHIA can detect any manipulation of the in-network aggregation. SHIA has three phases: query dissemination phase, aggregationcommit phase, and result-checking phase. After the queries were broadcast to the whole network, sensor nodes send their environment data and commitment upward. In the result-checking phase, associated off-path values would be passed down to the aggregation tree. Through this way, sensor nodes can verify if their data was indeed added to the final aggregation results. After node is convinced, it will send an authentication message MAC( , ‖ OK), where OK is a unique identifier and is the key shared with base station, to base station hop by hop. The intermediate nodes perform XOR on these authentication messages and the final result received by base station is MAC( 1 , ‖ OK) ⊕ ⋅ ⋅ ⋅ ⊕ MAC( , ‖ OK). Base station computes this message itself and compares it with the authentication message received. If two messages match, base station accepts the aggregation result. Otherwise, the result will be ignored.
Although this protocol can protect data integrity, the data confidentiality is not guaranteed.
Combining homomorphic encryption and secret sharing, Papadopoulos et al. proposed a secure data aggregation protocol SIES which can protect both data confidentiality and data integrity [8]. In this protocol, each sensor node has a secret message ss , a global key , and a private key shared with base station. After the environment data V was sensed from monitoring area, each sensor node generates a message = V ‖ ss . Sensor nodes use homomorphic encryption to encrypt and send them to their parent nodes. The parent nodes can perform aggregation functions on the encrypted data directly. When the final aggregation result is received by the base station, it will be decrypted and divided into two parts: environment data and secret SS. Base station computes the SS itself and compares it with the secret received. This protocol also has a problem that if the data V of the message was tampered by adversaries, base station cannot detect such changes.
Our protocol improves SIES [8] by using result-checking mechanism motivated by SHIA [9] to protect data integrity instead of secret sharing. We also use homomorphic encryption to allow intermediate nodes perform aggregation directly so as to achieve end-to-end data confidentiality. When the final aggregating result was sent to the base station, the result-checking process will detect whether all the data is added to the final aggregation result.

System Model
Different secure data aggregation protocols support different aggregation functions. The security protection abilities are also various and the adversaries can launch a variety of attacks. This section gives the general problem definition, network assumption, and attack model in detail.

Network Assumption.
We assume that an aggregation tree is already set up in the deployment phase. If not ready, TAG [20] can be used to build such tree-based networks. In our protocol, base station BS has unlimited energy and computing ability. BS can broadcast query messages to the whole network using authenticated method TESLA [10]. In the aggregation tree, there are two types of nodes, leaf nodes and the intermediate nodes. Leaf nodes only sense environment data and encrypt them, while intermediate nodes not only sense environment data but also aggregate their child nodes' data. The distance between two nodes is about 10 m and nodes can only communicate with its child nodes or parent nodes.
Each sensor node has a unique ID and an initial key . We further assume that sensor nodes can perform symmetric additively homomorphic encryption and collision-resistant hash function.

Problem Definition.
In our protocol, we use an efficient symmetric additively homomorphic encryption algorithm. A key distribution mechanism is also used to generate the keys in each sensor node.

Homomorphic Encryption.
As shown in Algorithm 1, is a large integer. Let ( ∈ [0, ]) be the message that needs to be encrypted, and ( ∈ [0, ]) is the initial key stored in every node before deployment. Parameter is the random number sent by base station and is the hash function.

Message
Format. The message sent by sensor node has a fixed format which is a data tuple: where count is the number of the sensor nodes in the subtree. If a sensor node is the leaf node, then count = 1. Parament value is the environment data and it has lower bound min and upper bound max . We also add value = max − value to this data tuple. MAC can be computed as follows: where ID is node's unique ID.

Aggregation
Function. An intermediate node has child nodes 1 , 2 , . . . , . Let V 1 , V 2 , . . . , V be the data sensed by these child nodes from the monitory area. We define aggregation function agg that can be performed on intermediate nodes as shown in Algorithm 2.

Attack Model.
The most threatening attack against data aggregation in wireless sensor networks is node compromising launched by the adversaries. Through this attack, adversaries can obtain the keys of the nodes. After that, they can use fake messages to disturb the final aggregation results. Such stealthy attacks [12] can make base station accept the false aggregation results without being detected. We assume that the adversaries can compromise a fraction of sensor nodes in the network.
Denial-of-service attack is out of our consideration, because this type of attack can be detected easily by the base station when the network works improperly.

ECIPAP
We improve SIES [8] by using result-checking mechanism instead of secret sharing. Then we propose an efficient confidentiality and integrity preserving aggregation protocol (ECIPAP). The homomorphic encryption used in our protocol can guarantee end-to-end data confidentiality. Our protocol has three phases: query dissemination phase, data aggregation phase, and result-checking phase.

Network Deployment.
Before the sensor nodes were deployed in the monitory area, each sensor node shared a private key , a large integer , and a unique ID with the base station. The symmetric additively homomorphic encryption algorithm and hash function SHA-1 are also preset. When the aggregation process begins, we assume that the aggregation tree is already set up. We can use TAG [20] to form an aggregation tree. (i) Count Function:

Query Dissemination Phase.
In each aggregation round, base station BS chooses a random number and the query message such as COUNT, SUM, and AVERAGE. Due to its powerful communication ability, it can broadcast this query message to the whole network along with the random number . When the nodes receive the query messages, they store them in the RAMs and start the data aggregation phase.

Data Aggregation Phase.
Sensor node collects environment data value ∈ [ min , max ] such as temperature and sets value = max − V, count = 1. Next it generates the temporary keys , , , , and , : Then it encrypts these data as follows: The reason why we use different keys to encrypt count, value, and value will be shown latter. Sensor nodes can compute message authentication code as follows: The data tuple can be created now as = ⟨ count, , value, , value, , MAC ⟩ .
When the nodes prepare the data tuples ready, they send them to their parent nodes. We show the aggregation process in Figure 1. There are = 10 nodes in the network and node has two child nodes and . When node receives the data tuples and sent by node and node , it aggregates them with the data tuple created by itself:  The base station BS can also get the MAC agg :

Result-Checking Phase.
When the final aggregation result is received, base station broadcasts the aggregated data tuple down to the whole network using authenticated method. To enable result checking, each sensor node will send a checking message to its child node. We can regard the result checking as a reverse process of data aggregation. We show this in Figure 2. Base station sends the final data tuple to node . Then node can remove the data sent by nodes , , and from . We define the reverse computational operation as ⊖. For example, we can get data tuple created by node as = ⊖ ⊖ ⊖ . Through comparing with the data tuple created by node , node can verify if its data was added to the final aggregation result.
Every sensor node can verify if its own data was added to the aggregation data by comparing its own data to the data sent by parent nodes. If the result passes the verification, then every sensor node prepares an authentication message MAC( ‖ ‖ OK) and sends it to the base station hop by hop. If verification is failed, MAC( ‖ ‖ NO) will be sent. When the intermediate sensor nodes receive these authentication messages, they aggregate them using MAC Aggregation Function. The base station also can calculate this authentication message with its own data stored before network deployment. So comparing these two authentication messages can verify if all the sensing data is added to the final aggregation result. If the authentication message passes the verification, then base station accepts this aggregation result. Otherwise, base station just ignores it.

Security Analysis.
Adversaries can compromise a fraction of sensor nodes in the wireless sensor network. When  a sensor node is compromised, its private information such as encryption keys will be leaked. Adversaries can sniff confidential data sent by the sensor nodes. They can launch stealthy attack to make the base station accept false data without being detected. We assume the adversaries can eavesdrop messages sent between sensor nodes and they already know the range of sensing data. If we use the same key to encrypt both value and value, the adversaries can get the original sensing data through algebraic operation, such as value = ( value, − value, + max )/2. Because we use different keys , , , , and , to encrypt count, value, and value in ECIPAP, data confidentiality can be protected.
Authors in [9] define the direct data injection and optimally secure as follows.

Direct Data Injection.
A direct data injection attack occurs when an adversary modifies the data readings reported by the nodes under its direct control, under the constraint that only legal readings in [ min , max ] are reported.

Optimally
Secure. An aggregation algorithm is optimally secure if, by tampering with the aggregation process, an adversary is unable to induce the base station to accept any aggregation result which is not already achievable by direct data injection.
The message format in our protocol is the same as the message format ⟨vount, value, complement, commitment⟩ used in SHIA [9]. Adversaries can compromise some nodes and set all their readings to min or max so that the final aggregation result is in the range [ min , max ] where is the number of malicious nodes. Obviously, any aggregation   result between these two bounds can be achieved by direct data injection. Hence, our ECIPAP is optimally secure.  Table 2 describes the parameters used in our experiment. We use sensor nodes to form an aggregation tree whose root is a powerful PC. Table 3 shows the data received by base station before decryption. The data in count, value, and value columns are encrypted. We know this environment data is larger than the upper bound max , so we cannot get any useful information from such messages.

Experiment
Then, we use the keys shared between sensor nodes and base station to decrypt the aggregation results and show them in Table 4. The total number of nodes used is 5 and the average temperature is about 23.2 degrees Celsius. Base station broadcasts query messages every 1000 million seconds. Our experiment shows the practicability and high efficiency of ECIPAP.

Performance Analysis.
We give a theoretical analysis of the performance in this section. Through comparing the communication overhead between ECIPAP, SHIA, and SIES, we show that ECIPAP not only protect both data integrity and data confidentiality but also reduce the communication overhead in result-checking phase.
We assume every intermediate node in the network has child nodes and the height of the aggregation tree is ℎ. The length of the data tuple sent by sensor nodes in ECIPAP, SHIA, and SIES are | |, | | and | |. The length of the  authentication messages, sent by sensor nodes in resultchecking phase, are |MAC| and |MAC | in ECIPAP and SHIA. The number of nodes in the whole network is ∑ ℎ =1 . So we can get the communication overhead of ECIPAP, SHIA, and SIES in Table 5.
In ECIPAP, we use 4-byte integers to present encrypted count, value, and value. Hash function SHA-1 outputs a 160bit value. So we know | | = 256 bit and |MAC| = 160 bit. Message length and MAC length in SHIA are the same as ECI-PAP, so we can set | | = 256 bit and |MAC | = 160 bit. The length of message in SIES is also 256 bit. The communication overhead in SHIA is mostly spent in authentication process. Our protocol can reduce such overhead largely as we present in Figure 3.
The total communication overhead reduces by 79% comparing with SHIA when ℎ = 6 and = 3, shown in Figure 4. As the height of aggregation tree increases, the advantages of ECIPAP are more obvious. ECIPAP improves the secure ability of SIES with small communication overhead increase. Our protocol can work more efficiently in the applications where data should be transmitted along multihops.

Conclusion
It is difficult to design an aggregation protocol to protect both data confidentiality and data integrity in wireless sensor networks. Existing protocols which claim to achieve this goal still have some problems. In this paper, we improve SIES by using result-checking mechanism instead of secret sharing. We also use homomorphic encryption algorithm to protect data end-to-end confidentiality. We concentrate on the communication overhead cost in the result-checking phase and propose an efficient confidentiality and integrity preserving aggregation protocol (ECIPAP). We implement our protocol on physical nodes running TinyOS. Through this experiment and theoretical analysis, we show the practicability and high efficiency of ECIPAP.