EMQP: An Energy-Efficient Privacy-Preserving MAX/MIN Query Processing in Tiered Wireless Sensor Networks

We consider a hybrid two-tiered sensor network consisting of regular resource-limited sensor nodes and powerful master nodes with abundant resources. In the architecture, master nodes take charge of storing data collected by sensor nodes and processing queries from the base station. Due to the important role of master nodes, they might easily become the target for the adversary to compromise in an untrusted or hostile circumstance. A compromised master node may leak sensitive data in its storage to the adversary, which breaches the data privacy. This paper proposes EMQP, a novel and energy-efficient privacy-preserving MAX/MIN query protocol which is capable of preventing adversaries from obtaining sensitive data collected by sensor nodes. To preserve privacy, the 0-1 encoding verification, keyed-hash message authentication coding, and symmetric encryption are applied to achieve the secret comparison of data items without knowing their real values. On the basis of secret comparison mechanism, the data submission and query processing protocols are proposed to describe the details of EMQP. And the analyses on privacy protection and energy consumption are also given. Moreover, a hash-based optimization method is presented to save more energy of the resource-limited sensor nodes. The simulation result shows that EMQP is more efficient than the current work in energy consumption.


Introduction
Wireless sensor networks (WSNs) have been widely used in a variety of important areas, such as environment sensing, battlefield monitoring, and volcanic eruption predication. In this paper, we consider a two-tiered wireless sensor network (two-tiered WSNs) [1,2] as shown in Figure 1, which consists of a large number of sensor nodes at the lower tier and relatively fewer master nodes at the upper tier. Sensor nodes are resource limited (computation, storage, energy, etc.) and take charge of collecting data and periodically submitting it to a nearby master node for storage, while master nodes have rich resources, and answer for the ad hoc data queries from the base station which are issued via an on-demand wireless (e.g., satellite) link. It is necessary to maintain such in-network data storage and query processing in remote and tough environments, where it is infeasible or difficult to keep connection between the sensor networks and the base station with the high-speed and always-on manner. The two-tiered architecture is also known to be indispensable for increasing network capacity and scalability, reducing system complexity, and prolonging network lifetime.
As master nodes are responsible for data storage and query answering in networks, they are much more attractive and vulnerable to adversaries in a hostile environment. Once a master node is compromised, serious threats could be brought out. For example, adversaries could use compromised master nodes to steal information about patients in a human health monitoring sensor network, leading to the privacy breach of patients. It is a challenge for master nodes to process queries in such an environment with privacy, since they have to gain information about the collected data items for query result computing, which is conflictive with the privacy preserving objective. Data query is an important operation for events monitoring or data analysis in sensors networks. Recently, privacypreserving range query [3][4][5][6][7][8] and data aggregation [9][10][11][12] have been well addressed, however, research efforts on MAX/MIN query are limited, which is to query the maximum or minimum value in an interested area. In this paper, we focus on the MAX/MIN queries, which are important in many applications. For example, the MAX/MIN query can be applied to monitor the forest fires according to the maximum temperature acquiring.
To the best of our knowledge, only [13] proposed a preliminary solution to privacy-preserving MAX/MIN query in two-tiered WSNs, but it is still with the problem of inefficient energy consumption. This paper proposes an energy-efficient privacy-preserving MAX/MIN query processing (EMQP) for two-tiered WSNs. The basic idea is that sensor nodes first encode their collected data and send them to their nearby master nodes for storage, for the convenience that the master nodes can correctly process MAX/MIN queries over encoded data without knowing their real values. An adversary cannot steal any data items or query results in the master nodes, even when they were compromised. The main contribution of our work is that we introduce 0-1 encoding verification scheme to achieve the secret comparison between the collected data items, without knowing their real values. Based on that method, we propose a novel privacy-preserving MAX/MIN query protocol. To reduce the energy consumption of sensor nodes, we also give a hash-based optimization method, which demonstrates a significant energy-saving benefit. We evaluate EMQP by comprehensive simulation, and the results indicate that EMQP has a good performance compared with other methods.
The rest of this paper is organized as follows. Section 2 gives a brief review of the related work. Section 3 describes the models and the problem statement. In Section 4, we present the details of our energy-efficient privacy-preserving MAX/MIN query protocol. Section 5 gives an optimization for saving energy of sensor nodes. We evaluate the performance of our approach in Section 6 and conclude this paper in Section 7.

Related Work
Data storage models for sensor networks have drawn much attention in existing research work. In [14,15], a novel data storage system is proposed by introducing an intermediate tier between the base station and sensor nodes, which can provide abundant storage for data caching and an efficient access to the data collected by sensor networks for query processing. We consider the same system model in this paper, in which some master nodes are deployed as the intermediate tier for data storage and query answering. In practice, several products of master nodes have been manufactured and are commercially available, such as StarGate [16] and RISE [17].
Recently, verifiable privacy-preserving range query in two-tiered WSNs has been widely studied [3][4][5][6][7][8], aiming to protect the privacy and integrity of range queries. Hacigümüş et al.firstly proposed a bucket partitioning [18] based scheme [3,4], whose basic idea is to divide the domain of collected data values into multiple continuous but no overlapping buckets. In each epoch of time, sensor nodes collect data items, put them into corresponding buckets, encrypt them together in each bucket, and then send the ciphertext along with the corresponding bucket ID to a nearby master node. For each bucket without data items, an encoding number will be generated and transmitted to a nearby master node, which can be used by the base station to verify that the bucket is empty. When the base station executes a range query, it first generates the smallest set of bucket IDs covering the range in the query and then sends the ID set as the query to master nodes. Upon receiving the bucket IDs, the master nodes return the corresponding ciphertext in all those buckets. The base station can then decrypt the ciphertext to get the query result and verify its integrity by encoding numbers. Shi et al. proposed an optimized version [5,6] of integrity verification scheme of [3,4], with the objective to reduce the communication cost of sensor nodes. Since all the works in [3][4][5][6] are based on the bucket partitioning scheme, there is an inherited drawback that the bucket partitioning allows compromised master nodes to obtain a reasonable estimation on the real values of both data items and queried ranges [19]. To solve data estimation problem, Chen and Liu proposed a secure and efficient range query processing protocol, SafeQ [7,8], which is based on Prefix Membership Verification (PMV) [20,21] and neighborhood chains. The PMV scheme can be used to check a data item x whether it is in a range [ , ] without knowing the real values of x, a and b, while the neighborhood chains mechanism can be used to detect the falsifies or forges of query results. Using such PMV and neighborhood chains, SafeQ can correctly process range queries in privacy and integrity preserving circumstance.
International Journal of Distributed Sensor Networks 3 For privacy-preserving MAX/MIN query in two-tiered WSNs, only [13] has presented a preliminary solution. In [13], the same PMV scheme as in [7,8] is used to privately compute the maximum or minimum data item. However, it still has a problem of inefficient energy consumption of sensor nodes. This paper will propose an energy-efficient privacy-preserving MAX/MIN query, the evaluation results of which indicate that it has a better performance than [13] in energy consumption of sensor nodes.

Network Model.
We consider a similar two-tiered sensor networks model as in [3][4][5][6][7][8]. As shown in Figure 1, the network is partitioned into multiple cells, each containing several sensor nodes and a master node. The two types of nodes are different in resource owning. In particular, the master nodes are powerful devices and have abundant resource in energy, storage, and computation, while the sensor nodes are cheap sensing devices with limited resource. Each sensor node periodically transmits its collected data to its nearby master node in the same cell. The base station is in charge of converting users' questions into queries and then disseminating the queries to the corresponding master nodes, which process the queries based on their stored data items and return the query results to the base station via an uppertier multihop network formed by the resource-rich master nodes and an on-demand wireless (e.g., satellite) link between some master nodes and the base station.
As in [3][4][5][6][7][8], we assume that master nodes and sensor nodes know their respective locations and affiliated cells. The time is assumed to be divided into epochs. At the end of each epoch, each sensor node submits all its collected data items to the affiliated master node in its cell.

MAX/MIN Query Model.
A MAX/MIN query is an operation to obtain the maximum or minimum value from an interested area. For simplicity, the following atomic MAX/MIN query will be considered, which is denoted as a four-element tuple: And the query result of the above complicated MAX query is the maximum of the results of 1 , 2 , and 3 . In this paper, we take atomic MAX/MIN query as an abbreviation, "MAX/MIN query", for simplicity.  Since each master takes charge of a unique cell, the adversary will not gain more from the collaboration of multiple honest-but-curious masters. The subsequent discussion in this paper focuses on a cell consisting of a master and sensor nodes { 1 , 2 , . . . , } whose IDs constitute the set Γ = {1, 2, . . . , }. Each sensor node probes several data items during each epoch . We just concentrate on the MAX query processing schemes, while the MIN query is similar and easy to implement.

Threat Model and Problem Statement.
In two-tiered WSNs, the master nodes are too attractive to be easily under attacks from adversaries, since they not only store all the data items collected by sensor nodes, but also take charge of processing queries received from the base station. We assume that the sensor nodes and the base station are trusted but the master nodes. And we adopt the same honest-butcurious threat model as [13], where master nodes may try to breach privacy to obtain sensitive data items but faithfully obey protocols during query processing.
In this paper, we focus on how to provide data privacy preservation and efficient query processing schemes for MAX/MIN queries, while confronting the honest-butcurious master nodes. In addition, we will use the metric of energy consumption of sensor nodes, which directly affects the lift time of the whole networks, to evaluate the performance of our proposed schemes.

0-1 Encoding-Verification-Based MAX/MIN Query Processing
To preserve privacy, it seems natural to have sensor nodes encrypting their collected data items; however, the key challenge is how the master nodes process MAX/MIN queries over encrypted data without knowing their real values.

4
International Journal of Distributed Sensor Networks The basic idea for preserving privacy MAX/MIN query is as follows. We assume that each senor in a network and the base station share a secret key . For the data items that collects in epoch , first encrypts the maximum or minimum data item using key , the result of which is denoted as ( ) . For computation efficiency, we use symmetric encryption like DES, IDEA, and so forth. Then, applies an encoding function R to and obtains R( ). And submits the encrypted and encoded data to its closest master node . When performs a MAX/MIN query, a secret comparing function I will be used for query processing over encrypted and encoded data. The functions R and I satisfy the following conditions: (1) given R( ) and ( ) , it is computationally infeasible for the master node to compute . (2) Given two data items and , ≤ if and only if I( , ) is not null. The former condition guarantees data privacy, while the later allows the master node to determine the very encrypted data containing the maximum or minimum without knowing the real values of the collected.

0-1 Encoding
Verification. 0-1 encoding verification was first introduced by Lin and Tzeng in [22] for solving the millionaires' problem [23], which is to find the richest from several millionaires without leaking the sensitive personal information of their properties.

Theorem 2.
For two numbers and , if they are encoded into 1 ( ) and 0 ( ), one can see that The proof of Theorem 2 refers to [22]. In order to verify whether a number x is not greater than the other number using Theorem 2, we can convert and to 1 ( ) and 0 ( ); thus, ≤ if and only if 1 ( ) ∩ 0 ( ) = ⌀, otherwise > .
To verify whether 1 ( ) ∩ 0 ( ) is null or not, the operation of verifying the equalization of two binary strings is needed. For simplicity, we convert each 0-encoding or 1encoding binary string to a corresponding unique number using a numeralization function N, which should satisfy the following properties: (1) for any 0-encoding or 1-encoding binary string , N( ) is also a binary string; (2) for any two There are many ways to construct N. We use a similar numeralization function as [24]. Given a binary string 1 2 ⋅ ⋅ ⋅ −1 of bits, we insert 1 before 1 . For example, 0101 is converted to 10101. Given a set of 0encoding or 1-encoding binary strings , we denote by N( ) the resulting set of numericalized binary strings. Therefore, ≤ if and only if N( 1 ( )) ∩ N( 0 ( )) = ⌀, and > if and only if N( 1 ( )) ∩ N( 0 ( )) ̸ = ⌀. Figure 3 shows the process of verifying 9 > 4.

Data Submission Protocol.
The data submission protocol (DSP) is concerned with how a sensor node transmits its collected data items to the master node . For each sensor node in , after collecting the data items { 1 , 2 , ..., } in epoch , performs the following steps.
(3) Compute the keyed-hash message authentication code (HMAC) [25] of each data item in N( 0 ( )) and N( 1 ( )) using key , which is shared by all sensor nodes in , but knows nothing about it. An HMAC function using key , denoted as HMAC , satisfies the one-wayness and the collision resistance properties. (The one-wayness property of HMAC means that, given HMAC ( ), it is computationally infeasible to compute and , while the collision resistance property means that it is also computationally infeasible to find two different data items x and y such that HMAC ( ) = HMAC ( ).) Given a set of numbers , we use HMAC ( ) to represent the resulting set after applying HMAC to every numbers in . In summary, this step computes HMAC (N( 0 ( ))) and HMAC ( ( 1 ( ))).
(4) Encrypt to ( ) using key which is shared with the base station.
International Journal of Distributed Sensor Networks 5 (5) Submit the following message to : → : ⟨ , , ( ) , HMAC (N ( 0 ( ))) , The above steps indicate that the aforementioned encoding function R is defined as follows: We name R( ) as comparison factors ( -factors) of , which will be used for the secret comparing in the next section.
Since the HMAC function is with one-wayness and collision resistance properties, and sensor nodes only share the secret key with the base station, given R( ) and ( ) , it is computationally infeasible for the master node to obtain the value of . Therefore, we can see that the DSP can preserve data privacy from the master node.

Query Processing Protocol.
The query processing protocol (QPP) is concerned with how the master node executes a query and returns response to the base station. When receives a query = (MAX, , , Γ ) from the base station, processes on its stored data {( ) , HMAC (N( 0 ( ))), HMAC (N( 1 ( ))) | ∈ Γ }, which is received from sensor nodes in epoch .
We omit its proof here since it can be easily derived from the collision resistance property of HMAC and Theorem 2.
Proof. Suppose that the above condition is satisfied, for each ∈ and ̸ = , we have HMAC (N( 1 ( ))) ∩ HMAC (N( 0 ( ))) = ⌀, and then ≤ can be derived due to Lemma 3. Therefore, we can see that d is not smaller than any other data items in , which means that is the maximum of .
On the basis of Theorem 4, the master node performs the following steps to implement query processing.
(2) Find the encrypted data ( ) whose corresponding -factors satisfying the condition of Theorem 4 and transmit the following response message to the base station: → base station : ⟨ , ( ) ⟩ .
Upon receiving the above message, the base station loads the secret key shared with and then decrypts ( ) to obtain the query result , which is the maximum of the data item collected by the queried sensor nodes in epoch .

Privacy Protection Analysis.
As the privacy protection is the focus in this paper, we propose the privacy analysis about EMQP on the following two aspects.
(1) Privacy of Collected Data. According to the data submission protocol, the submitted information from each sensor node to the affiliated master node is not plaintext but encrypted and HMAC data. Since the HMAC function is with one-wayness and collision resistance properties, and sensor nodes only share the secret key with the base station, given R( ) and ( ) , it is computationally infeasible for master nodes to obtain the value of . Thus, the difficulty for master node to breach privacy is equal with cracking encryption and HMAC. Therefore, we have that EMQP can protect collected data items from master nodes.
(2) Privacy of Query Result. The query processing protocol shows that the master node can use the secret comparing function to obtain the query result, which is the maximum or minimum of data items collected by the queried sensor nodes. Because the secret comparing is built upon the HMAC data items and the collected data items are all encrypted for master node storage, it is also computationally infeasible for master nodes to obtain the value of the query result without keys. As a consequence, we have that EMQP is capable of preserving query result from master nodes.
Since [13] also uses similar HMAC and encryption to protect privacy, the capability of privacy preservation is the same between our work and [13].

Energy Consumption Analysis.
In two-tiered WSNs, sensor nodes have limited energy resource while master nodes are abundant in energy. Therefore, the life time of network is mainly determined by the energy consumption of sensor nodes. In this section, we discuss the energy consumption of sensor nodes in our proposed schemes. (3) the average hops between a sensor node and is ; (4) each collected data item is of bits; (5) each encrypted and HMAC data item is of and bits; (6) the energy consumed by encrypting and HMAC computing a data item are and ℎ ; (7) the energy consumed by transmitting and receiving a data item are and .
The total energy consumption of sensor nodes is composed of two aspects, one is communication cost including sending and receiving messages and the other is computation cost such as encryption and HMAC computing. We use total , , and to represent the total, communication, and computation energy consumption of the sensor nodes, then we have total = + .
As shown in DSP, each sensor node will encrypt the maximum or minimum of its collected data in an epoch and generate its 0-encoding and 1-encoding c-factors having HMAC data items in total. The encrypted data and c-factors will both be transmitted to M. Then, we have According to (8), (9), we have total = ⋅ ( + + ⋅ + ) ⋅ ( ⋅ + ( − 1) ⋅ ) In [13], for each senor node and the local maximum or minimum data item collected by in epoch , will first generate the HMAC computed and numericalized prefix families of and [ , top ], which are denoted as and .
Here, the top is a very large number that is greater than any collected data items. Then encrypts , and the encrypted data will be transmitted to its closest master node along with the HMAC data sets and . According to [13], if is of bits, then has + 1 HMAC data items and has HMAC data items, where 1 ≤ ≤ 2 − 2. So the lower bound of transmitted HMAC data items is +2, while the upper bound is 3 − 1. Since each sensor node computes and transmits the same encrypted data but more HMAC data items, [13] will consume more energy in sensor nodes comparing with our scheme. We will evaluate their energy consumptions in Section 6.

Energy Optimization
The above query scheme will consume much energy in sensor nodes because each sensor node needs to submit -factors which consist of multiple HMAC data, and each HMAC data may have several bits such as 128 bits with HMAC-MD5 [26] or 160 bits with HMAC-SHA1 [27]. In this section, we focus on the c-factor compressing method with the basic idea to compress the HMAC data of -factors, which can significantly reduce the communication cost in sensor nodes. As a result, the energy consumption of sensor nodes will be decreased and the lifetime will be promoted.
Assume that each HMAC data have bits and are randomly distributed in = {0, 1, . . . , 2 − 1}. We use a simple hash function H to compress the HMAC data, which is defined as follows, where ∈ and < , After applying H, each HMAC data of a -factor can be converted to a fewer-bits number, which is called HMAC data and is randomly distributed in = {0, 1, . . . , 2 − 1} because of the random distribution of HMAC data in , such that the -factor is compressed. Given a set of HMAC data , we use H( ) = {H( ) | ∈ } to represent the resulting set of HMAC data after applying H to every items in .
If the HMAC data is replaced with the corresponding HMAC data in (5), we can get a similar secret comparing function I as follows, where and are two collected data items: ∩ H (HMAC (N ( 0 ( )))) .

Lemma 5. For two data items and , one has
(1) If I ( , ) = ⌀, then ≤ must be true. = ⌀ holds, the false decision is to be emerged. We denote the maximum false positive rate as Pr. In practical, if Pr is low enough, the -factor compressing method is still acceptable. In the subsequent of this section, we will give the analysis of Pr in comparing two data items by Lemma 5. We assume that each collected data item is of bits, and are the -factors of data items and where = HMAC (N( 1 ( ))) and = HMAC (N( 0 ( ))), and ∩ = ⌀. Apparently, the more HMAC data that and have, the higher probability that H( ) ∩ H( ) ̸ = ⌀ emerges.
Therefore, we assume that and both have HMAC data items, which is the upper bound of the quantity of HMAC data that each -factor has, = { 1 , 2 , . . . , } and = { 1 , 2 , . . . , }. For each cHMAC data ∈ , there are at least ⌊2 /(2 −1)⌋ HMAC data items in whose results equal when module 2 − 1, such as , + (2 + 1), + 2 * (2 + 1). Supposing H( ) has cHMAC data items where 0 < ≤ , there will be a set having ⌊2 /(2 −1)⌋⋅ HMAC data items, which satisfies H( ) ∈ H( ) for each ∈ . Therefore, for each ∈ , the probability of H (  . It is apparent that the probability will reach the maximum when = . As a result, we have If the 128-bit HMAC-MD5 is used ( = 128), we have the results of the impact of and on Pr as shown in Figure 4, which indicate that Pr could be very low if an appropriate is chosen. For instance, assuming that = 16 and = 24, then we have Pr = 1.53 * 10 −5 . Obviously, such low false positive rate rarely affects the result of secret comparing.

Performance Evaluation
To evaluate the performance of our proposed EMQP and the current work [13], which is denoted as PMQP, we implement both schemes and perform energy consumption comparison on the simulator of [28] with the same data set as [13] which is from Intel Lab [29]. We use PMQP(bot) and PMQP(top) to represent the lower and upper bounds of energy consumption of PMQP, and EMQP(bas) and EMQP(opt) to represent energy consumption of EMQP before and after hash-based optimization. We carry out evaluations on a MAX query in a cell with sensor nodes and a master node, and we consider the following two aspects: firstly, the total energy consumption total of sensor nodes in EMQP and PMQP will be given, while the communication and computation energy cost and in EMQP(opt) will be secondly measured as EMQP(opt) is the most energy-saving scheme.
The evaluations are conducted on a PC with a P4 3.0 GHz CPU and 512 MB memory running Ubuntu operating system. The placement of sensors nodes of a cell follows a uniform distribution over a two-dimensional region covering a 100 × 100 m 2 area, and the radius of sensor communication is assumed as 10 m. According to [30], the energy consumed by transmitting and receiving 1-bit data in wireless communication are computed as follows: = + × and = , where is the distance to which a bit is being transmitted, is the path loss index, and capture the energy dissipated by the communication electronics, and represents the energy radiated by the power-amp. In our simulation, the values for these parameters as in [30] are adopted as follows: = 10 pJ/bit/m 2 , = 45 nJ/bit, = 135 nJ/bit, and = 2. In addition, we assume that the energy of encrypting a data item is adopted as = 8.92 J which is from [31], where using RC4 for encryption in TelosB, and the energy of HMAC computing a data item is assumed to be equal with encryption for simplicity. Other default parameters are summarized in Table 1.
In each measurement, we randomly distribute the sensor nodes and generate 20 networks with different topologies which are represented by different network IDs. The total energy including communication and computation costs of each measurement is the average of 20 networks.
(1) Energy Consumption versus Network ID. Figure 5(a) shows that total of EMQP and PMQP are both uniformly distributed in different networks, but total of EMQP is obviously lower than PMQP. In detail, compared with the lower bound of PMQP, the EMQP before optimization saves about 15% energy in average, and about 75% is saved after optimization. The main reason is that PMQP needs to submit and compute more HMAC data than EMQP, and the optimization of EMQP converts every HMAC data to a lower-bits cHMAC data, which significantly reduces the communication cost. Figure 5(b) indicates that and in the optimized EMQP are also both uniformly distributed, but is much higher than . The energy consumption is almost entirely covered by communication that consumes more than 99% energy in total consumption.
(2) Energy Consumption versus and . Figures 6(a) and 7(a) both show that total of EMQP and PMQP are both increased as and w increasing, but total of EMQP is always lower than PMQP. In detail, the EMQP before and after optimization save about 12% and 76% energy in average. The reason is similar with (1) in this section. Although and in the optimized EMQP are both increased as and increasing, such increments are both very inconspicuous as shown in Figures 6(b) and 7(b), since the computation only covers little part (no more than 1%) in total energy consumption.
According to the above evaluations, we can conclude that our proposed EMQP are more energy efficient than the current PMQP. Particularly, the optimized EMQP has a significant saving (about 75%) in energy consumption even compared with the lower bound of PMQP. And communication costs much more energy than computation (more than 99%).

Conclusion
As the wireless sensor networks are deployed and used in many important areas, preserving the privacy of sensitive collected data items during query processing is a critical problem in sensor network applications. In this paper, we propose EMQP, a novel and energy-efficient protocol for handling privacy-preserving MAX/MIN queries in twotiered sensor networks. To implement privacy-preserving   MAX/MIN query processing without exposing the real value of collected data to master nodes, the technique of 0-1 encoding verification and encryption are applied. Furthermore, we also give a hash-based optimization for saving more energy of the resource-limited sensor nodes. The result of our evaluations shows that the proposed EMQP has a better performance than the current work in energy consumption. Under our optimized circumstance, in comparison with the lower bound of the current work, the EMQP has a significant improvement in energy saving. Last but not least,