A Multimodel Based Range Query Processing Algorithm for Information Collection in CPS

A multimodel based range query processing algorithm is proposed to solve the information collection task for the CPSs, which utilizes multiple probability models to depict the data distribution of a sensor node. The execution of the multimodel based algorithm consists of two phases, which are the preprocessing phase and the query processing phase. During the preprocessing phase, multiple models are constructed for each node according to their historical data. During the query processing phase, a suitable model is selected from the multiple models with the help of a sampling based algorithm, which is used to process the query. As the multimodel based algorithm needs to sample data from the network, it can waste energy more than that of the single model based algorithm in some cases, which does not sample data from the network. The cost of the multimodel based and single model based algorithm is analyzed. A cost model based algorithm is proposed to select a better algorithm to process a query from the two algorithms. Experimental results show that the cost model based algorithm can save 13.3% energy consumption more than that of the single model based algorithm.


Introduction
Cyber-physical systems (CPSs), which consist of computing devices and embedded systems such as distributed sensors and actuators, integrate computation, communication, and control with the physical world [1].The tasks, running on the CPSs, involve close interactions between the cyber world and the physical world.Extracting knowledge from the physical world is an important task for the CPSs.Some useful information is collected from the physical world firstly and then analyzed to extract knowledge.Wireless sensor networks (WSNs) [2][3][4][5] are usually used to fulfill the information collection task, which is transformed into some kinds of queries for the WSNs [6][7][8][9][10].
The range query is one of the most important types of queries to collect information from the WSNs.For instance, a range query is sent to the sensor network distributed in a forest, asking for the places where the temperature lies in [ 1 ,  2 ].The sensors, whose temperature lies in this range, return their locations or IDs to the sink.If the sensors return their IDs, the sink transforms the IDs to locations and returns the locations.Some existing methods have been proposed to solve the range query in the WSN, which can be classified into two classes.The first class is the data centric storage based algorithms, such as GHT [11], DIM [12], comb-needle [13], double ruling [14], and energy-aware algorithm [15].The data centric storage algorithms define different types of events for the data collected by sensors.Each type of events is stored in a particular node called event storage node in the WSN.When a node detects an event, it transmits the data of the event to the event storage node.A range query, transformed to a query for an event, is sent directly to the event storage node and answered by the node.The event defined in the data centric storage algorithm is very rigid, which means the users can only ask for the result of a range defined by the event.So these algorithms do not fit for the query asking for the result of any range.
The second class is the local storage based algorithms, such as [16][17][18][19].For the traditional local storage based 2 International Journal of Distributed Sensor Networks algorithms [16,17], the data collected by a sensor is stored in its local storage.The queries are sent to each node and the nodes satisfying the query return their results to the user.The problem of the traditional algorithm is that all nodes need to return their results to the sink whether they satisfy the query or not, which consumes a lot of energy.In [19], a probability model is used to process the range query.The probability model is used to estimate the probability that each node satisfies the query.Only if the probability of a node, satisfying the query, is above a threshold, the node is considered as a result.With the help of the probability model, nodes do not return any result to the sink for a range query.There are two problems for the algorithm.The first one is the algorithm can only give an approximate answer to a query.The second one is it is hard to determine a threshold balancing the efficiency of energy consumption and the accuracy of the query result.
In this paper, we propose a multimodel based algorithm to solve the range query in the WSN, which is a local storage based algorithm.Compared with the other local storage based algorithms, our algorithm has the following advantages.First, our algorithm constructs multiple probability models.With the help of these probability models, only the most relevant nodes among all nodes need to transmit their results to the sink, which saves more energy than the traditional local storage algorithms.Second, our algorithm can give the precise answer to the range query with minimum energy consumption.The multimodel based range query processing algorithm proposed in this paper is composed of 3 steps: (1) probability model construction; (2) sampling based model selection; (3) model based query processing.
The probability model construction algorithm first constructs multiple probability models for each node in the WSN.By clustering the historical data collected by the nodes,  subclasses are constructed and each node builds a probability model according to the data of its own in a subclass.With the help of the multiple probability models, the query processing algorithm can select the more accurate model to describe the data distribution for the current condition than that of the single model based algorithm.
Multiple probability models have been constructed for each node in the WSN.For a particular range query, there must be a method to select a suitable probability model for each node to process the query.In this paper, a sampling based algorithm is proposed to fulfill this task.Some typical nodes are selected by a preprocessing algorithm.Then the sampling based algorithm collects the data from these typical nodes to determine a suitable model for a query.
Combining the model selected by the sampling based model selection algorithm with the real data sensed by a node, the model based query processing algorithm can minimize the energy consumption of processing a range query.While the performance of the multimodel based algorithm is not the best in all cases, we analyze the cost of the multimodel based algorithm and propose a range query processing algorithm augmented with the cost model, which selects a suitable query processing algorithm for a range query according to the cost model.Experimental results show that the cost model based algorithm can provide the accurate answer with 13.3% energy consumption less than that of the single model based algorithm.
The contributions of this paper are as follows.First, a multimodel based algorithm is proposed to solve the range query in the WSN, which utilizes multiple probability models to improve the accuracy of the data distribution function and saves the energy consumption of the algorithm.Second, the energy consumption of the multimodel based algorithm is analyzed and a query processing algorithm augmented with the cost model is proposed to save energy in most cases.Third, extensive experiments were done to verify the efficiency of the proposed algorithms.
The rest of the paper is organized as follows.Section 2 introduces the multimodel based range query processing algorithm.Section 3 analyzes the cost model of the multimodel based algorithm and proposes the query processing algorithm augmented with the cost model.Section 4 evaluates the performance of the proposed algorithms on real dataset.Section 5 briefly discusses the related work and in Section 6 we draw the conclusion.

The Model Based Range Query
Processing Algorithm The data collected by a node is a random variable, which can be described by a probability distribution function (PDF).The PDF of a random variable is usually hard to calculate, but it can be estimated by a histogram.A histogram is a representation of tabulated frequencies, erected over discrete intervals (bins), with an area equal to the frequency of the observations in the interval.The total area of the histogram is equal to the amount of data.
The vectors belonging to the th subclass   can be used to construct a histogram for each node in the WSN.For   , all data belonging to the node   forms a data set represented as  (1) 2.2.The Typical Node Selection Algorithm.Multiple probability models have been constructed for each node in the WSN.For a particular range query, there must be a method to select a probability model for each node to process the query.In this paper, a sampling based method is proposed to fulfill this task.
Firstly, some typical nodes are selected from the WSN.Before sending a range query to the WSN, the sink samples data from the typical nodes.Based on the data sampled from the typical nodes, a suitable probability model    ∈   is selected to make the query processing algorithm be carried out efficiently.Before giving the typical node selection algorithm, we present some definitions.
Let  For example, in Figure 1, there are two nodes  1 and  2 in the WSN.The data collected by them is clustered into two subclasses.The data ranges of , according to the definition of the candidate node,  1 is the candidate node and it can distinguish the subclass 1 from the subclass 2. If we sample data V 1  from the node  1 and find V 1  ∈  1 1 , at this time, the models for every node ( 1  1 and  2 1 ) constructed from the subclass 1 are better than those ( 1 2 and  2 2 ) constructed from the subclass 2 to process the query.
In Figure 1 , the probability models constructed by subclass 1 or subclass 2 are selected.But when the data from  1 falls in , the models cannot be determined directly.We propose a Euclidean distance based method to solve this problem in the next section.Compared with the length of For example, Figure 2 shows that there are two nodes  1 and  2 in the WSN.The data collected by the two nodes are Input: MCN output:  (1) while  ̸ = 0 do (2) sort the nodes in the MCN according to their counters in descending order (3) select the node with maximum counter from MCN (4) add the selected node   to the typical node set  (5) set   →   to the current counter of   in MCN (6) remove the selected node from MCN (7) remove all the pairs in the list of the selected node from  (8) remove all the pairs in the list of the selected node from the list of the other candidate nodes (9) subtract the counter of a candidate node in MCN by 1 when a pair is removed from the node's list (10) end while (11) return  Algorithm 2: The typical node selection algorithm.
Based on the definitions and the MCN calculated by the preprocessing algorithm, we propose a greedy based algorithm to select the typical nodes from the WSN.The greedy algorithm sorts all the merged candidate nodes in the MCN according to their counters in descending order.Then it selects the candidate node with the largest counter as the typical node and removes the selected node from the MCN.Then the entries in the list of the selected node are removed from  and the list of the other merged candidate nodes is left in the MCN.When an entry is removed from the list of a merged candidate node left, the counter of the node is subtracted by 1.The greedy based algorithm repeats this process until the set  is empty, which means any two models can be distinguished by a candidate node selected.All the candidate nodes are selected from the typical node set .The th typical node   in  has an attribute represented as counter  , which is the number of intersections   can distinguish.The greedy based typical node selection algorithm is given in Algorithm 2.
For example, after sorting,  1 is selected into the typical node set, because its counter is 3, which is bigger than the counter of  2 .The pairs in the list of  1 are removed from  = {(1, 2), (1, 3), (2, 3)} and the list of  2 .After removing,  2 → list = 0,  2 → counter = 0, and  = 0.As  = 0, Algorithm 2 finished.Otherwise, the candidate nodes left in MCN are sorted and the process is repeated again.In our example, the typical node set  is { 1 }.It is obvious in Figure 2 that the data collected from  1 can distinguish the models constructed by the three subclasses.

Sampling Based Model Selection and Model Based Query
Processing Algorithms.The model based query processing algorithm for a range query works as follows.A node   in the WSN stores the  models   and a general model    in its local storage.After receiving a range query from a user, the sink samples data from the typical nodes in  selected by Algorithm 2. According to the data sampled, a probability model is selected.The sink sends the range query together with the index of the model selected to all nodes in the WSN.A node processes the query with the help of the probability model selected.In this section, we will introduce the algorithm selecting a suitable probability model based on the sampled data and the model based query processing algorithm.
To save energy, the model selection algorithm does not sample data from all typical nodes at the same time.We sort the nodes in  according to their counters in descending order.The model selection algorithm samples data from one node in the typical node set at a time from the beginning.Only when the sampled data cannot determine a model, If there is only one number left in CM, the corresponding model is selected as the suitable model.Otherwise, there will be no number or multiple numbers in CM.In this case, the model selection algorithm uses a distance based method to select the suitable model.The data sampled from the typical nodes forms a vector.As each subclass is composed of a lot of vectors, the center of each subclass also forms a vector.The vector formed by the data of the typical nodes is part of the center vectors of the subclasses.The data corresponding to the typical nodes is drawn from each center vector of the subclasses, which forms a partial center vector for each subclass.The model selection algorithm calculates the Euclidean distance  between the typical node vector and the partial center vector of each subclass and selects the subclass with the minimum distance as the suitable model.The model selection algorithm is given in Algorithm 3.
After the sink selects a suitable probability model for a range query, it sends the index of the model together with the range query to all the nodes in the WSN.When a node   receives the query, it calculates two probabilities.Firstly, it gets the index of the probability model and calculates the probability that the data of the node satisfies the range query according to the selected probability model.Secondly, it calculates the probability according to the general probability model.We represent the first probability as pr  index and the second probability as pr   .The larger one of the two probabilities is chosen as the final probability represented as pr  final .If pr  final is larger than a threshold , which is called the probability threshold, but the data really collected by the node does not satisfy the range query, the node returns a negative answer to the sink.If pr  final ≤ , while the data really collected by the node satisfies the range query, the node returns a positive answer to the sink.The node does not return any answer to the sink in other cases.
We calculate two probabilities that the data of the node satisfies the range query and use the larger one as the final probability, because the model constructed by a subclass of data of a node is not more accurate than the general model.For example, the multimodel based range query processing algorithm uses two models and a general model to depict the temperature of a room.One model depicts the distribution of the temperature from 7:00 A.M. to 9:00 A.M. and another  } >  then (6) put   into the result set  (7) end if (8) end for (9) broadcast the query and the  throughout the sensor network (10) collect answers from the nodes in the network (11) if receive a positive answer from a node   then (12) put ID of node   into the result set  (13) else if receive a negative answer from a node   then (14) remove the ID of node   from the result set  (15) end if (16) return  model depicts the distribution of temperature from 9:00 A.M. to 11:00 A.M. The general model depicts the distribution of the temperature from 7:00 A.M. to 11:00 A.M. If the query range is from 8:00 A.M. to 10:00 A.M., neither of the two models can depict the distribution of the temperature more accurately than the general model.By comparing the probabilities calculated by the selected model and the general model, we guarantee that the final probability is not worse than that of the general model.
When pr  final > , it means that the data sampled by the node has a large probability to satisfy the query.The event, that the real data collected by the node does not satisfy the query, is a small probability event.Only if the small probability event happens, the node needs to send the answer to the sink, which minimizes the number of messages transmitted between the node and the sink.The same reason is for the other two cases.The model based range query processing algorithm is a distributed algorithm.The algorithm executed by the sink is given in Algorithm 4. The algorithm executed by an ordinary node is given in Algorithm 5.

Cost Analysis of the Multimodel Based Range Query Processing Algorithm
The multimodel based query processing algorithm is not always efficient for all range queries, because it must sample data from the typical nodes first and then collects the results from these nodes.While the single model based algorithm directly collects data from the nodes, the multimodel based algorithm may consume more energy than that of the single model based algorithm.We construct the cost model to estimate the energy consumed by the multimodel based algorithm and the single model based algorithm.By comparing the cost consumed by the two algorithms, we can select a better one to process a query.In this paper, we use the number of messages transmitted by an algorithm to represent the energy consumption.

Cost Analysis of the Single Model Based Algorithm.
Before analyzing the cost of the multimodel based algorithm, we present a single model based algorithm.In the single model based algorithm, a node   in the WSN only stores its general International Journal of Distributed Sensor Networks model    in its local storage.After receiving a query from a user, the sink directly broadcasts the query throughout the WSN.A node calculates the probability pr   that its data satisfies the query with    .If pr   > , but the data really collected by the node does not satisfy the range query, the node returns a negative answer to the sink.If pr   ≤ , but the data really collected by the node satisfies the range query, the node returns a positive answer to the sink.The node does not return any answer to the sink in other cases.
The energy cost for the single model based query processing algorithm   can be estimated by formula (3), in which pr   is the probability that a node   satisfies a query.The first part of formula (3) is the energy cost of broadcasting the query throughout the network.The second part of formula (3) is the expectation of the energy cost that the sink receives answers from the nodes in the WSN: pr   * pl  . (3)

Cost Analysis of the Multimodel Based Algorithm.
We analyze the cost consumed by the multimodel based algorithm.The cost of the multimodel based algorithm is composed of two parts.The first part is the energy cost by the sampling phase, represented by   .The second part is the energy cost by the query processing, represented by   .  can be estimated as follows.Define an event as follows: "after sampling data from the first  typical nodes, the model selection algorithm can choose the probability model used by the query processing algorithm." The probability of the event is (1 − pr  1 )(1 − pr  2 ) ⋅ ⋅ ⋅ (1 − pr  (−1) )pr   , where pr   is the probability that the th typical node can determine the probability model.pr   can be estimated by If there are  models in the multimodel based range query processing algorithm, there are altogether (, 2) pairs of models to be distinguished by the typical nodes, where (, 2) represents the number of 2 combinations of {1, 2, . . ., }.In the typical node selection algorithm (Algorithm 2), counter  is used to record the number of pairs of models that node   can distinguish.Formula (4), which divides counter  by (, 2), is the probability that the th typical node can distinguish a model from other models, which is used to estimate the probability that the th typical node can determine a probability model.
The probability that a model is determined by the th typical node is ∏  =1 (1 − pr  −1 )pr   and the energy cost is (∑  =1 pl  ), where pl  represents the path length between the sink and the th typical node.As there are || typical nodes altogether and the sampling process is composed of sending and collecting data, the expectation of the energy consumption of the model selection algorithm   is given by Let pr   be the probability that the th model is chosen by the model selection algorithm and let   be the energy consumption of the query processing algorithm of the chosen model.The expectation of the energy cost of the query processing algorithm   is given by The first part of formula ( 6) is the number of messages by which the sink broadcasts the range query throughout the network.The second part of formula ( 6) is the expectation of the energy cost by which the sink receives answers from the nodes in the WSN.As the sink has received the data from the typical nodes at the sampling phase, the typical nodes do not transmit their data to sink at the query processing phase and   /2 is subtracted from ∑  =1 pr   *   .We analyze the estimations for pr   and   next.
As each model is constructed by the vectors contained by a subclass, the number of vectors belonging to a subclass can be used to estimate the probability that the corresponding model is chosen.The larger the number of vectors contained by a subclass is, the higher the probability, that data sampled from the typical nodes belongs to the data range of the subclass, is.Let || be the total number of vectors contained by all subclasses and let |  | be the number of vectors contained by the subclass   .The pr   can be estimated by Let   be the random variable that a node   returns answer to the sink.If the answer is yes,   = 1.Otherwise,   = 0.The expectation of   is (1 − max{pr

Performance Evaluation
In this section, four experiments were done to verify the performance of the algorithms proposed in this paper.There are three factors that influence the performance of the multimodel based range query processing algorithm, which are the probability threshold , the number of models , and the coverage threshold .In the first three experiments, we test the influence of these factors on the multimodel based algorithm.In the last experiment, we compare the performance of the single model based algorithm, the multimodel based algorithm, and the query processing algorithm augmented with the cost model.The performance of the algorithms is measured by the energy consumption of these algorithms, which is the number of messages transmitted.We adopt the data set collected from 34 sensors deployed in the Intel Berkeley Research lab [20] in our experiments.
There are 54 sensors in the data set.As a lot of data of some sensors is lost in the data set and our algorithm needs plenty of historical data to construct probability model, we only select 34 sensors from them.We randomly assign an integer number to each node as its number of hops to the sink, which is used to calculate the energy consumption of each of query processing algorithms.

Evaluation of the Probability Threshold.
In this subsection we evaluate the influence of the probability threshold  on the performance of the single model based algorithm (SMA) and the multimodel based range query processing algorithms (MMAs).In this experiment, we change the probability threshold  of the algorithms from 0.5 to 0.8.The number of models  and the coverage threshold  of the MMA are fixed to 4 and 0.8.The sink sends queries with different ranges to the WSN. 10 queries are generated for each range.
Figure 3 shows the energy consumption of the SMA and MMA corresponding to different , respectively.The -axis is the query range sent by a user and -axis is the number of messages transmitted by the query processing algorithm, which is the average of the number of the messages to process the ten queries for each range.
The experimental results in Figure 3(a) show that, with the increasing of the , the cost of the SMA increases.The SMA with  = 0.5 consumes the least energy.The energy consumption of the MMAs with different  is shown in Figure 3(b).The results show that the MMA with  = 0.6 consumes the least energy among all cases.Compared with the SMA, the MMA consumes more energy when the query range is small.For example, the query range is 2, while MMA saves more energy when the query range is large.When the query range is small, the number of nodes satisfying the query is small.The MMA needs to sample data from the typical nodes, so it consumes more energy than the SMA.When the query range is large, the number of nodes satisfying the range query is large and the MMA can select a suitable model to process the query.The saved energy of query processing for MMA is much more than the energy consumed by the data sampling, so the MMA is more efficient than the single based algorithm.

Evaluation of the Number of Models.
In this subsection we evaluate the influence of the number of models  on the performance of the MMA.In this experiment, we set the number of models  of the MMA to 4, 6, and 8.The probability threshold  and the coverage threshold  of the MMA are fixed to 0.6 and 0.8.The sink sends queries with different ranges to the WSN. 10 queries are generated for each range.Figure 4 shows the energy consumption of the MMAs corresponding to different .The -axis is the query range sent by a user and -axis is the number of messages transmitted by the MMA, which is the average of the number of the messages to process the ten queries for each range.
The experimental results show that the algorithm with  = 6 is the most energy efficient one among all cases.When the number of models is small ( = 4), the granularity of the model is coarse.It means that the models constructed when  = 4 are not more accurate than those constructed when  = 6, which causes the energy waste.When the number of

Conclusions
In this paper, a multimodel based query processing algorithm is proposed to solve the range query problem.The cost model of the multimodel based query processing algorithm is analyzed and a range query processing algorithm augmented with cost model is proposed to save energy even further.The experimental results show that the cost model based algorithm can save 13.3% energy consumption more than that of the single model based algorithm.

𝑁:
The set of all nodes in the WSN   : The vector of data collected at timestamp  : Thesetof

𝑇:
The set of all typical nodes MCN: The merged candidate node set : The set of node IDs for a query.
{[, ]}, which can be estimated by formula (1) based on the histogram.The ratio of the area of the histogram of the range [⌊⌋, ⌈⌉) to the total area of the histogram of the range [⌊V

𝑗𝑖 2 )Theorem 2 .
represent the mean of    of the node   .   can be calculated by formula (2), where |  | represents the number of vectors in the subclass   :           .(Definition 1 (data range).The data range of a node   , whose data belongs to    , is defined as    = [⌊   ⌋ − , ⌈   ⌉ + ] and pr   {   } ≥ , where  (0 <  < 1) is called the coverage threshold.The range    is a subrange of the total range of the model    .The ratio of the area of    of the histogram and the total area of the histogram is not less than a threshold .The  and  of a data range are integers.Proof.   of a node   is constructed as follows.   is calculated according to formula (2), which falls into the bin [⌊   ⌋, ⌈   ⌉] of the model    of the node   .Initially, we set    = [⌊   ⌋, ⌈   ⌉] and check whether the ratio of the area of current    of the model    and the total area of the model    is not less than .If not, we add all adjacent bins of the current    to it, which means one or two bins are added to the current    .The    is expanded until the coverage threshold  is reached.As the interval of the histogram is 1, the final    is [⌊   ⌋ − , ⌈   ⌉ + ] where  and  are integers.As there are  subclasses, the node   has  data ranges, represented by   1 ,   2 , . . .,    .The intersection between the th and the th data range of the node   is represented as    =    ∩    .There are altogether (, 2) intersections between any two data ranges of a node, where (, 2) represents the number of the 2 combinations from integer 1 to .Definition 3 (candidate node).If an intersection    of the node   satisfies the condition length(   ) = min{length(   ) |  = 1, 2, . . ., ||}, the node   is called a candidate node.The pair (, ) is called distinguishable by the candidate node   .lengh(⋅) is a function, representing the length of the intersection.

Figure 2 :
Figure 2: Illustration to the preprocessing algorithm.

Input:
query range [ 1 ,  2 ], index of the selected model  Output: the result node set  (1)  = value returned by Algorithm 3 (2) for ∀  ∈  do (3) pr   = the probability of the node satisfying the received range query calculated by the  model (4) pr   = the probability of the node satisfying the received range query calculated by the general model (5) if max{pr   , pr

Algorithm 4 :
The model based query processing algorithm for the sink.Input: [ 1 ,  2 ], , V   Output:  (1) extract the  and query range [ 1 ,  2 ] from the packet received (2) pr   = the probability of the node satisfying the received range query calculated by the  model (3) pr   = the probability of the node satisfying the received range query calculated by the general model (4) if (max{pr   , pr   } > ) & (V   ∉ [ 1 ,  2 ]) then (5) send a negative answer containing the ID of the current node to the sink (6) else if (max{pr   , pr   } ≤ ) & (V   ∈ [ 1 ,  2 ]) then (7) send a positive answer containing the ID of the current node to the sink (8) end if Algorithm 5: The model based query processing algorithm for an ordinary node.

𝑗th node 𝑛 𝑗 at the timestamp 𝑡. The historical data set 𝐻 is composed of a set of vectors 𝑉 𝑡 at different timestamps. Given the historical data set 𝐻, the vectors in
can be clustered into many subclasses.If the vectors in  are clustered into  subclasses, represented as  1 ,  2 , . . .,   , a probability model can be constructed for each node based on the vectors contained in a subclass.For convenience, we list the notations used throughout this paper in the Notations section shown at the end of the paper.
2.1.Probability Model Construction.First, multiple probability models are constructed for each node in the WSN based on the historical data collected from each node.Let  be the set of all nodes in the WSN and let || be the number of nodes in the WSN.Each node collects data, such as the temperature and humidity, from the environment which means V (12) the candidate node set CN (10)end for(11)construct the MCN by selecting the unique nodes from all CN (12)for each node   in the MCN do (13)   → counter = the number of CN  containing   (14)   → list = all the pairs (, ) of CN  containing   As there can be multiple nodes satisfying the definition of candidate node for an intersection    , the preprocessing algorithm calculates a candidate node set CN  for each intersection, which is composed of all the candidate nodes of the intersection.Finally, the preprocessing algorithm merges the candidate nodes in the candidate node sets of all nodes into a merged candidate node (MCN) set.Each element in MCN has two attributes, which are a counter and a list.If multiple intersections have the same candidate node in their candidate node sets, these candidate nodes are merged into a unique one, called merged candidate node.The counter of the merged candidate node is the number of CN  containing the merged candidate node.The list of the merged candidate node is all the pairs (, ) of CN  containing the merged candidate node.Let  be a set of the pairs (, ), which contains the (, 2) 2 combinations of the  subclasses.The preprocessing algorithm is given in Algorithm 1.
(b) Case 2Figure 1: Illustration to data range and candidate node.Input: ,   (1 ≤  ≤ ),  Output: MCN (1) for each node   ∈  do (2) calculate the  Data Ranges    for a node   (3) end for (4) for each node   ∈  do (5) calculate all the intersections Output: index of the selected model (1) set the candidate model set CM to {1, 2, . . ., } (2) sort the nodes in  according to their counters in descending order (3) for each node   ∈  do (4) sample data V [ 1 ,  2 ] Output:  returned by the called algorithm (1) estimate the cost   for the multi-model based query processing algorithm (2) estimate the cost   for the single model query processing algorithm (3) if   <   then The query processing algorithm augmented with the cost model, given in Algorithm 6, works as follows.The sink estimates the costs   and   for a particular range query according to formulas (9) and (3).If   <   , the multimodel based algorithm is adopted.Otherwise, the single model based algorithm is used.
, pr   } * pl  .Input: for all historical data    : Theth model of   of node   : ) for all 2 combinations from 1 to  CN  : The candidate node set distinguishing model  from  CM: The candidate model set   : Theth node in the WSN V   : Thedatacollectedfrom  at timestamp    : Theth subclass of     : The general model of node | ∀  ∈   , V