Node Selection Algorithms with Data Accuracy Guarantee in Service-Oriented Wireless Sensor Networks

The service-oriented architecture is considered as a new emerging trend for the future of wireless sensor networks in which different types of sensors can be deployed in the same area to support various service requirements. The accuracy of the sensed data is one of the key criterions because it is generally a noisy version of the physical phenomenon. In this paper, we study the node selection problem with data accuracy guarantee in service-oriented wireless sensor networks. We exploit the spatial correlation between the service data and aim at selecting minimum number of nodes to provide services with data accuracy guaranteed. Firstly, we have formulated this problem into an integer nonlinear programming problem to illustrate its NP-hard property. Secondarily, we have proposed two heuristic algorithms, namely, Separate Selection Algorithm (SSA) and Combined Selection Algorithm (CSA). The SSA is designed to select nodes for each service in a separate way, and the CSA is designed to select nodes according to their contribution increment. Finally, we compare the performance of the proposed algorithms with extended simulations. The results show that CSA has better performance compared with SSA.


Introduction
Sensing is considered as one of the most important technologies especially in the emerging big data era.Nodes with sensing ability can be deployed everywhere in the world including airspace, ground, and underwater environment due to their cheapness, simplicity, and small size.Moreover, the wireless radio allows these sensor nodes to be organized into a network, which is generally named as Wireless Sensor Networks (WSN) [1], and local information about the environment is then sensed and reported to the base station in a periodical manner.It is obvious that the wireless sensor networks will create a huge number of data with time ongoing and the number of network increasing.Accordingly, one challenging issue is how to utilize these various wireless sensor networks in the future big data era.
The current wireless sensor networks are generally datacentric or application-centric, which means that each sensor node serves for one special application.However, with the number of different applications increasing rapidly, heterogeneous wireless sensor networks appear and they might be located in the same physical areas providing different datacollection functions.How to connect these heterogeneous wireless sensor networks efficiently is still a pioneering work in the future ubiquitous computing environment due to several observations.Firstly, two different applications may be interested in the same collected data, and it is unnecessary to place two separate nodes with identical sensing devices but different tasks.Secondary, in case that one application is concerned with several different types of sensed data simultaneously, such a requirement is not guaranteed since current solutions work only in a separate way.Finally, the emergence of powerful sensors, which can provide different types of data sensing, has introduced new issues in the research of wireless sensor networks since they can support different applications simultaneously.
Accordingly, the service-oriented architecture appears as a new emerging trend for the future of wireless sensor networks, in which sensor networks are considered as the services provider and sensors as the data sources for these services [2].Users and programmers can access the service-oriented wireless sensor networks by using a simple service-oriented interface and utilizing encapsulation of the low-level implement details.Services in wireless sensor networks may be the sensing capabilities for example, temperature and humidity or software components provided by nodes, for example, the operations of in-network data aggregation, time synchronization, and data processing [2][3][4][5].The data sensing of nodes can also be defined as the sensing services, and sensed data as the service data similarly.Furthermore, the sensor nodes can be equipped with multiple types of sensing units used to collect environmental information.For example, the MICA2 mote [6] can provide services such as light, sound, and vibration.
In the heterogeneous applications scenarios, different types of sensors can be deployed in the same area to support various service requirements.Figure 1 shows an example of a service-oriented wireless sensor network including eight sensor nodes.The sensed sources are assumed to be located at positions  1 and  2 , and each sensed target is assumed one different service.Nodes  1 ,  2 , and  3 can only support service  1 , and nodes  6 ,  7 , and  8 support service  2 .However, nodes  4 and  5 can support service  1 and  2 simultaneously.In this example, there are several ways to select nodes while providing the services  1 and  2 by assuming that at least two nodes are required for each service, that is, nodes  1 ,  2 for service  1 and nodes  6 ,  7 for  2 , or nodes  4 ,  5 for both of them simultaneously.Note that sensor networks are generally deployed in dense manner and a huge number of nodes can be provided to collect data from the environment.It leads to the problem of how to select nodes in an efficient way to provide the required services.
The accuracy of the sensed data is one of the key criterions while choosing nodes to provide required service.It is also known that sensor nodes are generally designed under the guideline of cheapness and simplicity and that they are mostly deployed in a dynamic and rough terrain for continuous environmental monitoring.The sensing equipment on sensor nodes is expected to be unreliable and the collected data to be distorted.Furthermore, some physical attributes exhibit a gradual and continuous variation over the two-dimensional Euclidean space due to the diffusion property and, thus, each observer has different distorted data.To collect all related data around the same sensing sources can help to eliminate or minimize the distortion.However, it might lead to heavy energy consumption in the network.Some applications only are concerned with approximate observations rather than exact results [7,8], and it is unnecessary to gather data from all nodes in the network in this case.It shall also be mentioned that the diffusion property of physical attributes results in spatial correlation among sensed data observed by nodes closer to the same sensing source.Exploiting the spatial correlation can help to improve the network performance by selecting a subset of nodes to provide the required service with data accuracy guaranteed [9,10].
Although the service-oriented architecture for the wireless sensor networks has already been introduced recently in related works [4,5,[11][12][13][14], most of them are concerned with the practical architecture framework, such as middleware and platforms.Some works [13,14] considered the node scheduling to support services query with network lifetime prolonged, and some others are concerned with the spatial correlationship among the sensed data with the service unconsidered [15][16][17].To provide accurate service to users is an important issue and it is generally one criterion for the applications.In this paper, we focus on the heterogeneous service supporting problem in the future wireless sensor networks and aim at providing efficient node selection algorithms with data accuracy guarantee for the serviceoriented sensor networks.Different from previous works, we consider the data accuracy for services by following the observation that sensed data is a noisy version of the physical phenomenon.And we have explored the data accuracy according to their spatial correlation for the same sensing sources.
Due to the inaccuracy and spatial correlation of the sensed data, it is a new and challenging issue to provide the required services with data accuracy guarantee in an energy-efficient way for the service-oriented wireless sensor networks.So far, as we know, this is the first paper concerned with both the data inaccuracy and spatial correlation in the sensor networks, and we aim at providing node selection algorithms so as to improve the network performance.The main contributions of this paper are summarized as follows.
(1) We have proposed the node selection problem with data accuracy guarantee for service-oriented wireless sensor networks via bipartite graph and formulated it as an integer nonlinear programming problem to illustrate its NP-hard property.(2) We have also presented two efficient heuristic algorithms for this problem; namely, Separate Selection Algorithm (SSA) and Combined Selection Algorithm (CSA) with low-time complexity.
The rest of this paper is organized as follows.In Section 2, we summarize the related works.Section 3 describes the system model and the problem formulation.Sections 4 and 5 have introduced the integer nonlinear programming formulation and two heuristic selection algorithms.Section 6 describes and analyzes the simulation results.We conclude this paper in Section 7.

Related Works
This paper focuses on the node selection problem in serviceoriented wireless sensor networks.Several works have been done to develop service-oriented architecture specific to the sensor networks, and the architecture has shown many advantages in the heterogeneous applications scenarios.Gračanin et al. [2] proposed a service-centric model that focuses on services provided by a wireless sensor networks and views a wireless sensor networks as a service provider.This model consists of mission, network, region, sensor, and capability layers.Within each layer, there are four planes or functionality sets: communication, management, application, and generational learning.Rezgui and Eltoweissy [4] introduced the service-oriented architecture as an approach for building a new generation of open, efficient, interoperable, scalable, application-aware sensor-actuator networks.In this vision, sensor-actuator networks would not be deployed to provide sensing and actuation capabilities to a specific application but, rather, to provide sensing and actuation services to any application.King et al. [5] developed a serviceoriented sensor and actuator platform called Atlas, which enables self-integrative, programmable pervasive spaces.This platform has shown the advantage of improving communication and interoperability between heterogeneous devices in pervasive computing environments.Authors of [11] proposed TinySOA, a service-oriented architecture that allows programmers to access wireless sensor networks from their applications by using a simple service-oriented API via the language of their choice.The main advantage of TinySOA is relieving application developers from dealing with the lowlevel technical details of the wireless sensor networks to get sensors data.Corchado et al. [12] proposed a service-oriented telemonitoring system for healthcare using heterogeneous wireless sensor networks, which aimed at improving healthcare and assistance for dependent people.
It is an important issue to provide services efficiently with resource-constrained sensor nodes.Node scheduling is considered as an efficient technique to implement the service-supporting schemes, in which sensor nodes should be selected to provide requested services.Recently, Wang et al. [13] investigated the service-availability-aware sleep scheduling design in service-oriented wireless sensor networks.The purpose of this study is to minimize the energy consumption and guarantee that enough sensors are active to ensure service availability at all times.The authors had proven this problem to be NP-hard and presented heuristic linear-programming based-solutions.However, they assumed that each service has a known requirement on the number of active sensors based on the historical service composition requests in the system, which may not be the case in practice.Furthermore, they only consider the sleep scheduling design for the sensors in the service provider overlay network and neglect the routing cost of service data.Authors of [14] try to identify the service composition that is less likely to be invalid in the near future due to nodes going to sleep mode.The goal is to minimize the recomposition cost.They make use of the dynamic programming to reduce total service composition cost when the minimum number of required service composition solutions is derived.However, the dynamic programming is unsuitable for large-scale problems.
The distributed nature of wireless sensor network results in spatial correlation among the sensed data.And data accuracy is accordingly influenced by the spatial correlationship.Under different assumptions, researchers have proposed several mathematical models for spatial correlation in wireless sensor networks.Some [18] assume that the sensed data follow diffusion property, and some [17] use an empirically obtained approximation function for the joint entropy of sensed data.The most commonly used model is the jointly Gaussian [9,19,20], which assumes the data to be jointly Gaussian with the correlation being a function of the distance.The jointly Gaussian model is easy to use and analyze.However, the chief limitation is that it forces the joint probability density function of the data values to be jointly Gaussian.Some researchers [21] use variograms to analyze spatial correlation in wireless sensor networks.The proposed model is Markovian in nature and can capture correlation in data irrespective of the node density, the number of source nodes, or the topology.Furthermore, this model derives the data value at a node from other correlated nodes whose data values have already been derived.However, it is not always the case that a given spatial process will be Markovian.Some others proposed correlation model for specific applications, such as soil moisture measurement in wireless underground sensor networks [10].The presence of spatial correlation among sensor network data has been exploited for solving different problems.The authors in [15] proposed a traffic model for wireless sensor networks, which takes into account the statistical patterns of node mobility and spatial correlation.In [16,17], spatial correlation was used to design energy-efficient data aggregation algorithms.Ma et al. [16] proposed a distributed clustering algorithm based on the dominating set theory to choose the cluster heads nodes and construct clusters by measuring the spatial correlation between sensors.Pattem et al. [17] studied the correlated data gathering problem and followed the idea of using an empirically obtained approximation function for the joint entropy of sources.
It is important and challenging to provide different services with data accuracy guaranteed through unreliable sensors nodes.Fault tolerance is one of the most important techniques, which has been taken into consideration in many works [22][23][24][25][26][27].Han et al. [22] addressed the problem of deploying minimum number of relay nodes to achieve diverse levels of fault tolerance with higher network connectivity in the context of heterogeneous wireless sensor networks.However, they adopted the network model that in which nodes possess different transmission radius, while all of the relay nodes use an identical transmission radius.Banerjee et al. [23] investigated the event detection scheme with fault tolerance for multiple events occurring simultaneously.They proposed the use of polynomial-based scheme that addresses the problems of event region detection by having an aggregation tree of sensor nodes.However, their International Journal of Distributed Sensor Networks work is limited to static sensor and the network topology cannot adapt to the dynamic nature of simultaneous events with varied priorities.
Our work in this paper is concerned with the node selection algorithms, which is similar to the works that aim at dealing with node selection and assignment problems.Cai et al. [28] addressed the multiple directional cover sets problem of organizing the directions of sensors into a group of nondisjoint cover sets to extend the network lifetime.The directional sensors are different from common sensors that have a limited angle of sensing range.The authors proved this problem is NP-complete and presented three heuristic approaches.Lin et al. [29] proposed an adaptive energyefficient multisensor scheduling scheme for collaborative target tracking in wireless sensor networks.The challenging issue of this problem is how to achieve energy efficiency and track reliability while satisfying the tracking accuracy requirement.In their algorithm, a number of sensors are selected to form a temporary tasking cluster, and the optimal sampling interval is determined to satisfy the given tracking accuracy.Johnson et al. [30] considered sensor-mission assignment problem in wireless sensor networks.In this problem, multiple missions compete for sensor resources.They showed that this problem is NP-hard even to approximate, and presented several heuristic algorithms.Liu et al. [31] studied the topology control problem using a probabilistic network model.They attempted to find a minimal transmission range for each node while the global network reachability satisfies certain threshold.Different from these previous works, we aim at providing efficient node selection algorithms with data accuracy guaranteed for the serviceoriented wireless sensor networks by exploring the spatial correlation among the sensed data and the advantages of diverse services provided by different sensor nodes.

System Model and Problem Formulation
In this section, we have firstly introduced the network model for the service-oriented wireless sensor networks.Secondly, we have described the spatial correlation model for single as well as multiple services in the network.Finally, we have formulated a definition for the node selection problem with data accuracy guaranteed in the service-oriented wireless sensor networks.

Network Model.
We consider a wireless sensor network in the plane with stationary nodes  = { 1 ,  2 , . . .,   }, which are built to provide a series of services  = { 1 ,  2 , . . .,   }.Each node   can provide one or more service   , which is a subset of ; that is,   ⊆ .One service, for example,   , can be provided by a group of nodes   , and   = {  |   ∈   }.It is obvious that the set size |  | demonstrates the number of nodes in the network which can provide service   .The relationship between nodes and services can be further described as a bipartite graph  = (, , ), where  denotes the set of nodes,  denotes the set of services, and  denotes the set of edges.There is an edge (  ,   ) between   and   in case that   can provide service   ; that is, (  ,   ) ∈ . Figure 2 Nodes Services has shown an example of the proposed model for serviceoriented sensor network with five nodes and each service is supported by three distinct nodes.
To be convenient, the symbols used in this work are summarized in Table 1.

Spatial Correlation Model.
Researchers have proposed several mathematical models for spatial correlation in wireless sensor networks under different assumptions.Pattem et al. [17] proposed to use an empirically obtained approximation function for the joint entropy of sensed data.In [18], the sensed data is assumed to follow the diffusion property and the diffusion is formulated as a function of the distance.The jointly Gaussian is adopted in many related works [9,19,20], which assumes the data to be jointly Gaussian with the correlation as a function of the distance.The jointly Gaussian model is easy to use and analyze by forcing the joint probability density function of the data values to be jointly Gaussian.Jindal and Psounis [21] analyzed the spatial correlation among sensed data by using variograms in wireless sensor networks.The proposed model is a special case of Markov random field.In this model, the data value at a node is derived from other correlated nodes whose data values have already been derived.However, it is not always the fact that a given spatial process will be Markovian.In this paper, we are concerned with the data accuracy with the spatial correlation model.The jointly Gaussian model proposed in [9] has considered the measurement noise of nodes and given the distortion function, which is suitable for our problem.
In the senor networks, the observation result of each node is in fact a noisy version of the physical phenomenon located at the sensing source, and it can be modeled as Gaussian random variable of zero mean and variable  2  ; that is,  ∼ N (0,  2  ).Similarly, the sensed data for the physical phenomenon at node   can also be modeled as Gaussian variable   ,   ∼ N (0,  2  ).Assume that the sensed data for node   /  is denoted as   /  accordingly.The correlations between  and   ,   and   are described as International Journal of Distributed Sensor Networks 5 where  V, denotes the Euclidean distance between   and the sensing source ,  , denotes the Euclidean distance between   and   ,   (⋅) is the covariance function concerned with the Euclidian distance  and it is formulated as where  controls the correlation between the distances of sensor nodes.In addition, we can see that   (⋅) = 1 in case that  = 0, and   (⋅) = 0 in case that  = +∞.
The collected data by the sensor node   is often subject to noise interference originated from the environment, and it can be represented as where   is the additive white Gaussian noise,   ∼ N (0,  2  ).We assume that the noise that each sensor node encounters is independent of each other.
According to [9], the distortion of the estimation for  is formulated as where  ( > 0) is the number of sensor nodes.We use  2  to normalize δ, and the estimated data accuracy is calculated as where  =  2  / 2  denotes the Signal-to-Noise Ratio (SNR).

Problem Definition.
In this paper, we study the problem of node selection in the service-oriented wireless sensor networks with the data accuracy requirement guaranteed for the services.The number of nodes is considered as the optimizing object due to the following considerations.Firstly, there are fewer packets to be transmitted in the network if we select less number of nodes to provide services, which is also helpful to reduce the energy consumption.Secondarily, it will increase the collision probability in the contentionbased wireless network if too many nodes are kept awake, and significant retransmission cost and additional delay occur accordingly.Finally, it helps to reduce the overhead of data transmission to allow one node to provide multiple services simultaneously.In case that the data of different services is correlated, it can be compressed into a smaller packet; even in the uncorrelated case, it can still be transmitted in a single packet, and thereby it is helpful to reduce overhead in the network [32].
In this paper, we aim at providing node strategies with the number of selected nodes minimized.Let   be the data accuracy requirement of each service   ,    one subset of nodes selected from   to provide service   ,    ⊆   , and â (   ) the estimated data accuracy of service   when service   is provided by nodes in set    .The node selection problem for the service-oriented wireless sensor networks can be defined as follows: given a bipartite graph  = (, , ), in which  denotes the set of nodes,  denotes the set of services, and  denotes the set of edges between  and , and given the required data accuracy requirement   of each service   ∈  and spatial correlation among these nodes and corresponding sensing sources, the problem is to find a subgraph   = (  , ,   ),   ⊆ ,   ⊆ , and the objective is to minimize the number of selected nodes in   under the constraint that â (   ) ≥   is satisfied for each service   ∈ .

Integer Nonlinear Programming Formulation
In The variables  , can be obtained when the topology graph and the set of services provided by nodes are given.Obviously, the selected nodes for services   shall be selected from these nodes with  , = 1. ( Let â be the estimated data accuracy of service   , and â can be obtained via formula (5), which can be further described as follows: where   = ∑  =1  , and   denotes the number of selected nodes which are assigned to provide service   .
In order to satisfy the required data accuracy for all services in the network, we have the following constraint: â ≥   , ∀ = 1, 2 . . ., .
(3) Variables   .The variable   is 1 if and only if node   is selected to provide services required in the network: By following the above definition,   equals 0 if node   is not selected to provide any service, and otherwise it equals 1.As mentioned above, the variable  , denotes the case that node   ∈  can provide service   ∈  or not, and we have the following constraint: The objective of node selection problem is to minimize the total number of nodes that are selected to provide the required services, and the number of selected nodes can be calculated as ∑  =1   .
Then the node selection problem discussed in this paper can be formulated as In this section, we have introduced an Integer Nonlinear Programming (INLP) formulation for the node selection problem.The proposed INLP is generally considered as an efficient way to provide an accurate description on the problem formulation.This formulation is useful to find the optimal solution in case that the solution space is small enough with the help of some well-known tools, such as LINGO and MATLAB.However, INLP is a special case of integer programming which is proved to be NP-hard, and accordingly the INLP problem is NP-hard.Many related works have also been proposed to find the suboptimal solution for a given INLP problem [33][34][35].Although INLP has shown its good performance in the practical applications, it results in some well-known deflects such as computation as well as space complexity, especially in case that the solution space is very large.Unfortunately, the wireless sensor network generally includes hundreds to thousands of nodes; the variables required by the INLP mentioned above might increase exponentially with the node number.Another problem is that for a random network, it is hard to gather all the constraints mentioned above since these nodes are heterogonous and constraints for each node are fully different from each other.Moreover, in the practical applications, the sink is almost impossible to get the accurate information in the harsh environment by following the observation that the sensed data is generally a noisy version of the physical phenomenon.It is more practical to adopt a suboptimal solution with the data accuracy guaranteed instead of optimal one that is hard to find for an NP-hard problem.In this way, it is reasonable and necessary to develop heuristic algorithms for the node selection problem in the service-oriented wireless sensor networks.

Heuristic Algorithms
Heuristic algorithms are generally considered as an important way to solve the NP-hard problem.In this section, we propose two heuristic algorithms for the node selection problem in the service-oriented wireless sensor networks, namely, the Separate Selection Algorithm (SSA) and the Combined Selection Algorithm (CSA).

Separate Selection Algorithm (SSA).
The basic idea of the Separate Selection Algorithm (SSA) is that we select a minimum number of nodes for each required service with the data accuracy guaranteed in a separate way, and the union of selected nodes for all services is considered as the problem solution.The key process for the SSA is how to select nodes for each service.Here we follow the idea with which nodes are selected in a sequence way, and in each step, we will choose the node that is potential to improve the data accuracy.
Assume that the current node selection solution is   = (  , ,   ), in which   = {   |  = 1, 2, . . ., },  = {  |  = 1, 2, . . ., } is the set of services, and   = {(  ,   ) |   ∈   ,   ∈ }.Let us consider the general case that one node, that is,   is considered to provide one special service, that is,   .Here we use  , to denote the data accuracy increment for service   in case that node   is selected to provide service   , where (  ,   ) ∈ . , can be calculated as follows.
(1) In case that â (   ) ≥   , we have  , = 0, which means that the data accuracy requirement for service   has already been satisfied that there is no more improvement once nodes   and (  ,   ) are added into the final solution.
(2) In case that â (   + {  }) ≤ â (   ), we have  , = 0, which means that the data accuracy of service   cannot be increased once nodes   and (  ,   ) are added into the final solution.
(4) In case that â (   ) <   < â (   + {  }), we have  , =   − â (   ), which means that we generally neglect the part of increment that exceeds the requirement since the solution is required only to provide the asked data accuracy.
The pseudocode for SSA is listed in Table 1.In Line 1, the set of selected nodes    for each service   is initially set as 0. In Lines 2-9, the algorithm tries to select nodes for each service   ∈ .In case that the current data accuracy cannot satisfy the data accuracy requirement, that is, â (   ) <   , we firstly check all the candidate nodes that are useful to improve data accuracy.Secondarily, we select one node with maximum data accuracy increment as the candidate (in Line 4).Finally, the selected node is added into    to provide service   (in Line 6).This process continues until enough nodes are selected for all services.
We illustrate the SSA algorithm by an example given in Figure 3.As we can see from Figure 3(a), the network has four nodes and each service is supported by three distinct nodes.The required data accuracy for each service is listed on the right side; for example, the required data accuracy of  1 is 0.8, Table 1: Description of the symbols.

Symbol Description n
The number of nodes m The number of services G The node-service bipartite graph P The set of nodes S The set of services E The set of edge between P and S The node-service assignment bipartite graph    The set of nodes that selected to provide services    The set of edge between   and S The  and the data accuracy increment of each node when services is provided only by this node is also listed on the left side; for example, the data accuracy increment of  1 is 0.5 for  1 and 0.0 for  2 .Without loss of generality, we assume that the data accuracy increment of each node can be added directly to simplify description; that is, the data accuracy of  1 provided by  1 and  2 is 0.5 + 0.4 that equals 0.9.The algorithm will select nodes for  1 and then  2 .Node  1 with the maximum data accuracy increment for  1 will be selected firstly.In case that the algorithm has selected  1 , both node  2 and  3 can guarantee the required data accuracy of  1 , and we assume that the randomly selected one is  2 .Similarly, service  2 will firstly select  4 .In case that the algorithm has selected  4 , both  2 and  3 can guarantee the required data accuracy of  2 .However,  2 is intended to be selected by  2 in case that it is already selected by  1 .After that, the selected nodes have guaranteed the required data accuracy of  1 and  2 , and the algorithm will stop.As we can see from the final solution shown in Figure 3(b), the SSA algorithm selects three nodes to provide  1 and  2 with one node reduced.

Combined Selection Algorithm (CSA).
The previous proposed SSA tries to select nodes for each separate service based on the criterion of data accuracy increment.However, some nodes will provide several services simultaneously in the wireless sensor networks, and this multiservice property can help to improve the performance of node selection strategies if we simply select one multiservice node to improve the data accuracy required by different services.As we can see from Line 4 in Algorithm 1, we intend to select the candidate node that is already chosen to provide some other service in SSA algorithm, which means that nodes with multiservice property are more preferred in SSA algorithm during node selection process.However, this separate selection process is not always efficient especially in some cases.For example, the sample network given in Figure 3 can obtain a better solution than the solution found by SSA.If we do not select  1 and  4 which have maximum data accuracy increment for one special service, but select  2 and  3 which can provide service  1 and  2 simultaneously, it is obvious that this solution can guarantee the required data accuracy of  1 and  2 .However, this solution has fewer nodes than SSA because it only selects two nodes with two nodes reduced.In this section, we will introduce a new Combined Selection Algorithm (CSA) that has utilized multiservice property for node selection problem in service-oriented wireless sensor networks.
In this paper, we aim at minimizing the number of selected nodes with the data accuracy guaranteed for all services in the network.There are two important factors that will influence the number of selected nodes; that is, the number of services and the service quality of each node.Intuitively, it helps to reduce the number of selected nodes in case that they can provide more kinds of services since more nodes are potential candidates during the selection process.However, nodes might have poor data accuracy when they are far away from the sensing source although they can provide the required services.It means that we shall consider the data accuracy as well as the number of services simultaneously during the node selection process.
The basic idea of the CSA is described as follows.Initially no nodes are selected and the final solution is an empty set.
Then, nodes are chosen and added into the final solution in a sequence way.In each step, we intend to select the node with maximal contribution increment to all services in the network (which will be discussed in details below in this section).This process continues until the data accuracy for all services is satisfied, and finally we can obtain the selected nodes as well as the services provided by each node for the problem.
Assume that   is considered to provide services   in the current selection bipartite graph   .In case that  , = 0, it is obvious that there is no benefit for node   to provide   , and we have (  ,   ) ∉   .In case that  , > 0, we can see that node   helps to improve data accuracy of service   , and we have (  ,   ) ∈   .In this way, for a given   , we can calculate the  , for each   .Generally, we intend to choose the node that has more contribution increment to the data accuracy.
Here we have introduced  , (0 <  , ≤ 1) as a coefficient to demonstrate the impact of current contribution increment on the final data accuracy, and it is formulated as ,  , > 0. ( As we can see from the above formulation,  , = 0 in case that  , = 0, which shows that node   has no contribution to the data accuracy of service   .Here we adopt the power exponential function to indicate how much the data accuracy is close to the requirement value.Let   be the contribution increment of node   with the current selection bipartite graph   in case that   is selected to provide services, and   is formulated as where  , (0 <  , ≤ 1) is a coefficient and  , denotes the data accuracy increment for service   in case that node   Input: Node-service bipartite graph  = (, , ) and requirement for service data accuracy  1 ,  2 , . . .,   .Output: Node-service selection bipartite graph   = (  , ,   ).
1:   ← 0,   ← 0,    ← 0,  = 1, . . ., ; 2: while true 3: If for each service   ∈ , has satisfied â (   ) ≥   , the algorithm stops with solution found; 4: Calculate data accuracy increment  , and contribution increment   for each   ,   ∈  and   ∉   , and select the one   with maximum   and   > 0 as the candidate node; 5: If no such a candidate is found, the algorithm stops with no solution found; 6: for each service   , (  ,   ) ∈  and  , > 0 7: is selected to provide service   , which is as same as that in Section 5.1.
So far we have introduced the basic node selection process for the CSA.However, the algorithm can be further optimized.In case that the algorithm selects one node with maximum contribution increment, this node will provide each service that helps to improve the data accuracy.However, the node with multiservice and maximum contribution increment might have poor data accuracy increment for some services.Although the node that provides poor data accuracy increment can still improve the data accuracy, the data accuracy increment is so small that it needs to select more nodes to guarantee the required data accuracy.Therefore, we can further reduce the number of nodes by removing some already selected nodes that are with poor data accuracy increment for some services.Let us consider an example that two services are supported by two nodes, in which  1 can support  1 and  2 ,  2 can support  2 , and the required data accuracy for  1 and  2 is both 0.8.Suppose that the data accuracy of  1 provided by  1 is 0.8, the data accuracy of  2 provided by  1 is 0.3, provided by  2 is 0.8, and provided by  1 and  2 is 0.7.In the first selection, the value of  1 is larger than  2 according to formula (15), then  1 will be selected and provide service  1 and  2 .In the next selection,  2 will be selected and provide service  2 .However, the data accuracy of  2 provided by  1 and  2 is less than that provided by  2 .It is clear that we can improve the data accuracy of service  2 if we let  1 do not provide  2 .Note that the "bad" assignments (i.e., assigning nodes to provide services that are with poor data accuracy increment) cannot be eliminated during the selection process, due to the fact that they still help to improve the data accuracy.After selecting a new node, we can check all the selected nodes to find the "bad" assignments that were included in the previous selections.The basic idea of the optimization process is described as follows.In case that there is a special service for example,   , has selected a new node, we will firstly calculate â (   − {  }) for each node   ∈    , and select the one   with maximum â (   − {  }) and â (   − {  }) ≥ â (   ); after that, we let   do not provide   .This process continues until no more of this kind of nodes can be found from    .The pseudocode for CSA is listed in Algorithm 2. In Line 1, the set of selected nodes    for each service   is initialized as 0. In Lines 2-15, nodes are selected in a sequence way until data accuracy is guaranteed for all services.In case that there is some service that current data accuracy cannot satisfy its requirement, we will firstly calculate all the candidate nodes' contribution increment, and select one node with maximum contribution increment as the candidate (in Line 4).Secondarily, the candidate node is assigned to provide each service   that helps to improve the data accuracy (in Line 7).Finally, we check all the nodes in    and remove nodes from    without declining the data accuracy of service   , and this subprocess continues until no more of this kind of nodes can be found from    (in Line 8-12).The node selection process continues until enough nodes are selected for all services.
We illustrate the execution of CSA algorithm during one round of the iteration process by an example given in Figure 4.In this example two services are supported by four nodes.The node-service bipartite graph  = (, , ) is given in Figure 4(a), and the current node-service selection bipartite graph   = (  , ,   ) is given in Figure 4(b).As we can see from Figure 4(b), the algorithm has selected  2 in   , then the available nodes are  1 ,  3 , and  4 .Suppose that the contribution increment of each available node has been calculated and the value of  1 ,  3 , and  4 is 0.1, 0.3, and 0.2, respectively.The relationship between available nodes and services is given in Figure 4(c), and there is an edge between   and   indicating that the data accuracy increment of   is larger than 0, that is,   helps to improve the data accuracy of   .In the next step, the algorithm will select one node with maximum contribution increment, that is,  3 in this example.According to Figure 4(c),  3 helps to improve the data accuracy of  1 and  2 .Then  3 will provide  1 and  2 , and the new solution is given in Figure 4(d).After a new node is added into the solution, the algorithm will execute an optimization process.We assume that there is one "bad" assignment ( 2 ,  2 ); that is, the data accuracy of  2 provided by  3 is not less than that provided by  2 and  3 .Then the algorithm will remove the assignment ( 2 ,  2 ) from the solution.The final solution of this round of the iteration process can be observed from Figure 4(e).

Complexity Analysis
Lemma 1.The time complexity of SSA is ( 2 ).
Proof.During the outside for loop, the algorithm selects nodes for each service   ∈ , and each execution of the outside for loop contains a while loop.In the while loop, the algorithm checks each node   , (  ,   ) ∈  and (  ,   ) ∉   and selects one node with maximum data accuracy increment.The while loop will continue until the data accuracy of   is satisfied.Because there are at most  nodes that can provide   and each execution selects one node, the execution of the while loop takes ( 2 ) time.Hence, the time complexity of SSA is ( 2 ).
Lemma 2. The time complexity of CSA is ( 3 ).
Proof.During the outside while loop, the algorithm will firstly check each node   ,   ∈ ,   ∉   , and calculate the data accuracy increment for each services and node's contribution increment.Because each node can provide at most  services and each execution selects one node with maximum contribution increment, this process takes at most () time.In the next step, the loop is executed to assign the selected node to provide each service that helps to improve the data accuracy, and there are at most  services.In each execution of the inside while loop, the algorithm checks each node in    and tries to find one node that without declining the data accuracy of   when this node does not provide   .The inside while loop will continue until no more of this kind of nodes can be found.Because each    contains at most  nodes and each execution of the inside while loop selects one node, the inside while loop takes at most ( 2 ) time and for loop is ( 2 ).Therefore, each execution of the outside while loop takes at most ( +  2 ) = ( 2 ) time.Because there are at most  nodes to be selected, the time complexity of CSA is ( 3 ).

Simulation Results and Analysis
In order to evaluate the actual behavior of the above algorithms, we have relied on the experimental simulation to show its performance.In this section, we have firstly introduced the building process of our simulation and then analyze the impact of spatial correlation parameters  and SNR  on the results.Finally, we compare the performance of SSA and CSA in different environments.
6.1.Simulation Setup.We use MATLAB as the platform tool that is used popularly in simulation of wireless networks.The scenarios are built in a square area 500 m × 500 m.The sensor nodes are random placed as well as the sensing sources.Here we assume that each sensing source is dedicated to one special service.Given the sensor nodes and the sensing sources, in the next step we need to decide the set of services provided by these nodes.Here we adopt the randomly model to determine whether node   can provide service   with a given probability ratio  (0 <  < 1); that is,   only provides   in case that the random value (between 0 and 1) is larger than .Here we also assume that each service is provided by at least one node.Otherwise, the scenario is rebuilt until this constraint is satisfied.And the data accuracy â for each service is assumed to be identical.In this work, we build 100 different scenarios and compare the average performance of the proposed algorithms.

Impact of Spatial Correlation Parameters 𝜃 and SNR 𝛽.
In this part, we analyze the impact of spatial correlation parameters  and SNR  on the performance of the SSA and CSA.The spatial correlation parameters  and SNR  are two parameters in the spatial correlation model, which we have introduced in Section 3.2.The spatial correlation parameter  denotes the correlation of sensed data between the distances among sensor nodes.As we can see from formula (2), the larger  indicates a high degree of spatial correlation; that is, the nodes in a network provide strongly correlated service.The SNR  denotes the noise strength that will affect the distortion of service.It is obvious that the larger  will result in low distorted sensed data; that is, the services provided by nodes are more accurate.As we can see, the two parameters  and  will affect the sensed data, which in turn influences the selection results.The first set of experiments is concerned with the impact of spatial correlation parameter  on the number of selected nodes.The simulation is done with 300 nodes and 10 services, and the SNR  is assumed to be 10 dB and  be 0.5.The spatial correlation parameter  varies from 500, 1000, and 2000 to 5000, and we study the average number of the selected nodes compared with the change of services' accuracy requirement â that starts from 0.7 to 0.97.As we can see from Figure 5, the number of selected nodes is minimized in case that â = 0.7, and it increases together with the increasing of accuracy requirement â.However, this process is not so significant until â reaches some special point.For example, the average number of selected nodes is among 1.79 to 4.34 when 0.7 ≤ â ≤ 0.9 in case that  = 5000 and using CSA algorithm to find solution; however, it increases rapidly when â > 0.9.Moreover, there might have not been enough nodes to support the required data accuracy requirement; for example, the maximum data accuracy requirement is about 0.87 in case that  = 500.
The second set of experiments is concerned with the impact of signal-to-noise ratio  on the number of selected nodes, which is illustrated in Figure 6.The simulation is done with 300 nodes and 10 services, and spatial correlation parameter  is assumed to be 2000 and  be 0.5.The SNR parameter  varies from 5 dB, 10 dB, and 15 dB to 20 dB, and we study the average number of selected nodes compared with the change of services' accuracy requirement â that starts from 0.7 to 0.96.We also can see that the number of selected nodes remains stable or varies linearly when â is smaller; however, it increases rapidly when â is larger than some special point.This conclusion is similar to that of Figure 5.As we know, the energy budget is an important criterion for the wireless sensor networks, and it will worsen the network performance if too many nodes are involved in the data sensing process.The compromise from a given application scenario will help to reduce the energy consumption by selecting a proper accuracy requirement.

Performance Comparison between SSA and CSA.
So far as we know, this is the first works concerned with the node selection algorithms with data accuracy guaranteed for the service-oriented wireless sensor networks.Most of the related works [28][29][30][31] focused on different research issues, such as target tracking, and topology control.Wang et al. [13] had proposed a scheduling algorithm for the service-oriented wireless sensor network, but it did not consider the data accuracy.In this section, we compare the performance of SSA and CSA in different scenarios with varied accuracy requirement, number of nodes , number of service , and the value of , respectively.Figure 7 has shown the number of selected nodes with SSA and CSA when the accuracy requirement â varies from 0.7 to 0.95.The simulation is done with 300 nodes and 10 services, and spatial correlation parameter  is assumed to be 2000, SNR  to be 10 dB, and  to be 0.5.The experimental results show that CSA has better performance compared with SSA in all situations.
The second set of simulations is done to show the impact of network size on the number of selected nodes.The simulation is done with 300 nodes and 10 services, and spatial correlation parameter  is assumed to be 2000, data accuracy requirement â to be 0.92, SNR  to be 10 dB, and  be 0.5.And the network size varies from 100 to 500.As we can see from Figure 8, the CSA has better performance than SSA in all cases.Furthermore, we have two observations from Figure 8. (1) The average number of the selected nodes is relatively smaller in case that the network size is larger.This is due to the fact that there are more potential candidates for a given service with the network size increasing, and it helps to reduce the number of selected nodes.(2) The number of the selected nodes decreases slightly in case that the network size reaches some special point.It implies that it is helpless to reduce the number of selected nodes by adding more nodes into the network.The third set of simulations focuses on the impact of the number of services in the network on the number of selected nodes.We use the similar parameters in the second set of simulations.As we can see from Figure 9, CSA runs better than SSA with different value of  although it is not so significant when  is close to 5.
The fourth set of simulations focuses on the probability  on the number of selected nodes by varying  from 0.1 to 0.9.We also use the similar parameters in the second set of simulations.In fact, the parameter  indirectly represents the number of services provided by nodes in the network.As we can see from Figure 10, the number of the selected nodes is rather close with SSA and CSA when  is small enough.Particularly, the SSA is even slightly better than CSA when  = 0.1.However, the CSA shows better performance when the value of  increases.Meanwhile, we can also obtain two conclusions from this set of experiments.(1) The average number of the selected nodes decreases with  increasing.The larger  results in more services that can be provided by selected nodes.Thus, each node can make more contribution to the required services, which in turn reduces the total number of selected nodes.for example,  > 0.5, the average number of selected nodes decreases slowly with  increasing.

Conclusion
To provide various services is one important trend for the future wireless sensor networks, and the service-oriented architecture allows different services supported simultaneously in the same physical area in which one sensor can provide different kinds of service.Quality of services, such as data accuracy, is one of the key criterions for applications because the sensed data is generally a noisy version of the physical phenomenon.The spatial correlation among the sensed data makes it possible to select a subset of nodes to provide the required services while the data accuracy is guaranteed, which is obviously helpful to improve the performance of the wireless sensor networks.We are concerned with this issue in this paper and have formulated the node selection problem into an Integer Nonlinear Programming (INLP) problem.We also have developed two heuristic algorithms, namely, Separate Selection Algorithm (SSA) and Combined Selection Algorithm (CSA) for the problem.In the future work we are to develop efficient scheduling schemes for the node selection process and aim at providing a solution for the service-oriented wireless sensor networks with the network lifetime maximized.The temporal correlation is also important to optimize the network performance.We also plan to explore energy-efficient scheduling schemes for service-oriented wireless sensor networks with both spatial and temporal correlation considered.

1 Figure 2 :
Figure 2: An example of the bipartite graph in which each service is supported by three distinct nodes.

Figure 4 :
Figure 4: An example of the execution of CSA algorithm during one round of the iteration process.(a) The node-service bipartite graph  = (, , ).(b) The current node-service selection bipartite graph   = (  , ,   ).(c) The relationship between available nodes and services, and there is an edge between   and   indicating that   helps to improve the data accuracy of   .(d) Select  3 to provide  1 and  2 .(e) The final node-service selection bipartite graph   of this round of the iteration process.

Figure 7 :
Figure 7: Comparison of SSA and CSA with different data accuracy requirements.

Figure 8 :Figure 9 :
Figure 8: Comparison of SSA and CSA with different network size.

( 2 )Figure 10 :
Figure 10: Comparison of SSA and CSA in different value of .
Nodes supporting service  2 Nodes supporting services 1 and  2  Figure 1: An example of a service-oriented wireless sensor network in which service  1 is supported by nodes  1 ,  2 ,  3 ,  4 and  5 , service  2 by nodes  4 ,  5 ,  6 ,  7 and  8 .
The variable  , is 1 if and only if (  ,   ) ∈ ; that is, node   can provide service   : this section, we present an Integer Nonlinear Programming (INLP) formulation for the node selection problem.Integer programming is a mathematical optimization or a feasibility program in which some or all of the variables are restricted to be integers.INLP is a special case of integer programming, where some of the constraints or the objective functions are nonlinear.INLP is considered as an efficient technique to solve the optimization problem with nonlinear constraint, so that it is feasible to express the node selection problem as INLP.This paper is concernd with the node selection problem, and the objective is to minimize the total number of selected nodes with nonlinear data accuracy constraint.We use the following set of binary integer (0 or 1) variables and constraints in the INLP formulation.(1)Variables  , for each node   ∈  and service   ∈ .
) Variables  , for each node   ∈  and service   ∈ .The variable  , is 1 if and only if node   is assigned to provide service   :International Journal of Distributed Sensor Networks sensor networks, in which each node   is assigned to provide service   if  , = 1.Note that  , equals 0 in case that  , = 0, which means that node   cannot be assigned to provide service   since it is not supported.In this way, we have the following constraint:  , −  , ≥ 0, ∀ = 1, 2 . . ., ,  = 1, 2 . . ., .