Semisupervised Location Awareness in Wireless Sensor Networks Using Laplacian Support Vector Regression

Supervised machine learning has been widely used in context-aware wireless sensor networks (WSNs) to discover context descriptions from sensor data. However, collecting a lot of labeled training data in order to guarantee good performance requires much cost and time. For this reason, the semisupervised learning has been recently developed due to its superior performance despite using only a small amount of the labeled data. In this paper, we extend the standard support vector regression (SVR) to the semisupervised SVR by employing manifold regularization, which we call Laplacian SVR (LapSVR). The LapSVR is compared with the standard SVR and the semisupervised least square algorithm that is another recently developed semisupervised regression algorithm. The algorithms are evaluated for location awareness of multiple mobile robots in a WSN. The experimental results show that the proposed algorithm yields more accurate location estimates than the other algorithms.


Introduction
Location awareness is one of the most important issues for context-aware wireless sensor networks (WSNs) because many applications need to know where the data has been obtained.For example, in applications such as target tracking [1][2][3][4][5], network routing [6,7], human-centric service [8][9][10], and monitoring system [11,12], the measurement data is useless without location information of sensing device.
Machine learning methods have been adopted to learn location context from law sensor data.The standard SVR learning [13,14], manifold learning [15], and neural network learning [16,17] can be employed to obtain precise estimate of the location.However, all these methods require a large amount of the labeled training data for high accuracy.Also, a significant effort such as cost, time, and human skill is needed in order to generate a large set of the labeled data.To make location-aware algorithm viable with a negligible effort, this paper adopts a semisupervised learning which uses a small number of the labeled data.
Based on the semisupervised learning framework suggested in [18], various algorithms such as [19][20][21] have been developed by defining different loss functions and regularization functions.In this work, we develop Laplacian support vector regression (LapSVR) which extends the standard SVR to the semisupervised learning framework.This approach uses the standard SVR to define the loss function and kernelbased regularization.However, for utilization of unlabeled data, the manifold regularization based on the concept of the graph Laplacian [22] is performed.
We evaluate the LapSVR to estimate locations of multiple mobile robots in the WSN.The deployed anchor nodes broadcast RF signal whose signal strength (RSSI) is measured by the mobile robots.The mobile robots independently localize themselves by using the trained regression model and current RSSI measurements.The LapSVR is compared with the standard SVR and the semisupervised Laplacian least square algorithm [23].The location awareness performance is compared according to the number of labeled data, unlabeled data, and anchor nodes.From the experimental results, we confirm that the LapSVR estimates the location most accurately and is least sensitive to the variation of the number of the anchor nodes.
This paper is organized as follows.In Section 2, we provide details on Laplacian support vector regression algorithm.Section 3 describes the estimation process of mobile robots using the proposed algorithm.Section 4 reports empirical results with the description of the experimental setting.Finally, concluding remarks are given in Section 5.

2
International Journal of Distributed Sensor Networks

Laplacian Support Vector Regression
This section describes a semisupervised regression approach named Laplacian support vector regression (LapSVR).We use the standard SVR algorithm to define the loss function and kernel-based regularization.Also, the graph Laplacian is employed for manifold regularization.We describe these ingredients of the formulation in the following.
2.1.Semisupervised Learning Framework.We define a set of  labeled samples {  ,   }  =1 and a set of  unlabeled samples {  } + =+1 , where   ∈  ⊆ R  and   ∈ R. Semisupervised learning aims to establish a mapping  :  → R, by the following regularized minimization functional: where  is a loss function, ‖‖ 2  is the norm of the function in the reproducing kernel Hilbert space (RKHS), and ‖‖ 2  is the norm of the function in the low-dimensional manifold.  and   are the regularization weight parameters.

Manifold Regularization.
According to the manifold regularization (or manifold assumption), the data points are samples from a low-dimensional manifold which is embedded in a high-dimensional space.This is represented by the graph Laplacian and the kernel function: where  is the graph Laplacian given by  =  − , f = [( 1 ), . . ., ( + )]  ,  is the adjacency matrix of the data graph, and  is the diagonal matrix given by   = ∑ + =1   .In this paper, the edge weights   are defined as Gaussian function of Euclidean distance, given by where  2  is the kernel width parameter.Minimizing ‖‖ 2  is equivalent to penalizing the rapid changes of the regression function evaluated between two samples.

Loss Function for Regression.
We employ the -insensitive loss function: Also, slack variables   ,  *  are used with the -insensitive loss function in order to cope with infeasible constraints of the optimization problem that uses only the  constraint, described in [24].Figure 1 depicts an example of onedimensional linear regression function with -tube.The samples outside the -tube are called support vectors.Slack variables ,  * , which establish trade-off between model complexity and the support vectors, are determined by solving the optimization problem in the following section.

Complete Formulation.
In this subsection, the complete formulation of the LapSVR is given based on the ingredients described in Sections 2.1-2.4.Substituting (2)-( 6) into (1) with the slack variables   ,  *  , the primal problem is defined as min After the introduction of four sets of  multipliers ,  * , , and  * , the Lagrangian  can be obtained in the following: International Journal of Distributed Sensor Networks 3 In order to convert the primal problem to the dual representation, we take derivatives Using the above equations, we can rewrite the Lagrangian as a function of , , and  * only, given by where B = [ 1 −  * 1 , . . .,   −  *  ]  and   = [ × 0 × ] where  × is the  ×  identity matrix and 0 × is the  ×  zero matrix.
Taking derivatives with respect to , we obtain From (11), a direct relationship between  and B is obtained: where  is the ( + ) × ( + ) identity matrix.Substituting (12) back into Lagrangian (10), we arrive at the convex optimization problem: max subject to: The optimal solution B * of the above quadratic problem is linked to the optimal  * in (12).Then, the optimal regression function  * () can be obtained by substituting 2.6.Computation of .Now, the remaining work to establish regression function ( 2) is computation of the bias term .As described in [24],  is calculated by using Karush-Kuhn-Tucker (KKT) conditions which state that the product between dual variables and constraints has to be zero at the point of the solution.From (8), we obtain for any labeled data; that is,  = {1, . . ., }.Equations ( 14) lead to three conclusions.First, only the samples (  ,   ) with corresponding  ( * )  = 1/ lie outside the -tube.Second,  *    = 0; that is, a set of dual variables   ,  *  is never simultaneously nonzero.Finally, for   ∈ (0, 1/), we have and for  *  ∈ (0, 1/) we have Hence, we can compute  using any data point satisfying   ∈ (0, 1/) or  *  ∈ (0, 1/), given by In reality, the value of  can be slightly different depending on which data point is used in (17).Averaging over all  obtained from support vectors may improve robustness.

RSSI-Based Location Awareness
In RSSI-based location awareness, mobile nodes of unknown locations collect the RSSI measurements emitted by  anchor nodes.We assume that the anchor nodes are stationary with known location in  ∈  2 where  is the area of interest.Each mobile node (i.e., mobile robot in our setting) estimates its own 2D location using the RSSI measurements.
Here, we introduce notations of the labeled and the unlabeled training data sets.Because each robot estimates its own location independent of the others, we will describe the algorithm of one mobile node.In the training phase, we put the mobile node at the specific location   = [  ,   ]  and obtain the RSSI values   = [ 1 , . . .,   , . . .,   ]  ∈   , where   and   are the and -coordinate of the mobile node, and   is the RSSI value from the th anchor node for  = 1, . . ., .The total  labeled training data sets to learn the   and   are defined as L   = {  ,   }  =1 and L   = {  ,   }  =1 , respectively.On the other hand,  unlabeled data consists of only the RSSI measurements, given by U = {  } + =1 .In the test phase, with the trained semisupervised SVR model, the mobile node determines its own location ŝ() = [ŝ  (), ŝ ()]  using the RSSI set () ∈   measured at time .
We summarize the detail of the semisupervised SVR for RSSI-based location awareness Algorithms 1 and 2.

Algorithm 1 (training phase). Consider the following.
Step 1. Collect the training data set L  , L  , and U.
Step 4. Choose the regularization weights   and   in (1).
Step 5. Solve the QP problem twice in order to obtain  *   and  *   in (13).
Step 1.At time , the mobile node collects the RSSI measurements () from the  anchor nodes. Step

Experimental Results
In the experimental setup, the total of thirteen anchor nodes using IEEE 802.15.4 and TinyOS is deployed.The nodes use commercial CC2420 radio chip which operates in the 2.4 GHz ISM band with a data rate of 256 kbps and provides a received signal strength indicator (RSSI) for each received packet.Two mobile robots, which are equipped with one receiver node, respectively, estimate their own position using the RSSI measurements obtained from the anchor nodes distributed in the 3 m × 4 m workspace, shown in Figure 2.For accuracy analysis, the true locations of the mobile robots are measured by the Vicon Motion System [26] composed of eight cameras that track reflective markers attached to the mobile robots.
For obtaining labeled training data, we collect the RSSI data sets of the mobile robot placed at the different locations, repeatedly.On the other hand, the unlabeled data sets are obtained as we let the mobile robots move automatically and collect the RSSI measurements.Figure 3 is an illustration of the obtained data set.Note that the real locations of the unlabeled data are not included in the unlabeled data set, although Figure 3 shows the locations of the unlabeled data.

Characteristics of RSSI.
Radio signal is easily influenced by environmental noise.Also, its measured value affects the performance of the location awareness.Here, we examine the characteristic of RSSI measurement.
We record RSSI measurements of one moving receiver, emitted by the fixed thirteen anchor nodes.The distance between the anchor nodes and receiver varies from 0.2 m to 3 m. Figure 4 shows the boxplot of the obtained measurements.The central red bar is the median and the edges of the box are the 25% and 75%.The upper and lower bars outside the box represent extreme data points.The results show highly nonlinear and noisy RSSI measurements, which supports the need for learning to obtain accurate estimates.Still, we can observe that similar RSSI values denote similar distance values, which suggests that RSSI values are meaningful data representing the actual phenomenon in a kernel space (i.e., RSSI measurements obtained from our experimental setup satisfy manifold assumption described in Section 2.3).Hence, the RSSI is suitable to be used for the LapSVR.We note that the LapSVR can be used with other sensors as long as they satisfy the manifold assumption.

Model and Parameter
Setting.We use as the kernel function in (2) the radial basis function (RBF) given by (  ,   ) = exp(−‖  −   ‖ 2 / 2 ), where  is the kernel width parameter.In (4), the graph Laplacian consisted of ( + ) samples using -nearest neighbors, and the edge weights   are defined as Gaussian function of Euclidean distance.In all experiments, we set  = 0.5 and  = 10.
On the other hand, the regularization parameters   ,   are obtained using cross-validation.We compute leaveone-out in the range [10 −5 , 10 −2 ]. Figure 5 shows the crossvalidation result using 80 labeled data and 100 unlabeled data.We set optimal   ,   minimizing the training error.

Performance.
The proposed algorithm is compared with two other algorithms.First, the standard SVR is conventionally employed for the location awareness problem as in [14,27].Second, Laplacian least square (LapLS) is a recently developed semisupervised regression algorithm and its application to the location estimation can be found in [23].LapLS solves optimization using graph Laplacian and loss function in the least square that does not have any constraint for the optimization.Therefore, an optimal solution can be obtained without a numerical process.
Figure 6 illustrates the true and estimated trajectories of the LapSVR, standard SVR, and LapLS algorithms using the obtained data set in Figure 3, where the LapSVR gives the best tracking performance.In the following, we show three kinds of performance criteria to compare the algorithms.
First, Figure 7 shows the average location estimation errors according to the number of the labeled data, which are calculated after two mobile robots move along their own trajectories.We confirm that the proposed algorithm gives the best performance over the other two algorithms.
Second, location estimation error of the LapSVR according to the number of the unlabeled data is described in Figure 8. From this result, we confirm that the unlabeled data gives a positive effect when the labeled data is insufficient (less than 50 labeled data in Figure 8).The excessive unlabeled data (more than 50 unlabeled data in Figure 8) does not improve the performance.
Third, we examine the sensitivity of the location estimation error with respect to the number of the anchor nodes.We prepare the set 2 consisting of all the deployed nodes.The set 1 is formed with the nodes whose numbers are from 1 to 9 depicted in Figure 2. The set 0 consists of the nodes whose numbers are 1, 3, 5, 7, and 9.In this experiment, we use 40 labeled data and 200 unlabeled data.As the result in Figure 9, the LapSVR algorithm causes the smallest growth of error as the number of the anchor nodes decreases in comparison with the other algorithms.

Conclusion
This paper proposes a semisupervised regression algorithm named Laplacian SVR by combining the core concepts of the standard SVR and a manifold regularization framework.The proposed algorithm learns the relation mapping function between the observation space and the physical space using a small number of the labeled data and a large number of the unlabeled data which can often be easily acquired more than the labeled data.The suggested algorithm is evaluated for location estimation of mobile robots in the wireless sensor network.In our multirobot system, the input of the learner is defined as the RSSI observation set and the output is the physical location.The experimental results show that the proposed algorithm gives the most precise performance and   is least sensitive to variation of the number of anchor nodes in comparison with the standard SVR and the Laplacian least square algorithm.

Figure 1 :
Figure 1: A behavior of the -intensive loss function with slack variable  [24].

Figure 2 :Figure 3 :
Figure 2: Experimental setting with two mobile robots and thirteen anchor nodes in the 3 m × 4 m area.

Figure 4 :Figure 5 :
Figure 4: Relationship between RSSI measurements and distance between fixed thirteen anchor nodes (transmitter) and one mobile node (receiver).

Figure 7 :
Figure 7: Performance of the three different algorithms according to the number of the labeled data.