Robust Indoor Sensor Localization Using Signatures of Received Signal Strength

Indoor localization based on the received signal strength (RSS) values of the wireless sensors has recently received a lot of attention. However, due to the interference of other wireless devices and human activities, the RSS value varies significantly over different times. This hinders exact location prediction using RSS values. In this paper, we propose three methods to counter the adverse effect of the RSS value variation on location prediction. First, we propose to use an index location to select the best radio map, among several preconstructed radio maps, for online location prediction. Second, for an observed value of the signal strength of a sensor, we record, respectively, the distances from the sensor to the nearest location and the farthest location where the signal strength value has been observed. The minimal and maximal (min-max) distances for each signal strength value of a sensor are then used to reduce the search space in online location prediction. Third, a location-dependent received signal strength vector, called the RSS signature, is used to predict the location of a user. We have built a system, called the region-point system, based on the proposed three methods. The experimental results show that the region-point system offers less mean position error compared to the existing methods, namely, RADAR, TREE, and CaDet. Furthermore, the index location method correctly selects the best radio map for online location prediction, and the min-max distance method promotes the prediction accuracy of RADAR by restricting the search space of RADAR in location prediction.


Introduction
Indoor localization is important for many real-life applications. For example, it gives the location context of a context-aware system that provides proper settings of the system based on the location, activity, and physiology of the user and the environmental context information [1]. Recently, indoor navigation applications, which require an exact indoor location, are becoming a very popular research area [2]. Due to the increasing need for indoor localization, many indoor localization techniques have been proposed. An indoor localization method can be categorized as a rangebased or a range-free method [3]. While point-to-point distance information is required for a range-based method, it is not required for a range-free method. The techniques for estimating the distance between two communication nodes include the time of arrival (TOA) [4], time difference of arrival (TDOA) [5], and the angle of arrival (AOA) [6]. The TOA technique uses the radio signal propagation time to estimate the distance. The TDOA technique utilizes two radio signals with different propagation speeds and estimates the distance between the two communication nodes by measuring the difference between the arrival times of the two signals. Unlike TOA and TDOA, AOA technique measures the angle at which a signal arrives. It can be used to complement TDOA or TOA in location calculation [3]. Indoor localization methods that use range information usually achieve high accuracy in location estimation. For example, the Cricket [7] indoor localization system of MIT reported the error of 1 to 3 centimeters in position estimation. Despite being accurate in location prediction, the range-based localization techniques require large scale deployment and costly devices. The range-free location prediction techniques have received a lot of attention recently. The well-known rangefree location prediction methods include RADAR [8] and the probability-based methods [9][10][11][12]. RADAR is developed by Microsoft. In RADAR, for a predefined set of training locations, the received signal strength (RSS) values from several IEEE 802.11 access points are recorded in a database, called the radio map. To estimate the position of a user, the RSS values from the access points are collected at the location of the user. Afterwards, RADAR performs pattern matching of the collected RSS values against the RSS values in the radio map to find a fixed number of locations with the most similar RSS values against those of the user. Finally, the positions with the most similar RSS values are averaged to give the estimated position of the user. The probability-based methods also use the RSS values for location prediction. However, instead of a fixed number of locations for prediction, the probabilitybased methods use the Bayes theorem to predict the location of the user by finding the location where the collected RSS values of the user can be observed with the highest probability. In [13], the authors proposed to learn, at time 0 , a set of equations to fit the RSS values of a location using the RSS values of a set of reference points. With this method, the RSS value pattern of a specific location at a later time can be calculated by using the RSS value patterns of the reference locations at time . Therefore, the effort to collect the RSS values at the offline training phase can be significantly reduced. However, in an environment where the RSS values observed at a location vary over times, the regression equations learned at time 0 may not properly reflect the relationship between the RSS values of the location and those of the reference points. This may result in poor prediction accuracy. In [14], the authors proposed a method, called CaDet, which uses multiple decision trees for location prediction. They first divide the training dataset into several clusters and build a decision tree for each cluster. To predict the user's location, the RSS values of the user are compared against the means of the RSS values of each cluster center to find the cluster with the least distance from the RSS values of the user for prediction. Finally, the decision tree of the selected cluster is used to predict the location of the user. Besides using the values of the received signal strength, in [15], the authors proposed to use the link quality indicator (LQI) values for location prediction. They modeled the location prediction problem as a classification problem and used a neural network model to solve the problem. However, their method is more suitable for finding a coarse position for a user, such as in the kitchen or in the living room.
The most difficult problem for the range-free methods in location prediction is that the offline constructed radio map may not be suitable for online location prediction. The variation of the received signal strength values may outdate the radio map when an online location prediction is required. In this paper, we propose three methods to counter the adverse effect of the variation of the received signal strength values on location prediction. First, we propose to construct several radio maps over different nonoverlapping time intervals and use an index location to select the best radio map for online location prediction. Second, for an RSS value of a sensor observed in the location prediction area, we propose to record the minimal and the maximal (minmax) distance from the sensor to the locations where the same RSS value has been observed. The min-max distance information is used to reduce the number of locations required to be searched for in online location prediction. Thirdly, we propose to use a location-dependent received signal strength vector, called the RSS location signature, for pattern matching in online location prediction. A system, called the region-point system, which implemented the three proposed methods, has been implemented. The experimental results show that the region-point system offers less position prediction error compared to the existing methods, including RADAR, TREE, and CADet. Furthermore, the experiment also shows that the index location method correctly selects the best radio map for location prediction, and the min-max distance method significantly reduces the position prediction error of RADAR. The rest of this paper is organized as follows. In Section 2, we describe the phenomenon of the variation of the received signal strength values. In Section 3, we present the details of the region-point localization system. In Section 4, we present the experimental results. In Section 5, we give a discussion of the experimental result. Finally, in Section 6, we give the conclusion of this paper.

Variation of the Received Signal Strength
The most challenging problem for location prediction using RSS values is that the RSS values of a sensor observed at a fixed location change over different times [12][13][14]16]. In this paper, we use the MPR2400CA sensor, a ZigBee-based sensor called Mote, to show the phenomenon of RSS value variation over different times. The Mote uses the RF frequency band of 2.4-2.4835 GHz for communication. The 2.4 GHz band frequency is a very noisy band since the wireless local area network (802.11b and 802.11g), the Bluetooth personal area network (802.15.1), and the industrial, scientific, and medical (ISM) devices are all using this unlicensed frequency band. The interference from other networks or devices forces the received signal strength value of a sensor at a fixed location to vary significantly over different times. Furthermore, the unpredictable people moving and door opening or closing cause the changes in the reflection, absorption, diffraction and scattering of the RSS values amplify the variation of the RSS values in an indoor environment [13].
To show the variation of RSS values over different times, we collected 500 RSS values from a fixed location which is 84.85 centimeters away from a ZigBee sensor for a time interval of 4 consecutive hours. Figure 1

The Region-Point Location Prediction System
In this section, we present the implementation of a robust sensor prediction system which considers the variation of the RSS values.

The Components and
Layout of the System. The components and the layout of the system are shown in Figure 2.
The system is implemented in a classroom measuring 9.3 m × 13 m. There are three rows of tables with a desktop on each table. There are two doors and one electronic podium in the room. We placed ten Mote sensors, denoted by in Figure 2, as the reference sensors. A sensor, denoted by , is mounted on a moving cart for testing the location prediction algorithm. To predict the location of a user, the sensor (stands for the user) broadcasts a packet to the reference sensors. Upon receiving the packet from , a reference sensor records the RSS value of its received packet, stores the RSS value in a new packet, and then sends the new packet to the location prediction computer, denoted by in Figure 2, to predict the location of . Figure 3 shows the architecture of the region-point location prediction system. It contains the offline training phase and the online location prediction phase. The offline training phase contains the following steps.

Architecture of the System.
(1) For different time periods, collect the RSS values of the reference sensors for each training location and store the RSS values in the radio maps.
(2) Create a min-max distance table for each radio map.
(3) Find the index location for radio map selection. The online location prediction phase contains the following steps.
(1) Collect a number of RSS values at the index location.
(2) Select the best radio map for online location prediction. (3) At the location that needs to be localized, collect the RSS values from the reference sensors; find the region for location prediction using the RSS values and the min-max distance table. (4) Find the position of the predicted location in the selected region using the RSS signature of the collected RSS values.
The details of each step are discussed in the following.  Table 1 shows an example of the radio map.

The Min-Max Distance
values from the localization sensor when is fixed at a specific location. Similarly, the same RSS value observed by a reference sensor may be from different packets transmitted by at different locations. For example, the RSS value −29 dbm of sensor 8 in Table 1 is observed when is located at location (1, 1) and location (1,7). During the offline training phase, for each observed received signal strength value of reference sensor , we keep track of the minimum and the maximum distances from sensor to sensor . Table 2 shows an example of the min-max distance table.
The min-max distance table is used to reduce the search region of locations during the online location prediction phase.
3.5. The Index Location. As noted in [13], the radio map constructed in the training phase may not be suitable for online location prediction. We propose to use several radio maps for location prediction. Assume that the set of time intervals is = { 1 , 2 , . . . , }. Let denote the radio map constructed at time interval , ∈ .
Let , = ( , ,1 , , ,2 , . . . , , , ) denote the average RSS vector at location in , where , , , 1 ≤ ≤ , is the average of the received signal strength values of sensor at location . Then, for each location , we calculate , the summation of the Manhattan distances between every pair of average RSS vectors at location , where each vector belongs to a different radio map. That is, The index location is the location which maximizes . That is, ≥ , = 1, . . . , . During the online localization phase, we collect five received signal strength vectors at the index location. Take the average of the signal strength vectors, and then use the average RSS vector to select the best radio map for online location prediction. Assume that the average RSS vector is = ( ,1 , ,2 , . . . , , ). Then, the radio map * is found by using the following equation: * = arg min =1,..., 6 International Journal of Distributed Sensor Networks That is, we choose the radio map which minimizes the Manhattan distance against the online average RSS vector at location for online location prediction.
3.6. The RSS Location Signature. While the probability-based methods use the original radio map, as shown in Table 1, for location prediction, we propose to use a refined variant of the RSS vectors, called RSS signatures, for location prediction. An RSS signature of a location is a distinctive RSS representative for the location. Let ( , = ) denote the probability that the RSS value of sensor is observed at location . Probability ( , = ) is defined in the following equation: where fr( , = ) denotes the number of observations (frequency) of RSS value of sensor at location . Note that, since the RSS value of sensor may be observed at different locations, ( , = ) is the location distribution of the RSS value of sensor at location . We then define the discernability factor ( = ) of an RSS value of sensor by the following equation: The third and fourth terms of (4) together represent the entropy of location distribution of RSS value of sensor over different locations. The second term is used to normalize the entropy value to the interval (0, 1). The maximal value of the entropy function occurs when value of sensor is evenly distributed over locations. In this case, value of sensor does not have any discernability to distinguish between different locations. The higher the skewness of the location distribution is, the smaller the normalized entropy is. The normalized entropy value equals zero if the RSS value of sensor can only be observed at a single location. Therefore, the discernability factor of an RSS value of a sensor is a measure of the ability to distinguish between different locations in the system. Note that in (4) is the number of locations in the system.
Having defined the discernability factor, we define the weight of an RSS value of sensor at location by the following equation: where is the total number of RSS samples, that is, the number of RSS vectors, collected at location . Equation (5) shows that the weight of RSS value of sensor at location is the product of the discernability factor of RSS value and the probability of observing at location .
For location , we define its location signature at sensor to be the RSS value received from sensor whose weight is greater than that of any other RSS value received by at location from sensor . To obtain the RSS location signature vector for location , we find the RSS location signature value of each sensor , 1 ≤ ≤ . Table 3 shows an example of the  table of RSS location signatures for the radio map in Table 1. Table 4 shows the weights of the corresponding RSS location signatures in Table 3.

The Online Location Prediction Phase.
During the online localization phase, we first collect several RSS samples at the index location. Then, we compute the average RSS value vector of the collected samples and use it to select the best radio map for online location prediction.
To find the position of the user, we collect an RSS value vector, denoted by * = ( 1 , 2 , . . . , ), at the designated location of the user. Then, for each component of vector * , we refer to the min-max distance table to find the minimum and the maximum distances from sensor for this signal strength value. Figure 4 shows the minimum and maximum distances from three sensors for an example.
From the circles with radii of minimum and maximum distances from their corresponding sensors, we can find the intersection points, that is, 1, 2, 3, 4, 5, 6, and 7, as shown in Figure 4. Then, we find the bounding box of the intersection points as the region within which the position (coordinates) of the user is to be found.
Finally, we find the training locations within the bounding box and use these locations to predict the position of the user. The pattern matching on RSS location signatures is used to find the position of the user. For each location in the bounding box, we find the top-p weighted RSS value components of its RSS location signature. Then, we compute the Euclidean distance between the vector of the top-p RSS value components of location and the vector of the corresponding components of * . Let us denote the top-p weighted RSS value components of the RSS location signature of by = ( 1 , 2 , . . . , ) and the corresponding components of * by = ( 1 , 2 , . . . , ). Then, the Euclidean distance between and is calculated according to the following equation: After computing the distances between * and the RSS location signatures of the training locations in the bounding box, the position of the user is predicted to be the position of the location with the smallest Euclidean distance of its top-p weighted RSS value components against .

Experiments
To show the performance of the region-point system, we perform several experiments on location prediction in the classroom. In this section, we present the experiments and the results.
International Journal of Distributed Sensor Networks 7   Figure 2, we implement the localization system in a classroom. Figure 5 shows the layout of the reference sensors and the locations where the training samples are taken. The ground of the classroom is decorated with tiles. The tile's dimension is 60 centimeters on each side. We set the origin of the coordinate system at the top left corner of Figure 5. Ten reference sensors, denoted by large circles in Figure 5 1) and (1, 7). Note that, since each grid in Figure 5 represents one tile on the floor, the Euclidean distance between any two locations in Figure 5 can be calculated by multiplying their Euclidean distance by 0.6 meters. To build the radio maps, we collect 500 RSS value samples from each of the 16 training locations over a consecutive 4-hour time interval of the day. Three radio maps, denoted by 1 , 2 , and 3 , are constructed for the experiment.

The Experimental Environment. As shown in
For comparison purpose, we implement the RADAR method and a decision tree method called TREE and the CaDet method. For RADAR, the RSS vectors of different training samples from the same location are averaged. As a result, each location is associated with only one average RSS vector. To predict the coordinates of a test sample, three neighbors whose RSS vectors are among the top 3 shortest distances from the test sample are retrieved from the radio map and their corresponding coordinates are averaged to give the predicted coordinates of the test sample. To examine the effect of the search space reduction on RADAR, we revised 8 International Journal of Distributed Sensor Networks  the RADAR method by using the min-max distance table to confine the search region of RADAR. We call the revised RADAR method ReRADAR in the experiment.
For the TREE method, a decision tree is constructed for every radio map. The decision tree is then used to predict the coordinates of a test sample. Note that we use the CART decision tree model in R [17] to construct the decision trees. For CaDet method [14], we first use the K-means method in R to divide the training samples into three clusters based on their RSS vectors. A CART decision tree is then built for each cluster. To predict the coordinates a test sample, we compare the RSS vector of the test sample against the cluster mean of each cluster and select the decision tree whose corresponding cluster center has the shortest distance against the test sample to predict the location of the test sample. Figure 6 shows an execution of the radio map selection algorithm. It shows that the location (14,13) is chosen as the index location since it has the largest variance on the RSS values of different radio maps. Furthermore, based on the index location, radio map 3 is selected as the best radio map for the ongoing experiment.

The Experimental Results.
To conduct the experiment, we consecutively collect 20 RSS value samples at each of the 16 testing locations. Totally, there are 320 test samples. Figure 7 shows the four executions of the region point with different lengths of the RSS location signatures. Figure 7 shows that the longer the RSS location signature, the higher the prediction accuracy. However, the length effect decreases as the length becomes longer. This is evidenced in Figure 7, where the accumulated errors for region point with 9 and 10 components, respectively, are almost the same.   Figure 8 shows the accumulated errors for RADAR, ReRADAR, TREE, CaDet, and region point. It shows that the region point method has the smallest accumulated error compared with RADAR, ReRADAR, TREE, and CaDet. It also shows that the accumulated error for ReRADAR is much less than that of RADAR. The mean errors for the 320 test samples are 0.681, 1.29, 2.11, 2.87, and 2.99 for region point, ReRADAR, RADAR, CaDet, and TREE, respectively. Figure 8 shows the fact that the search region restriction using the min-max distance table effectively reduces the prediction error of RADAR. Figure 9 shows the accumulated errors for different methods based on radio map 2 . It again shows that the accumulated error for ReRADAR is much less than that of RADAR. The mean errors are 0.958, 0.991, 2.15, 2.29, and 3.18 meters for region point, ReRADAR, RADAR, CaDet, and TREE, respectively. Figure 10 shows the accumulated errors based on radio map 3 which is chosen by the index location. The mean  errors are 0.556, 0.822, 1.98, 1.18, and 1.30 meters for region point, ReRADAR, RADAR, CaDet, and TREE, respectively. Note that the errors for different methods based on 3 are all less than their corresponding errors in radio map 1 and 2 , respectively. This shows that the index location method correctly selects the best radio map for online prediction. It is also noted, from Figures 8, 9, and 10, that, although clustering before constructing decision trees helps to promote the prediction accuracy of CaDet, the improvement is not significant.

Discussion
The fact that the TREE and CaDet methods do not perform well in our experimental environment needs to be carefully studied. To do so we show the decision tree built by CART based on radio map 3 in Figure 11. Note that 3 contains 8000 samples with 500 samples for each of the 16 locations.    The label at the terminal node denotes the class and the number of samples in the training dataset that are classified as this label. For example, the terminal node 1 has label 1 13, which denotes location (1,13), and there are 173 samples being classified as location (1,13). The classification accuracy for the training dataset of this decision tree is 93.3 percent. Table 5 shows the confusion table for predicting the 320 testing samples based on the decision tree of Figure 11. It shows that 234 out of 320 samples are correctly classified, that is, correctly predicted. For comparison, we show the histogram of prediction errors for both TREE and region point in Figure 12. It shows that the region point has more samples correctly classified than the TREE does, that is, 283 versus 234. Furthermore, for the misclassified samples, the region point tends to classify them to their nearby locations. These two observations account for a less mean prediction error in region point than those of the TREE and CaDet.

Conclusions
In this paper, we present the implementation of a robust indoor localization system using a wireless sensor network.
In this system, we propose three methods to counter the adverse effect of variation on the received signal strength values on location prediction. First, we propose to use an index location to select the best radio map for online location prediction. Second, we propose to use the minmax distance table to confine the search region for online location prediction. Finally, we propose to use the RSS location signature for pattern matching in online location prediction. The experimental results showed that the index location method correctly selects the best radio map for online location prediction. It also showed that the min-max distance table method effectively reduces the prediction error of RADAR, and the region point system offers a higher

12
International Journal of Distributed Sensor Networks prediction accuracy than those of the RADAR, TREE, and CaDet.