Improving the Indoor Localization Accuracy for CPS by Reorganizing the Fingerprint Signatures

This paper firstly represents a survey of recent researches on localization of CPS and analyzes the advantages and limitations of the existing schemes. To overcome the limitations, a novel localization scheme is then proposed, which includes site survey, received signal comparison, and location correction phases. The main improvement of our proposed scheme is to filter the noises and interference added to the received RSSI signal; the detailed procedure is then provided with mathematical analysis. Finally, the experiment and simulation results are presented with analysis.


Introduction
In recent years, wireless sensor networks [1,2] achieve rapid development, which constitutes the fundamental of cyberphysical systems (CPS). In many CPS, sensed data will only be useful with the location information, which is so called location based services (LBS). Therefore, the localization of sensor nodes is essentially important for implementation of WSNs.
Many localization schemes have been introduced, which can be divided into range-based [3] and range-free [4][5][6] categories. Due to the big difference between indoor and outdoor environment, schemes designed for indoor or outdoor localization are relatively different. Only indoor environment is considered in the scope of this paper. GPS [7] localization is a mature method nowadays, which uses satellite signals for localization. However, it does not work as well indoors due to the blockage of signals by walls and other barriers. Furthermore, GPS components are relatively expensive and large in size if high precision is required, which is not suitable for sensor nodes. Therefore, other available radio signals indoors are required. Nowadays, with the implementation of huge amount of WiFi [8] access points (APs) all around the world, many shopping malls and other public places offer WiFi signals. People with smart phones and other equipment such as pads or laptops can sense and measure WiFi signals and use them for localization with RSSI signature.
In theory, radio RSSI value decreases with distance increases. Because of multipath fading [9] and other signal attenuations [10], WiFi RSSI signature indoors is not smoothly monotone decreasing. It presents fluctuation in some pattern. Therefore, accurate ranging is not practical without complicated equipment. Therefore, range-based schemes are not good choices in indoor environment. To overcome such big problems, range-free schemes, such as fingerprint-based indoor localization [11][12][13], are introduced. It actually alters the algorithm kernel from geometry issues into classified issues. Original fingerprint-based localization aims to locate a user to the most likely sampling site, which serves as a reference location in positioning model. This algorithm is widely used for the sake of easy understanding and realization. The impact from random human presence can be offset to a large extent due to the fact that it only selects single ref-location. But the localization accuracy depends on the sampling sites interval and density utterly. The cost of preliminary manual work such as site survey [14,15] will be increased exponentially once the accuracy needs to be improved.
Through our experiments, it is found that high frequency radio signals, such as WiFi RSSI signatures, are remarkably susceptible to human presence and movement. Human 2 International Journal of Distributed Sensor Networks crossing the link between a WiFi AP and an RSSI receiver (AP-Receiver link) causes distortion of RSSI signature values remarkably, rebound after leaving, which causes localization inaccuracy. Furthermore, when there is no human interference, WiFi RSSI signatures present an inherent fluctuation that is a Gaussian distribution [10], which induces localization inaccuracy as well. It gives rise to false capture if thevalue of an RSSI value and its prediction reaches the threshold to determine the location even though there is no human interference. To recover such kind of RSSI signature attenuated waveforms, filtering schemes have to be introduced to eliminate the impact of random human presence, such as exponential weighted moving average (EWMA) [16].
In this paper, to overcome these limitations, a novel scheme called Fingerprint Signature Reorganizing (FSR) is introduced, with Overlap-based Weighted Nearest Neighbors (OWKNN) algorithm, which is based from WKNN [17,18]. By using the same interval and density of sampling sites as original scheme, both accuracy and standard deviation of localization can be improved significantly. This scheme is no longer susceptible to human presence [19] and movement as a Linearly Weighted Moving Average (LWMA) [20] scheme is adopted. Cooperative localization based on joint distribution of received APs-is proposed for further improvement of localization accuracy.
The remainder of this paper is organized as follows. Section 2 outlines the previous work carried out on fingerprint-based localization algorithms, KNN, R-KNN, WKNN algorithms and valuable ideas with respect to localization. The scheme model and algorithms are described in Section 3. Section 4 presents a description of the experiment scenario and gives some practical results. Large scale, high density network simulation is carried out in Section 5 followed by performance evaluation and discussion. Finally, in Section 6, the conclusion and future work are listed.

Related Work
In recent researches of indoor localization, WIFI RSSI-based scheme is widely used due to the fact that few extra hardware is required [14,21]. Although time of signal arrival (ToA) and time difference of signal arrival (TDoA) [22] can perform well outdoors, they suffer great multipath interference and other signal attenuations indoors [9,10]. As Heurtefeux and Valois [21] mentioned, these solutions require dedicated hardware at both emitters and receiving ends and perform low in energy consumption. They also pointed out the limitations of RSSIbased schemes, such as signal instability and susceptibility to interference. However, if these problems can be solved, or at least partly solved, WiFi RSSI-based schemes are supposed to be applicable with high performance.
To determine the relationship between signal strength and RSSI signature, Rappaport [10] proposed path loss model, which presents log-distance path loss model when signal propagates in open space. Tested signal received power is given by and theoretical average signal received power is given by These equations keep relatively accuracy in free space propagation.
Other researches [12,13,17,[23][24][25][26][27] also proposed path loss model or radio propagation model generally based on log-distance path loss model and their tested RSSI signatures also confirmed the Gaussian distributed random variable model. They adopted different parameters in order to apply to specific scenario or hardware. In [24], authors set an alternative formulation: Although radio propagation model offers the formula about RSSI and Transmitter-Receiver distance, precise positioning based on it is not feasible particularly indoors. Tanglesome surroundings induce unexpected small-scale attenuation, so trilateration [27] by T-R distances presents deviation because localization accuracy of range-based schemes highly depends on the accuracy of T-R distance estimation. Therefore, fingerprint-based localization [11][12][13] is widely adopted due to its feature of range-free. Walls, furniture, showcases, and so forth will not change RSSI fingerprint signatures and naturally have little impact on this algorithm as long as they stay statically.
Fingerprint-based localization schemes require a fundamental step called site survey [14,15] that offers the data sampled serving as a reference locations system. However, the cost of preliminary manual work on site survey is a conspicuous shortage for such schemes. Wu et al. [15] and Chintalapudi et al. [14] proposed their schemes that reach an uplifting outcome; meanwhile, preliminary work lessens radically from site survey. An alternative way is to improve localization accuracy remarkably compared with previous schemes by costing the same in site survey.
Many schemes have been proposed to classify WiFi APs by using correlation coefficients of listened RSSI values, such as centroid [24], WDF [25], KNN [28,29], R-KNN [17], WKNN [18]. Among which, WKNN is most reasonable in certain cases by focusing on estimating reference locations and their weights. Accuracy is improved significantly compared to original fingerprint. In this paper, OWKNN algorithm is proposed and the overlap ratio of APs between sensed by users and related reference locations is taken into consideration on weights estimation.
It is known that WiFi uses short wave signal, which is susceptible to interference. All the algorithms above perform retrograde in noisy cases, especially deteriorated on OWKNN's weight estimating. Outemzabet and Nerguizian [30] adopted Kalman filtering and Kim and Noble [16] used EWMA to filter noisy values and both of them performed well. This paper opts for LWMA [20] instead for the sake of simplification on computation complexity and codes realization, meanwhile, obtaining almost the same efficiency with respect to RSSI signature values recovering.

Scheme and Algorithm Description
In this section, the proposed Fingerprint Signature Reorganizing localization scheme and corresponding algorithms and the mathematical models are described. The main assumptions and key points are listed as follows.
Fingerprint-based localization schemes work according to comparison between user's received RSSI signatures and traversal of reference parameters computed and stored in database sampled from site survey. The similarity level determines the estimating location ultimately. It actually alters the kernel of localization from geometry issues into classification issues. RSSI values received present a Gaussianlike distribution without human interference. So the average values of RSSI signatures is obtained to help get the stable parameters and serve the algorithm efficiently, which is supposed to be closer to the theoretical RSSI signatures. However, in real world, RSSI signatures cannot get rid of the impact of human presence, which gives rise to deviation of the average values and degenerates localization accuracy. LWMA filtering can help recover the scenario back to no interference situation.

Fingerprint Signature Recognizing Localization Scheme.
Although original fingerprint algorithm can offset the impact of random human presence to a large extent due to its position estimating chooses only one reference location sampled from site survey, which is regarded as single NN algorithm as well. The localization accuracy depends utterly on the sampling sites interval and density.
To overcome the limitations and maintain the advantages, a Fingerprint Signature Recognizing (FSR) localization scheme is proposed which only alters slightly from works that existed. Figure 1(a) is the overview of the flowchart of the scheme. Compared with original fingerprint localization scheme ( Figure 1(b)), LWMA filtering, cooperative localization, and OWKNN algorithm are introduced to enhance localization performance. The method of estimating reference locations' weights is the crucial step in fingerprinting algorithm. Many fingerprinting algorithms take the -value of RSSI signatures to average a user and a reference location received as the basic parameter to compute the weights in an inverse correlation way.
In our proposed OWKNN algorithm, a simple, efficient, and widely used method is adopted, which is demonstrated in Algorithm 1, where is a constant value used to protect computation from division by zero. By obtaining the average ℎ in fingerprinting, over weighted cases caused by stochastic deviation of APs deployment can be omitted, while the same efficiency in weight evaluating can be achieved.

Capture and LWMA Filtering.
Not every RSSI signature value received is supposed to be replaced in filtering; otherwise the stochastic deviation of several first prior parameters selected will be passed to entire following RSSI values. Human presence on the link between a WiFi AP and an RSSI receiver causes RSSI signature values' attenuation remarkably, rebound once human leaves (Figure 2 to be captured and filtered by LWMA to eliminate the impact of random human presence. The equation is given by which is predicted by reference parameters before signature waveform, or which is predicted by reference parameters after signature waveform when reference parameters are insufficient before it, where [ ] is the th value, is current index in list, is the number of prior parameters, is number of RSSI signatures attenuated caused by human interference, and [ ] is the predicted value computed by LWMA, which works on the -value of RSSI and its prediction reaches a ℎ ℎ . The trigger condition of LWMA filtering, defined as a capture (Figure 2(b)), is given by The two captures are restrained according to an RSSI signature attenuated waveform that is the target to be filtered. Other RSSI signatures present a Gaussian distribution. So the average value nonfiltered is supposed to be feasible for fingerprint algorithm without human interference. The length of receiving RSSI signatures list is assigned to 13 within receiving frequency 4 Hz. It determines that RSSI signatures receiving period is about 3 seconds; meanwhile, human presence on an AP-Receiver link occupies 1 second when moving, which determines = 4. So assigned to 3 is reasonable. RSSI signatures attenuation caused by human presence is about 6 dB when it takes place on an AP-Receiver link with distance of 3-5 meters toward the receiver. Once the link becomes longer, there is more probability for human presence on it, which means similar level of interference by sum of human presence. On average, ℎ ℎ = 3.5 dB. The -value of RSSI average value before and after filtering can be 4 × 4 ÷ 13 ≈ 1.23, which is multiple of statistic deviation without human interference. Localization accuracy descends more if the attenuation becomes more severe.
A human moving path can cover a list of AP-Receiver links (Figure 3(a)) in real world. It induces a user's interfered -, of which format is = { 1 , 2 , . . . , }. APs in it are ordered by the sequence to be interfered by International Journal of Distributed Sensor Networks 5 human presence. Localization accuracy deteriorates when , the length of , increases due to the fact that more data is deviated.

False Capture and Cooperative Localization.
The fluctuation of RSSI received presents a zero-mean Gaussian distribution within a 6 dB range width. There exists a certain probability that the -value of an RSSI and its prediction reaches the threshold, which is defined as false capture, without human presence on the AP-Receiver link. LWMA filtering will replace the RSSI signature values that trigger the capture condition coincidentally without interference.
( ), the probability of false capture with respect to the RSSI signatures list of , is computed precisely by where It is observed that ( (1) ) is approximated to ( ) after calculating the parameters of RSSI signature fluctuated distributions based on formula (7) even in worst case ( Table 2). The -value between ( ( ) ) and ( ) is a very minor factor to the RSSI signature fluctuated distribution that impacts little the calculating results. So all ( ( ) ) in this algorithm can be replaced by ( ) and formula (7) can be simplified as where ( ) is the probability of false capture by any 4 values from [ ] to [ + 3]. ( ) is the probability sum of every false capture waveform that matches with the formula (6) when = 3, or where is a random value from −3, −2, −1, 0, 1, 2, 3.  Table 2.
Given the crowd density in a shopping mall, it has a big probability that two persons closed both have localization requirements at the same time. Single interference source is supposed to affect two closed receivers in a similar pattern due to temporal correlation and spatial correlation, which serve as a ground to cooperative localization. Human presence on AP-Receiver links when moving, as an interference source, which impacts RSSI signatures received from all the APs in real world (as shown in Figure 3(b)). There is an APs intersection between (proposed in Part B) of 1 and 2 , which is supposed to be interfered from identical source. Cooperative localization eliminates false capture main bases on the joint distribution of data structure , the interfered -of two adjacent users.
. . , } are the , respectively, to 1 and user who are close to each other geographically and sense APs at the same time. They are supposed to share every element in . Any ∉ ( 1 ∩ 2 ) is referred to as a false capture. ( ), the probability of false capture with respect to to be eliminated, in other words the efficiency of cooperative localization, is computed by , which presents the level of fluctuation RSSI signature received, has the dominated influence on the probability of false capture. It becomes bigger when increases.
( ) is 2.77% in real case, which is nonignorable taking place in a relatively stable situation. Eliminating of false capture is contributory for the final localization accuracy.
Cooperative localization reaches an uplifting outcome. Theoretically, ( ), the efficiency for eliminating false capture, is more than 99% in real case, of which distribution data is tested in experiment. ( ) is more than 40% in worst case, which is supposed to be a remarkable performance within the assumed worst fluctuated distribution of RSSI signatures.

Overlap-Based Weighted Nearest Neighbors Algorithm.
KNN-based algorithms evaluate user's location by the relationship among reference locations and their weights; the equation is Weights evaluation of ref-locations is the crucial step of fingerprint-based localization algorithms. Unlike rangebased schemes such as trilateration, the location obtained in range-free schemes is not strictly according to geometry, weights evaluation determine the most likely location based on classifying reference locations. Efficiency of such schemes is judged by two key standards.
(i) nearest neighbour locations are picked out correctly.
(ii) Their weights evaluated are reasonable with respect to user's location.
Algorithm performs well if weights evaluation is high efficiency and the localization accuracy is supposed to be better than range-based schemes due to the low performance in range evaluation indoors.
Many fingerprint-based algorithms take the -values of RSSI signature average calculated from a user and a reflocation's list as the base number to compute weights in an inverse correlation. But such schemes can confront a big problem that gives rise to a severely incorrect weights evaluation. In Figure 4(a), the 4 nearest neighbor ref-locations with respect to user's location are supposed to be the 4 black locations. However, there exists a situation that some APs can be on the midperpendicular of the line segment between a user and a relatively far ref-location, which is referred to as a crasher reference location (the blue location in Figure 4(b)). Due to mirror symmetry principle, the crasher's Euclid distances to the APs on the midperpendicular are closer to those of the user than those of the 4 nearest ref-locations (4 black locations in Figure 4(b)). Naturally, User.
V [ ] is almost , which can contribute to a high weight in fingerprinting due to the weights estimation in an inverse correlation. Weight of a crasher ref-location can be higher than some of the nearest neighbor ref-locations are supposed to be.
OWKNN algorithm is introduced for both ruling out crasher reference locations and reevaluating weights of nearest reference locations more reasonable.  shown in Figure 5). ℎ ℎ V is computed by two neighbor ref-locations. V < ℎ ℎ V implies that user's location is not in the polygon formed by nearest neighbor ref-locations expected, of which some can be crashers. This step helps OWKNN algorithm satisfy Standard (i) that nearest neighbor locations are going to be figured out correctly apart from the rest. is a constant estimated by (introduced in Algorithm 1) that confines the ceiling of weight due to ℎ max = 1/ when User.
V [ ] = 0. greater than the magnitude of ℎ max is sufficient to rule  is the weight reevaluated by V due to the following geometrical analysis, which verifies the algorithm satisfies Standard (ii) that weights of ref-locations are going to be reevaluated more reasonably.
In order to find out the method for weights reevaluation, the analysis of overlapped APs is based on the following assumptions.
(1) Sensing area of a receiver is modeled as a disk of radius . (2) The quantity of overlapped WiFi APs sensed at a user's location and a ref-location is in proportion to the overlap area of their sensing ranges.
Therefore, the weights evaluating issue become a geometrical issue through studying the overlap area instead of overlap APs. Formula (12) is simple and supposed to have high efficiency in making OWKNN algorithm qualify to Standard (ii): where and are constants to ensure nearest neighbor locations' > 1. Its precise evaluating method is demonstrated in the following.
The overlap area (Figures 6(a) and 6(b)) of two sensing ranges computed by distance of a user location and a reflocation is where is the sensing radius and = ‖ 0 ‖, which is the Euclid distance of two locations.
The curve of formula (13) is almost linear especially for the first half (Figure 7(a)), which is used in algorithm. So it can be simplified to a linear pattern for the sake of easier realization both in algorithm and coding: The relationship between and becomes a weighted centroid issue (Figure 7(b)). According to formula (11), equation for 0 's location is formed as Therefore, is precisely computed by and as where ( , ) is 's location and is 's weight with respect to 0 (Figure 7(b)) and the reweight in Algorithm 2.
In order to obtain the equation of computed by and , formula (16) should be transformed into the inverse function pattern. However, it is not feasible due to the fact that is not monotone changing considering all within their entire range space in formula (16). In the following analysis, a concise model (shown in Figure 7(b)) is introduced for the sake of figuring out an efficient evaluation for , which is based on lessening the dimensionality of formula (16) and the restrictions are listed as follows.
(1) Reference locations system is formed to grid pattern; (2) = 4, and ref-locations' interval is assumed to be equal.
The function images with respect to formula (16) restricted by the conditions above are demonstrated in diverse groups of 3 when 2 and 4 are changing as the variates and 1 is assigned to a fixed value, 1.
The function images of relationship between and with respect to Figure 8 and formula (14) are demonstrated in Figure 9.
It is observed that 1 increases drastically when 2 , 3 , and 4 increase in range space that is smaller than 1 and presents relative stability in another range space. Generally speaking, shorter means bigger ratio of overlap area , which finally induces higher .
This scheme obtains an evaluation in exponential pattern computed by V as described of formula (12), which received a high-efficiency evaluation result. Finally, the localization formula considered and altered from formula (12) is which is computed in algorithm when it is not a crasher from judging condition.

Scenario Study and Experiment Results
This section describes the characteristics of a general indoor scenario, such as a shopping mall, in which our scheme and algorithms work effectively. The RSSI signatures both within and without human interference through experiment are also demonstrated. The data collected and distributions calculated provide parameters of our algorithms and models. assume that hundreds of smart phones with WiFi devices can help indoor localization. OWKNN algorithm computes and by V in a simple, direct method. The analysis is based on the assumption that the quantity of WiFi APs in a given region is in direct proportion to its area. There are several exceptive cases that can impact OWKNN's effectiveness. Low density of APs will cause a severe stochastic deviation in a given region. If there are a handful of APs in the overlap region of two circles (Figure 6(b)), even one AP inside or outside of it can value the and V to a large extent. Inhomogeneous deployment of APs can induce the similar consequence. Therefore, WiFi APs' density and deploy pattern are necessary to be considered. Scenario can be divided into office buildings and shopping malls based on it.

WiFi APs Deployment and Its
WiFi APs are randomly deployed in office buildings due to unorganized relationship among office owners without considering the high coincidence for overlap of propagation range of WiFi APs. Nearly every office needs at least one WiFi AP on its own due to privacy and information security. The high density of APs can sharply decrease the proportion of stochastic deviation in a given region ( Figure 10) due to the large denominator. Through experiments, a WiFi receiver can sense 40-60 APs in an office building in a university.
The density of WiFi APs in a shopping mall is lower as open access is the only consideration. A receiver can sense 15-30 APs, which can give rise to higher proportion of stochastic deviation in a shopping mall than that in an office building. However, the APs deployment in a shopping mall is not random utterly, but substantially based on grid or cell pattern (Figure 11), which are supposed to be the most economic coverage schemes. The stochastic deviation becomes stable and predictable and that contributes to renew OWKNN algorithm. The and can be computed by taking both V and the stochastic deviation based on APs' density into consideration in a more reasonable method.

RSSI Signature Attenuated Waveform and Distribution.
One of the RSSI signature attenuated waveforms caused by human presence on AP-Receiver links is demonstrated in Figure 2(a). The vast majority of human interference can be captured in waveforms clearly.
Fluctuated distributions of RSSI signatures tested from experiments are collected and calculated to be used as the distribution of real case served in false capture. Several lists of RSSI signatures within different length of AP-Receiver links are demonstrated in Figure 12.

Simulation Setup and Performance Evaluation
In order to examine the efficiency of our advanced fingerprint-based localization scheme, experiments and simulations are carried out based on the following configurations. The entire scenario is set up as follows (APs deployment pattern is similar to Figure 10, the site survey scheme is bases on Figure 14, and the user locations is tested bases on Figure 15).    Table 1) and a radio propagation range of 30 meters, which also means receivers sensing area can be modeled as a disk of radius 30 meters, or = 30 m for formula (13).
(2) 100 APs are deployed randomly in a 100 m × 100 m space area. A receiver can sense 28 APs on average in the middle area of room given APs density and the sensing range. Considering random deployment, it senses 26-30 APs within stochastic deviation.
(3) The distance between samples in either vertical or horizontal direction is 5 meters in site survey. Therefore, a total of 441 sampling sites are formed from a grid of 21 meters × 21 meters, which covers the room serve as reference locations.
(4) 1681 users' locations are formed from a grid of 41 meters × 41 meters, which covers the room to be tested for algorithm performance evaluation.
Several key parameters and crucial conditions of algorithm are computed based on the setting characteristics of scenario. is assigned to 0.01, so = 100 in OWKNN which is enough to help rule out crasher reference locations but not change the proportion of weights among nearest neighbor locations. It makes algorithm qualified to Standard (i).
Above all, = 2 is enough to ensure > 1. The parameters qualify algorithm for Standard (ii). Localization accuracy and standard deviation are demonstrated in Figures  16 and 17.
There are comparisons among 9 cases both for localization accuracy and its standard deviation-Case 0: no interference; other cases, I: RSSI signatures are interfered but not filtering; F: RSSI signatures interfered are filtered by LWMA; C: colocalization recovers accuracy; 1: interfered by single human; 2: interfered by 4 humans semisurrounded; 3: interfered by 8 humans surrounded.  This scheme improves accuracy by 50.02%-58.69%; meanwhile, the standard deviation reduces by 39.75%-45.07%.
It verifies that advanced fingerprint-based localization scheme achieves a good accuracy and succeeds in recovering in interfered conditions. Theoretically, efficiency of cooperative localization is supposed to be 91.02%. However, it does not perform as high as expected due to the relatively severe fluctuation of RSSI in accept case. False capture eliminating only partly recovers the data from false capture deviation to stochastic deviation, which existed within the average   of a received RSSI list and impacts localization indirectly. That is the reason why the performance of colocalization is overwhelmed. As long as the fluctuation of RSSI becomes less severe, the smaller stochastic deviation makes colocalization perform better recovering.

Conclusion and Future Work
In this paper, an advanced fingerprint-based indoor localization scheme called Reorganizing the Fingerprint Signatures (RFS) for CPS has been introduced. The interference of human presence to localization accuracy has been studied and the corresponding scheme has been implemented. The scheme can also solve the problem of false capture and introduces cooperative localization for further increasing of localization accuracy. The experiment and simulation results show that this scheme can sufficiently improve the localization accuracy.