A Fuzzy Similarity Elimination Algorithm for Indoor Fingerprint Positioning

Fingerprint positioning can take advantage of existing WLAN to achieve indoor locations, which has been widely studied. We analyzed the corresponding positions distribution of similar fingerprints, and then found that the fuzzy similarity between fingerprints is the root cause of the larger errors existing. According to clusters distribution feature of corresponding positions of the similar fingerprints, we proposed a K-Means+ clustering algorithm to achieve fine-grained fingerprint positioning. Due to the K-Means+ algorithm failing to locate the positions of outliers, we also designed a linear sequence matching algorithm to improve the outliers positioning, and reduce the impact of fuzzy similarity. Experimental results illustrate that our algorithm can get a maximum positioning error less than 5 m, which outperforms other algorithms. Meanwhile, all the positioning errors over 4 m in our algorithm are less than 2%. The positioning accuracy has been improved significantly.


Introduction
Indoor positioning is the foundation of indoor location services.GPS with indoor limitation and cellular positioning with rough accuracy makes indoor positioning need for finegrained efficient positioning method.Fingerprint positioning algorithm does not rely on additional hardware overhead, which can utilize the existing infrastructure (e.g., WLAN) to complete the positioning tasks.First, through the acquisition of the received signal strength (RSS) at each sampling point of the object area, the offline fingerprint database will be constructed.Then, we are able to match the signal strength online measured with the fingerprint in the offline database and choose the corresponding position of the optimal matching fingerprint as the positioning result.Therefore, as long as the area can be covered by wireless WiFi networks, where it is easy to realize fingerprint positioning algorithm, the versatility of fingerprint positioning as a technology choice for indoor positioning will be improved.
The current researches on fingerprint algorithm mostly focus on two aspects.One is to reduce the cumbersome workload during offline acquisitions, and another is to improve the fingerprint positioning accuracy.Because the different orientation fingerprints need to be gathered at each sampling point of the target region, the fingerprint acquisition of a large-scale indoor space will be very timeconsuming.The current study of this issue is mainly based on the signal model [1,2] and crowdsourcing method [3,4], which have achieved many better results.For the positioning accuracy, researchers have designed various online matching algorithms to reduce the positioning error [5,6] and achieved the median error of around 2 m.But there always are some larger positioning errors with more than 6 m, which is one of the bottlenecks in improving the positioning accuracy.Liu et al. [7] considered the larger positioning errors problem in the practical application of fingerprint positioning.But they ignore the human blocking problem.In fact, the positioning equipment needs to be carried with users, where the human 2 International Journal of Distributed Sensor Networks (1) / * Compute the number of clusters of  * / (2)  = ()/ *   is the subset of fingerprint database * / (3) / *  is all the fingerprint database * / (4) for all ( ∈ ) do (5) / *   is the fingerprint of cluster center * / (6) if (( −   ) ≤  max ) then (7)  ∈   (8) end if (9) end for (10) if   is not convergent then (11) Center(  ) / * Compute cluster center * / (12) goto (4) ( 13) else (14) return   (15) end if Algorithm 1: K-Means plus.
body will lead to the signal multipath and shadow so as to produce severe attenuation of signal strength [8].In this paper, we analyze the impacts of different orientations and holding positions and find that body blocking will cause the positioning error over 8 m.This means that the multipath and shadow will further aggravate the larger positioning errors.As far as we know, there are few studies to reduce these large positioning errors caused by human blocking.
For the problem of larger positioning errors, we discuss the position distribution of some offline similar fingerprints and find there is fuzzy similarity between offline fingerprints.The corresponding positions of some similar fingerprints are far apart, called fuzzy similarity, which is the root cause of larger positioning errors.Meanwhile, the multipath and shadow will further aggravate the fuzzy similarity, which leads to more large positioning errors.Through further analyzing positions distribution of similar fingerprints, we also find that most of the corresponding positions had a cluster feature.Therefore, we design a method based on -Means clustering to achieve fine-grained positioning accuracy, which is called -Means+ (Algorithm 1).Due to the fact that the -Means+ positioning algorithm is invalid to the outliers, we further designed a linear sequence matching algorithm to improve the accuracy of outliers positioning, thus eliminating the problem of fuzzy similarity in offline fingerprints.Experimental results demonstrate that our algorithm can improve the positioning accuracy significantly.
The remainder of the paper is organized as follows.The related work is shown in Section 2. We analyze the fingerprint feature in Section 3 while leaving the details of our algorithm design in Section 4. Then we show the experiment and evaluation results of our algorithm in Section 5. Finally, we conclude the paper in Section 6.

Related Work
Fingerprint positioning with no limit of extra deployment is widely studied.Various methods, such as deterministic NN [9], Bayesian estimation [10], Sequential Monte Carlo [11], support vector machine [12], and neural network, are used for improving positioning accuracy.But most of fingerprint algorithms rarely reduce the larger errors caused by the body blocking in order to improve the positioning accuracy.Radar system is an earlier attempt of fingerprint positioning using wireless signal strength [8]; although the system finds that body blocking has a serious influence on the positioning accuracy, it does not provide an effective solution.Thereafter, some papers begin to consider body blocking problem during indoor positioning.Papapostolou and Chaouchi [13] build the different direction of signal attenuation model based on a lot of experiments and provide an orientation aware fingerprint positioning algorithm to reduce the influence of body blocking.COMPASS [14] algorithm introduces the device with a digital compass, measuring the personal orientation as a dimension of signal strength fingerprint vector to improve the positioning accuracy with body blocking.LoSF [15] algorithm provides a double node mechanism to avoid the nonline of sight effects from body blocking.These algorithms reduce the influence of body blocking to a certain extent, but they have not considered body blocking impact on fuzzy similarity fingerprint and are unable to avoid larger positioning errors.
Liu et al. [7] consider the fingerprint fuzzy similarity problem in the practical application of fingerprint positioning, which uses the existing mobile phone to provide a peer assisted algorithm.They adopt the sound ranging method to measure the distance between mobile users by the microphone and loudspeaker of mobile phone and use the acquired distance relationship between mobile users to constrain the fingerprint positioning results, which can prevent the emergence of larger errors.This method can avoid larger positioning errors, but it requires additional sound-based ranging method which will increase the energy consumption of positioning service.More importantly, the sound-based ranging method is hardly used in noisy public environments.
We analyze the distribution features of offline similar fingerprints and find similar fingerprints have a cluster position distribution feature besides the fuzzy similarity.Therefore, we design an efficient clustering method on offline fingerprints to eliminate fuzzy similarity and avoid the restrictions of sound ranging.

Fingerprint Fuzzy Similarity and
Positioning Performance

Body Blocking Influence.
To analyze the WiFi signal fingerprints positioning performance, we first conduct a study on the impact of various factors, such as orientation and holding position.Due to the development of smart phones, people use the mobile phone to obtain indoor positioning services increasingly.So we select a GALAXY Note 3 as the WiFi terminal device to acquire signal data.The test mainly studies the multipath and shadow influence on the WiFi signal fingerprint without considering the factor of device diversity.The testbed is an open lab area of 38 m * 26 m.Because the desks and chairs cover some parts of indoor office area, we just choose the 76 positions in the passable region (e.g., corridors) to sample signal fingerprint.There are 8 APs deployed for measurement as shown in Figure 1.At each sampling point, the user faces 0 ∘ , 90 ∘ , 180 ∘ , and 270 ∘ directions and holds the bottom and upper positions of mobile phone, respectively, to measure signal strength.Each measurement acquires 15 groups of signal strengths to calculate an average value.Thus we will generate a total of 608 records in the offline fingerprint database.
Generally, wireless signal strength will change with time leading to some measurement errors [15].Suppose that these measurement errors follow zero mean normal distribution with  variance.Fingerprint matching often uses the Euclidean distance to measure the similarity of fingerprint vectors, and then the maximum measurement error  max between fingerprints could be calculated by the following equation: where  is the variance of signal strength distribution,   is the received signal strength from th AP, and  is the number of APs.Only when the distance of two fingerprint vectors is greater than  max will there be significant fingerprint dissimilarity, which is called fingerprint granularity in this paper.Since error  is the inherent error from signal measurement, we call  max the maximum intrinsic fingerprint granularity error.
To analyze the influence on fingerprint granularity and positioning performance with different orientations and holding positions on mobile phone, we design four group tests using the 608 fingerprint records to evaluate orientation and holding position.We select 30 sampling points of the 76 points in Figure 1 to execute 5 times NN algorithm to compute the average value of the positioning errors.Meanwhile, we compute the Euclidean distances between 608 fingerprints to construct the fingerprint granularity distribution.In Figure 2, different orientation tests include the comparison between 0 ∘ orientation fingerprints and 90 ∘ orientation fingerprints and between 0 ∘ orientation fingerprints while holding the International Journal of Distributed Sensor Networks show the fingerprint granularity cumulative distribution with different orientations and holding positions.Suppose the mean error of WiFi signal strength measurement is 5 db [15].We can compute that the maximum intrinsic fingerprint granularity error is 28 db by (1).In Figure 2(a), the fingerprint granularity less than 28 db (i.e., similar fingerprints) accounts for 13% and 19%, respectively, where the different orientations have a larger similar proportion.In Figure 2(b), the fingerprint granularity less than 28 db accounts for 18% and 23%, respectively, where the different holding positions have a larger similar proportion.Meanwhile, holding positions have a bigger influence than orientations, which is because the hand is closer to mobile phone than human body.Figures 2(c) and 2(d) show the positioning performance of different orientations and holding positions.We randomly select 30 indoor sampling points and compute the average value of 5 groups of positioning results to compare positioning performance.We find that the higher proportion of similar fingerprints will lead to larger positioning errors.Meanwhile, there are always some larger errors over 6-8 m in the positioning results.This is the main cause of decrease of the positioning performance.To solve this problem, we further analyze the similar fingerprints in the offline database.fingerprint records and determine the corresponding position of these similar fingerprints.The threshold of fingerprint similarity is the maximum intrinsic fingerprint granularity error.Figure 3 shows the positions distribution of the similar fingerprints at the 73rd sampling point with 0 ∘ orientation, 40th sampling point with 90 ∘ orientation, 51st sampling point with 270 ∘ orientation and 59th sampling point with 180 ∘ orientation.The 42nd position distributions are the solid circles in Figure 3, where most of the similar fingerprint positions are close to the 42nd sampling point, and just a few positions are far away.These few outliers are exact cause to produce fingerprint fuzzy similarity; that is, the corresponding positions of similar fingerprints have a large distance, which will lead to larger positioning errors.In addition, we find an obvious cluster feature of sampling points with similar fingerprints.That is, most sampling positions of similar fingerprints are close to each other.The other sampling points in Figure 3 also have a similar distribution feature.Therefore, these outliers with fuzzy similarity in fingerprint database are the root cause of the larger errors.According to the cluster distribution feature of the positions corresponding to similar fingerprints, we try to solve the larger errors problem by clustering method.

Removing Fingerprint Fuzzy Similarity
At present, many researches focus on the design of online matching algorithm and filtering optimization algorithm to improve the fingerprint positioning accuracy.The filter can smoothly fit positioning results according to the previous results to avoid the larger deviation.But if there are larger positioning errors, the filtering and fitting will lose efficacy.
Based on cluster features of similar fingerprints, we provide a -Means+ method to cluster the offline similar fingerprints, and we also design a linear sequence matching algorithm to locate the outliers position so as to increase the positioning accuracy.
where  is the cluster set, which is described as  = ( 1 ,  2 , . . .,   ).  is the center of the th cluster.The traditional -Means algorithm is a dynamic iterative clustering algorithm, but the parameter  must be determined in advance.In the practical application, the  value is difficult to determine and this will directly affect the result of the clustering algorithm.But similar fingerprints positions have significant regional features.So we can obtain the initial  value according to indoor positioning area.Meanwhile, the traditional -Means algorithm is sensitive to the outliers of clusters.From Figure 3, if two fingerprints have a fuzzy similarity, it will lead to a larger error to compute the center points of clusters.Therefore, we design the -Means+ algorithm to use the center sampling point instead of the average fingerprint vector as the center of the cluster.Meanwhile, the clustering criterion function uses the maximum inherent fingerprint granularity error  max instead of the least sum of squared error as clustering decision threshold, which will reduce the iterative times of traditional -Means.The -Means+ algorithm is described as follows.
Step 1. Divide the indoor corridors according to a fixed length , and compute the segmentation number .We can obtain  clusters, where the initial cluster center is the center position of each segmentation.
Step 2. Compute the fingerprint granularity between each sampling point and the cluster center, and add the less than  max sampling point to the corresponding cluster.
Step 3. Recompute the center of all sampling points in each cluster.
Step 4. Set the cluster radius as /2, and repeat Steps 2-3 so the cluster center is in a stable position range or achieves the iteration threshold.
It is important to note that when repeating Step 2, we do not use  max to cluster sampling point, but decide a sampling point to join the cluster by judging whether the distance upper bound between sampling point and cluster center is more than the given cluster radius /2 in Step 4. This is conducive to reducing the iterative times.Therefore, the algorithm has a low time complexity of (), where  is the number of fingerprint samples,  is the number of International Journal of Distributed Sensor Networks (1) / * Initial position detection * / (2) (, )/ *  is the consecutive positioning results set,  is the sequence * / (3) / * Sequence increase or decrease * / (4) for all ( ∈   ) do (5) / *   is the candidate position set * / (6) if (  ≤   + /2) then (7) IndirectAdd(, ) (8) end if (9) if (  >   + /2) and (  <   ) then (10) DirectAdd(, ) (11) end if (12) end for (13) return  Algorithm 2: Linear sequence generating.clusters, and  is the iterative times.The -Means+ algorithm is highly suitable for processing large amount of offline fingerprint data.The essential choice of  directly determines the clustering number of  values but is also related to whether the -Means+ algorithm can guarantee all clusters to cover indoor positioning area.Therefore, the -Means+ algorithm presents a  value calculation method. value is the physical diameter of similar fingerprint clusters.We need to randomly select   sampling points as the cluster centers and execute once Steps 2 and 3 of -Means+ method to construct similar fingerprint clusters.The average value of  can be computed based on the clusters from   sampling points, and then we can compute the clustering number  based on the average .According to the precalculated  values, the -Means+ algorithm will obtain the offline similar fingerprint clusters.
After clustering the offline fingerprint, we adopt a layered NN matching algorithm for positioning.First, we match the sampling real-time fingerprint vector with the fingerprints of the cluster centers.Then we run the exact matching in clusters.The layered NN matching algorithm can reduce the matching computational overhead and obtain more accurate positioning result.However, this clustering method cannot solve outliers positioning.When computing the outliers position, it will be false matching to other clusters due to the fact that the outliers do not belong to any cluster.Although the outliers are fewer, they also cause larger positioning error.We further propose a linear sequence matching method to replace the traditional point matching.

Linear Sequence Matching.
Since the -Means+ algorithm ignores the influence of outlier data, the outlier sampling points are difficult to obtain the accurate positioning result.We propose a linear sequence matching algorithm to replace the traditional point matching.In the process of positioning, we record matching position sequence and set the length of the sequence as .Since the offline fingerprint cluster covers a larger area, two adjacent positioning intervals usually do not exceed the scope of the cluster.Sequence generating process is described in Algorithm 2.
Step 1 (initial position detection).When there are  consecutive positioning results in the same cluster, we can determine the initial position in the current cluster.We use the NN matching method to obtain the precise initial position and take the current position number as the linear sequence header.Otherwise, we continue to execute the positioning and detection.The  is an experimental experience value.
Step 2 (sequence increase or decrease).The initial length of the sequence is 0. The position number will constantly add to the sequence with the fingerprint matching until the length of sequence achieves , where two consecutive positions can exist in the same cluster.When a new position number adds to the sequence, it must obey the following rules.
(i) Compute the average length   of sequence segments by sliding average method.Set the Euclidean distance between the end position of the sequence and the cluster center as   and between the end position of the sequence and the outlier position as   .
(ii) If   ≤   + /2, we select the optimal fingerprint matching position in the cluster to insert in the sequence and delete the sequence header in order to keep the sequence length unchanged.
(iii) If   >   + /2 and   <   , we add the optimal matching outlier position to the sequence and delete the sequence header.In Figure 4, the dash path represents the numbers that have been deleted, and the solid path represents the current sequence.The same graph stands for the corresponding positions of similar fingerprints.For example, at the  point, the matching similar fingerprint cluster is represented by circle.The cluster center is away from the sequence end point over   + /2, and the outlier circle P can fulfill   <   .So the outlier point  adds to the sequence.
(iv) Otherwise, keep the sequence unchanged until next positioning is completed.Step 3 (matching).The end position of the sequence is the current positioning result.
The linear sequence matching method considers both physical distance and fingerprint distance to avoid the outlier failure problem of -means+ matching algorithm.However, the accuracy of initial position will affect the correctness of the increasing sequence.We will verify the rationality of initial position selection by the experimental method.

Experiment Design.
In order to verify the performance of the linear sequence -Means+ clustering matching algorithm, we still adopt the 608 fingerprint records in Figure 1 with a 38 m * 26 m open office area.We first analyze the impacts of different cluster diameter  on the fingerprint granularity and positioning accuracy and find the optimal  value is mean diameter of clusters generated by maximum intrinsic fingerprint granularity error, which has better clustering performance and reduces the outlier to improve the positioning accuracy.Secondly, we further validate the rationality of selection method of initial position through experiments.Finally, according to the different sequence lengths, we verify the influence of the linear sequence matching algorithm on positioning accuracy.

Performance Evaluation.
Before executing -Means+ method, we need to select   sampling points to compute the cluster diameter .In our testbed, 5 sparse sampling points will be selected and 8 APs are deployed where the maximum intrinsic fingerprint granularity error  max is also 28 db.Based on  max , we can compute the average diameter of 5 sampling clusters as 6.4 m.We take 6.4 m as the optimal cluster diameter.To verify the selection of the optimal cluster diameter, we set  as 3 m, 6.4 m, and 9 m, respectively, to compare the fingerprint granularity and positioning accuracy.Based on the  value, we first compute the  value as 42, 19, and 10.Then we execute the -Means+ algorithm to obtain all similar fingerprint clusters where the outliers will be removed.Based on the fingerprint data holding the bottom position with 0 ∘ orientations and holding the upper position with 90 ∘ orientations, we recompute the fingerprint granularity distribution according to different  value.As shown in  Figure 5, we can find that similar fingerprints will reduce with increasing  value, which is because the outliers with fuzzy similarity are deleted.Meanwhile, the clustered fingerprint granularity between  = 42 and  = 19 is very close.This is because our -means+ algorithm has no more iterations.We just use the maximum intrinsic fingerprint granularity error  max to construct similar fingerprint clusters.Although  value is 42, the new cluster diameter computed by -means+ algorithm will be close to 6.4 m, which leads to many overlapping clusters.So the cluster numbers between  = 42 and  = 19 are almost the same and the fingerprint granularity distribution is similar.When  = 10, the cluster numbers will obviously decrease.But the cluster diameter is still 6.4 m around -Means+ algorithm.So the clusters cannot cover the whole positioning area, and many positions will be deleted as outliers lead to fewer similar fingerprints.
We select 30 sampling points of the 76 points in Figure 1 to execute 5 times -means+ algorithm to compare the positioning errors.In Figure 6, when  = 42 and  = 19, the outliers reduce significantly since the clusters almost cover the whole positioning area.So the larger errors are less than those of  = 10.But there are still some position errors nearly 6 m.This is because the deleted outlier positions can not be positioned.When  = 10, the clustering results cannot cover all positioning areas.Although similar fingerprints reduce, the positioning performance has not seen any improvement.
When analyzing the influence of different clusters  on fingerprint granularity and positioning accuracy, there are similar performances between  = 42 and  = 19.The positioning error increases under  = 10 although the number of similar fingerprints has declined.To obtain a tradeoff between position performance and clustering overhead, we select  = 19 as the experimental value to execute following tests.For the problem of the larger errors after -Means+ clustering, we further verify the performance of the linear sequence matching algorithm.The initial position of the linear sequence is important to the matching performance.Generally, user cannot move out of the clustering range immediately.According to the above test results, the cluster diameter is about 6.4 m, while the user stride is around 1.5 m.Therefore, successive positioning results are hardly over the range of the same cluster.In the following, we verify whether the adjacent  positioning results belong to the same cluster or not, where the  is set to 2, 3, and 4. For each trial, we measure 25 groups of the positioning results and analyze the probability and accuracy located in the same cluster.As shown in Figure 7, the initial position of the linear sequence can be determined once with 96 percent and 80 percent, respectively, when  is 2 or 3.But the probability will be down to 20 percent when  is 4.This is because more successive positions will be beyond the range of the cluster.Although the probability located in the same cluster achieves 100 percent, it needs more positioning tests and is time-consuming.Considering the probability and accuracy synthetically, we set  = 3 to determine the initial position of the linear sequence in our experiment.
Based on above clustering results, we further verify the influence of linear sequence length on outlier positioning.The sequence length will affect the average sequence segments   value.After defining  as 2, 4, and 6, respectively, we analyze the positioning errors of outliers.We select 5 outlier positions with 4 different orientations to execute linear sequence matching according to above 3 groups of sequence lengths.The matching performance is illustrated in Figure 8.
Figure 8(a) describes the matching accuracy of outliers.The sequence length has less impact on matching accuracy.With the increasing of sequence length, the matching performance has a little improvement.Figure 8(b) represents the average positioning error with wrong matching.The outlier positioning error will reduce while the sequence length increases.But the error falling is not distinct and ranges from around 2 m to 4 m.To achieve the higher positioning accuracy, the sequence length with 6 is a better choice.So we select  = 6 to further verify the linear sequence matching performance.
To verify the performance of our fuzzy similarity fingerprint eliminating algorithm, we compare the classical Radar, 2 peers assisted, and -Means+, linear sequence matching methods in Figure 9, which are based on the fingerprint data holding the bottom position with 0 ∘ and 90 ∘ orientations.We select 30 sampling points of the 76 points to execute 5 times aforementioned algorithms.We find the median error of our -Means+ method is within 2 m and the 90 percent positioning errors are within 3 m, which is better than 4 m in Radar algorithm and 3.6 m in peers assisted algorithm.Particularly, larger errors have reduced significantly.The Radar algorithm can achieve 8 m maximum error, while -Means+ algorithm can lower it to around 6 m.After linear sequence matching, the maximum error will be decreased to 5 m, which is also superior to maximum error of peer assisted algorithm with two phones.This is because the relative ranging in peer assisted algorithm hardly constrains the positioning outliers under human blocking.The proportion of larger errors in positioning results also reduces significantly.The larger errors over 4 m of the Radar method account for 10 percent, while -Means+ algorithm will reduce it to 4 percent, and then the linear sequence matching will further reduce it to 2 percent.

Conclusion
Fingerprint positioning is the foundation of indoor location services, which has been widely studied.We analyzed the corresponding position distribution of similar fingerprints and then found the larger errors problem.Through analyzing the corresponding positions distribution of the similar fingerprints, we also found fuzzy similarity is the root cause of larger errors.According to the cluster features of similar fingerprints, we proposed a -Means+ clustering algorithm to achieve fine-grained fingerprint positioning.Due to the -Means+ algorithm failing to locate the positions of outliers, we also designed a linear sequence matching algorithm to

Figure 1 :
Figure 1: The distribution of sampling points in the experiment.

Figure 2 :
Figure 2: Influences of orientations and holding positions.

15 XFigure 3 :
Figure 3: The distribution of similar fingerprints at four sampling points.

Figure 4 :
Figure 4: A diagram of generating linear sequence.

Figure 5 :
Figure 5: Fingerprint granularity distribution under different numbers of clusters.

Figure 6 :Figure 7 :
Figure 6: Positioning errors under different numbers of clusters.
4.1.Clustering Offline Fingerprints.The traditional -Meansalgorithm is a classic machine learning method based on samples similarity measurement.The criterion function is usually the least sum of squared error.Suppose an dimensional real vector  = ( 1 ,  2 , . . .,   ) to describe the fingerprint sample; the fingerprint similarity is presented by the Euclidean distance of fingerprint vectors.The traditional -Means algorithm divides the sample set into  clusters according to the preset parameter .