Robust Sub-Meter Level Indoor Localization With a Single WiFi Access Point-Regression Versus Classification

Precise indoor localization is an increasingly demanding requirement for various emerging applications, like Virtual/Augmented reality and personalized advertising. Current indoor environments are equipped with pluralities of WiFi access points (APs), whose deployment is expected to be massive in the future enabling highly precise localization approaches. Though the conventional model-based localization schemes have achieved sub-meter level accuracy by fusing multiple channel state information (CSI) observations, the corresponding computational overhead is usually significant, especially in the current multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. In order to address this issue, model-free localization techniques using deep learning frameworks have been lately proposed, where mainly classification methods were applied. In this paper, instead of classification based mechanism, we propose a logistic regression based scheme with the deep learning framework, combined with Cram\'er-Rao lower bound (CRLB) assisted robust training, which achieves more robust sub-meter level accuracy (0.97m median distance error) in the standard laboratory environment and maintains reasonable online prediction overhead under the single WiFi AP settings.


I. INTRODUCTION
Precise indoor localization, a raising demand from our daily lives, brings a brand-new navigation experience [2] in modern shopping malls or exhibition halls. Since the traditional outdoor positioning systems, such as Global Navigation Satellite Systems (GNSS) [3], suffers from the satellite signal blocking effect, some newly deployed infrastructure is often required to achieve high resolution indoor localization accuracy. Typical examples include sound or ultrasonic collection systems [4], Bluetooth Low Energy (BLE) systems [5], radio frequency identification (RFID) receivers and tags [6], infrared equipment [7], or even hybrid of them.
Due to the extreme low deployment cost, WiFi access points (APs) are massively deployed in the indoor environment for information transferring, and recently utilized to perform high resolution indoor localization as illustrated in [8]- [12]. Compared with aforementioned techniques, the additional deployment cost is usually negligible, and the main challenges nowadays are the high accurate localization algorithms. Among the existing approaches, fingerprintbased schemes have been proven to be an effective solution, where the intrinsic features of WiFi signals are extracted in the training stage and utilized in the operating stage to predict the location through real time measured signals. "HORUS", a typical fingerprint-based localization system, has been proposed in [8], which relies on the received signal strength indication (RSSI) to generate signal features. More accurate localization schemes have been proposed in [9]- [12], where the real time channel state information (CSI) are measured and processed instead to improve the localization accuracy. For example, SpotFi [9] and Chronos [10] extract the propagation parameters from CSIs, including angle of departure (AOD), angle of arrival (AOA), and time of flight (TOF) information, to compute the relative locations from the reference APs. Another common approach establishes a probabilistic model between the collected CSIs and the candidate locations through some classifiers such as deterministic k-nearest neighbor (KNN) clustering and probabilistic Bayes rule algorithms [11], [12]. The above fingerprintbased solutions are able to achieve sub-meter level accuracy if CSIs from multiple APs [9], multiple frequency bands [10] or multiple antennas [12] can be fused together. However, the corresponding computational overhead during the offline modeling and online feature extraction is usually significant as shown in [13].
Apart from the above model-based approaches, the modelfree localization schemes have also been widely investigated during recent years, especially after the deep learning technique has been invented. With the controllable online prediction overhead, the model-free localization approach can directly estimate the corresponding position in the operating/online stage based on the observed and learned relations between the collected CSIs and the labelled locations in the training/offline stage. Typical classification algorithms, including restricted Boltzmann machine (RBM) [13], [14], convolutional neural networks (CNN) [15], deep residual networks (ResNet) [16], have been applied to exploit CSI features and classify to different reference positions (RPs) with certain probability. The resultant localization accuracy, after fusing the classification results together, can be significantly improved if compared with the conventional modelbased approaches, which ranges from 1.78m to 0.89m in terms of median distance error (MDE) [13]- [16]. The modelfree localization scheme partially solves the computational complexity issue, while the localization accuracy is still insufficient for many indoor applications especially when the infrastructure is insufficient. In this paper, we consider a standard laboratory environment with single WiFi AP settings and propose to use a logistic regression based solution [17] instead of using the commonly adopted classification based scheme. Since the regression based scheme can directly model the continuous localization function, it is capable of achieving sub-meter level accuracy (0.97m MDE) in 8m × 6m room space. In addition, based on the proposed framework, we derive the lower bound of localization errors using Cramér-Rao lower bound (CRLB) [18] and figure out that a small perturbation in the training stage can eventually help us to reduce the localization errors. We hope the proposed logistic regression based framework can shed some light on the model-free as well as the model-based localization techniques and pave the way for the deep learning based localization algorithms in the practical WiFi MIMO-OFDM systems. The main contributions of this paper are listed below.
• Regression versus Classification. A straightforward idea to solve the localization problems using deep learning is to extract the features in the operating stage and compare with the pre-collected features of RPs in the training stage. A classification process is then applied to calculate the similarity with respect to different predefined RPs. This approach greatly reduces the computational resources for online feature comparison, while the corresponding localization accuracy will be affected due to the limited training space offered by the finite number of RPs. To achieve a better trade-off between the localization accuracy and the online inference capability, a reasonable approach is to expand the finite location set of RPs to the continuous set of the entire room space, where a logistic regression method can be applied. • Unified Optimization Framework. To provide a detailed understanding of the proposed scheme, we establish a general mapping relationship between the real time measured CSI and the corresponding locations according to the parametric system model. On this basis, we introduce a unified optimization framework to formulate the localization problems using WiFi fingerprints, including both classification and regression based formulations. Based on that, we explain why the logistic regression based approach achieves better localization accuracy than the traditional classification based approaches, and discuss the potential impacts with different system configurations. • CRLB Assisted Robust Training. Based on the proposed framework, we conduct extensive CRLB analysis to obtain an in-depth understanding of the localization errors in the proposed system. In addition, we show through CRLB analysis that a small perturbation in the training stage can help to accommodate the randomness induced by temporal spatial variation, which eventually improves the robustness of the proposed scheme. Therefore, a more robust training strategy is to construct the training dataset for each RP using collected CSIs from this RP and its neighboring areas. As we show through extensive numerical experiments, the proposed CRLB assisted robust training method is able to improve the localization accuracy about 30%, if compared with the conventional training strategies.
The rest of paper is organized as follows. In Section II, we provide some background information regarding the channel model and the mapping functions. The regression based localization formulation is discussed in Section III and the corresponding CRLB derivation is provided in Section IV. We propose the classification and logistic regression based solutions in Section V and present our experimental results in Section VI. Finally, we conclude this paper in Section VII.

II. PRELIMINARIES
In this section, we introduce a multipath MIMO-OFDM channel model for the indoor localization environment and then establish the relationship between the channel fingerprints and location information.

A. CHANNEL MODEL
Consider a MIMO-OFDM system with N T × N R antenna configuration 1 as shown in Fig. 1, where N T and N R represents the number of the transmitted antennas and received antennas. The received signal at the i th subcarrier and the n th OFDM symbol can be modeled through, where L is the target location, y i (L, n) ∈ C N R , x i (L, n) ∈ C N T denote the received and transmitted signal, and n i (L, n) ∈ C N R denotes the additive white Gaussian noise, respectively. H i (L, n) ∈ C N R ×N T denotes the collected the corresponding CSI and the overall aggregated channel response H(L, n) ∈ C N T ×N R ×Nsc is given by, For illustration purpose, we assume uniform linear arrays (ULA) with inter-antenna spacing d are equipped at the transmitter and receiver sides, and all the antenna arrays share the same plane as illustrated in Fig. 1. In addition, we consider K + 1 multiple fading paths in this environment, and the channel responses H i (L, n) is assumed to remain 1 The proposed approach is equally applicable to single antenna users by extending the received signals to N R copies. constant during the transmission of the n th OFDM symbol, which is given by [19], where the angular domain correlation matrices, A R/T,i (L) ∈ C N R/T ×(K+1) , are defined as, As shown in Fig. 1, we define θ T,k , θ R,k , d k and L s k as the AOD, AOA, path length and the unknown scatterer location of the k th path 2 . Then we have d 1 , where h k is the channel coefficient of the k th path and T s is the sampling period.

B. FUNCTION MAPPING RELATIONSHIP
Due to the multipath effect, the channel matrix H(L, n) is also affected by the indoor environment (objects placement or humans movement), except for the location L. That is a certain location L corresponds to a channel state matrix set {H(L, n)} as its fingerprints, which is kind of one-to-many function mapping relationship. We denote the relationship between fingerprints H(L, n) and the location L as function f (·), which is expressed as, where N denotes the total number of OFDM symbols in each localization positions. It is a natural question to ask if H(L, n) can be mapped to a certain location L and what is the condition of the inverse function exists.
Hypothesis 1 When the perfect channel knowledge H(L, n) is obtained, the observation CSI can be mapped to a certain position.
Proof of Hypothesis 1: Please refer to Appendix A for the proof.
In the conventional model-based approach, we are supposed to find the closed-form relationship between H(L, n) and L by exploiting AOA and TOF features, which is generally complicated. While in the model-free based scheme, we directly figure out the characteristics of the localization function g(·) and propose a better approximation by learning the collected CSIs and locations in the training stage.
We make the following assumptions in the rest of the paper. Firstly, we assume availability of perfect CSI measurements leaving extensions to imperfect CSI, due to e.g. imperfect hardware components and limited pilot power, as well as low noise cases for future work. Secondly, we only collect the CSIs from some discrete RPs, instead of sampling the entire fading environments, to control the deployment complexity. Thirdly, we assume that the re-reflected signal (secondary reflection) is too weak to be considered in the channel model. Last but not least, since the mathematical representation of median distance error is in general complicated, we define mean distance error (MDE) [20] instead as the performance measurement in the training stage, as well as the loss function design in the training stage.

III. PROBLEM FORMULATION
In this section, we apply a general optimization framework to describe the localization problem. Denote L m andL m to be the ground-true and the predicted locations of the m th target respectively, and the corresponding MDE performance over M sampling positions is given by 1 where · 2 represents the vector l 2 norm as defined in [17] For illustration convenience, we denote the inverse function of f (·) to be g(·), i.e., g(·) = f −1 (·), and the mathematical expression for the location estimation is defined by, With the above notation, we can describe the MDE minimization problem using the following optimization framework.
where A represents the feasible indoor localization areas and n m ∈ [1, N ] denotes the duration of the m th localization period with N observed OFDM symbols.
Since the above minimization needs to be evaluated over all the possible choices of functions g(·), conventional classification based approaches decompose the original problem into two stages, where it computes the likelihood functions with respect to several RPs in the first stage and simply applies some basic fusion techniques to obtain the final results in the second stage. The corresponding mathematical formulation is given below.
where L RP and p m (n m ) denote the collected locations of all the possible RPs and the likelihood distribution with respect to L RP based on the n th m OFDM symbol, respectively. N RP represents the number of RPs during the localization process 3 .
In the formulation of Problem 2, g(·) has been decomposed into two simplified functions, g 1 (·) and g 2 (·), and the existing literature focuses on modeling g 2 (·) as a typical classification problem. g 1 (·) usually adopts the mathematical average operation or some Kalman filtering [21] based techniques to fuse multiple classification results together. Through this approach, the searching space of candidate location set as well as the corresponding computational complexity can be greatly reduced, e.g. from all feasible location area A as defined in Problem 1 to L RP with finite dimension, N RP . However, the above decomposition approach sacrifices the localization accuracy by enforcing the candidate location set to be finite RPs and their trivial combinations. A more reasonable approach is to directly model the function g (·) using the logistic regression concept [17], where we formulate Problem 3 to approximate the original non-convex function g (·) for MDE minimization using logistic regression.

Problem 3 (Regression based Localization)
where g LR (·) denotes the associated regression function.
Since we can search over a larger optimization space of the function g(·), the logistic regression based scheme shall be able to achieve better localization accuracy than the classification based scheme. To control the potential processing complexity for evaluating different functions of g(·), traditional schemes usually rely on the Gaussian regression, which fit the approximation function by calculating means and variances. However, the Gaussian regression approach has the robustness issue and a model-free deep learning based localization approach is more preferable as elaborated in [22]. To control the potential deployment complexity associated, we further tighten the constraint (9) and have, Kindly note that the above approximation can be improved when the number of RPs, N RP , increases, which actually provides a meaningful trade-off between the implementation complexity and the localization accuracy 4 .

IV. CRLB FOR LOCALIZATION ERROR
In this section, we present the CRLB of the formulated problem in Section III, which is widely used to derive a lower bound on the variance of unbiased estimators [23]. Fisher Information Matrix (FIM) as defined in [24] is utilized to evaluate the CRLB of localization error and the effect of perturbation is analyzed in what follows.

A. CRLB OF LOCALIZATION ERROR
In the above formulation, the position of the m th location corresponds to a 2-D location coordinate 5 , e.g., be the collections of unknown parameters for K + 1 channel fading paths. For the k th fading path, η T k ∈ R Nη represents N η unknown fading parameters, including delay, angles and channel coefficients. Mathematically, it can be expressed as where θ k = [θ T,k , θ R,k ] T is the collections of AoD and AoA, and h k = [h R,k , h I,k ] T contains the real and imaginary parts of the channel coefficients. If we defineη as the unbiased estimator of η, the associated covariance matrix, Cov(η), is given by, where J η is the FIM for η. According to the definition of FIM [24], J η can be computed via, where ψ(η k , η k ) can be defined as, According to [25], the likelihood function of the received signal y conditioned on η, p(y|η), can be rewritten as, 4 When the number of RPs, N RP , tends to infinity, we are able to characterize the function g (·) in target area A with probability 1 by learning the function g LR (·). However, in the practical systems, we observe that the localization accuracy saturates when N RP exceeds some threshold value. 5 We focus on 2-D position case in this paper, Lm = (xm, ym, zm) ∈ i (L, n)x i (L, n) and ∝ denotes equality up to irrelevant constants.
To obtain the CRLB of the proposed localization scheme, we transform the parameter vector η toη, which includes the locations of scatters for K fading paths, {s k }, and the modified parameter vectorη can be obtained through, T is the modified parameter vector for the direct path. Meanwhile, the FIM forη, Jη, can be obtained by multiplying the transformation matrix T, which gives Based on this transformation, the associated covariance error matrix for the target location L m can be bounded as, where [·] 2×2 denotes the projection operation to the 2 × 2 upper left sub-matrix. As a result, the CRLB of localization error using the proposed scheme can be obtained through the following theorem.
Proof of Theorem 1: Please refer to Appendix B for the proof.

B. DATA AUGMENTATION WITH PERTURBATION
As mentioned before, a straightforward approach to improve the localization accuracy is to increase the number of N RP , which generally requires careful measurement and labeling procedures in the offline training stage. In order to control the deployment complexity, we propose to use a perturbation based data augmentation scheme, where the perturbation distance ∆L 2 is much smaller than the localization distance L m 2 , i.e., ∆L 2 L m 2 . Based on this perturbation, we can re-derive the CRLB of localization error as summarized below.
Proposition 1 (CRLB with Perturbation) The CRLB of localization error with perturbation is denoted as, where J η can be calculated through (22), and J ∆L is defined in (C.5).
Proof of Proposition 1: Please refer to Appendix C for VOLUME 4, 2016 the proof.
Since tr J −1

∆L,p 2×2
≤ 0 holds, then we have ηp ≤ η , that is the operation of perturbation makes lower CRLB. Therefore the MDE minimization optimization problem is rewritten as follows.
Problem 4 (Logistic Regression with Augmentation) where α ∈ [−1, 1] denotes a fine-tuning coefficient and will be determined in the training stage.
With the above formulation, we numerically evaluate the CRLB to see how the position perturbation affect the CRLB and the corresponding numerical results are given in Section VI-D.

V. DEEP LEARNING BASED SOLUTION
In this section, we consider adopting classification and logistic regression based approaches to minimize the MDE mentioned in Section III and design the neural network architecture for each of them. However, the corresponding difficulties are obvious. Firstly, under the situation of fluctuated wireless environments, the original sampled channel states contain unknown random noises, which may significantly degrade the estimation accuracy provided by neural networks. Secondly, the design methodology for logistic regression based localization is still unclear according to the existing literature. Last but not least, the deep learning based scheme usually requires huge amount of data to train the neural network parameters, which may incur significant overhead in the practical deployment. To address the above three challenges, we will introduce the proposed localization scheme in detail in this section.

A. DATA COLLECTION AND CLEANING
Network Interface Cards (NICs) like Qualcomm Atheros AR series and Intel 5300 Nics make it possible to collect CSI data. Linux 802.11n CSI Tool [26] is the most widely used among the major CSI measurement tools. Consider a WiFi localization system as illustrated in Fig. 2, where the localization entity is a laptop equipped with Intel 5300 network interface card (NIC) and multiple receive antennas. The localization entity is working based on the real time receiving WiFi signals from an off-the-shelf AP with multiple transmit antennas. Rather than generate data from numerical simulations, we collect the CSI data under 5

FIGURE 2:
The System Architecture, the whole process can be divided into offline phase and online phase, the amplitude and phase of collected CSI data are used as training and test data.
symbol basis with duration 3.2µs according to IEEE 802.11n standards. The training dataset contains 60000 6 transmitted packets and the packet interval is 4 ms, that is various channel situations of 4 minutes duration in the experimental environment are logged in the training dataset. Furthermore, CSI data extracted by the Linux 802.11n CSI Tool is transformed into polar coordinates for convenient data processing, i.e. h i (L, n) = |h i (L, n)|e jθi(L,n) , where |h i (L, n)| and θ i (L, n) denote the corresponding amplitude and phase information respectively and j represents the imaginary unit.
In the practical systems, the measured phase information, e.g.θ i (L, n) for subcarrier i cannot be directly used for high accurate localization due to random jitters and noises caused by imperfect hardware components. In order to eliminate this effect, we adopt the common phase calibration algorithm proposed in [27], and then obtain, where N F F T denotes the size of Fast Fourier Transform 6 To accelerate the model training, we install Keras on our server with Intel(R) Xeon(R) CPU E5-3680 and NVIDIA Tesla P100 GPU. 6 VOLUME 4, 2016 (FFT) 7 , δ means the time lag at the receiver side, and Z is unknown random measurement noise.

B. NEURAL NETWORK FOR CLASSIFICATION
As mentioned in Problem 2, the localization accuracy rely on the accuracy of p m (n m ) in such a classification problem. Common classification based deep learning neural network structures like MLP and CNN, are designed in this part to obtain the best approximation of p m (n m ) and improve the final localization results.
Softmax function [28] is chosen as activation function at the output layer, which maps the output tensor values a = [a 1 , a 2 , · · · , a i , · · · , a N RP ] into the normalized prediction possibility p m (n m ) in interval (0,1). The process can be described as, where p m,i is the i th element of the vector p m (n m ). We also utilize cross-entropy as the loss function to measure the difference between the output normalized prediction p m,i (n m ) and the true label vector l m,i (n m ), which has proven to be a valid loss function for classification neural network [29]. It can be written as, where l m,i (n m ) is the true label data for the i th RP location. Additionally, we train the parameters of deep neural networks with Stochastic Gradient Descent (SGD) method [30] to minimize the loss function. In the online test phase, a probabilistic method utilize the estimated p m (n m ) to obtain the final estimation locationL m as mentioned in Problem 2.

C. NEURAL NETWORK FOR REGRESSION
Our target is to find a better approximation of the nonconvex function g LR (·) by the logistic regression based approach, so we choose the aggregated channel information, H(L) = H(L, 1), . . . , H(L, N )] ∈ C N ×N R ×Nsc , and the localization results,L m ∈ R 1×2 , to be the input and output matrices/vectors of neural networks respectively, and carefully select the loss function L to be the original definition of g LR (·),which is given by 8 , Kindly note that this type of loss function is quite different from cross entropy [29] mentioned in classification based approach, which is often applied to describe the difference between the classification test results and the distribution of 7 Linux 802.11n CSI Tool is designed according to IEEE 802.11n protocol, and the FFT size is 64. 8   ground truth results. By minimizing the MDE with respect to the RPs, the logistic regression based approach can gradually converge to the non-convex function g LR (·) with satisfied performance via machine learning.
Additionally, to improve the identification and representation capability of neural networks, we exploit two classical neural networks with deeper structures, e.g., Multi-Layer Perception (MLP) and Convolutional Neural Networks (CNN), as shown in Fig. 3. Compared with the fully connection structure of MLP networks, the convolution layers in CNN provide the feasibility to extract the features from CSIs across the time and frequency domains, which may be more suitable for wireless fading environments as demonstrated in Section VI. To avoid the biasing effects caused by unusual samples, we apply the max pooling technique to deal with unimportant features and adopt the dropout technique [31] to further reduce the unimportant connections in the neural networks. Meanwhile, to avoid the gradient vanishing problem, we also apply rectified linear unit (ReLU) [32] as the non-linear activation functions in each hidden layers. The detailed configuration and parameters for both classification and regression networks are listed in Table 1.

D. OUTLIERS REMOVAL
Due to measurement uncertainty of online test signals, the matching process usually leads to a geographically dispersed set of test results {L m }, resulting in unsatisfactory localization accuracy. Hence, the outlier points of the observations may exist in the set {L m }, which is distant from other observations in statistical sense. In this paper, we proposed a outliers removal scheme to rule out the outlier points, which are far away from clustering set center in the decision process. We denote (x,ȳ) as the average point of set {L m }, and std x , std y as the standard deviation of x, y, which is VOLUME 4, 2016 where N gs is the sample group size. If the following conditions, are held, then (x k , y k ) is considered as a outlier point and removed from the set {L m }, in which δ th > 0 is the designed rejection threshold.

VI. EXPERIMENT RESULTS
In this section, we provide some numerical results to show the effectiveness of the proposed logistic regression based approach for indoor localization. More specifically, we compare the proposed scheme with two baseline systems, e.g., Baseline 1: KNN based localization scheme and Baseline 2: classification based localization scheme with MLP architecture. We verify the proposed logistic regression based localization scheme in both laboratory and corridor environment, where the layout of testing scenarios are shown in Fig. 4. With laboratory equipment, furniture, and people movements in the real situation, the tested wireless fading conditions cover most of the daily indoor scenarios with mixed LOS and NLOS paths.

A. LOGISTIC REGRESSION VS. CLASSIFICATION
In the first experiment, we compared the proposed regression scheme with the above baselines by measuring the cumulative distribution function (CDF) of distance error in the laboratory scenario, as well as the corridor scenario. Fig. 5 describes CDF of the localization distance error during the operating stage. The proposed regression based algorithms show superior localization accuracy over conventional algorithms, including KNN based localization (Baseline 1) and Table  Table  Table  Table   Table  Table  Table  Bookshelf Table classification based localization (Baseline 2) for both two cases. By comparing MLP-based approach (blue solid curves) and CNN-based approach (black solid curves), the latter one achieves the median errors of 1.42m and 1.43m for the laboratory and corridor scenarios respectively, which shows better localization accuracy than the former one (1.67m in laboratory case and 1.51m in corridor case). This is due to the fact that CNN-based approach is able to capture the time domain correlations of multiple OFDM symbols, while MLP-based approach only focuses on extracting the common features among all the observations. Kindly note that due to the similar numbers of parameters, the time complexity for Baseline 2, MLP and CNN based methods is similar as well 9 , which is 0.11s, 0.11s, 0.12s for each test.

B. EFFECT OF SYSTEM CONFIGURATION
In this experiment, we investigate on the effect of system configurations, like the APs number and grid size, which will directly affect deployment cost and localization accuracy. We would like to find a practical tradeoff between deployment cost and localization accuracy, and explore the most efficient deployment setting for our system. In the above experiments, the distance between the adjacent RPs each is set as 1.2m and only one WiFi AP is deployed. Hence, we add a new AP placed at a different corner of the laboratory and add different grid size of 0.6m and 1.8m as supplementary. The location error results of different system configurations are illustrated in Fig. 6. We find that the average distance error of each APs are 1.42m (green curves) and 1.51m respectively, and up to 1.10m when 9 KNN based method (Baseline 1) will cost much more time than others, due to the absence of the training process. using both of them 10 , improving 23% compared with using single AP. Furthermore, using two APs can effectively reduce the possibility of large location error more than 3m, which greatly improve the system reliability. We leave the question that how to better use the relationship between two APs for the future work, rather than simply taking the average results from multiple APs. The median location accuracy of different grid size with 1.8m, 1.2m and 0.6m settings are 1.86m, 1.42m and 0.61m respectively. It is worth noting that when the size is changed from 1.2m to 0.6m, the localization accuracy improve 57% at the cost of four times of data collection and labeling. We conclude that the median distance errors are close to the corresponding grid size. It means that the operators can choose the deployment grid size according to the accuracy that he wants to achieve, which has a certain guiding significance for the actual deployment.

C. EFFECT OF OUTLIERS REMOVAL
The above CDF plots are drawn according to the results of each online test, without any post processing techniques. In this part, the proposed outliers removal scheme is used to rule out the abnormal test results and bring out much better localization accuracy. Two boxplots 11 of location error in the laboratory scenario and corridor scenario are shown in Fig. 7. We make N gs equal to 10, 20, 50 as the group size of the test results in our system and rule out the abnormal coordinate results in each group. Eventually, the average results of each group are calculated to represent this group.
As shown in Fig. 7, we find that the maximum error of our localization system has been controlled within 4m if 10 continuous results taken as a group. Furthermore, if 20 or 50   continuous results are taken as a group, the maximum error is only about 1m, which greatly increases the reliability of our system. The location errors have been effectively reduced after using our outliers removal algorithm. The time cost of every fusion test are 1.27s, 2.54s, 6.35s for group size of 10, 20 and 50. Considering the time cost versus location accuracy, 20 is the most reasonable group size scheme among these three settings for our system. Therefore we can conclude that taking advantage of outlier removal techniques will bring better robustness and reliability of localization system.

D. EFFECT OF DATA AUGMENTATION
In order to verify the effectiveness of data augmentation with perturbation, we extend the original training dataset by adding some perturbed samples with ∆L less than 0.1m. We re-train the neural networks with the augmented dataset and redo the same experiments in the operating stage. The CDF of distance errors for different algorithms are illustrated in Fig. 8.
Under the effect of data augmentation, the positioning accuracy of both MLP-based approach (blue solid curves) and CNN-based approach (black solid curves) have been obviously improved, e.g., from 1.67m to 1.20m and from 1.42m to 0.97m for the median distance error in laboratory scenario, which corresponds to 28% and 32% improvement respectively. In the corridor scenario, the proposed data augmentation scheme shows the similar improvement on localization accuracy as well, which verifies the effectiveness of data augmentation with perturbation as mentioned in Section III.

VII. CONCLUSION
In this paper, we propose a logistic regression based localization scheme for WiFi systems. By applying a unified optimization framework, we compare it with the conventional classification based approaches, and derive the corresponding CRLB of the localization errors. Based on the analytical results, we find that using perturbations to extend the training dataset can improve the CRLB accordingly. Together with some outlier removal techniques, we show through numerical experiments that the proposed logistic regression based approach is shown to be effective for the localization accuracy improvement, which achieves robust sub-meter level MDE using a single WiFi AP. .

APPENDIX A PROOF OF HYPOTHESIS 1
To obtain the only determinable position, we borrow the idea of successive interference cancellation (SIC) technique [33] to estimate the signal component of each path step by step from the received time domain signal y(t). For example, we estimate the parameters η 0 from the original received signal y(t). After the first estimation, we reconstruct signal s 0 (t) using η 0 , where U (t) ∈ C N T is the step function. After cancelling s 0 (t) from y(t), the residual signal y 1 (t) is calculate as, Similarly, the second strongest signal s 1 (t) can be obtained from the y 1 (t) residual signal. We iterate this process until signals of each paths are separately estimated, stopping when the power of the residual signal y K (t) is below the radio noise range. The parameter η 0 estimated from the received signal of the LOS path s 0 (t), including AOA θ R,0 and the straight distance d 0 = c · τ 0 , as well as the AP location L 0 = (x 0 , y 0 ), help to estimate L m , which is expressed as, If the multi-path components are taken into consideration, L m can be rewritten as, k=0 (x 0 + d k,1 cos θ T,k − d k,2 cos θ R,k , y 0 + d k,1 sin θ T,k + d k,2 sin θ R,k ) T . (A.4)

APPENDIX B PROOF OF THEOREM 1
We obtain the entries of T from the geometry relations between the parameters η andη as illustrated in Fig. 1, which is given as follows. for k > 0, and the rest of entries in T are zero.

APPENDIX C PROOF OF PROPOSITION 1
Under the assumption of ∆L 2 L m 2 , we havê L m ≈L m + ∆L = g ({H(L m + ∆L, n m )}) , (C.1) that is we can collect more CSI samples without changing the location labels. Since the norm satisfies the triangle inequality, we have To explain the role of data augmentation, we re-derive the FIM and CRLB under the effect of the perturbation. We define L p = L m + ∆L is the position under perturbation distance ∆L 2 , and then the unknown channel parameter under perturbation is defined as, η p = η T p,0 ,η T p,1 , · · ·η T p,k , · · ·η T p,K T , (C.3) whereη p,0 = [L T m , h T 0 , ∆L T ] T , andη p,k = [s T k , h T k , ∆L T ] T for k > 0. Based on the above assumption of ∆L 2 L m 2 , the FIM forη p with the presence of random parameters is given by [34], where J ∆L can be calculated as, and hence the CRLB of localization error with perturbation is denoted as, ηp = tr J −1 η,p 2×2 . (C.6)