Channel state information based efficient database construction for indoor localisation

: The popularisation of fingerprinting localisation technology has been hindered because of two major hurdles: (i) the accuracy bottleneck caused by unreliable location fingerprints and (ii) the huge effort required to construct a fingerprints database (or radio map) for the targeted area. To tackle the two problems, the authors propose an effective solution in this work. First, they exploit channel state information, which is a parameter depicting the frequency response of each subchannel, to design the location fingerprint, striving to eliminate the interferences of the complex indoor environment. Second, they propose an efficient construction scheme leveraging the matrix completion theory to improve the calibration efficiency, and employ a Bayes rule-based fingerprint matching method to implement location estimation. Finally, they evaluate the authors’ localisation system in two typical scenarios, and the numerical results show that the proposal ensures superior performance while reducing the workload significantly.


Introduction
Indoor localisation has garnered much attention recently, following the increasing demand for location-based services (LBSs), such as logistical warehouse management. Unfortunately, due to the nonline-of-sight (NLoS) conditions in many indoor environments, wireless signals are often subject to various types of interference (such as multipath effects, shadowing etc.) [1], which makes the target location estimation in a room more challenging.
Currently, WiFi received signal strength indicator (RSSI)-based fingerprinting localisation has become a commonly used solution for two reasons: (i) compared with the range-based methods, the fingerprinting techniques are less affected by the NLoS condition; and (ii) capturing RSSI can be implemented on most terminals without extra devices [2]. However, the RSSI is a coarse-grained parameter at the packet level, which makes the mapping between the RSSI value and the transmission distance not very reliable [3]. Therefore, most RSSI-based systems struggle to achieve better results in terms of accuracy and robustness [4].
The location fingerprint is the cornerstone of the entire localisation system, and the perfect fingerprints should have low fluctuations at the same location, but with large differences at different locations [5]. Hence, to improve the performance fundamentally, it is necessary to find a more stable parameter to take over the RSSI, while channel state information (CSI) [6,7] is a very suitable alternative. CSI consists of frequency response sampling values from OFDM subcarriers, and it can be captured in the 802.11a/g/n WiFi environment using a network interface card (NIC) [7,8]. Compared with the packet-level RSSI, CSI depicts each physical layer and contains fine-grained channel parameters. Moreover, CSI records the spatial streams information of different antenna pairs in MIMO separately, by which the channel description is extended to a higher dimension and the channel features are further refined [9], and thus, using the CSI to design the fingerprint will create room for improved localisation performance. However, because the indoor maximum excess delay is about 500 ns, the low WiFi bandwidth can weaken the receiver's ability to distinguish multiple subcarriers. For instance, with a 20 MHz channel, at most 500 ns × 20 MHz = 10 paths can be distinguished, and the accuracy is about 3 m [10]. Therefore, to pursue a better performance, we need to further exploit more CSI and explore more targeted fingerprint design methods.
Another problem that plagues fingerprinting technology is the huge workload for building a fingerprints database (FD). The FD, also known as the radio map, is the basis for effective location estimation. In the calibration phase, fingerprinting techniques typically require to collect the wireless parameters over each divided sample position (SP) of a room, and if the indoor physical surroundings change greatly, the FD needs to be updated. Obviously, this work requires a heavy workload and suffer from the increase of the indoor area, and the traditional approaches that collect the fingerprints for all the SPs take a lot of effort in the calibration phase. At present, the actualisation of less-calibration [11] is often based on the interpolation methods, which take advantage of the partial SP fingerprints and an interpolation algorithm to obtain an estimate of the unmeasured SPs [12]. The frequently used interpolation algorithms are inverse distance weighted (IDW) and Kriging interpolation [13,14]. The IDW is simple in the calculation but low in accuracy; the Kriging interpolation relies on the error pre-judging mechanism and has relatively high accuracy, its early variogram based on experience, however, is often not the optimal, which directly affects the accuracy of the imputation. Overall, although the interpolation methods can reach the goal of reducing the calibration effort, they do not fully exploit the global information of the objective FDs, which results in a large error in the recovered entries, and with ensuing performance degradation of the system. Therefore, a lesscalibration technique with lower error needs to be introduced in the FD construction phase.
Aiming at the problems above, in this work we take the raw CSI pre-processing as the first step of our solution, to obtain a location fingerprint with better robustness and discrimination. Secondly, we map the room to a two-dimensional array, called the fingerprint matrix (FM), in which one element represents the fingerprint of one SP, and the matrix is our FD. Then, under the premise of fully demonstrating the low-rank property of the FM, we propose a matrix completion theory (MC)-based [15] efficient FD construction scheme, which can recover the entire FD with only a small number of collected SP fingerprints (or measurements) while ensuring a tolerable error. Finally, in the online phase, given that the sparse nature of the localisation problems, i.e. the positioned object does not stand at different SPs at the same time, a fingerprint matching method based on the Gaussian kernel-based Bayes rule (GKBR) [16] is employed to estimate the object location.
Briefly, the main contributions of our work are • We fully exploit the available information in the raw CSI and design a location fingerprint with better robustness and discrimination. • Based on the MC, we propose an efficient FD construction scheme with less-calibration; based on the GKBR; we propose an accurate location estimation method in the online phase. • We evaluated our system from multiple perspectives with the CSI data from two typical scenarios. Numerical results reached our expectations and highlighted the superiority of the proposed system.
The remaining of this paper is organised as follows: Section 2 discusses the existing related work. In Sections 3 and 4, we describe the design of the fingerprint and the FD, respectively. Section 5 introduces the online fingerprint matching method based on GKBR. The experimental evaluations and analyses of the proposal are in Section 6. Finally, the conclusions and prospect are presented.

Related work
Although most range-based indoor localisation techniques without a calibration procedure, their performance depends much on the fitting effect of the formulated path-loss propagation model. Unfortunately, because most indoor environments have strong multipath and NLoS conditions, the path-loss models they design are difficult to accurately characterise the real cases, and such techniques have to resort to complex calculations and extra devices for a good result [17,18]. Thus, the RSSI-based fingerprinting technology has gradually become popular in the early days. RADAR [1] is the first RSSI-based fingerprinting system, and it used the k-nearest neighbour algorithm for online matching and got a precision of 3 m. Horus [19] employed a stochastic strategy to build the FD and used the maximum likelihood for location estimation, reducing the error to about 2 m. By modelling the multipath effect through convolution operations, Fang et al. [20] effectively suppressed multipath effects on RSSI fingerprints. However, due to the shortcomings of RSSI itself, the accuracy of such techniques cannot be greatly improved without extra devices. Since Halperin et al. [6,7] effectively captured the CSI in a WiFi environment, many well-known research institutions, including Hong Kong University of Science and Technology, Tsinghua University, as well as Microsoft and Intel Labs, have successively carried out some CSI-related work, of which applying CSI to the location-aware is a research focus. The CSI-based techniques can be broadly split into two groups: one is the rangebased methods [3,8,21], the other is the fingerprinting (or scenario analysis) methods [16,[22][23][24][25], and we mainly focus on the latter in this paper.
Earlier PinLoc [22] used the CSI frequency diversity to achieve meter-level accuracy with a probability of some 90%, but it did not involve spatial diversity. FIFS system [23] considered both the frequency and spatial diversity and generated the fingerprint by aggregating CSI amplitude values. This system reduced the average error close to 1 m. CSI-MIMO [24] used the difference of the amplitude and phase of CSI and effectively reduced the fluctuation of a fingerprint, but this move weakened the discrimination and resulted in unstable location estimation. With the non-linear fitting ability of the deep network, Wang et al. [16] designed a DeepFi system that takes the deep neural networks as the FD, and successfully trained it using the restricted Boltzmann machine, with an AP and a receiver equipped with an Intel 5300 NIC, their system achieved an accuracy of nearly 1 m. Wang et al. [25] used a random forest-based deep classifier to refine the features of CSI fingerprint, and in an NLoS scenario with multi-AP, the system achieved a result that some 85% test errors below 1 m. The deep learning-based models ensure accuracy and stability, but they often require huge offline effort.
According to the results of Jin et al. [10], the smaller indoor excess delay and WiFi channel bandwidth can limit the ability of the models to distinguish multipath signals, which makes the localisation hit a performance bottleneck. Accordingly, on the premise of no extra devices, we need to fully consider this problem and tackle it by digging up more information from the raw CSI.
Moreover, how to cut the calibration workload is also directly related to the application of fingerprinting technology. Therefore, to upgrade the practicability of the proposals, many previous studies [5, 12-14, 26, 27] have begun to take into account the less calibration when building the FD. In [12], the authors proposed a sparse-based recovery method to implement the completion of RSSI FD. Zuo et al. [13] fully explored the spatial correlation of RSSI and employed the Kriging interpolation to recover the sample FD, and achieved an average localisation accuracy of about 1.9 m with the reconstructed radio map. In [14], Wang et al. designed a localisation system for the mine worker, where the support vector regression and interpolation were combined to reduce the fingerprint collection effort. Kuo et al. [26] used the spatial correlation of fingerprints to characterise the fingerprints with a small number of parameters. Both the authors of [5,27] used the MC-based approach to achieve the target of less calibration effort. However, due to the coarse-grained nature of RSSI fingerprints, the positioning errors of both systems exceeded 2 m. Inspired by the achievements and shortcomings of the above work, our work will strive to seek the breakthroughs in FD construction and positioning performance.

CSI-based fingerprint design
The location fingerprint directly affects the accuracy and stability of the entire system. In this section, we first introduce the properties of CSI; then, we analyse the advantages and insufficiency of CSI as the location fingerprint; finally, an optimised CSI-based fingerprint design method is proposed.

CSI introduction
In an 802.11a/g/n WLAN with 20 MHz bandwidth, the number of subcarriers exploited by OFDM reaches 56, of which 52 to carry data (with the 40 MHz pattern, the numbers are 112 and 108, respectively). Using a commercial NIC and the 802.11n-CSI Tool [7], we can obtain some or all of the parameters that depict these subcarriers, namely CSI, as follows: where H( f k ch ) denotes the CSI value of the kth subcarrier whose centre frequency is f k ch , and for all subcarriers, and ∠H( ⋅ ) stand for the amplitude and phase, respectively. The CSI can also be regarded as the discrete Fourier transform of the subchannel impulse response.
With the Intel 5300 NIC, we can capture the CSI of 30 subcarriers, i.e. K = 30 in (1). Some commercial NICs, such as the Atheros 9k, can release all subcarriers (see literature [8]). Further, for each spatial stream of the transmitting end (TX) to the receiving end (RX) in MIMO, all the CSI can be represented as a threedimensional complex matrix where N and M are the number of the TX antenna (TXA) and the RX antenna (RXA), respectively. In the current dominant MIMO, N ∈ {1, 2, 3} and M = 3. H n, m, k is the kth subcarrier CSI of the antenna pair from the nth TXA to the mth RXA, and the different antenna pairs denote the different spatial streams. From (2), we can see that the CSI fully exploits the frequency and spatial diversity of the MIMO-OFDM technology, it has good robustness against the negative factors of indoor environments. For the spatial discrimination, the literature [22] has proven that the CSI values collected over the sampling SPs that are 1 m apart have low correlation, so the CSI fits well with the critical properties of a fine fingerprint.

Fingerprint design method
To weaken the impacts of the lower indoor excess delay and the WiFi bandwidth on the system's ability to distinguish multipath signals, we mined more information from the CSI to design a location fingerprint. For the characterisation values of the subchannels initially obtained from the NIC, called the raw CSI, we combined the RSSI, AGC, and received noise in the CSI packet to convert the raw CSI into a more stable value, called the effective CSI, as follows: where H eff denotes the effective CSI and H raw is the raw CSI. N and M are the numbers of the TXA and RXA, respectively. η is the power attenuation coefficient, and we experimentally conclude that N = {1, 2, 3} corresponds to η ≃ {1, 1.4, 1.7} in our scenarios. P rssi is the RSSI power received by an RXA, and P raw is the original power of the subcarrier with the value of the H raw 's amplitude square; ϵ is the quantisation noise. The effective CSI is the foundation of our system, and thus, its superiority should be demonstrated first. In a static laboratory (scenario details are shown in Section 6), we collected 1000 CSI packets over six SPs and then plotted the amplitude curves for the two kinds of CSI, as shown in Fig. 1. Here, L 1 ∼ L 3 denoted three neighbouring SPs, and L 4 ∼ L 6 were selected randomly; the TXA was set to 2, and the AMP denoted the average of all antenna pairs and the amplitude curve of the two CSI. Fig. 1 shows that the effective values of the same SP had lower fluctuation than the original ones, while they were different in neighbouring locations. The illustrations revealed that effective CSI could eliminate the indoor negative factors better and thus benefited the localisation performance. Therefore, we designed the fingerprints based on the H eff .
To further cement the advantages of the CSI fingerprints, we also took into account the time diversity, (i.e. the fact that the RX collects multiple packets over one SP to manufacture a unique CSIbased fingerprint). Currently, there are two main methods for generating CSI fingerprints: the average strategy [23] and the difference strategy [24]. Given that our fingerprints were based on the effective CSI and our scenarios had dense sampling SPs, the absolute deviation between the neighbouring fingerprints was small, and thus we chose the average strategy.
Moreover, although the indoor space is usually a multi-AP environment, considering the AP's increasing coverage capability and the overlapping interference of the fingerprints [5,10], this work deployed only one AP in the room. The literature [16,21] have also proven that the online matching operation under a single AP is more sensitive and flexible. The CSI-based fingerprint can be expressed as follows: where f denotes a location fingerprint, Δ is the total number of valid packets collected, H eff denotes the effective CSI of a subcarrier, and M, N and K are the same as (3). Based on the fingerprint designed, we constructed an efficient FD, as described in the following section.

MC-based FD construction
Building an FD is the main task of the localisation system in the calibration phase. According to the foregoing, we first mapped a room divided into several SPs to an FM, as shown in Fig. 2. In Fig. 2, s denotes the SP, and d denotes the centre distance of the adjacent SPs, which is usually set to a constant. F is a complete FD, and the element f h, w , which is derived from (4), stands for the fingerprint of SP s j . The relationship between these two coordinates is j = w + (h − 1)W.

Low-rank property of the FM
According to the proven work [15,28,29], for an incomplete matrix with only some known elements (measurements), if the matrix has a low-rank property along with the measurements selected uniformly at random, then it can be accurately completed by a mapping that satisfies the restricted isometry property, also known as matrix completion. Consequently, whether the FM has a low-rank property is the prerequisite to ensure that our scheme is feasible.
Because CSI obeys the path loss rules of radio signals well, there must be a strong correlation among the FM elements, which will cause the degrees of freedom of the matrix to be much lower than its size. Many previous works have also confirmed this crucial factor related to low-rank property [3,23,28]. Next, we experimentally verified this property of the FM with the data from real scenarios.
First, we performed singular-value decomposition on the five FMs from two different scenarios, (details provided in Section 6), as follows: where σ i is the singular value of F, and u i and v i are left and right singular vectors, respectively. Then, to visually show the relative weights of the singular values of each matrix, we plotted the proportion of each singular value, as shown in Fig. 3. Fig. 3 illustrates that most of the energy in each matrix came from the first two larger singular values, among which the first had a proportion of over 80%, while the singular values caused by the unstable factors, such as measurement noises, were close to 0. In Fig. 3, the energy of F rsh s1 from the static room was the most concentrated, and the proportion of the σ 1 is close to 95%; the F rsh d1 from the dynamic scenario suffered greater interference, but the sum of the first two was still around 90%. Fig. 3 demonstrates that the FD has a low-rank property, which makes the MC-based construction possible.

MC algorithm
A matrix completion problem can be described as where F ∈ ℝ H × W is the reconstructed matrix with complete entries, and we suppose H ≤ W. Ω ⊂ [H] × [W] denotes a coordinate set of the known elements of the matrix, and # Ω ≪ H × W denotes the number of known elements.
In our scenario, F is an incomplete FD with partial measurements, and F denotes the complete FD reconstructed by the MC, F h, w = f h, w is the fingerprint at s j .
Because the rank minimisation operator is non-convex, solving (6) is an NP-hard problem. According to [15], this minimisation problem can be approximated by the convex relaxation, as follows: where ∥ ⋅ ∥ * denotes the nuclear norm. P Ω is an orthogonal projection operator, for (h, w) ∈ Ω, [P Ω (F)] h, w is equal to f h, w , and otherwise zero.
Obviously, problem (7) is a convex optimisation problem, and Candès et al. [15,29] have shown that when the measurements number meets the lower boundary # Ω ≥ c 1 rW 6/5 log W, the incomplete object can be completed accurately with a probability over 1 − c 2 /M 3 , where c 1 and c 2 are environmental constants with a range of (0,1), and r is rank of the matrix.
However, in most practical cases, the analytical solution of (7) cannot be obtained, and thus many gradient descent based numerical iterative methods, such as singular value thresholding, accelerated proximal gradient, augmented Lagrange multiplier and so forth, are proposed [30]. Given that the high dependence of fingerprinting technique on FD accuracy, we leverage a Lagrange multiplier framework with higher precision, called auxiliary variable Lagrange multiplier (AVLM), to implement imputation of incomplete FMs.
First, we introduce an auxiliary variable G that transforms the problem (7) as Formulate the AVLM function of problem (8) where ∥ ⋅ ∥ F and ⟨ ⋅ , ⋅ ⟩ stand for the Frobenius norm and Frobenius inner product, respectively. Next, we will use the ADMM to get the optimum solution to problem (9). To clarify this process, we perform the following two transformations: Similarly Finally, AVLM function (9) can be expressed as: With the ADMM, the update steps of the variables of the optimisation problem (12) can be expressed as follows: (i) Fix G = G k , Z = Z k , and calculate variable F: where D μ −1( ⋅ ) and S μ −1(Σ) denote the soft-thresholding and shrinkage operator, respectively, expressed as follows [29]: where σ i is the singular value of G k + μ −1 Z k . Z k ), and update G: Problem (15) is strictly convex, and thus the minimum value can be obtained by its partial derivative , F k + 1 ), and calculate the gradient of (12) at the point Z: Therefore, the update of Z can be expressed as (iv) Similarly, the update of Y is expressed as (v) Update ρ and μ: ρ k + 1 : = min (αρ k , ρ max ), With an auxiliary matrix, our AVLM transforms the object function into a nuclear norm minimisation with a squared regulariser plus a strictly convex problem, which guarantees the uniqueness and stability of the optimum solution. More detailed steps are listed in Algorithm 1 (see Fig. 4).

Construction scheme
Based on the approximate low-rank property of the FM, we collected CSI fingerprints from only a few SPs in the localisation room in the offline phase, and then, these fingerprints were mapped to an incomplete matrix F via Fig. 2, as follows: where '0' denotes the unsampled SPs, and F can be recovered via MC to a complete FD. Note that to meet another constraint of completion, the SPs must be selected uniformly at random [15]. Obviously, by reducing the collection workload, the proposed scheme greatly improves the efficiency of building an FD, particularly for a large-scale scenario. Also, compared with some traditional data filling methods, such as interpolation, the proposal fully exploits the correlations between the low-rank property and the FM elements, by which it can achieve reconstruction with higher precision. The measurements ratio and reconstruction accuracy will be discussed in Section 6.

GKBR-based fingerprint matching
In the online localisation phase, the position requester side (PRS) of our localisation system first generated its own location fingerprint and then put this fingerprint into the FD for matching. Finally, the matched result was converted into a spatial location and passed to the PRS. Note that our system's FD needed to be loaded into the PRS in advance.
Compared with the deterministic matching algorithms, such as a k-nearest neighbour, the probabilistic methods represented by the Bayes rule provide more reliable results [16,23]. If we suppose that the fingerprint obtained by the PRS at the unknown SP s j is f i PRS , then the goal of performing the matching algorithm is to maximise the posterior probability p(s j f i PRS ). According to the Bayes rule, the matching method can be expressed as follows: where p( f i PRS s j ) denotes the probability that the PRS's fingerprint is exactly equal to f i PRS at SP s j , p(s j ) indicates the prior probability that PRS stands at the s j , and H × K is the total number of SPs.
The limitation of (22) is the modelling of random probability p( f i PRS s j ). In the actual online phase, due to some uncertainty of the indoor interference, a completely accurate model, which estimates the location based on timely measurements, is difficult to build. Also, p(s j ) is usually assumed to be uniformly distributed, is equivalent to a constant because there is no s. Therefore, the operation based on the maximum posteriori probability can be converted into a maximum likelihood estimation as follows: where p( f i PRS s j ) can be further modelled as a Gaussian kernel function probability model as where σ 2 denotes the Gaussian function variance or the location fingerprints fluctuation, which can be obtained in the fingerprint collection phase. Also, we introduced the CSI-based indoor path-loss model of [3] as an auxiliary weight for fingerprint matching, as follows: where ξ is an environmental factor or noise gain with a range from 7 to 20 in the indoor scenarios [31].  general office environment. Through trial and error in our scenarios, we set the ξ and γ to 12 and 1.5, respectively. Therefore, the determined location of the PRS is expressed as By the above, we can get the spatial position s j of PRS. The implementation of the matching method is shown in Algorithm 2 (see Fig. 5).

Experimental results and analysis
In this section, we first introduce the scenarios and planning of our test. Then, the accuracy of the proposed construction scheme is evaluated and compared under multiple conditions. Finally, the localisation performance of the proposal is tested comprehensively and compared with the two existing techniques.

Scenarios and planning
The APs were WR340G+ and 886N from the TP-LINK series, and the TXA's number was N = 1, 3 at 1.5 m from the ground. The WiFi bandwidth pattern was set to 2.4 GHz/20 MHz. The fingerprint data were collected by an integrated 'WiFi Radar System' [32] (see Fig. 6) with 3 RXA, by which the CSI packets can be recorded in real time. There were two scenarios, the research room and the laboratory, and the division is shown in Fig. 7. In addition, taking into account the available CSI data, as well as packet loss and response speed, the Ping rates in the offline phase were set to 100 packets/s for N = 1 and 50 packets/s for N = 3, and the rates were set to 200 packets/s in the online phase. Note that a higher Ping rate can result in a higher packet-loss rate, while three-antenna AP has a larger data delay. To evaluate the FM reconstruction effect, we collected no less than 500 CSI packets over all the SPs in the offline phase, and other details are shown in Table 1.

Construction scheme evaluation
Before evaluating our construction scheme, the generated matrices were uniformly and randomly sampled to produce an incomplete matrix F with measurements of 60, 70 and 80%, respectively. Moreover, the common standards of relative reconstruction error (RRE) and reconstruction SNR (RSNR) were used as metrics, as follows: where F and F denote the reconstructed and actual FM, respectively. Referring to our Fig. 3, the boundary parameters of the sampling were set as: c 1 = 0.25, r ≃ 3. After performing the proposed AVLM, the error distribution of the filled values of the FMs under different measurement ratios is shown in Fig. 8. Fig. 8 illustrates the cumulative distribution function (CDF) of RRE for the proposed scheme. Firstly, the illustrations reflect the basic rules of MC, i.e. the reconstruction accuracy is in direct proportion to the measurement ratio, which is consistent with most practical applications based on MC [30]. For the FM F rsh s1 in Fig. 8a, the reconstructed entries with an RRE below 2% were around 90%, and this proportion rose to 95% following the measurements changed to 80%, while the number of reconstructed entries with an RRE below 1% increased by over 35% in Fig. 8c. The other FMs the illustrations were subject to the above rules as  well. Secondly, with the same measurement ratio, the reconstruction result of the larger scenario's FM is slightly worse than that of the small, and the reconstruction accuracy in the static environment was far superior to that in the dynamic. As shown in Fig. 8b, the filled values with an RRE below 2% exceeded 90% in F rsh s1 , for the FMs F lab s1 and F rsh d1 , however, this proportion fell to around 80 and 48%, respectively. The main reasons for the above were the following: (i) The negative factors, such as selective fading, were amplified in a larger room, increasing the CSI fluctuation, affected by which the low-rank property of the F lab s1 was weakened (see Fig. 3). (ii) Because the random behaviours of the indoor objects caused more interference, the vulnerable linear relationship between the collected fingerprints and the transmission distance is destroyed, which greatly affected the correlation behaviours between the elements of the FM, and thus F rsh d1 suffered from a larger error. Finally, by comparing Figs. 8b and c, it can be seen that the reconstruction performance was improved slightly under the 80% ratio. Therefore, to balance the workload and error, we used the 70% ratio as the optimal sampling in this work.
Next, for the impact of the scenario expansion or the dynamic interference on reconstruction, reducing the fluctuation of the fingerprints is the most direct approach. Hence, without more APs, we attempted to adopt more antenna pairs against the reconstruction error. The result is shown in Fig. 9. In Fig. 9, with the aid of three TXAs, the reconstruction performance of the FDs was greatly improved, particularly the F rsh d3 , the error of which was reduced exponentially. These improvements were mainly attributed to the spatial multiplexing and spatial diversity gain provided by MIMO, by which the channel fading was effectively mitigated. Meanwhile, our amplitude-averaged strategy for generating fingerprints could more fully exploit the increased spatial streams.
Furthermore, to highlight the advantages of the MC-based method in this scenario, we performed a horizontal comparison. The compared algorithms were commonly used spatial interpolations, the Kriging [13,26] and IDW [11,14], and the selected target matrices were three single-antenna FMs, and the evaluation metric was RSNR. The results are shown in Fig. 10. Fig. 10a illustrates the reconstruction results of the three methods on the same FM with different sampling conditions. Under four different sampling conditions, although the accuracy of the methods became higher along with the growth of the measurement, our MC-based method had a smaller overall error, particularly for the case with a lower sampling condition, which was attributed to the fact that it scanned the global information to minimise the affine rank before implementing the next iteration, thereby achieving a more efficient processing of the outliers and sparse noise problems. The IDW relies on the average of the distances weighted between the local estimates and the measurements to achieve recovery, and it has low computational  complexity but poor anti-noise ability; the Kriging method considers the distribution and correlation of the measured values and adds the variant weights and an error prediction mechanism when filling in the missing values. However, because it does not make full use of global information, the selected variogram cannot guarantee the optimal parameters, which affects its accuracy. Note that compared with the other two methods, although the MC-based algorithm is more time consuming, considering that the offline phase does not require real-time support, the accuracy of the MCbased algorithm made it the first choice in our scheme. Fig. 10b illustrates the recovery results of the three methods on the different FMs with the same sampling. For a smaller scale FM, the reconstructions of the methods were almost identical. For a larger or dynamic environment FM, the MC-based method had a more stable performance. This illustration indicates that although the traditional interpolation was good for coping with the data pollution from the smaller Gaussian noise, the MC-based method could prevent more successfully the deterioration of the correlation between the matrix elements caused by the larger interference.

Localisation performance test
In this subsection, we evaluated the localisation performance of the proposed system in the static environment. The receiver in Fig. 6 was set to the terminal mode as the PRS, and it stayed for 2 s at each test SP (i.e. nearly 400 CSI packets were collected), where the SPs were selected uniformly at random. The number of test SPs was 20 per round in the research room and 30 per round the laboratory, and each test totalled four rounds. The online fingerprint over each SP was an average of the four rounds of results. Given that the fingerprints in our FD and the divided SPs are bijective, we took the offset of the actual location of the PRS from the estimation as the evaluation criterion. The results are shown in Figs. 11 and 12. Fig. 11 demonstrates the visually the localisation performance of our system using the reconstructed FD, along with the impacts of the ratio of the sampled SPs on the location estimation. In Fig. 11a, the average number that the PRS exactly located was only 5 (rounded value, the same below) out of 20, and 3 location estimates had a 4-SP offset(around 3.2 m), while more than half deviated by 2 or 3 SPs. As the number of sampled SPs increased, or the accuracy of reconstructed FD improved, the localisation error decreased. When the measurement ratio was 70%, the offsetfree estimations reached three-fourths, which was almost equivalent to Fig. 11d. Fig. 11 illustrates the importance of reconstruction accuracy and cements our proposal to adopt a 70% sampling. Fig. 12 shows the impacts of the environment and multi-TXA on the localisation performance. Under the sampling condition of 70%, all the actual FDs and their corresponding reconstructed FDs participated in the test. In Fig. 12a, the system was the most accurate in the static environment, but the estimation had a large offset range in the dynamic environment with a single antenna; the F rsh d1 caused the largest error, and some 55% of the estimates deviated from the actual location over 1 SP, which demonstrated that the existence of moving objects aggravates the mismatch probability between the FD and online fingerprint, while increasing the antenna pair in MIMO was one of the solutions. By the F rsh d3 and F rsh d3 , the proportion of the offset-free estimates had increased by 30 and 35%, respectively; by the actual FD, the estimates of the offset below 2 SPs were around 90%, and the figure by the reconstructed FD was 85%, all of which verified the benefits of spatial diversity on the positioning performance. In Fig. 12b, the proposed system also demonstrated high accuracy in a larger scenario. In the worst case, >90% of the location estimates were within a 2-SP offset. Also, the accuracy was further improved by leveraging the multi-TXA, the offset-free estimation exceeded 80% under both FDs conditions, and the proportion of the 2-SP offset was >90%.
With the support of frequency diversity and spatial diversity, both the actual and reconstructed FDs show strong robustness, which enabled the proposed system to achieve accurate location estimation. Also, to highlight the superior performance of the proposed, we first selected two advanced CSI-based techniques, FIFS [23] and CSI-MIMO [24], for horizontal comparison. The fingerprint generation strategies of the two use amplitude averaging and amplitude difference, respectively, while the fingerprint matching methods they employed are the same as ours. The test was performed for 3 rounds, with 12 (research room) and 15 (laboratory) test SPs per round, and the metric adopted the averaged error. To make it fair, the three exploited the same raw CSI and reconstruction method. The results are shown in Fig. 13.
Overall, for the three methods, the localisation accuracy of the static environment or the multi-TXA was higher than the dynamic or the single TXA, and the increase of the indoor area or using the reconstructed FMs could weaken the system's performance. In the smaller scenario of Fig. 13a, with the FD F rsh s1 , the methods performed well, and the gap was small. Our method achieved an accuracy of around 0.63 m and was better than FIFS's accuracy of 0.72 m and CSI-MIMO's accuracy of 0.79 m, which was because the fluctuation in the effective CSI we leveraged was smaller. However, the localisation accuracy saw a large decline in the dynamic environment, and the errors of the three methods increased by around 1.2, 1.1, and 0.9 m, respectively. With the aid of the three TXAs, the errors were reduced to some extent, during which ours and FIFS fell obviously. The reason for this result was that the amplitude-averaged strategy adopted by the two methods could leverage the spatial diversity more fully. Meanwhile, the single AP and the denser sampling SPs contributed to relatively close difference values for the CSI, so the CSI-MIMO failed to achieve a better result. Fig. 13b shows the localisation results of the three methods in a large static room. Compared with the others, ours exhibited higher accuracy and robustness and could make better use of the multi-antenna pair against the impacts of the increased room size.
Further, our proposal was also compared with the DeepFi [16] using the same online algorithm. In the DeepFi system, the multi-  layer network is its FD and the fingerprints are represented by the network's weights, which is a major change compared with the traditional methods. The FD of DeepFi trained by our effective CSI, our system used the restructured FDs with 70% measurements, and the test points were randomly selected in a laboratory and totalled 30. The results are shown in Fig. 14. Fig. 14 illustrates the CDF of localisation errors under different TXA conditions for the two systems. With a single-antenna AP, the overall error of DeepFi was lower than that of ours. The reason was that DeepFi's FD could depict the complex indoor scenario more finely, which is supported by the powerful non-linear fitting capability of deep networks. Although the proposed system was at a disadvantage in localisation performance, it avoided the heavy workload of training and calibration. With the improvement of the CSI dimension by the three antennas, ours obtained the accuracy close to the DeepFi. However, the move increased the time cost due to the data delay caused by the multi-TXA. Moreover, compared with Fig. 12 of [16], the performance of DeepFi itself has also been improved, which benefits from the fingerprint generation method of our solution.

Conclusion
Indoor LBS requires an accurate, robust, and low-cost localisation system to support its application. Compared with the geometric measurement-based technologies, fingerprint localisation technology has better accuracy and anti-interference capability. Exploiting CSI as a medium, we aimed to combine the frequency and space diversity under the IEEE 802.11n standard of WLAN into the location awareness technology and proposed an indoor fingerprinting localisation system, which featured the optimised CSI-based fingerprint, MC-based FD construction, and GKBRbased fingerprint matching technologies. The proposed system was evaluated from multiple angles using data from a real-indoor scenario, and the test results showed that our proposal produced reliable location estimation in addition to greatly reducing the calibration requirements of the system. Although our work explored the positive effects of applying CSI to the LBS, there are still some open problems in the proposed system that need to be considered. For instance, the pros and cons of dense AP on the performance of FD construction or localisation, and the adaptability of the proposed system to the multi-object situations, all of which will be the focus of our further study.