Radio map construction based on BERT for fingerprint-based indoor positioning system

Due to the heavy workload of RSS collection, the instability of WLAN signal strength and the disappearance of signals caused by complex indoor environments, the construction of radio map for wireless local area network (WLAN) fingerprint-based indoor positioning system is time-consuming and laborious. In order to rapidly deploy indoor WLAN positioning system, the bidirectional encoder representation from transformers (BERT) model is used to fill the missing signal in radio map and quickly build radio map. The radio map is imported into the BERT model in the form of natural language text, and the missing signal is filled by the BERT model. Since the number of input data in BERT model cannot exceed 512 words, the structure of BERT model is not suitable for WLAN signals with large data volume. Therefore, we redefine the model structure based on the original BERT model and fill in the missing signals in the radio map in parallel. In addition, the loss function is redefined. Except that each segment has a loss function, the weighted average value of all segment loss functions is defined as the total loss function. The experimental results show that the BERT model is better than the traditional linear interpolation method, compressed sensing algorithm and matrix completion algorithm in filling the missing signals in the fingerprint database, and the probability of error within 2 m reaches about 94%.


Introduction
In recent years, with the rapid development of smart mobile terminals and mobile Internet, the diversified services provided by the Internet have rapidly increased the application scenarios of location-based services, and location-based services have attracted extensive attention [1]. The combination of smart mobile terminals and positioning services is widely used in various indoor environments [2], such as positioning and navigation of meal delivery robots in restaurant and positioning and navigation services for customers in shopping malls. Among many indoor positioning technologies, fingerprint-based WLAN indoor positioning system has become the most prominent choice for positioning and navigation due to the advantages of no additional hardware equipment, short development cycle and high positioning accuracy [3,4].
The fingerprint-based WLAN indoor positioning involves two phases: offline training phase and online positioning phase. In the offline phase, the main work is to establish a radio map. We set reference points (RP) in the positioning area and collect RSS data from all access points (AP) on each RP. The radio map is obtained by storing the RSS data together with the location of the corresponding RP. In the online phase, the user's terminal senses the RSS value in real time and matches it with the RSS in the radio map to calculate the user's location [5]. Therefore, the radio map is the key of fingerprint indoor location, and the integrity of the radio map determines the accuracy of the positioning results.
Because of the complex indoor electromagnetic environment, RSS fluctuates violently and may be missing. Therefore, when building a radio map, researchers need to manually collect RSS values on each RP many times to reduce the impact of RSS fluctuation on the accuracy of radio map [6,7]. In addition, indoor personnel movement and cargo location change will affect the positioning accuracy, so it is necessary to update the radio map regularly according to the environmental changes [8,9]. Therefore, it takes days maybe weeks to build or maintain a radio map, which is time-consuming and labor-consuming. The huge time and labor cost has become the bottleneck hindering the promotion of fingerprint indoor positioning. Accordingly, how to quickly and lowcostly establish radio map has become a hot issue in the field of fingerprint-based indoor positioning in recent years.
In order to reduce the time and labor cost of data acquisition, we use the build-in sensors and inertial measurement unit of mobile devices to assist in the establishment of radio map [10]. When researchers walk in the indoor positioning area, the mobile terminal collects IMU data to automatically calculate the RP position and record the signal strength on each RP, so as to quickly establish a radio map. The radio map established by this method is incomplete and can only cover part of the RP, and there will be the loss of RSS, but we can use the interpolation method to fill in the missing RSS. In order to fill the missing signal, researchers have proposed many effective methods. In reference [11], X. Chai et al. used unlabeled user trajectories to construct radio map, and interpolated radio maps by labeling unlabeled data using Bayesian theory and hidden Markov model. Reference [12] investigates the influence of signal change on the radio map and obtains the intrinsic relationship of RSS data at different times by a linear model, which effectively reduces the updating workload of the radio map. The semi-supervised learning algorithm is proposed in reference [13] to construct a radio map. A small amount of labeled data and a large number of unlabeled data are collected in the offline phase, and the labels of unlabeled data are calculated by a semi-supervised learning algorithm. In reference [14], L. Ma et al. proposed a radio map efficient building using low-rank tensor completion method and reduced the effort of radio map building remarkably. Z. Wang et al. improved the low-rank matrix filling algorithm and improved the accuracy of filling data in radio map [15]. In addition, Du et al. [16] proposed a geographically weighted regression-based local interpolation method to construct a radio frequency fingerprint database, which uses calibration points deployed at multiple locations to fit the signal attenuation model with regression algorithm. However, this method requires additional deployment of a large number of hardware devices to calibrate the attenuation model, and the attenuation model parameters need to be updated regularly. In addition to local interpolation methods, global interpolation methods are also commonly used in indoor positioning fingerprint construction. Among them, compressive sensing interpolation methods [17,18] are particularly important. Reference [19] adopts compressive sensing method to dynamically fill the missing RSSI values of fingerprints and reduces the interference of multipath effects in the indoor environment through sparse low-rank singular-value decomposition algorithm. However, this method is not easy to find a suitable fingerprint library sparse representation matrix.
In 2018, Google proposed the bidirectional encoder representations from transformers (BERT) model [20], which is a deep neural network model based on the transformer architecture. The main contribution of BERT is the introduction of pre-training. BERT is pre-trained on a large-scale unlabeled corpus, and then fine-tuned on downstream tasks. In the past, the common method in natural language processing was to train neural networks on a large amount of labeled data, which often required a significant amount of labeled data and computing resources and was difficult to cover all natural language problems. However, the pre-training idea allows the model to be trained on a large amount of unlabeled data, which can be used to learn a universal language representation, making it faster to fine-tune in downstream tasks. The appearance of BERT has achieved the best results in multiple natural language processing tasks, such as question answering, machine translation, and named entity recognition. Due to its strong representation learning ability and generalization ability, BERT has become an important direction and hotspot in the research of natural language processing. Moreover, BERT has also been used in various fields such as generative models, dialogue systems, and search engines, providing strong support for practical applications. Inspired by the BERT model, we input the collected RSS data into the BERT model in the form of natural language. The missing signal in radio map is predicted through the BERT mask model to construct a complete radio map. Since the original model structure cannot input more than 512 words in a sentence, the structure is improved based on the BERT model. The RSS values collected in adjacent RPs are processed in segments, and the missing RSS values are filled in parallel according to the signal propagation theory. In order to solve the problem that the spatial relation between segments is not close and make the model more robust, we redefine loss function. Except that each section has a loss function, the weighted average of all section loss functions is defined as the total loss function.
The remainder of this paper is organized as follows. In Sect. 2, we describe the fingerprint-based WLAN indoor positioning system and give a complete radio map algorithm framework based on BERT model. Section 3 introduces the structure and improvement of BERT model. The relevant experimental results and analysis are introduced in Sect. 4. Finally, the full text is summarized in Sect. 5.

WLAN indoor localization system
The block diagram of fingerprint-based WLAN indoor positioning system is shown in Fig. 1. In the offline phase, suppose m APs and n RPs are deployed in the location area. In order to build the radio map, we collect RSS values from all APs on each RP, and a 1 × m dimension RSS vector RSS 1 = (RSS i1 , RSS i2 , . . . , RSS im ) can be get after preprocessing on each RP, where RSS ik is the received signal strength measured on the i-th RP from the k-th AP. The i-th fingerprint (RSS i , C i ) is obtained combined RSS i with its corresponding coordinates C i = (c i1 , c i2 , ) . The set of all fingerprints is radio map, as shown in Fig. 1. In the online phase, the user's location can be calculated by matching the collected RSS with the RSS data in the radio map.
In the indoor environment, the wireless signal transmitted by an AP reaches RP after attenuation, and the output is calculated as RSS value by the mobile terminal. The wireless signal propagation model is shown in Eq. 1 [21].
where d denotes the distance between the location of the collected signal and the AP, P r (d) represents the received signal strength of AP, P t is the transmit power of AP, and P d 0 represents the received signal strength at a distance of d 0 . n is a known parameter that represents the path loss. According to the signal propagation theory, the distance between adjacent RPs is relatively close, and the signal strength is highly similar. This indicates that there is spatial correlation between RSS values of adjacent RP. This conclusion lays the foundation for constructing complete radio map using deep learning model. By pre-training the complete RSS data, BERT model can learn the relationship between adjacent RP and AP, and then use a small amount of RSS data to predict the missing signal more accurately. We collect a small amount of RSS data in the indoor environment and import it into Bert model to build a complete radio map. The complete radio map construction process is shown in Fig. 2.

Data preprocess and analysis
Since collection is easily affected by the environment, noise is inevitably introduced in the collection process. In addition, the influence of RSS changes, and the relative direction of mobile antenna toward the access point should be considered in the data acquisition process. Therefore, data should be preprocessed before positioning.

Radio map overview
In the actual collection process, RSS data collected by mobile devices are instantaneous value. However, RSS values are different in a period of time, so this method of instantaneous data collection will have a great impact on positioning accuracy. The average value can better represent the indoor electromagnetic environment by reducing the Gaussian noise brought by the channel. So the RSS data we store in the radio map are average, not instantaneous. This means that they are statistical values obtained over a period of time. In addition, the relative directions of the mobile device antenna toward the access point are different, which will also have a significant impact on RSS data. Suppose C θ ij is a sample of RSS reading, where θ = (0 • , 90 • , 180 • , 270 • ) , and the average of the vector is C θ ij . Figure 3 illustrates an example of 100 samples of measured RSS data.
To eliminate the direction problem, we store vectors collected in different directions in radio map. This is one of the solutions to eliminate the direction problem, but it expands the radio map and increases the computational burden. In order to eliminate the direction problem and reduce the computational complexity, this paper selects the same orientation and averages the RSS collected in this orientation. In addition to the orientation problem, RSS data are also vulnerable to multipath fluctuations, for example, the movement of people, the door barrier, etc. This causes some exception values to be generated, as shown in Fig. 4. If these outliers are averaged together with the original RSS data, additional noise will inevitably be introduced into radio maps, seriously affecting the positioning accuracy. Therefore, preprocess the original RSS data.
To eliminate outliers in the original RSS data, we set a threshold. The threshold value is three times the standard deviation of original RSS data. Then, delete values greater than the threshold in the original RSS data. The treatment effect is shown in Fig. 5. Suppose the standard deviation of RSS readings is φ θ ij , and any sample of RSS reading C θ ij will be eliminated if the inequality |C ij θ −C θ ij | > 3φ θ ij is satisfied. Therefore, the average of RSS reading stored in the radio map for the i-th access point at the j-th reference point is defined as: (2)

Data analysis
Take a floor of a building for example. Assuming that the floor area is 1000m 2 and a reference point is set for each square meter in the region, there are 1000 reference points in the region. At a speed of 100 times in 1 min, it takes 16.7 hours to complete the radio map of the floor without considering the orientation problem. Therefore, constructing radio map is a very time-consuming task. Figure 6 shows the constructed radio map. The number of the columns is equal to the number of reference points, and so it has the same number of rows as the number of access points. Each entry value is expressed in colors, where red means the RSS reading is very strong, and blue means just the reverse. The time bar shows that it takes six periods of time to complete collecting RSS readings for the entire localization area.
Usually, when the reference point is far from the access point, mobile devices do not collect RSS from the access point. Therefore, we use −100dBm to populate RSS data at this reference point to build radio map. In addition, when the signal is blocked by pedestrians and doors, the mobile device cannot detect the access point. As shown in Fig. 7, gray indicates that RSS data are not collected.
Missing RSS readings will directly cause the asymmetric matching problem between the offline phase and online phase and lead to the failure of the fingerprint algorithm. KNN searches through the radio map to get the Euclidean distance between RSS readings online and those stored in the radio map. If any entry is missing, the location of the mobile device will fail to be estimated. It is worth pointing out that we could not ignore these access points with RSS missing readings and use the rest of the available access points to calculate the Euclidean distance. This is because the RSS Euclidean distance comes from different access points. As shown in Fig. 8, if we still choose to fill the missing RSS data with −100dBm , it will seriously affect the positioning performance. Because the signal is blocked by pedestrians and doors is an accidental event, it cannot guarantee the uniqueness of its state when collecting RSS data online. In order to ensure the feasibility of fingerprint algorithm positioning, these missing data should be correctly restored.

Filling the missing signal based on BERT
As shown in Fig. 9, the BERT model structure is a multi-layer bidirectional converter, which is only used for the encoder in transformer. The detailed introduction of the transformer is in reference [22]. Multiple transformer encoders are combined into BERT. BERT model can be divided into three parts: input, encoder and output. is the sign of the end of RP. Token embedding means that each data in the model is the signal strength of each AP propagation collected at the reference point. Segment embedding refers to the index of the reference point of the current RSS data, which is usually used to determine that the data of the current input model comes from two adjacent reference points. Location embedding is the index of the current location of RSS data, which is used to determine which AP sends the signal.
BERT uses the mask language model. Given a sentence, some words are occluded, and the model is used to predict the occluded words. Before filling the data, we first input 70% adjacent complete data into the BERT model for training. The BERT model with mask randomly replaces some words ( 15% ) with [Mask] for prediction each time. In the practical prediction task, 40% of the input data is filled with zero to simulate the missing signal value in the practical task, and then we choose to replace the zero filled data directly with [Mask]. This means that we do not randomly replace the data with [Mask], but instead directly define the missing signal as a new [Mask]  in practical task. The restored [Mask] is the missing signal value. The mask learning model is shown in Fig. 11. In Fig. 11, [Mask] represents the missing signal. The "lazy" behavior of BERT model is avoided without using Mask after input data. Since all known information is used to train the model and then the whole data is input into the model again, the model does not associate the intrinsic relationship information between the data, but copies the [Mask] data directly. In order to avoid the above problems, we improve the BERT model, and clearly stipulate that the missing signal in the [Mask] position must rely on context information to predict.
In addition to the mask language model (MLM) method, the BERT model also adds the second pre-training task: the next sentence prediction. This is a classification task to determine whether B is an adjacent sentence of A in sentences A and B. During the training process, 50% of the training data is contextually extracted and 50% is randomly extracted. In radio map, RSS collected on adjacent RPs is highly similar, so we modify the prediction task of the model, abandon the selection method of random sampling, and call one loop to fill the RSS on adjacent RPs. For overall computational efficiency, the maximum input length in BERT model is limited to 512. In order to achieve more efficient data filling, we segmented RSS data. When RSS data are stabilized, merge adjacent RPs and populate radio map in parallel, as shown in Fig. 12.

Loss function
The loss function of BERT consists of two parts, corresponding to two pre-training tasks of the BERT model. Through the joint learning of the two tasks, the information learned by BERT model can include both token-level information and global data information. The loss function is shown in Eq. 3.
(3) L p (θ, θ 1 , θ 2 ) = u p=1 L 1 (θ, θ 1 ) + L 2 (θ , θ 2 ) where θ is the parameter in encoder in BERT, θ 1 is the parameter in the output layer of encoder in the prediction task, θ 2 is the classifier parameter to judge whether RP is adjacent, and u is the number of cycles. Suppose the set of RSS data is M and the dictionary size is |V| , the first part of the loss function can be expressed as follows: Similarly, the global task of RP segment is also a weighted loss function, which is expressed in 5.
After u cycles, the loss function of joint learning of two tasks can be rewritten as 6,

Experiments and analysis
This section details the experimental results of constructing radio map using BERT model. The experimental area is located on the fourth floor of No.9 Teaching Building of Shandong University of Technology. As shown in Fig. 13, we divide the experimental area into several grids, each grid spacing is 0.5 m. Twenty-seven APs and 810 reference points were deployed. The parameters of the proposed model in this paper are shown log q (n = n i |θ , θ 2 )] in Table 1. In addition, the BERT pre-training model used is the published Chinese pretraining model "BERT-Base-Chinese. " We use Redmi K30pro device to collect RSS data, and the direction of each acquisition is consistent. In order to verify the effectiveness of the BERT model, we collect 100 RSS data at each reference point, remove the singular value, calculate the average value of the remaining RSS data, and use this average to construct a complete radio map. Using this method to collect data can effectively improve the stability of RSS, thereby improving the positioning accuracy. The complete radio map is shown in Fig. 14. The abscissa represents different APs, the ordinate represents different RPs, and the colors represent RSS strength.
We randomly selected 15 columns of RP to fill to verify the prediction function of BERT model, and the filling effect is shown in Fig. 15. We define the difference between the original data and the data filled by BERT as error, and the error equation is E = X − X ′ . X represents the original data, X ′ represents the data filled by BERT model. Figure 15 shows that most areas of [Mask] have good recovery effect by using the original BERT model. But in some areas, the recovery is very poor, and the recovery error will affect the positioning accuracy. This error is largely related to the selection of the second prediction task, because when constructing an incomplete radio map, the method of selecting RP is randomly selected, which may lead to a far-distance RP for parallel filling. According to the signal propagation theory, the closer the two RPs are, the higher the similarity of RSS data is. Therefore, random sampling method is not suitable for signal propagation theory, and will lead to poor recovery in some areas.
We import all the data into the improved BERT model, with a total of 810 RPs. To reduce the size of the data, we collected only 40% RSS data and set the remaining RSS value to zero. In the BERT model, 81 cycles are set, and 10 RP columns are extracted in each order to fill. The filled radio map is shown in Fig. 16.  As shown in Fig. 16, when extended to all data, the improved BERT model achieves excellent filling effect. We digitize the filling error of the BERT model and compare it with traditional algorithms (improved low-rank completion [15], low-rank  completion [23], spline interpolation (x), spline interpolation (y), spline interpolation (0,0)) [24], as shown in Table 2. Table 2 shows that the error of the improved low-rank matrix filling algorithm is 6.52 dBm because the low-rank characteristic of RSS matrix is used by the low-rank matrix filling algorithm when filling the radio map. Although the filling effect is improved, due to the influence of environmental factors, there will be low-rank noise in filling. Similarly, spline interpolation method has different filling effect according to the arrangement of data. The filling effect of BERT model is obviously better than that of the traditional algorithm, and the error is only 3.24 dBm. The effect of filling radio map with BERT model is very excellent, which greatly reduces the workload of building radio map.
To better demonstrate the effectiveness of our proposed method, we present a comparison of the cumulative distribution functions of various algorithms, as shown in Fig. 17. It can be seen from the figure that compared to the currently available linear interpolation, compressive sensing, and matrix filling algorithms, the wireless map established  by the BERT model has a higher positioning accuracy. The positioning accuracy within 2 ms is approximately 94%, which exceeds the positioning accuracy of traditional matrix filling algorithms (90%) and compressive sensing interpolation (90%). This means that we can achieve high-precision positioning with only 40% of the workload. In summary, our proposed method has significant effects on reducing the time and manpower costs of building wireless maps and obtaining accurate positioning results.
The traditional manual collection method not only consumes a large number of resources, but also takes a lot of time. The method proposed in this article not only significantly improves the positioning accuracy, but also achieves a significant improvement in time. We compared four fingerprint database construction methods, namely the traditional manual collection, Kriging interpolation, low-rank matrix filling, and the BERT model proposed in this article, based on the required time cost as shown in Table 3.
Traditional manual fingerprint database construction methods require dividing indoor space into multiple grid points, and professionals holding equipment to collect fingerprint data at each reference point to construct a fingerprint database. In order to improve the efficiency of fingerprint database construction, this paper chose the manual fingerprint database construction method to be completed by five people alternately. A total of 410 sample data points were collected, and 100 fingerprint data points were collected in different directions at each sampling location, and the average value was taken as the fingerprint data value. According to statistics, the average