3.1. Experimental Environment
We analyze the performance of the proposed techniques using publicly available Wi-Fi fingerprint datasets. The dataset used is from a 2019 paper [
24].
The dataset used in this paper contains RSSI, latitude, longitude, and layer information. Specifically, latitude and longitude data were calculated as datapoint locations. The training dataset includes 3852 data from the 0th floor and 3323 data from the 1st floor, with a total of 489 indoor positioning points used. The dataset also contains 443 unique APs. However, in this paper, we only used data from floor 0 because we do not consider interfloor movement. Considering only layer 0 data, the number of unique APs was found to be 173. The following hyperparameters are used to calculate the movement paths according to the proposed method.
Table 1 lists the hyperparameters required for merging BBs. Here, BBi size refers to the size of the initial BB, m_factor represents the multiplier used for expanding the BB when there are no neighboring BBs around it, and max_factor is the maximum size for combining and expanding the BB, which cannot exceed 5 m. The IoU Threshold is 0. Non-overlapping BBs cannot be merged with neighboring BBs, only expanded.
Table 2 presents the hyperparameters required to divide cells, where the minimum size of the divided cell is 5 m and the maximum number of BBs contained in the cell is limited to 2. If this limit is exceeded, the cell is divided.
The information of the dataset needed for training is shown in
Table 3, where the units are counts. Use Algorithm 5 to visit cells along the movement path from a random cell and collect RSSI data for each cell. The RSSI data from the visited cell is gathered by splitting it into train and test datasets. The last visited cell is numbered Y. Based on this data, we trained a model for the classification problem, utilizing a single slot on an Nvidia DGX A100 as the training environment.
Table 4 displays the range of hyperparameters we employed for training, subsequently serving as the search space for the Bayesian optimizer. The Bayesian optimizer is a method that predicts the optimal values of hyperparameters based on a prior probability distribution, which was used to find the optimal hyperparameters. The optimal hyperparameters calculated from the experiments are summarized in
Table 5.
The initial data distribution before training the LSTM on the Wi-Fi fingerprint data used in the experiment is shown in
Figure 2. This shows the distribution of data based on the Wi-Fi signal alone, regardless of the indoor structure.
3.2. Experimental Results
Based on the RSSI signal data, the proposed technique is applied to represent the BB, and the final BB is obtained through the process of merging and expanding the BB. The result is then divided into cells, as shown in (a) of
Figure 3. The initial cell is sized to encompass all BBs placed within the floor, and the distribution of BBs across the cells demonstrates that the BBs and cells form the foundation for generating movement paths. This is (b) in
Figure 3. As a result, we can observe that the BBs positioned within the entire cell represent the gathered dataset as BBs, and they are organized and divided in a manner consistent with the actual indoor structure of the floor.
When the placement of these BBs is plotted against the arrangement of the original data points, we can observe that they are correctly distributed within the cell region. This proves the appropriateness of the cell division, as the clustering of the replaced BBs is correctly represented within the cells. Therefore, our cell division technique is the basis for computing the movement path. In
Figure 4a shows that the division of the cells made sense when merged with the original data points.
Based on the BB’s location within the cell calculated in (b) of
Figure 3 and the neighboring cells where the BB exists, the cell adjacency calculation algorithm determines the adjacency. The movement path was then computed based on cells whose adjacency was confirmed through the path generation algorithm. The result is depicted in (b) of
Figure 4, and we confirmed that the calculated paths match the actual paths taken in the real building.
Training was carried out by utilizing the computed movement path dataset and the specified hyperparameters. The results are shown in
Table 6, which are the experimental results of the proposed method based on the 2019 dataset [
32]. The experimental results express the accuracy of the computed paths. The training data and test data were trained with the LSTM model, and the accuracy of the training results is shown as the mean, deviation, maximum, and minimum values. We can see that the difference between the maximum and minimum accuracy is 0.5% for the test data, and the deviation is low, so the accuracy remains constant.
An experiment was conducted to determine the optimal path length for the proposed technology. When experiments were conducted on various paths using hyperparameter search, the performance of the top five results was selected and evaluated.
Table 7 displays the results of an experiment comparing the performance when applying different numbers of paths (accumulated past data). The number of paths used in the experiment was confirmed to be 1 case without a path, 3, 5, and 7 using paths. In experiments using 1 to 5 paths, the accuracy of location judgment gradually improved, while with 7 paths, the accuracy decreased. This confirmed that the optimal number of paths for location judgment is 5.
We conducted an experiment to determine the optimal path length for the proposed technology. When we performed experiments on various paths using hyperparameter search, we selected and evaluated the performance of the top five results.
Table 7 shows the results of an experiment comparing the performance when applying different numbers of paths (accumulated past data). The number of paths used in the experiment was 1 case without a path, and 3, 5, and 7 cases with paths. In experiments using 1 to 5 paths, the accuracy of location estimation gradually improved, while with 7 paths, the accuracy decreased. This confirmed that the optimal number of paths for location estimation is 5.
To validate the performance of the clustering method using the proposed cells in this study, we compared it with K-means clustering. The validation experiment involved clustering areas based on the same dataset and training a model to estimate the location of each area. We set up the same task for further comparison. The number of clusters was determined to be 150 through experiments, in order to create a K-means clustering area of a size similar to the minimum size of the grid cell proposed by our approach, which was then applied in the comparative experiment. To compare the performance of this experiment, we used KNN and FNN models. The comparison results are presented in
Table 8, and it was confirmed that the grid cell-based clustering method proposed by our approach exhibited superior accuracy and performance improvements across all models when compared to the K-means clustering method.
There is a need to conduct experiments with other models to validate the performance of the proposed technology. However, to utilize continuous time-series information such as path data, LSTM must be employed. As a result, the comparison subject was divided into cases where route information is utilized and cases where it is not. KNN and FNN were utilized as the latter models. In the experiment, grid cells were applied to the proposed technology, and its performance was verified in an experiment with one path, where path information was not used for comparison with other models. Additionally, it was tested in an environment with five paths to confirm the maximum performance of the proposed technology. The accuracy of the experimental results was compared with the performance of the models using a classification problem that predicts the cell to which signal data belongs. When there are five paths, learning was conducted and evaluated as a classification problem, predicting the cell to which the final signal data belongs. The results are displayed in
Table 9, confirming that the performance of the LSTM model for the proposed technology surpassed that of other models.
In the experiments for this paper, we used a minimum of 5 m as the threshold in the cell division size problem, because clustering could not be performed with a smaller value. Therefore, the bounding box expansion size was limited to a maximum of 5 m. Moreover, there is a limitation that accumulated Wi-Fi RSSI is required for accurate location estimation.