Ada-LT IP: Functional Discriminant Analysis of Feature Extraction for Adaptive Long-Term Wi-Fi Indoor Localization in Evolving Environments

Wi-Fi fingerprint-based indoor localization methods are effective in static environments but encounter challenges in dynamic, real-world scenarios due to evolving fingerprint patterns and feature spaces. This study investigates the temporal variations in signal strength over a 25-month period to enhance adaptive long-term Wi-Fi localization. Key aspects explored include the significance of signal features, the effects of sampling fluctuations, and overall accuracy measured by mean absolute error. Techniques such as mean-based feature selection, principal component analysis (PCA), and functional discriminant analysis (FDA) were employed to analyze signal features. The proposed algorithm, Ada-LT IP, which incorporates data reduction and transfer learning, shows improved accuracy compared to state-of-the-art methods evaluated in the study. Additionally, the study addresses multicollinearity through PCA and covariance analysis, revealing a reduction in computational complexity and enhanced accuracy for the proposed method, thereby providing valuable insights for improving adaptive long-term Wi-Fi indoor localization systems.


Introduction
With the advent of the Internet-of-Things (IoT), along with the rollout of 5G and emerging 6G technologies, the significance of location-based services (LBS) has markedly increased.Accurate indoor positioning information is essential for a range of applications, including business location services, data mining, security monitoring, and venue management [1][2][3][4].While global positioning system (GPS) technology operates effectively in outdoor settings, it proves inadequate for indoor localization due to weak signal reception in complex environments.Key challenges include limited line of sight, insufficient satellite signal penetration, and interference from internal obstacles, such as shadows and multipath fading [5][6][7][8][9].As urbanization intensifies and a majority of activities shift indoors, the demand for reliable indoor positioning systems (IPSs) has surged.A variety of wireless technologies have emerged to address this need, including radio frequency identification (RFID) [10], Bluetooth [11], ultra-wideband (UWB) [12], Zigbee [13], inertial navigation [14], and visible light communication (VLC) [15].However, the implementation of these technologies often incurs significant infrastructure costs.Effective IPSs leverage diverse signal characteristics-such as received signal strength (RSS), channel state information (CSI), angle of arrival (AOA), and time of arrival (TOA)-to accurately locate objects or individuals in environments where GPS signals are compromised.To meet the demands of indoor settings, these systems must provide high accuracy, rapid estimation times, and low Sensors 2024, 24, 5665 2 of 33 power consumption.Nevertheless, the dynamic nature of indoor environments introduces variability in signal patterns, which can adversely affect positioning performance [16][17][18].To achieve a balance between computational costs and accuracy, IPSs must optimize available resources while accounting for environmental factors and maintaining an acceptable margin of error.The mission of the application and the overall system cost are also critical determinants of positioning performance [19][20][21].Among the various indoor positioning technologies, Wi-Fi fingerprint-based IPS (FPBIPS) stands out as a particularly promising solution owing to its cost-effectiveness and ease of implementation.However, FPBIPS is susceptible to challenges posed by multipath effects, shadowing, and scattering, which are influenced by the dynamic nature of indoor environments [22][23][24].Additionally, signal attenuation in wireless communication systems-primarily attributed to path loss, shadowing, and multipath effects-can significantly degrade location accuracy [25].Figure 1 illustrates the impact of multipath on the received signal within an indoor setting.
Sensors 2024, 24, 5665 2 of 35 demands of indoor settings, these systems must provide high accuracy, rapid estimation times, and low power consumption.Nevertheless, the dynamic nature of indoor environments introduces variability in signal patterns, which can adversely affect positioning performance [16][17][18].To achieve a balance between computational costs and accuracy, IPSs must optimize available resources while accounting for environmental factors and maintaining an acceptable margin of error.The mission of the application and the overall system cost are also critical determinants of positioning performance [19][20][21].Among the various indoor positioning technologies, Wi-Fi fingerprint-based IPS (FPBIPS) stands out as a particularly promising solution owing to its cost-effectiveness and ease of implementation.However, FPBIPS is susceptible to challenges posed by multipath effects, shadowing, and scattering, which are influenced by the dynamic nature of indoor environments [22][23][24].Additionally, signal attenuation in wireless communication systems-primarily attributed to path loss, shadowing, and multipath effects-can significantly degrade location accuracy [25].Figure 1 illustrates the impact of multipath on the received signal within an indoor setting.The variability of fingerprint values in indoor environments, influenced by factors such as device heterogeneity, measurement timing, user orientation, and channel conditions, significantly impacts positioning performance.This dynamic variability often leads to mismatches between stored and real-time fingerprints, posing a critical challenge for accurate indoor positioning.To address these issues, various fingerprint-matching strategies have been developed [26][27][28], broadly categorized into deterministic [29][30][31] and stochastic approaches [32][33][34].To mitigate the challenges posed by complex indoor signal fluctuations, several FPBIPS methods have been proposed.One approach involves modeling signal jitter using the path loss model; however, this method is constrained by its dependence on map information and the assumption of a fixed receiver position [35][36][37].In addition, machine learning (ML) algorithms have also been applied to RSS fingerprintbased indoor positioning problems, yet these techniques often fail to consider critical factors, such as leveraging related source domains, which could enhance the overall positioning accuracy and reduce the labor-intensive costs associated with offline fingerprint data collection [38][39][40].In addition to that, recent advancements in addressing the inherent challenges associated with FPBIPS have been extensively documented in the literature.Various studies have proposed innovative algorithms and methodologies aimed at enhancing the resilience of these systems against signal fluctuations and the deterioration of fingerprints over time due to the dynamic nature of indoor environments [41][42][43][44][45].For instance, advanced techniques and machine learning approaches have been demonstrated to significantly improve accuracy and robustness in environments with fluctuating signals and evolving conditions [44,45].A novel multi-modal indoor localization method that integrates visual information, Wi-Fi signals, and lidar data, achieving high precision with an average 3D localization accuracy of 0.62 m and a mean square error of 1.24 m in two- The variability of fingerprint values in indoor environments, influenced by factors such as device heterogeneity, measurement timing, user orientation, and channel conditions, significantly impacts positioning performance.This dynamic variability often leads to mismatches between stored and real-time fingerprints, posing a critical challenge for accurate indoor positioning.To address these issues, various fingerprint-matching strategies have been developed [26][27][28], broadly categorized into deterministic [29][30][31] and stochastic approaches [32][33][34].To mitigate the challenges posed by complex indoor signal fluctuations, several FPBIPS methods have been proposed.One approach involves modeling signal jitter using the path loss model; however, this method is constrained by its dependence on map information and the assumption of a fixed receiver position [35][36][37].In addition, machine learning (ML) algorithms have also been applied to RSS fingerprint-based indoor positioning problems, yet these techniques often fail to consider critical factors, such as leveraging related source domains, which could enhance the overall positioning accuracy and reduce the labor-intensive costs associated with offline fingerprint data collection [38][39][40].In addition to that, recent advancements in addressing the inherent challenges associated with FPBIPS have been extensively documented in the literature.Various studies have proposed innovative algorithms and methodologies aimed at enhancing the resilience of these systems against signal fluctuations and the deterioration of fingerprints over time due to the dynamic nature of indoor environments [41][42][43][44][45].For instance, advanced techniques and machine learning approaches have been demonstrated to significantly improve accuracy and robustness in environments with fluctuating signals and evolving conditions [44,45].A novel multi-modal indoor localization method that integrates visual information, Wi-Fi signals, and lidar data, achieving high precision with an average 3D localization accuracy of 0.62 m and a mean square error of 1.24 m in two-dimensional tracking [44].The study highlights the potential of hybrid techniques in enhancing location-based services within complex environments.Nevertheless, the performance relies on the accuracy and compatibility of the multimodal sensors used.In addition, the joint processing of multiple data sources might introduce additional overhead costs, which could limit deployment on low-power devices.
(1) We propose the application of functional discriminant analysis (FDA) in combination with transfer learning techniques to tackle the challenge of high offline fingerprint calibration overhead.To achieve this, we generate new feature spaces that focus on the most significant predictors.These predictors enhance the separability of the model, leading to improved accuracy in indoor positioning estimates.The rest of the paper is organized as follows: Related works are presented in Section 2. Section 3 describes fingerprinting localization framework and its problem formulation.Experimental results, discussions, and evaluation metrics are presented in Section 4. Conclusions and recommendations are provided in Section 5.

Related Works
This study presents an overview of IPSs and explores the application of FDA for feature extraction in this domain.Indoor positioning (IP) has become an increasingly important research area, with applications in smart buildings, emergency response, and location-based services [58][59][60].The paper discusses two main approaches for Wi-Fi-RSSbased IPSs: path loss model-based and fingerprinting.The path loss model-based approach relies on the relationship between RSS and distance to determine the target object's location [61][62][63].However, due to the complex indoor environment, including factors such as non-line-of-sight (NLOS) propagation, multipath effects, and a dynamic environment, this distance-based approach cannot provide accurate geometric parameters [64,65].In contrast, the fingerprint-based approach has gained significant attention in indoor localization as it does not rely on estimating geometric parameters and performs better than the distance-based approach in complex indoor environments [66,67].Not only that, but also Wi-Fi-based RSS fingerprinting has gained popularity due to its advantages, including universal availability, privacy protection, and low implementation cost [22][23][24].It is The rest of the paper is organized as follows: Related works are presented in Section 2. Section 3 describes fingerprinting localization framework and its problem formulation.Experimental results, discussions, and evaluation metrics are presented in Section 4. Conclusions and recommendations are provided in Section 5.

Related Works
This study presents an overview of IPSs and explores the application of FDA for feature extraction in this domain.Indoor positioning (IP) has become an increasingly important research area, with applications in smart buildings, emergency response, and locationbased services [58][59][60].The paper discusses two main approaches for Wi-Fi-RSS-based IPSs: path loss model-based and fingerprinting.The path loss model-based approach relies on the relationship between RSS and distance to determine the target object's location [61][62][63].However, due to the complex indoor environment, including factors such as non-line-ofsight (NLOS) propagation, multipath effects, and a dynamic environment, this distancebased approach cannot provide accurate geometric parameters [64,65].In contrast, the fingerprint-based approach has gained significant attention in indoor localization as it does not rely on estimating geometric parameters and performs better than the distancebased approach in complex indoor environments [66,67].Not only that, but also Wi-Fibased RSS fingerprinting has gained popularity due to its advantages, including universal availability, privacy protection, and low implementation cost [22][23][24].It is extensively used for communication purposes and holds great importance for terminals and sensor networks in IoT applications [1][2][3][4].This approach involves two main phases: first, RSS fingerprints are collected from each Wi-Fi access point (AP) at multiple locations to create a radio map or fingerprint database, and then a predictive model is trained to establish the relationship between the signal and location [35][36][37].
However, this method has faced criticism for the high cost of creating wireless maps, which can be very expensive [44][45][46].Attempts have been made to reduce the effort and time required for radio map generation, such as crowdsourcing [45] and simultaneous Wi-Fi localization and mapping, but these approaches have their own limitations [68].Moreover, existing Wi-Fi networks are primarily designed for communication rather than positioning, and there is a need for robust and efficient algorithms to enhance their positioning performance.Nevertheless, the fingerprint-based approach still faces challenges in achieving robust and efficient positioning performance in dynamic indoor environments.Researchers have proposed several methods to deal with the dynamic indoor environment, which leads to low localization accuracy due to variations in fingerprint patterns over time.These methods can be classified into four groups: (i) probabilistic methods [69,70], (ii) machine learning methods [71,72], (iii) exploiting the quality of fingerprints of various signal features [73][74][75][76], and (iv) a fused group of fingerprints [77,78].Although these methods have improved the location accuracy, they still suffer from fluctuations in the signal distribution and are not robust in indoor dynamic environments.Hybrid location systems (HPS) have been proposed to solve the single location problem, and the results demonstrate better location performance than the single system [79,80].However, a hybrid base station falls outside the scope of this work and is not economically feasible.Additionally, computational complexity is a serious problem for hybrid systems based on indoor positioning.
To address the computational complexity of IPSs, various studies have used the application of PCA for data preprocessing aimed at reducing the dimension and noises of the original dataset [81][82][83].These methods require intensive training dataset calibration overhead.However, the distribution of signal measurements for both training and testing did not account for the long-term effects of signal variations in the complex indoor environment.Moreover, studies have proposed LDA-based algorithms to eliminate the interference of environment and noise, generating a more stable and distinguishable fingerprint [84][85][86].Additionally, several indoor localization algorithms have been proposed in the literature to improve indoor location estimation based on the functional discriminant analysis [87][88][89].However, these methods have not considered the critical issues of offline fingerprint calibration overhead and have utilized CSI fingerprints, which demand extra hardware infrastructure cost compared to RSS fingerprints.Thus, the primary goal of this research study is to enhance the performance of long-term adaptive indoor localization systems that utilize RSS fingerprinting by reducing the computational complexity and resource requirements, both in terms of cost and time, through the application of transfer learning techniques in combination with several data reduction methods.

Problem Formulation and Framework
We consider that the general fingerprint-based positioning of the indoor environment scenario (FPBIP) consisting of L reference points (RPs), each reference point is indexed with a label k, (k = 0, 1, . . ., L − 1), and p detectable Wi-Fi access points (m) were available.The i th Wi-Fi signal strength of fingerprints received at the k th reference point of the m th Wi-Fi AP is a vector denoted as r (k) im and the fingerprint database can be represented as a matrix, as follows: im , . . ., r and (x l , y l ) is the corresponding coordinate to the associated location of the fingerprint.The target instances would be received during the testing phase, denoted as R (k) im ; i = 1,2, . . ., n t & m = 1, 2, . . ., p t , and the offline source domain can be represented as follows: R im , (x l , y l ) ; i = 1, 2, . . ., n s & m = 1, 2, . . ., p s called the labeled source data.n t and n s represent the numbers of measurements for the target and source data instances, respectively.Thus, the offline fingerprint database can explicitly be represented as follows: Sensors 2024, 24, 5665 The research paper introduces FPBIP, a method for creating a radio map used for online positioning.The offline phase involves generating reference points by capturing Wi-Fi APs data or wireless indoor positioning metrics along with their real-world locations during training.In the testing phase, the mobile user collects received signal strength values and sends them to a server, which uses pattern-matching algorithms to compare the measured RSS with the fingerprints in the database and determine the location.However, this fingerprint-based approach has several drawbacks.It is labor-intensive and expensive, and the radio map quickly becomes outdated due to the dynamic indoor environment.Factors such as device heterogeneity, measurement times, user and antenna orientation, and channel conditions can significantly affect positioning accuracy, leading to discrepancies between online and stored fingerprints.The limitations stem from challenges in maintaining an up-to-date fingerprint database and the high cost of using a large number of labeled samples.To address these limitations, researchers proposed hybrid location systems as an alternative, which have shown better positioning performance compared to single-system approaches.However, the computational complexity of hybrid systems poses challenges for indoor localization, and the economic feasibility of hybrid base stations is beyond the scope of this research.Figure 3 illustrates the proposed framework for adaptive long-term based on Wi-Fi RSS fingerprint indoor localization, which exploits both the knowledge from the source domains based on the mean signal received strength and target domains of heterogeneous feature spaces.The offline fingerprints of the radio map and the testing fingerprints obtained during the online phase are considered the source and target domains, respectively.In an environment that is slightly dynamic, the RSS measurements collected at a reference point X k are assumed to be normally distributed with mean µ k and variance σ Similarly, the overall measurements collected from the available Wi-Fi APs are supposed to follow a multivariate normal distribution with the overall mean µ and multidimensional covariance ∑, such that R (k) im = X ∼ N(µ, ∑) as follows:

Functional Discrimanat Analysis
The goal of FDA, also called Fisher's linear discriminant analysis (FLDA), is to discriminate different classes in low dimensional space by retaining the components containing feature values that have the best separation across classes.It is basically to identify the best projection subspace for a specific training sample set such that the projection points of similar samples are clustered in this projection subspace whereas the projection points of various sample types are dispersed.As a result, it may guarantee that the training sample set after projection has the greatest possible interclass distance and the smallest possible intraclass distance in the new subspace.LDA projects features from higher dimension to lower dimension space, and below are the steps for how LDA achieves the goal of discrimination.

Computes mean vectors of each class of dependent variable
Recall that we have L different reference points or classes with a total of n signal strength measurements received from the available Wi-Fi access points (APs).Table 1 presents the list of notations and their descriptions used in this study.We define the × s n p offline source data matrix as follows: A reference point k has k n RSS measurement samples calculated as follows:  It is noted that each feature of the RSS measurements collected at a reference point X k are assumed to be independent and identically distributed such that the joint probability density function with p dimension of features of the RSS values collected certainly from the defined region of L reference points can be given as follows: Nevertheless, the dynamic and ever-changing nature of the indoor environment poses challenges for assuming a normal distribution of received signal strengths collected from grid points.The study aimed to measure signal strength variations over time for adaptive long-term Wi-Fi indoor localization using RSS fingerprinting.To gain a comprehensive understanding of the datasets and form meaningful hypotheses, an initial data preprocessing step was carried out.This provided valuable insights and a solid foundation for the research.The analysis focused on examining the importance of all signal characteristics gathered throughout the study duration.Three techniques were employed: (i) Selection of significant signal features based on their mean values.(ii) Application of FDA and PCA to extract essential features.(iii) Leveraging positive knowledge transfer into the target domain to enhance indoor location performance.
These methods were used to address the significant concerns related to effectively modeling the problem at hand, given the complex indoor environment.To address the challenges posed by the dynamic indoor environment, the proposed algorithm leverages a new feature space derived from the offline source fingerprints as depicted in Figure 3.This derived fingerprint database was constructed during the training phase following a data preprocessing step to scrutinize the dataset and mitigate the effects of the indoor environment.This involved identifying and addressing outliers and noisy measurements.The derived source domain and the offline fingerprint database were then combined to build a new feature space.This integrated feature space is hypothesized to improve positioning performance in the target domain.Classifiers were then trained using the newly derived feature spaces, and predictions were made based on the newly received fingerprints.By constructing this enhanced feature space, the proposed algorithm aims to overcome the limitations of the original indoor environment and provide more robust and accurate indoor positioning capabilities.The ⊕ symbol used below represents the data integration.

Functional Discrimanat Analysis
The goal of FDA, also called Fisher's linear discriminant analysis (FLDA), is to discriminate different classes in low dimensional space by retaining the components containing feature values that have the best separation across classes.It is basically to identify the best projection subspace for a specific training sample set such that the projection points of similar samples are clustered in this projection subspace whereas the projection points of various sample types are dispersed.As a result, it may guarantee that the training sample set after projection has the greatest possible interclass distance and the smallest possible intraclass distance in the new subspace.LDA projects features from higher dimension to lower dimension space, and below are the steps for how LDA achieves the goal of discrimination.

Computes mean vectors of each class of dependent variable
Recall that we have L different reference points or classes with a total of n signal strength measurements received from the available Wi-Fi access points (APs).Table 1 presents the list of notations and their descriptions used in this study.We define the n × p s offline source data matrix as follows: im , . . ., r Table 1.List of notations.

R (k) im
The entire feature spaces received from the i th signal measurements of an m th feature space of the k th reference point.L # of references point.n k # of signal measurements received at k th reference point.

R
The grand/combined mean of the signal measurements of the grid point.
The i th signal measurements of an m th feature space of the k th reference point.r k The mean signal values of a reference point or a grid point.p s /p T # of sources feature spaces/#Targets feature spaces.n S /n T # of instances of the sources/Target domains.S B /S w Scatter matrix between/within classes.

X n×p
Feature spaces matrix of dimension n by p.

Y L−1×p
The y label of the L th grid points of the p th feature space.W p×L−1 The projection matrix of p th features spaces for the corresponding (L − 1) grid points.

J(W)
Determinants of the scatter matrices.ρ r i r j The correlation between feature spaces.

z (k) im
The i th standardized value of a p th feature space of the k th reference point.
The p th features spaces of the source domain data/target domain data.
A reference point k has n k RSS measurement samples calculated as follows: Sensors 2024, 24, 5665 9 of 33 The average RSS measurement values of a reference point k is given as follows: and similarly, the grand mean of all the measurements of the defined region L is given as follows: Computes within-class and between-class scatter matrices.
The scatter matrix between class and within class are given as follows: Computes eigen values and eigen vectors for the scatter matrix within class (S w ) and scatter matrix between class (S B ).

4.
Sorts the eigen values in descending order and select the top λ.

5.
Creates a new matrix containing eigenvectors that map to the λ eigenvalues.6.
Obtains the new features (i.e., linear discriminants) by taking the dot product of the data and the matrix.Below are the details for the above steps mentioned in 4 to 6. Specifically, we define the training instances matrix of a reference point k with a dimension of n k × p as r T ; m = 1, 2, . . ., p. Suppose that we have L reference points or categories instead of just only binary classes or outputs.We are now seeking (L − 1) projection [y 1 , y 2 , . . . ,y L−1 ] by means of (L − 1) projection vectors w i .The w i values can be arranged by columns into a projection matrix as W = [w 1 |w 2 |. ..|wL−1 ] such that the following is true: where Thus, if we have p feature vectors, we can stack them into one matrix as follows: where Recalling the two-classes case, the within-class scatter and between scatter were computed as follows: Sensors 2024, 24, 5665 10 of 33 Similarly, this can be generalized for the C-categories or reference points case as follows.For multiclass classification problem or L reference points case, we will measure the between-class scatter with respect to the mean of all classes as given below: where im ; and n and n k represent the number of all data and number of RSS measurements in a reference point of k.Similarly, we can define the mean vectors for the projected measurements y as r = y, where n and n k number of all data and number of data samples in reference point k.Thus, the scatter matrices of both the within class and between classes for the projected samples y are be given as follows: Recall that the binary classification case, we have expressed the scatter matrices of the projected samples in terms of those of the original samples as follows: Which still hold for multiple classification problem or L reference points case.Recall that we are looking for a projection that maximizes the ratio of between-class to within-class scatter.Since the projection is no longer a scalar (it has L − 1 dimensions), we then use the determinant of the scatter matrices to obtain a scalar objective function as follows: And we will seek the projection W * that maximizes this ratio.To find the maximum of J(W), we differentiate with respect to W and equate to zero.Recall that in the two-classes case, we solved the eigen value problem.
Similarly, for L reference points, we have L − 1 projection vectors, hence the eigen value problem can be generalized to the L-labels as follows: Thus, it can be shown that the optimal projection matrix W * is the one whose columns are the eigenvectors corresponding to the largest eigen values of the following generalized eigen value problem: where λ = J(W * ) = scalar and W * = w * 0 w * 1 . . .w * L−1 .

Principal Component Analysis
In the context of heterogeneous transfer learning for indoor positioning, two main challenges have been identified.Firstly, the assumption of independence among Wi-Fi signals received at a grid point from multiple APs can be problematic, as the RSS values may contain duplicates, leading to interference and the generation of irrelevant features or patterns in the database.Secondly, differences in the dimensions of the source feature spaces and target domains pose challenges when implementing Wi-Fi RSS fingerprint-based techniques, as the dynamic nature of RSS value spread, and the inherent heterogeneity of hardware devices make it difficult to represent signal fluctuations with a single value for a specific position.To enhance indoor positioning performance using RSS fingerprinting, the research focuses on reducing computational complexity and cost through the application of PCA [81][82][83] and FLDA using transfer learning methods [87,90].The objective is to generate new feature spaces that retain only the most significant predictors, improving model separability and indoor positioning estimates.The algorithm aims to be computationally efficient and cost-effective.
The proposed feature selection-based PCA algorithm (Algorithm 1) utilizes PCA to create a new fingerprint feature space with reduced dimensions [87].The algorithm employs data preprocessing techniques to decrease the dimensionality of RSS measurements in the offline fingerprint database based on the contribution of features to positioning performance.Features with a higher explainability variance ratio, which are more significant in discriminating the model's positioning performance, are selected and retained for further analysis.During the testing phase, the learned model is applied after the PCA data preprocessing step to infer the location of the mobile user.The feature selection-based PCA algorithm effectively addresses the challenge of high dimensionality in a predefined radio map by linearly combining features into an uncorrelated space using the training covariance matrix of the radio map.The fingerprint database is then projected into this uncorrelated space, selecting principal components based on their highest explainability variance ratio, which represents their information content.In this paper, the noise created due to the duplicated fingerprints and interdependence of APs is handled by using both the PCA (as in Algorithm 1) and correlation coefficient techniques, which could decrease the dependency of certain Wi-Fi APs or extract the most significant fingerprint features that could be used to build homogeneous feature spaces.The Wi-Fi signals received at a reference point from multiple independent APs can be defined as follows: And Recall that, the i th Wi-Fi signal strength of fingerprints received at the k th reference point of the m th Wi-Fi AP is a vector denoted as r im and the fingerprint database collected over the defined region of L reference points can be represented as a multi-dimensional matrix written as follows: The correlation coefficient of the RSS measurements of the Wi-Fi access points can be determined as follows: where cov(x m , x m+1 ) denotes the covariance of x m and x m+1 , which is given as follows: The variances of the measurements x m and x m+1 can be computed as follows: where the sample means of the CSI amplitude for the random vectors of Z and Q can be given as follows: im and r m+1 = Thus, the PCA algorithm has four major steps: (1) Standardize each RSS value as the following: (2) Calculate covariance matrix of the RSS sample measurements: (3) Eigen value decomposition of covariance matrix; (4) Obtain projection matrix.
Finally, Algorithm 1 below is a pseudocode to construct the refined source domain based on mean signal values and multi-criteria feature extractions (LDA-CA-PCA).On the other hand, the proposed algorithm comprises two primary phases.Firstly, in the training phase, RSS fingerprints are gathered from the available Wi-Fi APs at defined RPs.This entails sampling the signals from the APs at the RPs and storing the RSS measurements.Secondly, in the testing phase, the algorithm utilizes the collected fingerprints to estimate the location of a mobile device.It accomplishes this by comparing the received RSS measurements from the mobile device with the stored fingerprints in the database.Through techniques like pattern matching or machine learning algorithms, the algorithm determines the most probable location of the mobile device based on the similarity of the RSS measurements to the stored fingerprints.for m = 1, 2, . . ., p do 4.
-Check outliers and normality of the measurements using histograms and boxplot 7.
-Test heteroscedatiy of the measurements using histograms 8.
end for 10.
Generate feature spaces based on mean signal values as in Equations ( 29) and (30) 11.
Build refined source domain based on multi-criteria feature extractions 12.
Apply PCA and CA to extract features with sigifiacnt predictord as in Equations ( 24)-( 31 To tackle the challenges that have been identified in the context of heterogeneous indoor positioning, a data integration approach has been proposed to leverage the transfer of heterogeneous knowledge from the related source domain to the target domain.The objective of this approach is to improve indoor positioning performance by combining information from diverse domains.The pseudocode for the proposed knowledge transfer based on mean signal parameters and hybrid feature extractions (LDA-CA-PCA) for indoor positioning is provided in Algorithm 2. for m = 1, 2, . . ., p do 3.
end for 7.
Make selection on optimal refined features using hybrid matrices 8.
end for 15.
end for 16. end for 17.Train a classifier on R S and y S and optimize projection matrix of source samples W * 18. Estimate y T on R T by applying the trained classifier f ((R S , y S ), W * ) 19. return R S ,y S ,R T ,y T ,W *

Evaluation Metrics for Indoor Positioning Performance
The proposed algorithm for adaptive long-term Wi-Fi indoor localization was evaluated against various baseline machine learning algorithms and other identified states-ofthe-arts using extensive real-world datasets.The effectiveness of the localization system was measured using mean absolute error (MAE), which quantifies the average deviation between estimated positions and actual locations, with lower MAE values indicating higher accuracy.The indoor localization problem is often approached as a multiclassification task where different locations are regarded as labels to be classified [89].It is defined as follows: where [ xm , ŷm ] T and [x, y] T are the predicted location estimate and the true location of a client of the 2-dimensional coordinates of the m th positioning sample, respectively.And n is the total number of samples to be located in the target domain.

Experiment Setup
The study was conducted at the Universität Jaume I library in Spain, specifically on the 3rd and 5th floors, as shown in Figure 4a, covering an area of approximately 308.4 m 2 .Over a period of 25 months, a total of 620 Wi-Fi APs were deployed, as shown in Figure 4b.

Experiment Setup
The study was conducted at the Universität Jaume I library in Spain, specifica the 3rd and 5th floors, as shown in Figure 4a, covering an area of approximately 308 Over a period of 25 months, a total of 620 Wi-Fi APs were deployed, as shown in 4b.
(b) Wireless network devices.The measurements were collected using a Samsung Galaxy S3 smart smartphone (https://zenodo.org/records/1066041accessed on July 2024) by a individual.In the first month, fifteen offline databases and five online databases collected, and in the subsequent months, one offline database and five online dat The measurements were collected using a Samsung Galaxy S3 smartphone smartphone (https://zenodo.org/records/1066041accessed on 15 July 2024) by a single individual.In the first month, fifteen offline databases and five online databases were collected, and in the subsequent months, one offline database and five online databases were collected using the same smartphone.The 15 offline databases from the first month were chosen as the offline database, while the online databases from each subsequent month were used as testing samples.All of these databases can be accessed at https://zenodo.org/record/1066041#.W0 wHdfknYYJ (accessed on 21 January 2024).The collection process involved predefined positions and the collection of six fingerprints at each position.The collected measurements were uploaded to a server to create labeled training and test datasets.The collection period was divided into 25 collection months, with varying numbers of training and test datasets.In our experimental setup, we designated the four offline databases from the first month as our reference database, and we used all the online databases from each subsequent month as the testing samples to investigate fluctuations in signal features or measurements during sampling.Furthermore, we systematically selected a sole training dataset from each month for the purpose of scrutinizing the fluctuations in sample selection spanning multiple months (specifically, four months) in order to assess or estimate long-term indoor localization.Subsequently, we subjected this dataset to evaluation against five distinct testing samples to gauge the variability in signal characteristics within the samples gathered over randomly chosen months.For more detailed information, refer to reference [90].The long-term Wi-Fi fingerprinting dataset and supporting material are available on Zenodo.In Table 2, there is a detailed description of the dataset that was utilized for adaptive long-term Wi-Fi indoor localization.This dataset specifically focuses on using received signal strength as a signal feature and includes 620 features.It was collected over a period of 25 months and consists of a total of 106 grid points.

Exploring Wi-Fi RSS Distribution Characteristics
In this section, the main focus is on the preprocessing of Wi-Fi RSS measurement data in order to understand various characteristics of the dataset.These characteristics include the presence of outliers, the central values of the distribution, the spread of the data, the skewness of the distribution, and the normality of the data.These factors can potentially present significant challenges for the performance of algorithms used in indoor positioning systems.The ensuing bar plots of Figure 5 provided a comprehensive examination of the target variable distribution across diverse grid points.These visualizations serve to elucidate the frequency and prevalence of each grid point within both the training and testing datasets.An additional noteworthy observation arises from scrutinizing the distribution of signal measurements per reference points (RPs) for both training and testing labels.The discernible equality in the number of measurements taken from each grid point contributes to the establishment of balanced target labels.This, in turn, signifies a deliberate mitigation of potential adverse effects on indoor location predictions stemming from issues associated with data imbalance-a phenomenon that, fortuitously, is absent in this particular dataset.One can observe a cluster of dots instead of a visible box plot for grid points of thr and seven; it typically indicates that there may be outliers in the data for those speci grid points.The box plots below are used to visually represent the distribution of da and they include components such as the median, quartiles, and potential outliers.Wh you see a cluster of dots without a visible box, it suggests that there are many outliers the data for that specific grid point.Outliers can significantly affect the appearance of t box plot, and in extreme cases, the whiskers may be too short to show the spread of t majority of the data.We suggest conducting a detailed examination of the actual valu corresponding to grid points three and seven.We assess the validity of these values an   One can observe a cluster of dots instead of a visible box plot for grid points of thre and seven; it typically indicates that there may be outliers in the data for those specifi grid points.The box plots below are used to visually represent the distribution of dat and they include components such as the median, quartiles, and potential outliers.Whe you see a cluster of dots without a visible box, it suggests that there are many outliers the data for that specific grid point.Outliers can significantly affect the appearance of th box plot, and in extreme cases, the whiskers may be too short to show the spread of th majority of the data.We suggest conducting a detailed examination of the actual valu corresponding to grid points three and seven.We assess the validity of these values an One can observe a cluster of dots instead of a visible box plot for grid points of three and seven; it typically indicates that there may be outliers in the data for those specific grid points.The box plots below are used to visually represent the distribution of data, and they include components such as the median, quartiles, and potential outliers.When you see a cluster of dots without a visible box, it suggests that there are many outliers in the data for that specific grid point.Outliers can significantly affect the appearance of the box plot, and in extreme cases, the whiskers may be too short to show the spread of the majority of the data.We suggest conducting a detailed examination of the actual values corresponding to grid points three and seven.We assess the validity of these values and investigate for potential issues in the data, such as errors or anomalies.Furthermore, we considered adjusting the plotting parameters or exploring alternative visualization techniques to gain a clearer understanding of the data distribution for these specific grid points.However, most recorded values on these grid points are consistently set to 100, indicating no signal measurement was detected at those specific grid points.
Figure 8 explores alternative visualization techniques to better understand the data distribution for specific grid points in both the training and testing samples of one collected in month 1.The figure includes a boxplot and a dot plot representing the signal value distribution of the selected grid points, which align with each other.Additionally, the majority of recorded values at these grid points consistently show a value of 100, signifying the absence of signal measurements at those specific locations.

Comparative Analysis of Methods
In this section, we conducted a comprehensive set of real-life experiments spanning a period of 25 months.The experiments aimed to measure the temporal fluctuations in

Comparative Analysis of Methods
In this section, we conducted a comprehensive set of real-life experiments spann a period of 25 months.The experiments aimed to measure the temporal fluctuation signal strength for adaptive long-term Wi-Fi indoor localization, utilizing received sig strength fingerprint measurements.To gain a deep understanding of the datasets and g

Comparative Analysis of Methods
In this section, we conducted a comprehensive set of real-life experiments spanning a period of 25 months.The experiments aimed to measure the temporal fluctuations in signal strength for adaptive long-term Wi-Fi indoor localization, utilizing received signal strength fingerprint measurements.To gain a deep understanding of the datasets and generate meaningful hypotheses regarding adaptive Wi-Fi indoor localization, we performed initial data preprocessing.This preprocessing step allowed us to extract valuable insights and develop a solid foundation for our research.Thus, our analysis focused on four main issues: 1. Exploring the significance of all signal features collected over the time period of study: We examined whether the signal features played a crucial role in modeling the problem.To address this issue, we employed three different techniques as described in Section 3.
2. Investigating the effects of sampling signal fluctuations over time: • We examined the impact of sampling signal fluctuations on different algorithms in indoor localization scenarios.Multiple training samples were used to assess the influence of sampling fluctuations, while all collected testing samples for each month were used to evaluate algorithm robustness.• The research question regarding the vulnerability of algorithms to sampling fluctua- tions has been explored and investigated, as it is an intriguing area of research.• We demonstrated the challenges posed by the indoor environment through an analysis of multiple training samples.Additionally, we verified the dynamic nature of indoor positioning by conducting multiple sample testing.

•
By employing multiple training and testing samples, we can validate the challenges faced in indoor localization scenarios.
3. Finally, we performed a comparative analysis of the proposed algorithm for adaptive long-term Wi-Fi RSS-based indoor localization.In this study, we conducted a comprehensive comparison of our proposed algorithm, Ada-LT IP, against various machine learning algorithms commonly employed in prediction tasks.Additionally, we performed a comparative analysis with several recent robust algorithms to assess the reliability and robustness of Ada-LT IP in dynamic indoor environments.Specifically, we compared our algorithm to robust indoor positioning [90], LSTP [91], and TransLoc [92].

Exploring the Significance of Signal Features Collected over the Time Period of Study
We scrutinized the importance of Wi-Fi feature spaces implemented in the experimental area, where localization can be accomplished by studying the fingerprints containing recorded measurements at predefined positions.During a span of 25 months, a total of 620 Wi-Fi APs was deployed, as illustrated in Figure 4b.However, not all measurements collected from these Wi-Fi APs hold equal significance, and some may even be irrelevant for modeling the indoor localization problem.In light of this, we investigated the mean signal values derived from the received signal measurements obtained from each deployed feature space in the experimental area.Figure 13a,b below showcase the mean received signal values of Wi-Fi feature spaces for training and testing samples of Month_1.The analysis of these mean signal values revealed that only 77 Wi-Fi feature spaces contained actual signal measurements, while the remaining Wi-Fi APs were assigned a global constant value of 100.This constant value denotes the absence of received signal measurements, rendering those Wi-Fi APs unsuitable for device localization purposes.The height of each bar corresponds to the mean signal value of a specific feature, and the x-axis represents the feature names.By comparing the bar heights between the two graphs, differences in mean signal values can be observed.These graphs provide a visual representation of the relative magnitudes and potential variations in mean signal values among features in the training and testing datasets, aiding in the analysis and comparison of feature characteristics in the two datasets.Accordingly, 77 significant signal features have been selected for modeling adaptive long-term indoor localization based on their mean values, considering the signal values recorded in each feature.(a) Wi-Fi Fingerprint-Based Indoor Location Estimation of Targets Utilizing Origi Feature Spaces The impact of Wi-Fi received signal strength fluctuations on the adaptability of door localization was evaluated by analyzing multiple training samples collected ove four-month period.Four training samples were randomly selected, and the performan of various classifiers was assessed using five different testing samples from each mon The objective was to assess the robustness of the proposed algorithm in the presence dynamic signal variations.Tables 3 and 4    (a) Wi-Fi Fingerprint-Based Indoor Location Estimation of Targets Utilizing Original Feature Spaces The impact of Wi-Fi received signal strength fluctuations on the adaptability of indoor localization was evaluated by analyzing multiple training samples collected over a four-month period.Four training samples were randomly selected, and the performance of various classifiers was assessed using five different testing samples from each month.The objective was to assess the robustness of the proposed algorithm in the presence of dynamic signal variations.Tables 3 and 4 5 and 6 present the implementation of Wi-Fi fingerprint-based indoor location estimation techniques utilizing derived feature spaces, specifically focusing on mean signal strength values from Wi-Fi access points.Our analysis indicates that among the total Wi-Fi feature spaces, only 77 are based on actual signal measurements; the remaining access points are assigned a constant value of 100 (see Figure 13).We also conducted a comprehensive analysis of the algorithm's performance concerning localization accuracy, the impact of signal strength fluctuations, and adaptability in dynamic indoor environments.These tables provide critical insights into how signal strength variations affect the proposed algorithm's performance over time.To evaluate the algorithm's adaptability to signal strength fluctuations, we collected multiple training samples over a 25-month period.Four training samples were randomly selected, and the performance of various algorithms was assessed using five distinct testing samples from each month.This methodology enabled a robust evaluation of the proposed algorithm's capability to respond to dynamic signal variations.The results demonstrated noticeable variations in localization performance across both training and testing datasets, underscoring the dynamic nature of indoor environments.In this section, we investigated the impact of feature dimensions on the variance explainability ratio (VER) for indoor location prediction.We conducted the analysis using multiple training samples from the initial month and randomly selected five testing samples to examine the contribution of significant features.Specifically, we examined principal components at various levels or percentages, including PC-80, PC-85, PC-90, PC-95, P-99, and Ref. (which serves as a baseline representing 50% variation in the model, denoted as P-50).Table 7 provides a detailed overview of how feature spaces affect the variance accounted for in Wi-Fi fingerprint-based indoor location estimation for both training and testing sample datasets.We observed that the contributions of feature spaces varied across different testing samples from various months.For example, in the first month's training sample (#1), P-50 indicated that eight principal components were sufficient for location estimation.However, the corresponding testing sample #1 required 19 principal components for accurate indoor localization.Similar comparisons were made for each proportion of contribution in modeling the localization problem.For instance, P-95 required 50 principal components for the training sample, while the testing samples required 72, 67, 61, 65, and 99 principal components, respectively.These discrepancies in the number of principal components can be attributed to signal fluctuations arising from the indoor environment and the sampling process.Table 8 provides a comprehensive analysis of the impact of feature spaces on the variance account in Wi-Fi fingerprint-based indoor location estimation.To evaluate this, we utilized several training samples collected over four consecutive months, along with corresponding testing samples from each month.The aim was to assess the contribution of significant features with higher variance explainability ratios by examining principal components (PCs) at various levels or percentages, including PC-80, PC-85, PC-90, PC-95, P-99, and Ref.The detailed results presented in Table 8 outline the effect of feature spaces on the variance account for both training and testing sample datasets in Wi-Fi fingerprintbased indoor location estimation.For instance, P-99 required 67 principal components for the training sample of month #1 (M-1), while the testing samples required 110, 100, 113, and 97 principal components for each month considered (M-1, M-2, M-3, and M-4).The variations in the required number of principal components therefore can be attributed to the influence of signal fluctuations caused by the dynamic nature of the indoor environment and the inherent variability in the sampling process.Tables 9 and 10 display the results of the Wi-Fi fingerprint-based indoor location estimation of targets using extracted feature spaces based on principal components.In our study, we examined the impact of feature dimensions on the VER for predicting indoor locations.To conduct this analysis, we utilized multiple training samples from different months and randomly selected five testing samples from each month to assess the contribution of significant features.Specifically, we investigated principal components at different levels or percentages, including PC-80, PC-85, PC-90, PC-95, P-99, and Ref.We evaluated the performance of the proposed algorithm in each case, considering the number of principal components used.Remarkably, our findings consistently indicated that the proposed algorithm outperformed other algorithms across all tested scenarios.It exhibited the best fit in terms of accurately estimating indoor locations, regardless of the specific number of principal components considered.The observed variations in the required number of principal components can be attributed to signal fluctuations originating from the dynamic indoor environment and the inherent variability in the sampling process.These factors influence the optimal selection of principal components needed for effective indoor location estimation.This paper presents a comparative analysis of two distinct approaches to Wi-Fi fingerprint-based indoor location estimation: original feature spaces and derived feature spaces utilizing various data reduction techniques: mean signal strength values, PCA, and FDA.
1. Original Feature Spaces: The evaluation of localization performance across multiple training samples collected over various months revealed significant variations due to temporal signal strength fluctuations.The proposed algorithm, Ada-LT IP, demonstrated robustness in adapting to these dynamic variations, as shown in Tables 3 and 4.
2. Derived Feature Spaces: Utilizing mean signal strength values (as indicated in Tables 5-8) and data reduction techniques such as PCA (as indicated in Tables 11 and 12) and FDA (13 and 14).The analysis identified that only 77 of the Wi-Fi feature spaces are significant with actual signal measurements.The proposed algorithm, Ada-LT IP, demonstrated superior localization accuracy compared to the other algorithms evaluated in the study.Additionally, it proved to be robust in dynamic indoor environments.The study also highlighted discrepancies in the number of principal components required for training versus testing samples, reflecting the influence of signal fluctuations on model performance.Moreover, we performed a comparative analysis with several recent robust algorithms to assess the reliability and robustness of Ada-LT IP in dynamic indoor environments.Despite the inflated error of localization observed in testing and training samples for 3 and 4 due to the inherent significant signal sampling fluctuations, the proposed algorithm outperforms the state-of-the-art performance achieved in robust indoor positioning [90] (all the proposed methods reported their accuracy to be in the range of 2-5 m, LSTP [91] (reported MLE = 2.18 m), and TransLoc [92] (MLE = 2.23 m).The average mean absolute errors of the proposed Ada-LT IP algorithm are significantly lower compared to the stateof-the-art described in the study (as seen in Tables 11 and 12).Along with this, previous authors have reported that their algorithm (LSTP) outperforms several other algorithms, such as ISMA [93], UFL-ECLS [94], KNN, TCA [95], JGSA [96], CDLS [97], and LPJT [98].This validation underscores the potential of Ada-LT IP in enhancing indoor positioning accuracy.
3. Adaptability and Feature Extraction: The proposed framework emphasized the importance of feature extraction and adaptability.The use of data reduction techniques illustrated how different feature dimensions affected variance explainability ratios.Intensive experimental results demonstrate that the proposed algorithm maintained superior performance across various training and testing conditions, effectively addressing the challenges posed by a dynamic indoor environment.Thus, the Ada-LT IP algorithm consistently exhibits better adaptability and accuracy in response to signal variability, underscoring the significance of both feature selection and environmental dynamics in effective indoor positioning.

Comparative Analysis of Computational Complexity in Algorithmic Performance
In this study, we investigate the computational complexity of a novel algorithm designed for adaptive long-term Wi-Fi fingerprint-based indoor location estimation.Our analysis emphasizes the effectiveness of various data reduction techniques, including mean signal values, functional discriminant analysis, correlation analysis, and principal component analysis, in extracting feature spaces.The algorithms were executed on a laptop equipped with an AMD Ryzen 3 3200U CPU and 16 GB of RAM.The complexity assessment focuses on both time and space dimensions.Performance comparisons of different algorithms, as shown in Figure 15a, reveal that training times are consistently longer than prediction times across all algorithms, indicating that training requires more computational resources.Notably, despite the higher time complexity, our proposed algorithm achieves superior accuracy in indoor localization compared to others, including GMM and MLP, which exhibit even greater computational demands.Further analysis in Figure 15b contrasts the performance of algorithms using derived feature spaces, which were reduced to 77 dimensions based on mean signal values.This reduction significantly lessens time complexity for all algorithms, enhancing efficiency.Although the proposed algorithm has a slightly longer prediction time than some alternatives, its accuracy justifies this trade-off, establishing it as the optimal choice for indoor localization tasks.Overall, the findings underscore the potential of utilizing reduced-dimensional feature spaces to improve both computational efficiency and accuracy in indoor localization systems.

Conclusions
This research paper analyzes a comprehensive set of real-life experiments conducted over 25 months to investigate temporal fluctuations in signal strength for adaptive longterm Wi-Fi indoor positioning.The study focuses on the significance of signal features, the effects of sampling fluctuations, and the assessment of dynamic indoor localization using mean absolute error metrics.Techniques such as feature selection based on mean signal values and data reduction methods like PCA and FDA were employed to identify essential features with higher variance explainability ratios at various thresholds (PC-80, PC-85, PC-90, PC-95, P-99, and Ref).Additionally, the research leverages positive knowledge transfer to enhance indoor location performance and evaluates the impact of sampling signal fluctuations on different localization algorithms.Multiple training samples were used to assess these fluctuations, while monthly testing samples evaluated algorithm robustness.The challenges posed by the indoor environment were demonstrated through extensive testing, confirming the dynamic nature of indoor positioning.The proposed Ada-LT IP algorithm integrates data reduction techniques and transfer learning to address offline fingerprint calibration challenges, showing superior accuracy compared to state-of-the-art fingerprintbased positioning methods.The study also addresses multicollinearity through covariance analysis and compares various feature extraction methods.Furthermore, the analysis of computational complexity reveals that utilizing derived feature spaces significantly reduces time complexity across algorithms, enhancing overall efficiency.The findings highlight the potential of Ada-LT IP in improving indoor positioning accuracy and efficiency while tackling challenges related to offline fingerprint calibration and computational complexity.

Figure 1 .
Figure 1.Multipath received signal effect of indoor environment scenario.

Figure 1 .
Figure 1.Multipath received signal effect of indoor environment scenario.

( 2 )
We examined the impact of sampling signal fluctuations on different algorithms in indoor localization scenarios.Multiple training samples were used to assess the influence of sampling fluctuations, while all collected testing samples for each month were used to evaluate algorithm robustness.(3)We applied covariance analysis (CA) to reduce the multicollinearity problem of the various RSS values collected at a reference point (RP), aiming to minimize computational complexity.(4) We compare the performance of different feature extraction methods, namely mean signal values, principal component analysis (PCA), and linear discriminant analysis (LDA/FDA), for adaptive LT Wi-Fi IP.We evaluate the effectiveness of these methods based on the achieved metrics and also investigate the hybrid effect of combining features extracted from multiple methods.

Figure 2 .
Figure 2. Signal distribution comparison between training and testing sample.

Figure 2 .
Figure 2. Signal distribution comparison between training and testing sample.

Figure 3 .
Figure 3. Proposed framework for Ada-LT Wi-Fi RSS fingerprint indoor localization.

Figure 3 .
Figure 3. Proposed framework for Ada-LT Wi-Fi RSS fingerprint indoor localization.
the predicted location estimate and the true lo of a client of the 2-dimensional coordinates of the th m positioning sample, respec And n is the total number of samples to be located in the target domain.

Figure 4 .
Figure 4. Library environment: (a) a picture of the third-floor collection area that sho bookshelves and the stairs that connect the two floors; and (b) the network devices close collection area.The red asterisks represent the third floor's devices, and blue asterisks represe floor's devices [90].

Figure 4 .
Figure 4. Library environment: (a) a picture of the third-floor collection area that shows the bookshelves and the stairs that connect the two floors; and (b) the network devices close to the collection area.The red asterisks represent the third floor's devices, and blue asterisks represent fifth floor's devices [90].

Figure 5 .
Figure 5. Comprehensive examination of target variable distribution of measurements in both datasets.The box plots of Figures 6 and 7 below provide a visual representation of the distribution of the signal values for the selected grid points in both the training (X_train) and testing (X_test) datasets.The box plots depict signal values across grid points, with the xaxis representing grid points and the y-axis indicating signal values' range and spread.Position on the y-axis reflects signal values' magnitude.Box plots aid in outlier identification, data spread assessment, and signal value distribution comparison.Examining these plots provides insights into signal characteristics and variations within the selected grid points in training and testing datasets.Each box plot represents one grid point, and the box itself represents the interquartile range (IQR) of the data, which encompasses the middle 50% of the values.The line inside the box represents the median value, and the lines extending from the box (whiskers) represent the range of the data within a certain threshold.Any data points outside this threshold are considered outliers and are plotted individually as points.By examining the box plots, you can compare the distribution of signal values between the training and testing datasets.Since the box plots for the same grid point of 2, 4, 5, and 6 in both datasets of (month 1 and testing sample 1) have similar shapes and positions, it suggests that the distribution of signal values for this grid point is consistent across the datasets.On the other hand, the box plots differ significantly for 1, 8, 9, and 10, which indicates differences in the signal value distribution between the datasets.

Figure 5 .
Figure 5. Comprehensive examination of target variable distribution of measurements in both datasets.The box plots of Figures 6 and 7 below provide a visual representation of the distribution of the signal values for the selected grid points in both the training (X_train) and testing (X_test) datasets.The box plots depict signal values across grid points, with the x-axis representing grid points and the y-axis indicating signal values' range and spread.Position on the y-axis reflects signal values' magnitude.Box plots aid in outlier identification, data spread assessment, and signal value distribution comparison.Examining these plots provides insights into signal characteristics and variations within the selected grid points in training and testing datasets.Each box plot represents one grid point, and the box itself represents the interquartile range (IQR) of the data, which encompasses the middle 50% of the values.The line inside the box represents the median value, and the lines extending from the box (whiskers) represent the range of the data within a certain threshold.Any data points outside this threshold are considered outliers and are plotted individually as points.By examining the box plots, you can compare the distribution of signal values between the training and testing datasets.Since the box plots for the same grid point of 2, 4, 5, and 6 in both datasets of (month 1 and testing sample 1) have similar shapes and positions, it suggests that the distribution of signal values for this grid point is consistent across the datasets.On the other hand, the box plots differ significantly for 1, 8, 9, and 10, which indicates differences in the signal value distribution between the datasets.
old.Any data points outside this threshold are considered outliers and are plotted ind vidually as points.By examining the box plots, you can compare the distribution of sign values between the training and testing datasets.Since the box plots for the same gr point of 2, 4, 5, and 6 in both datasets of (month 1 and testing sample 1) have similar shap and positions, it suggests that the distribution of signal values for this grid point is co sistent across the datasets.On the other hand, the box plots differ significantly for 1, 8, and 10, which indicates differences in the signal value distribution between the dataset (a) Training sample 1.(b) Training sample 2.

Figure 6 .
Figure 6.Box-plot distribution of signals for selected GPs of training samples from month 1.

Figure 7 .
Figure 7. Box-plot distribution of signals for selected GPs of testing samples from month 1.

Figure 6 .
Figure 6.Box-plot distribution of signals for selected GPs of training samples from month 1.

Figure 6 .
Figure 6.Box-plot distribution of signals for selected GPs of training samples from month 1.

Figure 7 .
Figure 7. Box-plot distribution of signals for selected GPs of testing samples from month 1.

Figure 7 .
Figure 7. Box-plot distribution of signals for selected GPs of testing samples from month 1.

Sensors 2024, 24 , 5665 1 Figure 8
Figure 8 explores alternative visualization techniques to better understand th distribution for specific grid points in both the training and testing samples of o lected in month 1.The figure includes a boxplot and a dot plot representing the value distribution of the selected grid points, which align with each other.Additi the majority of recorded values at these grid points consistently show a value of 10 nifying the absence of signal measurements at those specific locations.

Figure 8 .
Figure 8. Signal value distribution of selected grid points of both samples from month 1.

Figure 9
Figure 9 illustrates the distribution of measurements for the unique values of 'y and 'y_test' in the training and testing samples of month 1.By comparing the box p 'y_train' and 'y_test', we can visually analyze and compare how the signal values a tributed for each category in the training and testing datasets.The box plots show t signal measurements collected at a specific grid point are diverse and cover a wide of received signal strength.Additionally, the box plots indicate the presence of values for almost all target labels.

Figure 8 .
Figure 8. Signal value distribution of selected grid points of both samples from month 1.

Figure 9
Figure 9 illustrates the distribution of measurements for the unique values of 'y_train' and 'y_test' in the training and testing samples of month 1.By comparing the box plots of 'y_train' and 'y_test', we can visually analyze and compare how the signal values are distributed for each category in the training and testing datasets.The box plots show that the signal measurements collected at a specific grid point are diverse and cover a wide range of received signal strength.Additionally, the box plots indicate the presence of outlier values for almost all target labels.

Figure 9
Figure 9 illustrates the distribution of measurements for the unique values of 'y_train' and 'y_test' in the training and testing samples of month 1.By comparing the box plots of 'y_train' and 'y_test', we can visually analyze and compare how the signal values are distributed for each category in the training and testing datasets.The box plots show that the signal measurements collected at a specific grid point are diverse and cover a wide range of received signal strength.Additionally, the box plots indicate the presence of outlier values for almost all target labels.

Figure 9 .
Figure 9. Distribution of received signal values by unique GPs: training vs. testing data.Figure 9. Distribution of received signal values by unique GPs: training vs. testing data.

Figure 9 .
Figure 9. Distribution of received signal values by unique GPs: training vs. testing data.Figure 9. Distribution of received signal values by unique GPs: training vs. testing data.

Figure 10 35 Figure 10
Figure 10 analyzes and visualizes the distribution of signal values for different grid points in the X_train and X_test datasets.It employs the Shapiro-Wilk test to assess whether the signal values follow a normal distribution.The distribution is represented through histograms, while the PDFs fit normal distribution curves to the data.The Shapiro-Wilk test is conducted on the signal values of each grid point in both datasets, comparing the resulting p-values to a significance level of 0.05.The normality status ("Normally Distributed" or "Not Normally Distributed") is indicated alongside each plot.The interpretation involves examining the histograms, PDFs, and normality test results to gain insights into the shape and characteristics of the signal value distributions.The findings suggest that the specified grid points in the training and testing datasets do not exhibit a normal distribution.This has implications for adaptive long-term indoor localization, as the diverse distribution of signal values poses a challenge for establishing a universal model.Traditional localization algorithms assuming normality may not be effective in this context.Therefore, robust and adaptive techniques, such as machine learning algorithms, are needed to handle the diverse signal value distribution and improve the accuracy of indoor localization.

Figure 10 .
Figure 10.Signal value distributions and normality test results for GPs in X_train and X_test.

Figure 11
Figure 11 illustrates the signal distribution comparison between training and testing sample 3 from month 1.It compares the signal distribution of selected features in the training and testing datasets.The Mann-Whitney U test confirms a significant difference between the distributions, indicating the dynamic nature of the indoor environment.This finding suggests that models trained on the training dataset may face challenges when

Figure 10 .
Figure 10.Signal value distributions and normality test results for GPs in X_train and X_test.

Figure 11 35 Figure 11 .
Figure 11 illustrates the signal distribution comparison between training and testing sample 3 from month 1.It compares the signal distribution of selected features in the training and testing datasets.The Mann-Whitney U test confirms a significant difference between the distributions, indicating the dynamic nature of the indoor environment.This finding suggests that models trained on the training dataset may face challenges when applied to the test dataset.To improve performance, it is important to develop adaptable models that can account for variations in the indoor environment.Sensors 2024, 24, 5665 21 of 35

Figure 12
Figure 12 illustrates the distribution of received signal strength for a specific grid point in both the training set ('X_train') and the test set ('X_test').The histograms provide a visual representation of the frequency or count of different received signal strength values within specified bins.By comparing the histograms, we can observe the shape, concentration, and position of the bars, which correspond to the frequency of received signal strength values falling within each bin.Additionally, a statistical test (the two-sample Kolmogorov-Smirnov test) was performed to determine whether the distributions in the training and test sets are likely the same or different.Accordingly, the distributions of the received signal strength in training and testing datasets are likely different as the p-value obtained from the test is smaller than 0.05.This result suggests that the received signal strength distribution may have changed or varied between the training and test sets.Thus, the figure and the accompanying statistical test provided insights into the diversity of the received signal strength distribution for the selected grid point between the training and test sets.

Figure 12 .
Figure 12.Signal distribution comparison of Grid point 0 for both training and testing samples.

Figure 11 .
Figure 11.Signal Distribution Comparison between training and testing sample 3, Month 1.

Figure 12
Figure 12 illustrates the distribution of received signal strength for a specific grid point in both the training set ('X_train') and the test set ('X_test').The histograms provide a visual representation of the frequency or count of different received signal strength values within specified bins.By comparing the histograms, we can observe the shape, concentration, and position of the bars, which correspond to the frequency of received signal strength values falling within each bin.Additionally, a statistical test (the two-sample Kolmogorov-Smirnov test) was performed to determine whether the distributions in the training and test sets are likely the same or different.Accordingly, the distributions of the received signal strength in training and testing datasets are likely different as the p-value obtained from the test is smaller than 0.05.This result suggests that the received signal strength distribution may have changed or varied between the training and test sets.Thus, the figure and the accompanying statistical test provided insights into the diversity of the received signal strength distribution for the selected grid point between the training and test sets.

Figure 11 .
Figure 11.Signal Distribution Comparison between training and testing sample 3, Month 1.

Figure 12
Figure 12 illustrates the distribution of received signal strength for a specific g point in both the training set ('X_train') and the test set ('X_test').The histograms prov a visual representation of the frequency or count of different received signal strength ues within specified bins.By comparing the histograms, we can observe the shape, c centration, and position of the bars, which correspond to the frequency of received sig strength values falling within each bin.Additionally, a statistical test (the two-sample K mogorov-Smirnov test) was performed to determine whether the distributions in training and test sets are likely the same or different.Accordingly, the distributions of received signal strength in training and testing datasets are likely different as the p-va obtained from the test is smaller than 0.05.This result suggests that the received sig strength distribution may have changed or varied between the training and test sets.Th the figure and the accompanying statistical test provided insights into the diversity of received signal strength distribution for the selected grid point between the training a test sets.

Figure 12 .
Figure 12.Signal distribution comparison of Grid point 0 for both training and testing samples.

Figure 12 .
Figure 12.Signal distribution comparison of Grid point 0 for both training and testing samples.

Figure 13 .
Figure 13.Mean received signal values of Wi-Fi feature spaces for both samples from month 1.
present the performance of the algorithms acr different training and testing samples, providing insights into the effects of signal streng fluctuations and the adaptability of the proposed algorithm for long-term Wi-Fi fing print-based indoor localization.The results indicate significant variations in localizat performance due to the dynamic nature of both training and testing datasets.

Figure 13 .
Figure 13.Mean received signal values of Wi-Fi feature spaces for both samples from month 1.
present the performance of the algorithms across different training and testing samples, providing insights into the effects of signal strength fluctuations and the adaptability of the proposed algorithm for long-term Wi-Fi fingerprint-based indoor localization.The results indicate significant variations in localization performance due to the dynamic nature of both training and testing datasets.

Figure 14
Figure 14 illustrates the comparison of MAE and VER of principal components for different algorithms on testing samples.The figure provides a visual representation of the performance of various algorithms in terms of both MAE and the variance explained by the principal components.The x-axis represents the different algorithms being compared, while the y-axis represents the values of MAE and the variance-explained ratio.The figure consists of two sets of bars or lines, one for each metric, corresponding to the different algorithms.The bars or lines for MAE provide a comparison of the absolute error between the predicted values and the actual values for each algorithm on the testing samples.Lower values indicate better performance, as they reflect a smaller discrepancy between the predicted and actual values.The lines or bars representing the variance-explained ratio of principal components illustrate the proportion of variance in the data explained by the principal components.Higher values indicate a greater ability of the principal components to capture the underlying patterns and variability in the data.By visually comparing the bars or lines for each algorithm, one can assess the relative performance in terms of MAE and the ability of the principal components to explain the variance.This comparison helps in understanding the trade-offs between prediction accuracy and the contribution of principal components in capturing the data's variability for each algorithm on the testing samples.It appears that the proposed algorithm actually has the smallest MAE in both Training Sample # Month 1 and Training Sample # Month 2. In Training Sample # Month 1, the proposed algorithm demonstrates the lowest MAE values across all principal components compared to the other algorithms, including DT, KNN, SVC, LR, RF, GMM, and MLP.Similarly, in Training Sample # Month 2, the proposed algorithm exhibits

Figure 15 .
Figure 15.Comparative analysis of algorithm performance based on computational time. 5. Conclusions This research paper analyzes a comprehensive set of real-life experiments conducted over 25 months to investigate temporal fluctuations in signal strength for adaptive longterm Wi-Fi indoor positioning.The study focuses on the significance of signal features, the effects of sampling fluctuations, and the assessment of dynamic indoor localization

Figure 15 .
Figure 15.Comparative analysis of algorithm performance based on computational time.

Table 2 .
Description of the dataset used for adaptive long-term Wi-Fi indoor localization.

Table 3 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing original feat spaces of size 620 with dynamic temporal signal variations(month 1 and 2).

Table 3 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing original feature spaces of size 620 with dynamic temporal signal variations (month 1 and 2).

Table 4 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing original feature spaces of size 620 with dynamic temporal signal variations (month 3 and 4).

Table 5 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing derived feature spaces based on mean signal strength received values (month 1 and 2.).

Table 6 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing derived feature spaces based on mean signal strength received values (month 3 and 4).

Table 7 .
Effect of feature spaces on the variance account of Wi-Fi fingerprint-based indoor location estimation of both training and testing samples datasets (month 1 only).

Table 8 .
Effect of feature spaces on the variance account of Wi-Fi fingerprint-based indoor location estimation of both training and testing sample datasets (different months).

Table 9 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing extracted feature spaces based on principal components (month 2).

Table 10 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing extracted feature spaces based on principal components (month 4).

Table 11 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing extracted feature spaces based on functional discriminants (month 2).

Table 12 .
Wi-Fi fingerprint-based indoor location estimation of targets utilizing extracted feature spaces based on functional discriminants (month 4).