Channel Phase Processing in Wireless Networks for Human Activity Recognition

The phase of the channel state information (CSI) is underutilized as a source of information in wireless sensing due to its sensitivity to synchronization errors of the signal reception. A linear transformation of the phase is commonly applied to correct linear offsets and, in a few cases, some ﬁltering in time or frequency is carried out to smooth the data. This paper presents a novel processing method of the CSI phase to improve the accuracy of human activity recognition (HAR) in indoor environments. This new method, coined Time Smoothing and Frequency Rebuild (TSFR), consists of performing a CSI phase sanitization method to remove phase impairments based on a linear regression transformation method, then a time domain ﬁltering stage with a Savitzy-Golay (SG) ﬁlter for denoising purposes and, ﬁnally, the phase is rebuilt, eliminating distortions in frequency caused by SG ﬁltering. The TSFR method has been tested on ﬁve datasets obtained from experimental measurements, using three different deep learning algorithms, and compared against ﬁve other types of CSI phase processing. The results show an accuracy improvement using TSFR in all the cases. Concretely, accuracy performance higher than 90% in most of the studied scenarios has been achieved with the proposed solution. In few-shot learning strategies, TSFR outperforms the state-of-the-art performance from 35% to 85%.


INTRODUCTION
W IRELESS Sensing has been a rapidly growing field of study within the Internet of Things in recent years. It involves measuring wireless channel characteristics using existing wireless networks, such as WiFi networks, to sense environmental changes in the surrounding area of the network. Human activity recognition (HAR) in indoor environments is one of the main fields of application of wireless sensing.
The pervasive deployment of wireless networks worldwide and the fact that wireless sensing can be considered a privacy-preserving solution make this technology a promising alternative to other sensing solutions such as video surveillance with depth cameras or wearables [1], [2], [3], [4], [5]. Those other sensing methods present some drawbacks; for instance, cameras can compromise user privacy and, in the case of wearables, users should carry the devices on their person to be monitored. In wireless systems based on orthogonal frequency-division multiplexing (OFDM), such as WiFi, the Received Signal Strength Indicator (RSSI) and the Channel State Information (CSI) are used for wireless sensing. RSSI suffers from significant uncertainties due to the signal fluctuations under actual conditions, such as scattering, degradation, and sensitivity to noise [6]. Therefore, in recent years, CSI data have been widely used due to its major robustness against noise and other impairments of the signal reception. A time series of CSI measurements show how wireless signals propagate through objects and humans in the time, frequency, and spatial domains and can be used for different monitoring applications. Due to this, human activity recognition is an important field of wireless sensing, ranging in several areas such as crowd counting [7] [8], people localization [9], [10], [11], [12], vital sign detection [13], [14], [15], and gesture recognition [16], [17], [18], [19], [20], [21], [22], [23], [24]. Likewise, CSI-based sensing can also be employed in other applications, such as electrical device classification based on the effect of the impulsive noise in the received signals [25]. In addition, it is worth noticing that IEEE 802.11 has recently approved a new task group named IEEE 802.11bf to accommodate sensing operations [26] into the WiFi standards.
CSI measures the channel frequency response (CFR) of a wireless communication link based on OFDM. This information is given in the form of amplitude and phase of the propagation channel for each subcarrier in an OFDM symbol. While the CSI amplitude provides a reasonably accurate estimation of the CFR amplitude, the phase contains uncertainties that make it challenging to use in many applications and theoretical developments of HAR. Correcting these phase uncertainties in frequency and time domains is a complex task, so many proposals in wireless sensing choose to work exclusively with amplitude [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19].
However, some authors have developed methods to perform channel phase estimation of WiFi CSI for indoor monitoring. A wide variety of these techniques perform a linear transformation (LT) of the phase to correct the linear impairments caused by synchronization issues. In this regard, LT is found in [27], where CSI is used to detect the position of people within different rooms, obtaining reasonable results. Authors in [28] derive meaningful phase information by employing LT on the raw CSI to eliminate the significant random noise in the frequency domain. Outlier filtering is applied to shift out biased observations. Extracting various statistical features, such as variance, mean, and distribution distance, they obtain an accuracy of 90% for human motion when using three antennas. In [29], authors use LT method for correcting the phase and then employ a deep network with three hidden layers to train the calibrated phases. Their results for detecting human positions in two rooms have about a 20% of error in distance. Also, the work in [30] uses LT to calibrate CSI phases, then an algorithm to extract different features is used, and last, a Deep Neural Network (DNN) classifies among three different human activities inside a car. In [31], phase LT is performed before using the difference between adjacent subcarriers to train a backpropagation neural network with fingerprint data. Recently [32] constructed the phase difference matrix expanded by the mean and standard deviation of the phase difference as a feature matrix after the LT method. Then Savitzky-Golay filter is performed on the raw CSI phase information. More recently, [33] introduces TransferSense, a one-time, environment-independent WiFi sensing method based on deep learning that converts RF sensing tasks into image classification and uses amplitude and phase data corrected with the LT method.
In addition to the LT method, other variants try to correct the errors of the estimated CSI phase, including non-linear errors. In [34], a similar linear phase calibration method is developed as an extension of the LT processing to multiple antennas, using the frequency difference between subcarriers to estimate the phase. In [35], the authors correct linear errors in phase, assuming that the CFR for one specific frequency should be the same even when measured in different bands. Then, they determine the time and frequency offsets in each band by matching the terms which define the CFR in each band. In [36], the authors estimate time and frequency phase offsets separately. For time offsets, they assume that the channel impulse response is a linear combination of periodic functions whose period varies smoothly from sample to sample and try to correct the jumps observed at the power delay profile. For frequency offsets, the authors prove that the phase of the signal at the receiver follows a normal distribution to obtain an average value of the phase for all subcarriers in each packet or symbol. In [37], the authors consider a multipath propagation model for each CSI sample, using the most robust path as a reference to correct for time offsets in the remaining paths. They define each CSI sample as a product between the contributions that depend on the subcarriers index and a vector representing the independent terms from multipath. Then, for each subcarrier, they calculate the terms of that product and multiply it by the conjugate of the one with the strongest path to eliminate constant offsets in frequency.
After correcting the phase of the CSI, some works employ filtering to remove noise from the signal. One of the most recently used is the Savitzky-Golay (SG) filter, since it allows data smoothing with a reduced distortion of the signal tendency. This filter has been applied for wireless sensing in the frequency domain [32] and in time domain [38], [39].
In our previous work [40], a channel phase calibration method was presented based on a linear regression of the CSI phase. In addition, time smoothing of the phase was carried out through a SG filter, and finally, an algorithm was proposed to correct phase gaps in frequency. The calibration method was tested using the power profile of simulated wireless channels. Based on the previous work in [40], this manuscript presents an improved and extended method of phase processing for HAR in wireless sensing. Additionally, to validate the proposed method, a comprehensive analysis of the proposal is performed over five different datasets of experimental measurements of HAR, using three different neural networks and comparing it with five other types of phase processing. Considering the above, this paper focuses on channel phase processing to improve the accuracy of HAR classification in indoor environments with OFDM-based wireless signals. The contributions of this manuscript to the current state of the art are the following: 1) We propose a novel phase processing method of CSI, coined Time Smoothing and Frequency Rebuild (TSFR), to be used for HAR. It consists, first, of an improved model for channel phase sanitization that appeared on [40], adjusting and removing some parameters from the previous work. In addition, a new algorithm has been developed to smooth the phase in time domain and correct discontinuities generated in frequency after filtering. 2) Two new CSI-based datasets with real measurements have been generated for counting people and position localization in indoor environments. 3) Two regular deep neural networks have been designed for CSI-based HAR: a fully connected network with four hidden layers and one dropout layer, and a convolutional network with three convolutional layers, three max pool layers and two flatten layers. In addition, the few-shot metalearning technique named ProtoNet [41] is also implemented to check the transferability of the results. 4) It presents a comprehensive performance analysis of the TSFR proposal for HAR purposes over five datasets (2 new, 3 from the bibliography) and three deep learning models (2 new, 1 from the bibliography). In this analysis, the use of the SG filter in the time domain, frequency domain, and in both domains simultaneously has been assessed. In addition, performance comparisons have been carried out with the other methods from the state of the art. Furthermore, performance results in terms of accuracy and confusion matrices have been obtained when working with the processed CSI phase, the CSI amplitude, and both variables combined.
The rest of the paper is organized as follows: Section 2 summarizes the main concepts on which our proposal is based: CSI and the Savitzky-Golay filter. Section 3 presents the proposed method. The datasets and the deep learning algorithms which are utilized in this work are described in Section 4. The results and discussion are presented in Section 5. Finally, the paper will be concluded with some ideas and future directions in Section 6.
Notation: Matrices are represented in capital letters and boldface. The matrix O SxK represents a zero matrix with S rows and K columns. O˚, k represents the column vector k, and O s,˚t he row column s. The application of the Savitzky-Golay filter is represented as SG{¨, n, 2l+1} being n the order of the polynomial used to fit the samples and l the length of the filter window.

Channel State Information
CSI describes the properties of the channel through which the signal propagates, in this case, OFDM wireless signals. These channel properties depend on the environment and the propagation medium and can therefore be used to extract characteristics of the environment. In the field of HAR, CSI is widely used because the channel properties are affected by environmental changes. So these variations are associated with the different activities to be classified. For an OFDM system, the received signal in the frequency domain can be modeled as y " H¨x`z. (1) where y and x denote the received and transmitted signal vectors, respectively, z is the additive complex white Gaussian noise, and H represents a diagonal matrix of the CFR, also referred as CSI. The CSI of the k-th subcarrier during the s-th symbol, h s,k , is a complex value as follows: where |h s,k | and θ s,k are the amplitude and the phase, respectively. The CSI is therefore composed of two independent sources of information, amplitude on the one hand and phase on the other. At the receiver side, CSI is usually estimated to decode the received signal. In this process, synchronization issues can lead to several errors in the estimated CSI, making the treatment of the phase complex for HAR purposes due to its uncertainties and offsets. In particular, there are three main types of errors [42] affecting the phase that do not reduce communication quality but are of great importance when working with CSI for HAR classification in closed environments. ‚ Sample Frequency Offset (SFO) is due to a mismatch of the oscillators between transmitter (TX) and receiver (RX). This lack of synchronization generates a time shift of the received signal concerning the transmitted signal. As the local oscillator remains stable over a short time, the SFO is usually treated as a constant. ‚ Sample Time Offset (STO) occurs because the receiver detects the packet by correlation operation and signal power calculation. Due to hardware imperfection, this process introduces a random time shift. ‚ Carrier Frequency Offset (CFO) occurs because the receiver center frequency is not synchronized. The system completes the estimation and compensation at the receiver by analyzing the cyclic prefix and pilot signals. However, due to hardware instability, the frequency offset cannot be entirely determined, and this residual offset causes a non-negligible error in the phase. Therefore, let x H SxK be the estimated CSI matrix of S symbols with K subcarriers: where the ps, kq-th element of x H can be given by p h s,k "ˇˇp h s,kˇe j p θ s,k . Likewise, a matrix of the measured CSI phases can be defined as p Θ SˆK where the measured phase at the k-th subcarrier of the s-th CSI frame can be expressed as: where θ s,k is the actual phase, ∆t is the time lag due to SFO and STO, m k is the subcarrier index of the kth subcarrier, N is the discrete Fourier transform size for the OFDM generation, γ is the unknown phase offset due to CFO, and Z is the measurement noise.
It is worth mentioning that these offsets occur in frequency and time domains and that SFO and STO linearly depend on each subcarrier. In the following, several phase processing methods are presented to provide useful information to the models used in the field of HAR.

Linear transformation
A usual approach to mitigate offset mismatches is to apply a linear transformation. It is noticed that the phase error 2π m k N ∆t`γ in (4) is a linear function of the subcarrier index m k . We can estimate for each symbol s the phase slope ε s and the offset τ s with the following expressions: Finally, subtracting ε s m k`τs from the raw phase p θ s,k , we can obtain the calibrated phase, θ 1 s,k , which is given by

Savitzky-Golay filter
The Savitzky-Golay filter is a filtering method based on local area polynomial least square fitting for time-series signals [43]. It is used to smooth the CSI data and reduce environmental noise interference to facilitate the subsequent feature extraction [44]. The method requires defining a moving window of size 2l`1 and a fitting order n to perform left-toright curve filtering. First, the filtering center is selected, and 2l`1 point out of each l point around the center is chosen as the primary filtering object. For the sake of simplicity, a vector v smoothed with a Savitzky-Golay filter is defined as v sg " SG pv, n, 2l`1q (8) where v sg is the output of the filter. The choice of the optimal parameters for the Savitzky-Golay filter depends on the nature of each problem. Therefore, they are obtained empirically through an analysis of data.
Given a CSI phase matrix p Θ, SG filtering could be applied in every dimension of the matrix or in both at the same time. It is usually considered that the phase is continuous along a given CSI symbol. However, when CSI estimates of different symbols are obtained with a periodicity small enough compared with the coherence time of the wireless channel, continuity of the phase is also preserved in the inter-symbol time domain.
Therefore, this work evaluates the SG filtering in order to maintain the continuous form of the phase in frequency domain, in the inter-symbol time domain and in both domains as follows: 1) Frequency domain: It is applied to the CSI estimate of each symbol. One of the filter characteristics is that it retains the width & height of waveform peaks in noisy signal [45]. 2) Time domain: The filter is applied to each subcarrier along consecutive CSI symbols to smooth and ensure phase continuity over time. Its application has an impact on the frequency domain and can generate distortions. 3) Time-Frequency domains: Applying the filter in both domains at the same time ensures continuity and phase smoothing in both dimensions. This data processing is done according to [46].

Phase sanitization
The idea of the proposed method is to take advantage of the good results offered by the linear transformation while maintaining continuity, i.e., avoiding gaps, in at least one of the two domains of the phase. In this sense, we improve the traditional linear transformation shown in Subsection 2.2 using a linear regression of overall symbol points to remove the slope generated by STO and SFO impairments. The slope removal is carried out through a rotation of each pair pk, p θ s,k q. The amplitude,ˇˇp h s,kˇ, remains constant and unchanged throughout the entire phase sanitation process.
Given the phase matrix p Θ, a linear regression of each s symbol (i.e., p Θ s,˚q in p Θ is computed. Then, the linear regression model function follows the form: and a s is the linear regression slope Fig. 1: Graphical representation of a CFO, SFO, and STO correction in a symbol phase by LRR q θ s,k vs. bibliographic method θ 1 s,k . These data belong to the OPERAnet dataset: tx1rx1, s = 500K.
Note thatk is the average value of k. Therefore, the angle of the slope a s with respect to the horizontal axis is α s " arctan a s Using (12), the pair pk, p θ s,k q is rotated by α s degrees, and the offset given by r s p1q is also removed for each phase value. As a result, the calibrated phase q θ s,k is given by: q θ s,k " k¨sin α s`p θ s,k¨c os α s´rs p1q (13) Figure 1 shows a graphical example of the LT method and the proposed Linear Regresion and Rotation (LRR) solution for a specific CSI frame in the OPERAnet dataset [47].

Time Smoothing and Frequency Rebuild
At this point, calibrated phases maintain distortions and gaps between adjacent CSI symbols in q Θ. Moreover, applying LT or LRR methods cannot ensure phase continuity in frequency, since other non-linear errors in hardware, software, or a weak implementation of the measurements can also generate gaps and deform the received signal and, in consequence, the estimated CSI. For this reason, a low-pass filter is used to smooth the calibrated CSI phases and ensure phase continuity. Time domain gaps correction makes sense if the activity to be measured generates changes in the channel at a rate greater than the time interval between OFDM symbols, which is the case in this work and generally in the HAR field.
Time Smoothing and Frequency Rebuild, TSFR, is the method proposed in this section. Assuming that the calibrated phase is approximately continuous in frequency and the main discontinuities appear in the time domain between adjacent symbols, SG filtering is proposed to be applied in time domain, combined with a threshold-based method to correct the irregularities that SG filtering generates in the frequency domain and, thus, to maintain continuity.
Once q Θ SxK have been calculated, SG filtering is carried out in time-domain 1 for the k-th subcarrier as: (14) Due to the previous time filtering, discontinuities in the frequency domain are generated in the form of a step between subcarrier blocks. A threshold-based method is proposed to remove those quantitatively large gaps that can appear between two adjacent subcarriers. In most scenarios, it can be assumed that the phase of the channel frequency response change slowly between adjacent subcarriers. As a result, the phase difference between adjancent subcarriers (i.e., q θ s,k´q θ s,k´1 ) should be small and large gaps could be considered outliers. Considering that those noisy differences can be approximated to a Gaussian distribution [36], we have defined a threshold, d s , which has the form: where µ s is the average of the phase differences before SG filtering: ( 16) and σ s is the standard deviation: In Fig. 2, several histograms of the phase differences of adjacent subcarriers are drawn for different datasets, which are described in Section 4. One can observe that phase difference distributions present a bell-shape and can be approximated to a Gaussian distribution. (14); (16); σ s Ð apply (17); d s Ð apply (15);

Algorithm 2: TSFR
Consequently, after SG filtering in time domain, a TSFR phase matrix r Φ SxK is calculated where the ps, kq-th element follows: being " q φ s,k´q φ s,k´1 . According to this methodology, the phase of any subcarrier in which the difference with the previous one exceeds d s will be modified. Based on the Gaussian assumption, approximately 30% of the subcarriers of each symbol are modified, including outliers generated by the SG filtering and actual smoothed values. Therefore, this methodology is not only intended to correct the outliers due to temporal filtering. It also tries to take advantage of this correction to modify the statistical distribution of the symbol, making it more characteristic for each activity by means of the d s value. The main ideas behind this method are: -The time evolution of the phase for each subcarrier can reveal information related to the channel variations of each activity. Therefore, those subcarriers that suffer phase gaps after time filtering are also characteristics of the time evolution of the whole CSI phase matrix, as shown in Fig. 4. One can observe that some phase differences can be sensitive to the channel changes related to the activity in the room, while others behave steady. -The corrected phases after the LRR method in each CSI symbol also contain relevant information related to the channel variations of each activity. Part of this information is present in its statistical variables, such as those referred to (15), (16) and (17). The gaps generated as a result of the temporal filtering can corrupt this valuable information for HAR and, therefore, these gaps are reduced through the proposed adjustment in (18).
With this in mind, the d s value incorporates information related to each activity into the CSI phase matrix, as is depicted in Fig. 3, generating a characteristic modal number (d s ) for each symbol and applying it in time domain via time characteristic subcarriers k, on which condition | q φ s,k´q φ s,k´1 | ą d s is satisfied. With this phase correction method, the jumps are not completely eliminated, but their value is reduced and uniformed to the d s value. So the information is preserved while distortion is reduced. In short, new information is added to each OFDM symbol using the new variable d s : how many times it is repeated, between which subcarriers, and what its magnitude is. All this is intended to help the prediction algorithms to classify correctly. The benefits of this processing are confirmed by the good results obtained, as seen in section 5. In Fig. 6, the effect of the time SG filtering and the gap removal process is shown for a certain CSI symbol in the OPERAnet dataset (tx1rx1, s = 50) [47]. We can observe that several large steps are generated after SG filtering in the blue areas and, afterward, removed with the proposed thresholdbased method.
The complete TSFR method is described in Algorithm 2. In Fig. 5, representations of the processed CSI phase matrices in different steps of Algorithm 2 are shown for a certain estimated phase CSI matrix p Θ SxK corresponding to real measurements. One can initially observe the synchronization errors in the measured phases. Corrections of the linear phase impairments are carried out with the proposed LRR solution, and q Θ SxK matrix is depicted in Fig. 5c. In Fig. 5b, we can also observe the corrections performed with the traditional LT method. Finally, the output of the TSFR solution r Φ SxK is given in Fig. 5e. Additionally, we can see in Fig. 5d the processed phase when the LRR method is applied along with two-dimensional SG filtering.
Finally, after the TSFR-based phase processing, the processed CSI matrix Ă H SxK can be reconstructed: where r h s,k "ˇˇp h s,kˇe j r φ s,k .

Datasets
This section explains the datasets used to test the proposed phase processing method for different human activity recognition (i.e., people counting people, position detection) and in indoor environments. There are five datasets. Two of them (named A and B) are not published and are not accessible. The other 3 are public, and their characteristics are described in detail in [7], [47], and [18]. The main characteristics of the datasets are shown in Table 1 and are explained below, especially for the ones that are not publicly accessible:

Dataset A
Dataset A is a dataset created by our research group at the University of the Basque Country. The dataset's purpose was to count people and detect their fixed positions (sitting) in an indoor environment. The measurements were taken in a meeting room (2.8m x 4.8m) with one TX and two RXs. There were four chairs around a table in the room. The TX  Fig. 7A. Three USRPs (Universal Software Radio Peripheral) were used for these measurements, one as a TX and two as RXs. A DVB-T2, 10 MHz, 32K Digital Terrestrial Television (DTT) based signal was employed for the measurements. The channel frequency was 5.4 GHz, and the sampling frequency of the TX and the RX USRPs was doubled to obtain 20 MHz bandwidth (BW). A software (SW) DVB-T2 receiver was used to decode the T2 signal and obtain the CSI, which were then decimated to work with K=273 subcarriers at a rate of 606 Hz.

Dataset B
This dataset was created by researchers at the National Autonomous University of Mexico (UNAM). The measurements were taken in the living room of the researcher's apartment (approximately 3x4 meters). Six different locations were selected in the room, and a chair was placed in each location. The measurements were made with one or two people in the room, sitting on the chairs or standing in from of them, covering all the possible combinations of locations, number of people, and in sitting or standing positions. Measurements of the room without people were also taken, but only 1% of the measurements correspond to this situation in contrast with 50% of measurements with one person or 49% of two people, so the dataset is strongly unbalanced. The room and the locations of the chairs are shown in Fig. 7B.
The measurement system consisted of two laptops with Qualcomm Atheros QCWB335 network interface cards (NIC). One of the laptops injected WiFi packets, and the  Measurements were carried out in five indoor scenarios where up to five people walked casually. The number of CSI traces per number of people and scenario ranged between 12K and 15K, depending on synchronization issues in the signal decoding process.
This dataset is explained in more detail at [7].

OPERAnet
It is a comprehensive dataset intended to evaluate passive HAR and localization techniques with measurements obtained from synchronized Radio-Frequency devices and vision-based sensors. For our purposes, the dataset consists of CSI data extracted from a WiFi NIC. Of the vast number of measurements and experiments in this dataset, we only used one, named "exp028: Crowd counting". The "exp028" dataset contains the CSI from the three TX antennas to each of the three RX antennas. For example, the CSI matrix generated between TX antenna two and RX antenna two is called tx2rx2. For convenience, only tx1rx1, tx2rx2, and tx3rx3 data have been used in this work. For the experiment, a maximum of six people walked continuously and randomly through a room. It started with six people; then, every 5 minutes, one person left the monitoring area. The WiFi CSI system consisted of three PCs fitted with an Intel5300 NIC, which extracts CSI from K= 30 subcarriers, spread evenly among the 56 subcarriers of the 20 MHz channel 149 in the 5 GHz band at a rate of 1.6 kHz.
This dataset is explained in more detail at [47].

ReWiS
These measurements were carried out in three different settings. The experiments involved two subjects who were given instructions on the type, duration, and location of activities such as jumping, walking, and standing. Each measurement campaign involved 180 seconds of data collection for each activity performed by the two people. Measurements were repeated ten times with a time interval of at least 2 hours between measurements. For the generation of the ReWiS dataset, the authors used three Asus RT-AC86U WiFi routers, each equipped with four antennas. The routers extracted the CSI packets using the Nexmon firmware [49]. The CSIs were calculated at a rate of 100 Hz, in the 5 GHz band, for 20 and 80 MHz BW, with K = 52 and K = 242 subcarriers, respectively. This dataset is explained in more detail at [18].

Deep learning models
To test the proposed method and quantify the improvement over the bibliographic LT method described in section 2, the datasets are manipulated in two different ways. The datasets Dataset A, Dataset B, EHUCount, and OPERAnet are evaluated by applying, on the one hand, a fully-connected neural network (FNN) and, on the other hand, a convolutional neural network (CNN). Stratified shuffle split crossvalidation [50] with 5 iterations are used in the training of both networks. In turn, the ReWiS dataset is evaluated using ProtoNet [41], a few shot learning (FSL) strategy [51], as described in [18].

Fully-connected Neural Network
In this case, CSI phase data is classified individually per OFDM symbol, assigning each one the label that corresponds to it. This way, if the dataset has KxS dimensions, 1xS labels are assigned. The datasets are evaluated using a full-connected neural network with four hidden layers. In addition, Mish activation layers [52] are introduced between the hidden layers to improve the information transmitted by the network using one of the new functions layer developed. Finally, a dropout layer of coefficient 0.2 is placed after the first hidden layer to avoid overfitting. The number of neurons of the first, second, third, and fourth hidden layer is 128, 64, 32, and 16, respectively, for WiFi CSI. DVB-T2 numbers are 256, 128, 64, and 32. The last layer has the same neurons as the classes to be classified. An example of this FNN is depicted in Fig. 8.

Convolutional Neural Network
To consider a sufficient time interval in which environmental changes may occur, CSI data are grouped into clusters and evaluated using a CNN. In each dataset, these groups are formed by a different number of symbols. The resizing of the data in the input network to make square inputs that can be used as images must consider the number of subcarriers, which changes for each dataset. Therefore, the dimensions of the inputs of this network are (r, r, 2), where r is equal to 128 or 256 in WiFi datasets or Dataset A, respectively, and 2 is due to that phase and amplitude are used. This way, the input obtained is comparable to a two-color square image.
This network consists of a two-channel input layer of size (r, r, 2) and three two-dimensional convolutional layers with 64, 32, and 32 neurons with three max-pooling layers between them. Behind the convolutional layers is a flattened layer to vectorize the output. Then, there are two fullconnected layers, one with 32 neurons and the last one with the number of classification classes. An example of this CNN is depicted in Fig. 9.

Few Shot Learning
In the case of the ReWiS dataset, the objective is to replicate the processing performed by its authors at [18], so FSL ProtoNet processing is applied to the raw data. The goal of FSL is to generalize quickly to new tasks containing only a few samples with supervised information. ProtoNet is based on the idea that there is an environment in which points are clustered around a single prototypical representation for each class.
In this case, four activities are classified: empty, walk, stand and jump. First, each activity set is divided into intervals of 300 symbols. Single Value Decomposition is applied to each interval to reduce its dimension from SxK to KxK. Finally, the linear correlation coefficient, or Pearson's coefficient, is applied to this matrix, obtaining another matrix of linear coefficients, KxK, which is the training network's input. The training is performed by means of a CNN with four convolutional blocks. Each block comprises a 64-filter 3×3 convolution, a batch normalization layer, a ReLU nonlinearity, and a 2×2 max-pooling layer that is applied after each of the blocks.

RESULTS AND DISCUSSION
In this section, several comparative analyses are carried out using the aforementioned datasets and DL models. To do that, different classification problems of HAR (people counting, position detection, and gesture recognition) are solved through the CSI information, i.e., using CSI amplitude, CSI phase, or combining amplitude and phase. Furthermore, six different phase processing methods (LT, LRR, LRR + SG filtering in frequency domain, LRR + SG filtering in time domain, LRR + two-dimensional SG filtering, and the TSFR proposal) are compared when CSI phases feed the described DL models. Accuracies given by the raw values of the CSI are considered as a benchmark of the models. Performance results are given in terms of averaged percentage values of accuracy along with the standard deviation. The average and the standard deviation are computed when several scenarios or receivers are provided in the same dataset. Tables 2 and 3 show the accuracy results for people counting and position detection, respectively, using the FNN model. In this network, the values of amplitude and phase, as well as the combination of both, are used separately. Firstly, Table 2 shows that using amplitude versus raw phase gives better values for the Dataset A and OPERAnet datasets. In contrast, the results for Dataset B and EHUCount are similar. It is also noted that the LRR calibration of the phase gives better results than the LT method for two of the four datasets, while the accuracy is the same for the other two. Regarding smoothing, the SG filter gives better results for the time than the frequency in three of the four datasets analyzed. Still, in OPERAnet, the time smoothing generates a very low accuracy. On the other hand, 2D smoothing gives better results than frequency or time in all cases. However, the TSFR method using only the phase is the one that offers the best results of all, maintaining in all cases accuracies above 94%. Table 3 shows the results for classifying fixed positions in Datasets A and B. While the analysis is similar to that in Table 2, the accuracy of the TSFR method on Dataset B is striking, as it achieves 99% accuracy compared to 33% for time smoothing, in both cases working only with the phase.
In Tables 4 and 5, the performance of counting people and detecting position, respectively, is given for the proposed CNN model. In this case, the components of the CSI have been windowed to create images as inputs of the model. This network combines amplitude and phase values in matrices of the form (r, r, 2), as explained in Section 4.2.
In these tables, we can see performance indicators similar to those in Tables 2 and 3. First, the LRR method still offers better or equal accuracies than the LT method.
Comparing time smoothing with frequency smoothing shows that one offers better results in two datasets and the     other in the other two, with substantial differences. Also, in this case, 2D smoothing improves the frequency or time smoothing results in all cases, but it is outperformed by the TSFR method, which achieves excellent accuracy of more than 90% in all cases.  The accuracy of the TSFR method in Dataset B is 96%, compared to the second highest, 33%, and in Dataset A is 88%, while the second highest is 68%.
In addition, we have seen fit to include Table 6 to show the Dataset B metrics in more detail, as the high accuracy in Tables 2, and 4 can be misleading. Dataset B is an unbalanced dataset in which class 0 occupies 1% of the total size, while classes 1 and 2 are 50% and 49%, respectively. Table 6 is an extension of Tables 2 and 4. It shows that, although the overall accuracy values are 98%, the only method capable of correctly classifying the unbalanced class is the TSFR method.
The ReWiS dataset is analyzed using FSL under the Pro-toNet model. In this case, the amplitude and phase values obtained are compared using, on the one hand, the raw CSI values and, on the other hand, the CSI values processed with the TSFR method. The Fig. 10 shows the confusion matrix for each comparison for 20 and 80 MHz, including reference values from [18]. TSFR phase processing improves the CSI raw results from 32% to 82% and from 35% to 85% at 20 and 80MHz of bandwidth, respectively, for phase accuracy. These results outperform the accuracy using the amplitude. Testing the TSFR method on this dataset using FSL implies that the method supports the extraction of certain features on the processed phase and improves the transferability of its results between different scenarios.
To summarize, the results indicate that the TSFR method can improve the classification accuracy by counting people, determining their fixed position, and detecting activities using regular neural networks, as shown in all datasets. The success of the (18) is observed by comparing the results of the SG filter in the time domain (LRR+SG time) vs. the TSFR method. In all cases, the proposed method in (18) to rebuild the distortions generated by the SG filtering in the frequency domain substantially improves the classification algorithms. The accuracy of the TSFR method is always higher than that given by the exclusive application of the SG filter in any domain, including both simultaneously. Moreover, TSFR obtains good results when DL algorithms use CSI data directly, and it also improves results when feature engineering is carried out, as we have observed in the comparative analysis based on the FSL model. Furthermore, it is worth mentioning that the results seem to indicate that their use in unbalanced datasets may help to improve the accuracy in detecting under-represented classes.   [18] using CSI amplitude. c) and d) are achieved with the raw CSI phase. e) and f) are based on the TSFR method. The data corresponds to the configuration of single antennas in transmission and reception. a rotation process. The rotation angle is obtained from the slope of a linear regression adjustment in the frequency domain. After that, phase smoothing in the time domain is carried out using the SG filter, and finally, filtering distortions are rebuilt in the frequency domain using information from the sanitized phase.

CONCLUSION
The comprehensive analysis of the proposed method shows that the TSFR solution outperforms other solutions of phase processing in five CSI datasets provided by different OFDM-based wireless systems, variable configurations, and different HAR activities (people counting (sitting or walking), position detection, and activity recognition). In addition, three different DL networks have been employed. Therefore, the TSFR method allows the use of the CSI phase as a robust source of information for HAR in different conditions of wireless sensing.
Guillermo Diaz received his B.S degree in Physics and M.S. in Data Science from the University of Cantabria (Spain) in 2018 and 2019, respectively. For two years, he worked as a researcher at Photonics Engineering Group, University of Cantabria, about Image Classification, Object Recognition and Gait Analysis for older people. Since 2021, he has been part of the TSR (Signal Processing and Radiocommunications) research group (https://www.ehu.eus/tsr) at Basque Country University (UPV/EHU), where he is currently a Ph.D. student. His current research interests include data analysis of wireless networks for human activities recognition and wifi sensing. Iker Sobron (M'10) obtained a degree in Electronics Engineering from the University of the Basque Country (UPV/EHU), Spain, in 2006; a degree in Physics from the National University of Distance Education, Spain, in 2010, and a Ph.D. degree in Electronics Engineering from the University of Mondragon (MU), Spain, in 2011. He is currently Assistant Professor in the Department of Computer Languages and Systems and member of the TSR (Signal Processing and Radiocommunications) research group (https://www.ehu.eus/tsr) at UPV/EHU. He has completed a two-year postdoctoral stay at the SMT Lab of the Federal University of Rio de Janeiro (Brazil) and a six-month research stay during the Ph.D. at the University of Turku (Finland). His current research areas are wireless sensing for human activity recognition, RF-based geolocation, distributed sensor networks, and machine learning applied to wireless networks.
Iñaki Eizmendi received the M.S. and Ph.D. in Telecommunications Engineering at the University of the Basque Country, Spain, in 1994 and 2012, respectively. He has worked as R&D engineer in several companies. Since 2003 he has been with the TSR (Radiocommunications and Signal Processing) research group at the University of the Basque Country Communications Engineering Department. He is an associate professor in the same department. As a result of all the research work carried out, he has participated as a co-author in 11 articles published in international journals with JCR and in 35 articles presented at international congresses. His current research interests focus on new digital broadcasting technologies, 5G networks, IoT networks, wireless sensing for human activity recognition, and machine learning applied to wireless networks.

Iratxe Landa
Ph.D. in Telecommunications Engineering. In 2001 she joined University of the Basque Country, where she is currently Full Professor with the Department of Communications Engineering. She is member of TSR (Signal Processing and Radiocommunications) research group (https://www.ehu.eus/tsr) at UPV/EHU. She has worked as a researcher in three postdoctoral stays at University of Corsica (France), Griffith University (Brisbane, Australia) and Dublin City University (Dublin). Her current research interests include radio noise measurements, impulsive noise measurement and effects in digital wireless systems, signal propagation, measurements and simulations of the digital broadcasting systems, wireless sensing for human activity recognition, RF-based geolocation and machine learning applied to wireless networks.

Johana Coyote received the B.S in Electronic
Engineering and Telecommunications Systems from the Autonomous University of Mexico City (UACM), in 2018. She is currently in process obtain the M.S degree in Electric Engineering from the National Autonomous University of Mexico (UNAM). Her current research interests include wireless communications, signal propagation, automation systems, machine learning and new information technologies.