Toward Seamless Localization: Situational Awareness Using UWB Wearable Systems and Convolutional Neural Networks

Depending on the environment, an increasing number of localization methods are available ranging from satellite-based localization to visual navigation, each with its own advantages and disadvantages. Fast and reliable identification of the environment characteristics is crucial for selecting the best available localization method. This research introduces a deep-learning-based method utilizing data collected with wearable ultra-wideband devices. A novel approach mimicking radar behavior is presented to collect the relevant data. Channel state information is proposed for training of the neural network and enabling the environment detection to obtain the desired situational awareness. The proposed detection approach is evaluated in three types of environments: 1) indoor, 2) open outdoor, and 3) crowded urban. The results show that fast and accurate environment detection for seamless localization purposes can be achieved with a precision of 91% for general scenarios and a precision of 96% for specific use cases.


I. INTRODUCTION
W ITH the expansion of semiautonomous and autonomous positioning in different fields, including vehicle transportation, vessel and ship tracking, and unmanned aerial vehicles (UAVs) navigation, the necessity of seamless localization is more tangible every day [1]. Changes in the environment are unavoidable in most of the positioning scenarios, and reaching the best available performance requires the selection of the best sensors and devices for each environment. For instance, considering three examples: 1) a vessel navigating in a canal in urban areas and under bridges, then moving to the sea far from any tall buildings, 2) a vehicle driving in an indoor parking area, moving to a crowded city center and then to an open highway, 3) a pedestrian walking in a wide park with no tall buildings, then walking to a narrow street and then entering a shopping center, it is clear that obtaining high-performance seamless positioning requires the understanding of the environment, and choosing the best available sensors and devices accordingly. Global navigation satellite systems (GNSS) receivers can work perfectly in open environments while their efficiency degrades significantly in urban areas with tall buildings due to multipath effects [2]. In Manuscript  addition, 5G networks can provide accurate position estimation in urban areas where the base stations and line-of-sight (LOS) signals are available [3], [4]. Instead, for indoor environments, where the anchors of radio frequency (RF) signals such as WiFi or ultrawideband (UWB) are available, multilateration based on range measurements can provide a good positioning solution [5]. Furthermore, in scenarios where cameras or LiDAR are available, computer vision methods can offer good positioning performance based on map information [6]. Thus, a moving user, navigating in different types of environments might take advantage of different devices and switch between them or fuse a set of them to obtain accurate position estimates. This procedure is illustrated in Fig. 1. Consequently, a fast and accurate method is required for the system to detect the change in environment and deploy the most reliable positioning method and devices based on the known type of environment [7], [8]. One of the main benefits of environment detection, as discussed in the literature [9], is that it results in less memory allocation by avoiding unnecessary data collection in a new detected environment [10]. The environment detection also enables reduced power consumption by keeping ON, or turning ON, only the most relevant sensors in each environment [11]. Furthermore, being aware of the type of environment requires sensing solutions, which are also considered for 6G networks under the simultaneous communication and sensing functionality [12]. Considering that aerial base stations (ABS) carried by drones are integrated in 6G cellular architecture [13], environment detection can improve the situational awareness of the drones and enhance the performance of the communications system. Besides, in the scenarios where there is uncertainty on GNSS measurements due to signal reflections from nearby buildings, environment detection can help to recognize the potentially challenging urban environment. As a result, false detection due to strong reflections of GNSS signals in urban environments can be recognized [14], [15], and further managed by appropriate actions.
Situational awareness, e.g., knowing in which type of environment a mobile user is currently located, can be realized using different technologies. One promising candidate that we propose is the use of UWB radio chips. UWB chips are being integrated into recently manufactured smartphones and this trend is expected to increase in future [16]. Thus, in near future, majority of the pedestrians are likely carrying an embedded UWB chip within their phones. This level of availability of UWB signals makes them a potential candidate for environment detection. Furthermore, UWB is a low-power technology and UWB signals have a bandwidth of around 500 MHz, which means a narrow pulse in time domain (in order of nanoseconds) [17], [18]. This wide bandwidth, which results in a narrower pulse and high time-resolution in comparison with other RF signals, enables the separation of different multipath components. In addition, this specification of UWB makes it a suitable candidate for accurate range measurement and positioning [5]. Consequently, this technology is available in many of the positioning scenarios where part of the positioning should be done in an indoor environment; so, in addition to range measurement, it can be utilized for environment detection as well. On the other hand, one recently introduced technology of 6G is Joint Radar and Communication (JRC) [19]. With the rise and utilization of JRC, the demand for radar-compatible methods will further increase in the near future. In this work, we present a method that mimics radarlike behavior to scan the environment and detect the type of environment based on UWB signals. For this purpose, we need a device-to-device (D2D) communication of two UWB devices: 1) a transmitter and 2) a receiver. Thus, the transmitted signal traveling in the environment captures the specifications in the environment and can be received by the receiver. Finally, the characteristics of the environment can be extracted by analyzing the channel information of the signal. These two devices are carried by the end user, meaning that no infrastructure of UWB is required in the area of interest.
To analyze the UWB signals, neural networks have been quite often exploited in recent research due to the high performance and efficiency. In comparison with other traditional machine learning methods, such as random forests, neural networks have the ability to extract the features from the given data-set. Convolutional neural networks (CNNs) are usually a proper candidate to analyze channel impulse response (CIR) of signals since they have the capability to extract patterns among image data. In this research, we have a set of signal data for which we want to find the patterns among them and find to which environment they belong. A neural-network-based UWB signal analysis has been investigated in the literature for Non-LOS (NLOS) and LOS detection [20], ranging error correction [21], and device-free localization [22]. Although the mentioned references have different research goals, all of them utilize UWB CIR and CNNs for the analysis of the data. These references prove that the neural networks have the capability to learn the patterns in UWB signals data for research goals in the domain of positioning. In this research, the goal is environment detection. Channel information of signal is computed from CIR. Considering that this information describes how a signal propagates from the transmitter to the receiver, the channel information can characterize the environment.
In this work, we introduce a novel method inspired by radar technology to detect the type of environment for a pedestrian moving in different areas. The main contributions of this work are listed as follows.
1) We propose a method for environment detection, which utilizes a wearable system for pedestrians, and does not need any infrastructure-related devices and is independent of any anchors or base stations in the environment. The proposed method can also be considered to mimic 6G JRC technology, which will be available in the future devices. 2) For the first time, we propose utilizing UWB channel state information (CSI) data for detecting the type of environment. CSI is the representation of the signal in the frequency domain, estimated by calculating the fast Fourier transform (FFT) of the CIR. This enables fast and accurate infrastructure-free environment detection with relatively low power consumption. While the methods presented in the literature consider only detection between indoor and outdoor, we propose detection over an extended set of environments, including indoor, open outdoor, crowded urban, and shopping mall. 3) Considering that extracting raw channel data in application programming interface (API)-level from smartphones is not currently available, we have used specific UWB devices to extract the channel information.
Although it would be desired to access the raw data at the API level to enable various third-party applications, there are also other possible use cases for the proposed method. For example, the raw data could be processed at the device chipset, and only compressed information, or directly environment detection results, would be passed to the API level. For the presented results, we have collected experimental data from nine different environments in Ghent, Belgium, using Wi-PoS devices, developed by IMEC and Gent University [23]. Moreover, the dataset is open-source and available on IEEE DataPort for the researchers for future studies [24]. 4) To achieve accurate results, we apply CNN-based machine learning methods to train and detect the type of environment. Furthermore, we describe the structure of the network and optimize and report the related hyperparameters. The proposed network is generalized utilizing regularization algorithms and the results prove that the method works for data collected from various places. The rest of this article is organized as follows. A more detailed comparison between the state-of-the-art methods and our method is presented in Section II. Our proposed system for environment detection utilizing UWB CSI is presented in Section III, the experimental setup is provided in Section IV, the performance evaluation is presented in Section V. Finally, Section VI concludes this article.

II. RELATED WORK
Most of the environment detection methods presented in the literature rely on separate infrastructure. For example, to detect the change in the environment, Zhu et al. [25] utilized the GNSS signals Carrier-to-Noise Ratio (CNR) and the number of available satellites, which are accessible by the GNSS receiver at the location of the user. However, the GNSS-based methods are generally considered to be power-hungry [9] and have uncertainties due to signal reflections [14]. Furthermore, the efficiency of the GNSS-based methods is highly dependent on the number of available satellites. The proposed method in this article consumes significantly less energy but yet provides high detection accuracy without the need of deployment of satellites or preinstalled base stations or anchors in the environment of interest.
In multisensor-based methods, several sensors are utilized to enable the environment detection. The sensors utilized in [26] include magnetometer, barometer, GNSS receiver, light, and pressure to distinguish between indoor and outdoor environments. Different types of outdoor areas, such as open areas and urban areas, are considered with a single "outdoor" class, and moreover, the average required time to detect the type of environment is 5s. In contrast, the proposed approach in this article is developed to detect multiple types of environments, not just outdoor and indoor, while still being significantly faster by providing the detection solution within less than 1 s.
In some scenarios, the light intensity can support other methodologies for environment detection. Li et al. [27] analyzed the received signal strength (RSS) of WiFi signals collected from different access points and fuse the results with light intensity information. The machine learning algorithms of adaptive boosting are utilized to detect between indoor, outdoor, and semiopen environments with an average accuracy of 85%. This method is dependent on the availability of WiFi access points in the environment.
Inertial measurement units (IMUs) can also be utilized to detect the type of environment. Kelishomi et al. [9] used the IMU inside a mobile phone to detect the physical activities of the user and then make a decision about the environment type based on the user activity. In [9], only indoor and outdoor environments are investigated, and the investigation of different types of outdoor environments is excluded because of the unavailability of data. Besides, the detection based on physical activity is highly dependent on the age of the pedestrian moving in different environments. In this work, we introduce a novel method to classify different types of environments, including crowded urban and open outdoor areas. Moreover, the proposed method in this article is not dependent on the user activity, or the age of the user, but entirely relies on the characteristics of observed UWB signals after propagating through the channel.
Ali et al. [11] have presented SenseIO for indoor/outdoor detection. SenseIO is a multimodel method, which takes advantage of the global positioning system (GPS), WiFi APs, light intensity, and human activity recognition. In spite of several technologies and sensors utilized in SenseIO, the environment detection accuracy for outdoor areas stays below 90%. Furthermore, there is no information regarding the time required to detect the environment type. However, in critical scenarios of seamless localization such as those for autonomous vehicles and drones, a precise and fast detection of the environment type is necessary. In our method, we introduce an infrastructure-free approach, which is independent of GNSS signals, WiFi APs as well as other sensors.
Jeon et al. [28] utilize computer vision methodologies to detect the change in the environment by discovering if the robot is passing the door to a new environment. They use AI YOLOv5 model for real-time object detection in the images captured by a camera. The change in environment found by door passing detection takes an average time of 3600 ms and the method is highly dependent on the shape of the door in the infrastructure.
5G signals CSI is used in [29] to detect the indoor and outdoor environment. Authors have utilized an unsupervised funnel on top of a supervised feature extraction method called Fukunaga-Koontz transform (FKT) to detect the type of environment. The average accuracy achieved is 75%. Although the method is a low-power methodology in Internet-of-Things (IoT) scenarios, it is dependent on the availability of one access point in the infrastructure.
As discussed above, fast and accurate recognition of various types of environments has remained unaccomplished in previous works. In this work, we utilize UWB CIR to introduce an infrastructure-free method. Besides conventional indoor and outdoor detection, the proposed method is able to recognize between different types of indoor and outdoor environments, including open outdoor and crowded urban areas, in less than 1 s. To the best of our knowledge, environment detection has not been investigated using the CNN-based analysis of UWB signals. A summary of the related works including the methodologies, measurements, limitations, and the dependence to the infrastructure, the average time required for predicting the environment type, and types of environment investigated is presented in Table I.

III. SYSTEM DESIGN
This section presents the overall framework of the method, data collection, data preparation, and the neural network training.

A. Overall Framework
The overall framework of our proposed method is illustrated in Fig. 2. After the data collection in the offline phase, data are first prepared. Different labels are investigated and considered in various scenarios. Then, the prepared data are fed to the CNN to train the network. In the online phase, the test set is fed to the trained network and the type of environment is detected for the unseen test dataset. The signal we investigate is the UWB signal and the data analysis method is CNN. The methodology of data collection and the description of the environments are provided in the next section.

B. Data Collection
The data are collected using a novel methodology to mimic monostatic radar behavior with UWB chips. Wi-PoS devices are utilized in the form of wearable systems on the arm of a pedestrian. These devices are used with an embedded Decawave DW1000 UWB transceiver [23], which enables the collection of UWB CIR data. As illustrated in Fig. 3, the UWB signal transmits from Wi-PoS 1 and after reflection from the walls, trees or other elements in the environment it is received by Wi-PoS 2 on the other arm. For the experiment, the CIR of this signal is collected by the laptop that the pedestrian carries.
The used channel for the transceiver is the UWB channel 5 with a center frequency of 6.489 GHz and the bandwidth of 499.2 MHz. The bitrate is 110 kb/s with a pulse repetition frequency (PRF) of 64 MHz and a preamble length of 4096. Furthermore, the time resolution of the CIR is 1.016 ns.
During the signal transmission, the signal experiences multipath effects due to reflections, diffraction, and scattering, which are environment dependent. For instance, a crowded urban environment with narrow streets or sidewalks, tall walls, groups of people, or moving vehicles, results in effects to the signals, which are different than effects of open area environments free from such strong multipath effects [30]. Consequently, the CIR of the signal changes due to different environmental effects [31]. These patterns in the signal generated by different environments can be learned by a neural network [32]. In this work, the raw CIR data are collected from nine different sites in Ghent, Belgium.
The data collection is performed by a pedestrian wearing the UWB devices and walking in different environments. The pedestrian wears the UWB devices on the arms and carries the sensors, power banks, and the laptop for data collection. As the pedestrian walks in an environment, the signal is transmitted by a transceiver on one arm and then received by the transceiver on the other arm, as previously illustrated in Fig. 3. The equipped pedestrian with the devices is illustrated in Fig. 4, and the considered data collection locations on the map are shown in Fig. 5.
One of the challenges in collecting the experimental data was the application and collection of the required permissions for  a pedestrian to move in different areas of Ghent city with the wearable equipment. The corresponding permissions have been granted by relevant organizations and authorities. For the data collection, the pedestrian walked for 10 min in each environment and collected more than 4000 CIR vectors per environment. Each CIR vector is made of 300 time-domain samples, which represent the CIR as a function of time. The CIR datasets are collected using a Python script and the data are stored on the laptop. The places that the pedestrian has walked are described and demonstrated in more detail in Section IV.

C. Data Preparation
To feed the CIR data to the neural network, we first compute the CSI by calculating the FFT of the CIR [33]. CSI is a frequency-domain signal representation, or a feature, which describes how a signal propagates from the transmitter to the receiver as function frequency. In this way, CSI is able to characterize the environment [3] and is a good candidate for environment recognition utilizing the power of artificial intelligence (AI). Before we explain the data preparation method, we will have a closer look at CIR and CSI definition.
1) CIR and CSI Definition: For a deeper understanding of the CIR and CSI, and especially how they are affected by the environment, it is beneficial to consider a related multipath radio propagation channel model. Assuming the use of omnidirectional antennas, the received signal can be represented as [34] where s(t) is the transmitted signal as a function of time t, and K is the number of multipath components. Furthermore, b k is a complex path coefficient for the kth multipath component, τ k and f D,k denote the path delay and Doppler shift in respective order. Finally, w(t) is additive white Gaussian noise, which can be also modeled to include other additive error sources, such as interference. The environment affects the parameters of the above signal model in various ways. For example, the following hold.
1) The path delays τ k are related to path propagation time and consequently the distances and reveal some information on proximity and density of surrounding objects. 2) Path coefficients b k are affected by attenuation along the path as well as different channel interactions, such as reflections, scattering, and diffraction, depending, for example, on used materials in surrounding objects. 3) Moving objects in the environment induce Doppler shifts f D,k to each multipath, which causes time-dependent phase rotation of the received signal. Fundamentally, the observed CIR includes all paths, and there is no need to distinguish separate paths for the proposed environment detection. On the contrary, the CIR, including joint path information and interpath dependencies, is processed as a whole in the proposed CNN architecture in the next section to extract the essential features for the environment detection. The presented model in (1) is very generic and can be applied to all considered environments by appropriately tuning the channel parameters. The reference transmitted signal is the one emitted by the emissor device.
The recorded data consists of CIR measurements, where each CIR measurement includes 300 complex-valued samples. Moreover, each sample represents the channel response at a specific channel propagation delay. By applying the discrete  Fourier transform, we observe the channel frequency response (also 300 samples), denoted as the CSI in the article. Each sample in the obtained CSI represents the channel response for a specific frequency. Assuming that the kth sample of CSI is denoted as h(k) ∈ C, in step 4 in Algorithm 1, the polar form is calculated as h(k) = |h(k)| exp(j arg{h(k)}), where |h(k)| is the amplitude (or modulus) and arg{h(k)} is the phase (argument) of the kth CSI sample.
It is worth noticing that the noise-free CIR of the channel can be obtained by substituting a unit impulse function for s(t) in (1), and removing receiver noise. The example of one measured CIR in an indoor area (Krook Library) and one in an outdoor area (Citadel Park) is illustrated in Fig. 6.
The change in CIR amplitude with respect to the propagation time of the signal from the transmitter to the environment can be observed in Fig. 6. The first few peaks in this figure show the reflections from the object in the environment. From (1), the CIR, and consequently the CSI, can be estimated by assuming the signal s(t) known at the receiver. The procedure to prepare the CSI to be fed to the neural network is presented in Algorithm 1.

D. Neural Network Training
When it comes to pattern recognition by image and signal analysis, CNNs can be considered as proper candidates [35]. CNNs are capable of finding the essential patterns and extracting the features from the data. Feature extraction is done by the elementwise product of the given input and a kernel, represented

Algorithm 1: Data Preparation Algorithm.
Input: The raw CIR data of all the environments Output: CSI Data. 1: for each environment do 2: for each CIR collected vector do 3: Compute discrete Fourier transform using FFT function to estimate CSI based on raw CIR vector; 4: Calculate the polar form of CSI complex elements: amplitude and phase; 5: Unwrap the phase and calibrate by removing the offset in phase values of the samples in one CSI vector; 6: Put the amplitude and calibrated unwrapped phase of each sample of the vectors separately in two consecutive columns. 7: end for 8: end for by an array of numbers. The kernel slides over all the elements in the input, and regarding the fact that the convolutional layer conducts a linear operation, it is usually followed by a nonlinear layer to enable backpropagation.
1) Proposed Network: The structure of the neural network used in this work is illustrated in Fig. 7.
The calculated CSI data vector has 300 samples, which describe the channel response over different frequencies, and each sample has an amplitude and a phase. This results in the shape of data as matrices with 2 rows and 300 columns. We have tried different shapes of matrices to be fed to the neural network and tuned the shape based on the size of the filters in our convolutional layer. The most suitable shape in accordance with the convolutional layer setting is 30 rows and 20 columns. The number of layers has been selected in a way to improve the training accuracy while preventing the overfitting by using dropout in each layer as a regularization method. We use the Adam optimizer to optimize the learning rate, weights, and biases of the network [36]. To prevent the problem of gradient exploding, gradient clipping is applied in each epoch of training. The number of classes for the last layer, which is a Softmax classifier, varies for different scenarios as explained in the next section.

IV. EXPERIMENTAL SETUP
In this section, the environments of data collection are described. The four different scenarios for evaluating the proposed method are explained and the hyperparameters of the neural network are provided.

A. Environments Description
Nine different environments are considered in this research to test the robustness of the proposed methodology. These environments are described in this section. One of the environments is a railway station. Different parts of this environment are shown in Fig. 8 and all the other environments are illustrated in Fig. 9. More photos of the environments are available in dataset description [24].

1) Fourth Floor at iGent Tower in the Premises of Gent
University: This environment has narrow corridors with more than 15 offices and a small kitchen area. Many researchers were present during the day of data collection and there were at least one to two researchers near the pedestrian while she was walking on this floor. The pedestrian walked inside the corridors, about 1 m far from the walls and she also walked inside three offices. This floor is shown in Fig. 9(a).
2) Zwijnaarde Open Area: At the Ghent University campus, there are some open areas hundreds of meters far from the tall towers and a few university buildings. The pedestrian walked in the open area for 10 min. Every few minutes, one car or bike or a student moved at least 10 m away from the pedestrian.  The campus area, where data collection took place, is shown in Fig. 9(b).
3) Stadhuis Street and Nearby: The Ghent city hall is located on Stadhuis street, which is in the heart of the city center. In this street and nearby, there are a lot of historical buildings on narrow streets and alleys. The pedestrian walked on this street, and during the data collection, there were a lot of tourists walking around and many cyclists passing by. At some parts, the trams and cars were also around. This street and its surroundings are shown in Fig. 9(c).  Fig. 9(d). 5) Portus Ganda: Portus Ganda is a port area, a yacht mooring provided by the city of Ghent. It is located at a crossing in the old waterways of the river Leie. The pedestrian walked around the river and on the bridge, where the buildings are quite far, and a few people in the area were walking at least 5-10 m away from the pedestrian. During the test, a few cars also passed by. Some parts of this port area are shown in Fig. 9(e).  Fig. 9(h).

B. Considered Scenarios for Training the Neural Network
Before feeding the dataset as the input to the neural network, we should first define the classes based on the type of environment. While having a closer look at the data collected at the railway station, we can see that this environment is made of some parts which seem to be indoor, some parts to be open outdoor, and some crowded parts such as a crowded urban area. Going back to the purpose of this research, which is the improvement of seamless localization, the classification of the environment types is highly dependent on the available sensors and positioning algorithms. To elaborate, in one possible scenario, all the nine environments can be classified as simple as two classes including: "indoor" and "outdoor," having one positioning method for indoor areas and another one for outdoor areas. In another possible scenario, one can classify the nine environments into four different classes including "shopping mall," "indoor," "outdoor," and "crowded urban." For instance, in the latter mentioned scenario, the positioning system could have four different sets of sensors and relevant positioning algorithms, which would be switched according to the detected environment. To investigate several granularities toward the labels or classes of environments, and to see how the network behaves for each granularity, we investigate various number of labels as defined in the following scenarios. Note that for each of these scenarios, a new network (with same structure as explained in Section III-D) is trained.
1) Scenario 1. Nine Labels Using All the Nine Environments Datasets: In this scenario, we consider each environment as one class to observe the capability of the network on classifying nine different labels. The nine environments investigated in this scenario have similar characteristics and by defining this scenario, we are interested in finding the network behavior while it confronts these similarities. For instance, a park is similar to a campus area considering the open environments and lack of buildings, whereas an office is similar to a library by having narrow corridors, walls, and desks. Nonetheless, we would like to investigate how the network classifies these similar environments, and thus, we prepare the dataset and feed it to the neural network with the steps illustrated in Fig. 10. As shown in Fig. 10, nine datasets with raw CIR vectors are first prepared for being fed to the neural network. Then, the dataset is randomly split according to a uniform distribution into three types as training, validation, and test. In the offline phase, the network is trained and validated using the training and validation set; after this, in the online phase, the test dataset is utilized to test the performance of the trained network and find out how accurate the network is in estimating the environment category.

2) Scenario 2. Four Labels and Training by Using Seven
Environments Datasets: Another scenario is the movement of a pedestrian in four types of environments. In this scenario, we have utilized the data collected in seven environments and we have considered four labels for these environments as shown in Fig. 11. The consideration of the classes in this scenario is inspired by the scenarios defined in 3rd Generation Partnership Project (3GPP) technical report on channel models [37]. The environments considered under the same class have similar characteristics. Library and offices are both indoors, Stadhuis and Graffiti street are both crowded urban areas, and port and park are both open outdoor areas. However, the shopping mall, as illustrated in Fig. 9(d), has characteristics that make it different from previous labels. For this reason, we have considered this environment as a different class.

3) Scenario 3. Three Labels and Training by Using Six
Environments Datasets: The third scenario that we have considered is the movement of pedestrian in three different types of environments. This scenario is similar to previous scenario but excludes the shopping mall. In this scenario, the data collected in six environments are utilized and split as illustrated in Fig. 12 of this scenario is to investigate how the trained neural network would behave, if it is trained over one set of environments and tested on other environments, which in human eyes seem to have similar features with the previous sets. In other words, the testing is applied for environments, which has not been earlier seen in the training data. In this scenario, we consider training over data collected in iGent Tower Offices, Stadhuis street, and Portus Ganda port area. After the network is trained, we test the operation of the network by feeding data from totally different places, unseen by the trained network, including data from Library de Krook, Graffiti Street, and Citadel Park. We refer to this scenario as a general test scenario, as we believe, it is able to illustrate the generalization of the proposed methods and accompanied results. The data splitting regarding this scenario is illustrated in Fig. 13. The training, validation, and test vectors in all of the four scenarios are selected randomly from the whole dataset. All the above scenarios are summarized in Table II. 1) Hyperparameters: Hyperparameters are the parameters that define the network structure and how a network should be trained. These parameters are tuned in a way to get the maximum validation accuracy and minimum validation loss. The tuned parameters, used to train the networks for all scenarios 1 to 4, and ground truth labels. In this work, the loss value is calculated using a negative log likelihood loss function. The loss value itself depends highly on the model, network architecture, regularization method, and optimization algorithm. However, the important rule of training the neural network is that the loss value should decrease while training the network. It is important to focus on monitoring the loss value during training and evaluating the performance of the trained model on a separate validation set [38]. The most suitable value of each hyperparameter has been selected to train the network and prepare it for the online phase, that is, testing the network with the test set. To elaborate the procedure of finding the best hyperparameters, a few number of layers are first considered to see how the network behaves in learning from the data. For instance, we started with three convolutional layers. After training the neural network, we could see that underfitting is happening so more layers have been added to learn from the data. After this change in the number of layers and increasing the layers one by one until no more underfitting is happening, we could see that the loss is increasing. We have solved this problem by regularization methods, such as dropout layers. Another problem we observed was the fluctuation in accuracy values. We solved this issue by batch normalization. Different batch sizes and epoch sizes have been tried to train the network with the best accuracy while preventing overfitting. We observe the validation accuracy and loss in comparison with training accuracy and loss to decide for each hyperparameter.

V. PERFORMANCE EVALUATION
In this section, we evaluate the performance of the proposed environment detection method. We consider different scenarios, as presented in Table II, to investigate different labels and the efficiency of the presented system.

A. Performance Evaluation for Different Scenarios
In the following, we present and analyze the performance of the proposed method for different scenarios, introduced in Table II. 1) Scenario 1: The training and validation accuracy achieved in scenario 1 are illustrated in Figs. 14 and 15, respectively. Moreover, the confusion matrix for the test sets is illustrated in Fig. 16.
In scenario 1, we have trained the network over nine different labels. As illustrated in Fig. 14, the network is well trained. The higher validation accuracy in comparison with training accuracy (see Fig. 14) and the lower validation loss in comparison with training loss (see Fig. 15) are a result of the utilization of the dropout regularization method to prevent overfitting. In this case, the network is trained with an average accuracy of 75% and an average loss of 0.58. By observing the behavior of the network on different datasets, as illustrated in Fig. 16, we can see that the most confusing dataset is related to Sint-Pieters railway station where many test vectors are detected as the shopping  mall and the port area. By looking at the pictures of the railway station in Fig. 8, we can see that the different parts of the railway station resemble other types of environments and the network is highly confused while detecting the different parts. Another observation from Fig. 16 is that some vectors collected at the offices are detected as library, some vectors collected at Zwijnaarde open area are detected as port area, and some vectors collected at Stadhuis crowded city center are detected as graffiti alley, and vice versa. This confusion, which results in 70% precision, is reasonable as there are obvious similarities between the confused environments. It shows that the network is learning the environment by CSI data and it proves that CSI is a good representative of environment characteristics. For example, some parts of the library are very similar to the offices, which are seen as similar channel effects on the collected signals. This will result in network detecting an office as a library. The general form of the confusion matrix is diagonal, showing that most of the test data are detected correctly.
2) Scenario 2: For scenario 2, the network is retrained with new definitions of environment labels, as described in Section IV-B. The training and validation accuracy achieved in scenario 2 are illustrated in Fig. 17. Furthermore, the confusion matrix for the test set is illustrated in Fig. 18.
In this scenario, four labels are considered for the classification including indoor, open outdoor, crowded urban, and a  shopping mall as shown in Fig. 11. The network is very well trained with an average accuracy of 97% (see Fig. 17). The confusion matrix presented in Fig. 18 shows an environment detection precision of 94% for the considered test vectors. The main confusion is between the crowded urban area and the shopping mall. This confusion can be justified by the fact that some parts in crowded urban areas resemble the shopping mall and the network cannot classify the differences between these two environments. As discussed earlier, some signals collected in the crowded urban area are facing the same effects in the shopping mall. This effect can be, for example, a result of the present people walking in the area.
3) Scenario 3: Similar to scenarios 1 and 2, the training and validation accuracy achieved in scenario 3 are illustrated in Fig. 19. Furthermore, the confusion matrix for the test sets is illustrated in Fig. 20.
In scenario 3, the network is trained over three labels, as illustrated in Fig. 12. The accuracy of training and validation indicates that the network is very well trained on the training datasets (see Fig. 19) with a 95% average training accuracy. By analyzing the illustrated confusion matrix in Fig. 20, the environment detection precision achieved in scenario 3 is 96%. 4) Scenario 4: Finally, the training and validation accuracy achieved in scenario 4 are illustrated in Fig. 21. In addition, the confusion matrix for the test sets is illustrated in Fig. 22.  In scenario 4, we analyze the robustness of the method when generalized toward new, unseen environments. We train the network on data from three environments, and test on the data collected in completely different environments, as illustrated in Fig. 13. The network achieves up to 93% training accuracy, as shown in Fig. 21. By investigating the confusion matrix in Fig. 22, regardless of testing with unseen data, an environment detection precision of 91% is achieved. It is worth mentioning that we have tried scenario 4 by swapping the environments of training and test data. The results are slightly different. In this case, the mean training accuracy is 93%, the mean validation accuracy is 94%, and the precision is 91%. A summary of the loss and accuracy values of training and validation of the network for each scenario is presented in Table IV.
As observed in the results, the network learns the type of environment from CSI data. In scenario 1, we observed that similar indoor environments such as library and office, port and park, and two crowded urban streets can be confusing for the network since they have similar characteristics. In scenarios 2 and 3, we have considered similar environments under the same label and we have seen that the network is trained with high accuracy, resulting in high precision in detecting the environment for the test set. Furthermore, we have shown that the methodology is robust by Scenario 4, in which the test set is collected from a totally different environment than those used for the training. It takes 3ms for the trained neural network to detect the environment type for each CSI measurement using high-end GPUs and Pytorch software framework. The required time for collecting one CIR sample is 167ms and the time required for FFT and CSI preparation from CIR is 99 ms. In total, one CIR vector collection and detection of environment based on that vector takes 269 ms. This amount of time shows that this method is faster than the previous methods in the literature as compared in Table I.

VI. CONCLUSION
In this article, we present a novel method for fast environment detection utilizing CNN and CSI of UWB signals. Wi-PoS devices in form of wearable systems are utilized for data collection. The proposed method mimics the monostatic radar behavior to scan the environment and is completely infrastructure-free. We have shown that the CSI data can represent the environment characteristics and by using machine learning algorithms for CSI data analysis, we are able to detect the type of environment. The results prove that the proposed method operates with a precision of up to 96% for specific use cases and a precision of 91% for general scenarios, where the considered test data are entirely unseen by the trained network. In addition, the proposed approach is significantly faster than prior methods, presented in the literature. Considering potential future steps of this research, we are interested in utilizing the proposed method in seamless positioning scenarios for vessels, vehicles, and UAVs. Another possible future research topic is object and material detection by applying similar methods as proposed. In addition, we are also interested in the utilization of denoising techniques to elaborate effects of noise and interference on environment detection.