DeepREM: Deep-Learning-Based Radio Environment Map Estimation From Sparse Measurements

Radio environment maps depict the coverage area of cellular networks. They are usually estimated by interpolating sparse measurements gathered in test drives. Typical estimation techniques rely on physical or statistical propagation models, known base station locations, topographic data, and/or building data. In this paper, we present DeepREM: a set of two deep-learning models (U-Net and CGAN) that estimate REMs from sparse measurements without requiring any additional information. A physical ray-tracing simulator with geographic and building data is required during the model training, but not for its operation afterwards. DeepREM models are capable of estimating two radio parameters: ${i}$ ) reference signal received power (RSRP) and ii) BS coverage (cell indices). Extensive testing shows that DeepREM models outperform state-of-the-art methods in terms of root mean squared error (RMSE) and mean absolute error (MAE), and that CGAN has better generalization capabilities in the analyzed scenarios (in particular when the input distribution does not fit the training dataset). Achieved RMSE and MAE are 6.32 and 4.54 dBm for RSRP estimation, while error rate was around 11% for BS coverage estimation. Moreover, our training dataset and models are publicly available and can be used to speed up and improve the accuracy of current REM estimation techniques.


I. INTRODUCTION
Radio Environment Maps (REMs) are charts of radio coverage parameters (typically the received power and cell association) on a given geographical area. They have been traditionally used in the design and deployment of wireless networks in both outdoor and indoor scenarios [1], providing information on estimated and/or actual coverage areas. A common problem in this context is to estimate REMs from sparse measurements taken in drive tests [2], [3]. Due to their complexity and cost for network operators, conventional drive tests are being replaced by crowdsensing schemes, where cellular user terminals gather radio signal parameters and share them with the operator. These geographically The associate editor coordinating the review of this manuscript and approving it for publication was Vittorio Degli-Esposti . sparse data are used to interpolate the coverage and estimate a full radio map of the area of interest [4].
REM estimation methods from sparse measurements have received increased attention in recent years, since they are not only used by network operators, but also in active research topics such as cognitive radio networks, self-organizing networks, and physical layer statistics, to name just a few. According to [5], REMs allow tracking of terminal locations and waveforms, traffic volume measurements, motion predictions, interference levels, and spectrum usage. In cognitive radio networks, REMs provide information on spectrum usage, propagation characteristics, and enable the application of spectrum usage policies [6]. In self-organizing networks, REMs help to automate the network configuration and optimize control parameters such as antenna tilt, cell transmit power, and other radio parameters [7]. Moreover, according to [2], REMs present a more accurate alternative for signal level calculations than predictions based on physical propagation models. Thus, it is evident that REM estimation provides significant benefits in both industry and research contexts.
In this work, we refer exclusively to REMs of two radio parameters: i) received power (equivalent to the Reference Signal Received Power (RSRP) in 4G cellular networks), and ii) the Base Station (BS) coverage, defined as an integer that indicates which BS is serving each location in the map. These REMs are constructed in two phases. First, measurements at sparse locations within an area of interest are obtained from drive tests or crowdsensing, as mentioned above. Secondly, radio parameter estimates are obtained for locations that have not been measured. Thus, the main challenge in constructing REMs is to obtain the most accurate and the highest resolution maps using sparse measurements. A number of works propose various techniques to tackle these challenges and have varying levels of performance. However, they are often computationally costly [8], require statistical path-loss models [9], [10], [11], can only interpolate along streets in the area of interest [9], or require geographical information in addition to radio measurements (e.g., elevation maps, geolocated BS positions, etc.) [11]. Considering these limitations of current methods to estimate REMs, we tackle the following research question: can we improve the accuracy of current REM estimation methods without requiring additional geographic information or statistical models? In this paper, we provide a positive answer to this question and present DeepREM as a tool to achieve it.

A. RELATED WORK
Numerous approaches for REM construction have been proposed in the literature. According to [1] and [8], we can group these approaches into three categories. The first category includes direct interpolation techniques, which are the most commonly used due to their low complexity. These methods estimate the path loss at non-measured locations using statistical signal processing techniques such as Kriging [12]. Kriging uses variogram analysis to estimate signal strength at unmeasured locations. The method seeks to minimize error variations and produces a residual error with zero mean. However, these techniques require knowledge of the path loss statistics, which are assumed to be stationary but often are not due to the varying propagation conditions within the area of interest. For example, [9] uses a feed forward neural network to improve Kriging performance in the construction of REMs through a measurement-based experiment. In [10], a Kriging interpolation algorithm is used to infer complete radio-environment information from sample information data collected based on mobile crowdsensing; the accuracy of this technique is compared with artificial neural networks and inverse distance weighting, demonstrating that Kriging offers the lowest interpolation error. In [13], inverse distance weighting interpolation is used assuming that the signal strength at short distances from the transmitter is greater than at far distances. The signal strength is estimated using a weighted average of its neighbors, where the weight is proportional to the inverse of the distance. However, this method does not perform well in scenarios with large shadowing variance, since the received power varies significantly over short distances due to obstacles.
The second category of REM estimation methods is the indirect approach, where the transmitter parameters (location and transmitted power) are estimated first and then the received power at unknown points is calculated based on propagation models. This approach is more computationally complex than the direct method and requires geographical information such as building and elevation maps. In [14], a construction of REMs based on location estimation is proposed. This method outperforms interpolation methods in the Rayleigh fading environment in terms of Root Mean Squared Error (RMSE). Indirect approaches are strongly dependent on the chosen propagation model (e.g., ray-tracing, log-normal, etc.) which may result in inaccuracies in coverage prediction if the model is not well calibrated [8].
The third category is hybrid methods that combine both direct and indirect approaches. In [15], a high-accuracy hybrid approach that combines estimation of propagation model parameters with Kriging interpolation is presented. Reference [16] proposes a hybrid method that uses an interpolation technique to construct a received signal strength image representing only one transmitter in the area of interest. Subsequently, images of all transmitters are combined using a labeling technique that forms an overall image with features such as antenna orientations, radiation patterns, and received power. The complexity of this method depends on the selected interpolation technique combined with the complexity of the image processing which depends on the image resolutions.
Within all of the three categories, several Machine Learning (ML) and Deep Learning (DL) techniques have been proposed in the past few years due to their successful application to similar problems in different research areas. For example, the K-Nearest Neighbor (KNN) algorithm has been applied for constructing REMs due to its simplicity [17]. Also, reference [11] uses a two-step algorithm that includes KNN to estimate the received power at locations where no measurements are available. In the first step, several clusters of radio propagation parameters are defined within the area of interest (i.e., path loss exponents and intercepts). In the second stage, each pixel in the map is classified as belonging to one of the parameter clusters, creating areas that share the same propagation characteristics. Even though this method creates accurate REMs, it requires the knowledge of received power measurements with their locations, in addition to the cell identification, and the location of all serving base stations. It also assumes a log-normal path-loss model.
In [18], radio and network parameters are estimated using several ML algorithms. These algorithms include KNN, Support Vector Machines (SVM), generalized linear models, Kriging, and distance-based prediction. Among the methods, support vector machines showed the best performance. However, this work does not estimate a complete REM, but only provides estimations for active users. In [19], a comparative analysis of KNN, SVM, and two decision tree-based models (XGBoost and LightGBM) is provided; the results showed a better performance of the tree-based models. However, the approach is only applied to wireless local area networks and cannot be generalized to wide area cellular networks with the provided data. Studies have also been carried out for the estimation of REMs using DL algorithms. In [20] and [21] neural networks that estimate the path loss for each input transmitter/receiver pair are proposed. The network is trained on a fixed map with simulated path loss values at a set of transmitter/receiver location pairs. Different city maps require pre-training the network and each trained network describes a specific map. In [22], a highly efficient and accurate DL method is proposed to estimate path loss from a transmitter location to any point in a flat domain. The study shows that properly designed and trained DL networks can learn to estimate the path loss function in a given urban environment with high accuracy and low computational complexity. In [23], a graph-convolutional neural network is trained to approximate an auto-regressive moving average model that reconstructs REMs from sparse measurements.
DeepREM models learn from a physical simulation (raytracing) dataset and generates path loss estimates that are very close to simulations, but are much faster to compute for real-time applications. Extensive numerical simulation results show that the method significantly outperforms others in the literature.

B. CONTRIBUTIONS
DeepREM is a methodology to estimate REMs in urban scenarios based on two DL architectures: U-Net and Conditional Generative Adversarial Network (CGAN). Our goal is to overcome the limitations found in current methods, in particular the need for side information on the area of interest and the reliance on statistical models, whose assumptions might not hold. Using a sparse set of measurements in outdoor environments, DeepREM reconstructs complete REMs of two parameters: the RSRP and the BS coverage. To achieve this, both models are trained using ray-tracinggenerated REMs that are undersampled to simulate sparse measurements. Additionally, we test the DeepREM generalization capabilities for different environments and propagation conditions by validating it in nine cities with several geographic and building differences. The models capture the physical radio propagation phenomena of the scenarios during training, so that the methodology can be easily extended to other environments as well. We evaluate the capabilities of these networks using appropriate error metrics and compare them with [11], which is applicable to identical scenarios and requires similar inputs. More specifically DeepREM has the following contributions with respect to the state of the art.
1) Regarding the required inputs, our models require only the received power, cell association measurements, and their locations on the map. To the best of our knowledge, this work is the only one that does not require additional information. This is an advantage with respect to other state-of-the-art methods, such as [11] and [22] that require knowledge of the transmitter positions or distances between transmitters and measurement locations. 2) Unlike other methods described above, our techniques do not assume any path loss model and are purely datadriven. 3) We present a new dataset of simulated REMs for received power and BS coverage in outdoor scenarios for different cities and diverse transmitter locations. This dataset, together with the model files and a testing app, are publicly available for replication or future works [24]. 4) DeepREM models capture the radio propagation physics of ray-tracing simulations during the training phase, including the influence of elevation and buildings. However, after the models are trained, such side information is not required anymore and the users need only to supply the measurement values and positions. This is a critical advantage with respect to state-of-theart methods, which also allows a considerable reduction in computation time of REMs (once trained, our models provide a REM estimation in less than a second). 5) DeepREM models outperform the benchmark in [11] in terms of RMSE and Mean Absolute Error (MAE). The benchmark is the only method in literature that operates under similar conditions, requiring only the transmitter positions as additional input with respect to our models. Other methods require considerably more side information and are thus omitted. 6) We evaluate DeepREM models to determine their performance in REM reconstruction using a similar procedure to that of image denoising [25]. Thus, the present study not only provides results in the field of communications systems, but also provides some insights into image processing based on DL approaches.

Fig. 1 shows the DeepREM construction methodology.
First, we generate a dataset of ray-tracing simulations that provides baseline REMs in several urban scenarios for the two measurements RSRP and BS coverage. Second, an undersampling-based data preprocessing is performed to simulate incomplete input maps. Third, we train U-Net and CGAN models to estimate REMs from those maps that simulate sparse measurements. As a last step, we test the models and compare them with current methods used in similar scenarios. In this section, we thoroughly describe this methodology.

A. DATASET CONSTRUCTION
We constructed the DeepREM dataset [24] from coverage images generated using Altair WinProp [26]. This software uses ray-tracing physical models to calculate the received power in an area of interest given the BS parameters. The VOLUME 11, 2023  simulator uses digital elevation and building maps to calculate ray interactions and provide more realistic approximations of the actual received power than statistical propagation models. Reflections, transmissions, and diffractions are accounted in the ray-tracing procedures. The DeepREM dataset is publicly available and is distributed in two data subsets [24]. The first one consists of 1800 RSRP coverage maps for one transmitter per scenario. The second includes 3600 maps, 1800 with BS coverage data, and 1800 with RSRP information for four transmitters in each scenario. The transmitters were placed in random locations on each map with different combinations of the simulation parameters shown in Table 1, which were taken from [27]. Digital elevation maps were taken from the Colombian cities of Armenia, Bogotá, Cali, Ibagué, Manizales, Medellín, and Pasto, and U.S. cities of Columbus, OH, and Washington, DC. This diverse selection of cities allows models to learn different propagation conditions (i.e., hilly and flat terrain, suburban and dense urban scenarios) and improve their generalization capabilities. Fig. 2 shows the process to generate REMs in outdoor environments. First, building and topographic vector databases are generated after preprocessing building and topographic maps in WinProp. Building information was taken from the Humanitarian OpenStreetMap Team (HOT) website [28], an open service that creates customized extracts of up-todate OpenStreetMap data [29]. This tool was used to extract urban maps with an approximate area of 2500 m 2 that include buildings represented as polygonal cylinders. The topography of urban scenarios is also taken into account as it influences the radio propagation. Topographic maps were downloaded from the global multi-resolution terrain elevation data 2010 (GMTED2010) made available by the U.S. Geological Survey's (USGS) Earth Explorer [30].
WinProp requires a pre-processing of the topography and building vector databases to perform ray-tracing simulations. During pre-processing, the databases are divided into tiles and segments and the visibility between them is determined, which requires considerable processing time. After pre-processing, we obtain coverage maps with four BS in each scenario. BS coordinates are selected from a uniform distribution within the map, and the other radio parameters are also selected randomly from the parameters in Table 1. For each geographical area, fifty BS positions are simulated, with four different radio parameter combinations in each position. A total of two hundred combinations are simulated for each area. REMs of both RSRP and BS coverage are saved as 256 × 256 matrices, where the matrix components are the pixels of the equivalent image. The REM resolution is 10 meters. RSRP is defined as the received power measured in dBm, whereas BS coverage is coded as an integer that indicates which of the four BSs has the highest RSRP in each pixel. Fig. 3 shows five REM samples for different cities. In the first row, RSRP maps are shown, with yellow representing a strong signal and blue representing a weak one. The second row shows their corresponding BS coverage maps with the four different colors representing the coverage area of each BS on the map.

B. NETWORK INPUT DATA
Since real measurements obtained either with drive tests or other sources are typically sparsely distributed in the area of interest [31], we simulate these incomplete maps according to the following process. Let (i, j) be the spatial coordinates of the complete map m andm the incomplete map. We find a set of 95% of random positions without repetition in j as the ith-row is traversed. In these positions m(i, j) is replaced by N , the lowest level of map intensity (i.e., N = −200 dBm for RSRP and N = 0 for BS coverage maps). Using this procedure, the known pixel locations in the map are uniformly distributed. This is frequently assumed in wireless networks (e.g., see [32], [33]) and can be justified as follows: 1) For medium and large cities, building features are usually constant in the estimated map area (2.56 Km × 2.56 Km), which leads to a uniform distribution of user  (and measurement) positions. Even if there are areas within a city that have different user densities, they can be separated into different maps to preserve the uniform distribution assumption. 2) If the sample locations are not uniformly distributed, crowdsourcing allows the collection of a large number of samples over a long period of time. The samples can then be filtered to approximate a uniform distribution given that REMs are approximately constant over time. Although the training set is generated with a uniform distribution, we also test the trained models under different input distributions in Section III in order to analyze their generalization capabilities.
The challenge we set is to train a model capable to predict the values of 95% of unknown REMs pixels from only 5% of known samples. This goal is set to allow comparison with state-of-the-art techniques that use such inputs.
After simulating incomplete REMs, we evaluate the distributions of the resulting input and output datasets used for training. The RSRP histogram is shown in Fig. 4(a), where the input data show most of the values are set to -200 dBm, which represents the unknown samples. The target data histogram, shown in blue, approximates a normal distribution. Fig. 4(b) shows the cell ID histogram. The target data are approximately uniformly distributed over the four BSs. The highest frequency of the input data represents the unknown data values. Since the pixels are balanced across target classes, we do not carry out any data augmentation process.

C. TRAINING AND TESTING OF DL-BASED MODELS 1) U-NET-BASED REM ESTIMATION
Autoencoders are neural networks that perform unsupervised learning, dimensionality reduction, and data compression tasks [34]. They learn to produce outputs based on the original  input by compressing it into a lower dimensional space using a lesser number of neurons in the hidden layers and, from there, reconstructing the input with minimal information loss. Consequently, hidden layers may learn the input features with fewer parameters [35]. In our application, we use a denoising autoencoder, which is a network that recreates the input while making noise reduction attempts.
The autoencoder architecture is constructed by connecting two networks together: encoder and decoder [36]. The encoder serves as a data compression network that extracts the most relevant features, whereas the decoder reconstructs the original image starting from the lower-dimensional representation.
A U-Net is an autoencoder that includes skip connections [37]. Experimental evidence shows that skip connections help the model converge faster [38]. The model we use is a U-Net where the layers in the encoder part are skip connected and concatenated with layers in the decoder part. This architecture helps to incorporate learned characteristics from earlier levels into deeper layers [39]. Fig. 5 shows the employed U-Net architecture. In the encoding stage, the number of kernels increases in powers of two, as the size of the maps decreases. The decoder phase uses the inverse pattern by taking the concatenation with the skip connections. The hyper-parameters of this architecture are based on a Pix2Pix generator model [40], since it was proposed for image translation applications. The activation layers were tuned according to trial-and-error tests. When using this architecture for RSRP interpolation (regression), we use LeakyReLU as activation function in the hidden layers and a linear output activation. For BS coverage REMs, we use ReLU and softmax activations in the hidden and output layers, respectively.

2) CGAN-BASED REM ESTIMATION
GANs generate new data with the same distribution as their training dataset. In their original form, they are classified as unsupervised learning techniques because no labeled data is required [41]. However, there have been recent extensions where GANs can be identified as either semi-supervised or supervised [42]. The conventional GAN architecture is composed of two networks: generator and discriminator. The generator receives a random vector and generates an output image. Then, the discriminator assesses the quality of generated images to help the generator improve the image quality. Thus, the discriminator is a classifier that learns to separate generated images from real images. The two networks play an adversarial game, where the generator develops better outputs to fool the discriminator. Likewise, the discriminator gets stronger at detecting the synthesized images [42].
Conditional GANs (CGANs) are adversarial networks where the generator and discriminator are conditioned to some side information that is fed onto the generator and discriminator as an additional input layer. This model converts a given image from a specific domain to another (imageto-image translation). The hyper-parameters of our CGAN architecture were based on the Pix2Pix model [40]. Fig. 6 shows the architecture we employed. The generator is the same as in the U-Net case; thus, only the discriminator information is shown. The discriminator's input is the concatenation of the incomplete REM and the generated REM. These data are alternated with the incomplete REM concatenated with the ground truth REM, so that the discriminator is trained with both the generated and true data and is conditioned to a deterministic input. The additional computational complexity introduced by the discriminator requires more effort during training but a better performance with respect to the U-Net model is also expected. In Section, III we confirm that CGAN indeed outperforms U-Net when the input distribution does not fit that of the training dataset.
The U-Net and CGAN were developed using Keras. We trained the U-Net models using the Google Colab cloud service which provides access to an NVIDIA Tesla K80 GPU with 12 GB RAM. The CGAN was trained with a workstation (Intel Core i7-10700 2.90GHz CPU, 32 GB RAM, and AMD Radeon Pro WX 3200 GPU).

3) k-FOLD-BASED TRAINING
Cross validation (CV) is a commonly used technique to test the performance of DL models on unseen data in terms of prediction error. In CV, the error is tested in several runs where data are randomly split into training and test sets. Thus, every data point has a chance of being used for training or testing in different runs. CV uses statistics from several training runs to summarize the model performance, rather than evaluating it from a single run. k-fold CV is a variation that divides data into k equal partitions called folds; then, at each iteration, the k-th fold is used to evaluate the performance of the model that was trained with the remaining k−1 folds [43]. We trained our models using k-fold CV with k = 5 folds. Hence, the dataset was randomly split into 1440 training maps and 360 test maps in each iteration, so that the performance of the DL models in generating all the maps of the dataset is evaluated. This procedure was carried out for both RSRP and BS-coverage indexes.
We use MSE and sparse categorical cross entropy loss functions to train the RSRP and BS coverage U-Net models, respectively [42]. For CGAN, the generator uses the same loss functions as U-Net and the discriminator uses binary cross entropy. The optimizer is Adam with a learning rate of 10 −3 , given that it has demonstrated good experimental results compared with other stochastic optimization methods [44]. We take 100 epochs and a batch size of 12 to train each model. To avoid over-fitting, dropout regularization was used in some layers. Thus, the network can learn a better-generalized mapping from input to output [45]. In addition, to speed up training and reduce the likelihood of convergence to a local optimum, input and output data were scaled using a standard scaler transformation. This pre-preprocessing rescales the distribution of values to have zero mean and unit variance [46].

4) EVALUATION METRICS
To evaluate the performance of models, we select appropriate metrics for the two types of prediction: regression (for RSRP REMs) and classification (for BS coverage REMs).
On one hand, BS coverage prediction is categorized as a classification problem, where the output is limited to a discrete set of values. Let X be the ground truth BS coverage REM and X ′ its predicted REM. Then, for BS coverage, X i,j , X ′ i,j ∈ {1, 2, 3, 4}, where X i,j is the pixel in position (i, j) of X. To evaluate the models' performance in the BS classification, we selected common metrics categorized as follows.
• Classification metrics: The error rate is commonly used to assess how many predictions are not the same as its ground truth in the BS coverage prediction [19].  In addition, the F1 score and confusion matrix were calculated to allow comparison with state-of-the-art techniques [47].
• Segmentation metrics: to evaluate the segmentation quality (how well the predicted coverage areas match the ground truth), we assess the intersection-over-union (IoU) and Dice similarity coefficient (DSC) overlap metrics [48]. IoU is the area of the intersection divided by the area of the union between regions belonging to the same class, averaged over all the classes. The DSC is twice the area of intersection between regions of the same class divided by the sum area of both images, averaged over all the classes. On the other hand, RSRP is a continuous value. Thus, we can evaluate the prediction using regression metrics such as the root mean squared error (RMSE) given by where ∥ · ∥ F is the Frobenius norm, and the mean-absoluteerror (MAE) is Both of these metrics are easily interpretable since their units are the same as the REM (i.e., dBm).

III. RESULTS AND DISCUSSION
A. CONVERGENCE ANALYSIS Fig. 7 shows error metrics for the first fold in the training and validation phases for both U-Net and CGAN models. Fig. 7(a) Table 2 shows the model evaluation metrics for BS coverage prediction in each fold. Error rates of 11.5% and 12.3% are achieved by CGAN and U-Net, respectively. The IoU index indicates that CGAN has an overlap of 80% between the ground truth and the estimated REMs, while its DCS is 89%, both outperforming U-Net. However, CGAN presents a higher variance in the error rate and segmentation metric estimations. CGAN also performs better than U-Net regarding the F1 metric. Fig. 8 shows the confusion matrices for each model. The matrices indicate that errors are balanced across cell IDs. Even though CGAN outperforms U-Net in all of the prediction metrics, the differences between the two architectures are below 1%. Fig. 9 shows some samples of BS coverage maps generated by the proposed networks from 5% of known samples, where each color represents a different BS index. Classification  errors are mainly presented at the boundaries of the transmission coverage area but, overall, the prediction is appropriate. Table 3 and Fig. 10. The lowest average errors in RSRP prediction were achieved by U-Net with an RMSE of 6.32 dBm and an MAE of 4.54 dBm. U-Net not only achieves the lowest RMSE, but also the lowest variances. CGAN obtained an RMSE of 8.1 dBm and an MAE of 5.7 dBm. Thus, U-Net estimations are better than those of CGAN for this particular problem.

RMSE and MAE achieved for both models are presented in
Both models achieve good performance in segmentation (BS coverage) and interpolation (RSRP) of REMs from only 5% of dispersed samples. Nevertheless, we highlight that the difference in RSRP prediction between CGAN and U-Net was noteworthy (i.e. an RMSE of approximately 5 dBm less for U-Net). For BS coverage prediction, the difference in error rate between models was approximately 1%. In addition, we emphasize that the computational cost of training a U-Net model is much lower compared to CGAN.

C. DATA DISTRIBUTION ANALYSIS
In this section, we perform a comparative analysis of data distributions of the input, ground truth, and predicted values for both models. These statistics are taken from the k = 0 fold for RSRP and BS coverage indices. Fig. 11(a) shows the data distribution of BS coverage over the four possible discrete values corresponding to cell IDs. The input distribution shows high a frequency representing the unknown samples of the maps at 0. The ground-truth data approximates a uniform distribution with slightly higher values for cell IDs 1 and 2 in this particular fold. Ideally, the distributions generated by DL models from the input data are expected to approximate the ground-truth.
Regarding RSRP REMs, Fig. 11(b) shows the high frequency at −200 dBm representing unknown samples in the input map. The frequency distributions of predicted data are close to the ground truth for both CGAN and U-Net, with the number of RSRP indices concentrated approximately in the range from −120 to 0 dBm. These approximations can be verified by estimating the mean and its standard deviation shown in Table 4. U-Net approximates the ground truth better than CGAN, with a difference of less than 0.5 dB in the mean and less than 1.5 dB in the standard deviation with respect to the ground truth. Thus, we validate the interpolation of RSRP by DeepREM models.

D. COMPARISON TO THE STATE OF THE ART 1) COMPARISON TO A CLUSTERED REGRESSION ALGORITHM
We compared the processing times to generate REMs with our models, reference [11], and the ray-tracing simulator WinProp. We used a workstation with an Intel Core i5 CPU and an NVIDIA Tesla k18 GPU. For outdoor scenarios, Win-Prop runs a simulation in 10 s, plus the pre-processing time of around 10 4 s. In MATLAB, the clustered regression algorithm in [11] generates a REM estimation in 10 2 s. Finally, U-Net and CGAN estimate REMs in 10 −1 and 10 −2 s, respectively. Hence, the DeepREM models are between 3 to 4 orders of magnitude faster than the method in [11], and 5 to 6 orders of magnitude faster than WinProp.

2) COMPARISON TO PROPAGATION-BASED MODEL
Concerning the RSRP prediction, we compare U-Net and CGAN for k = 4 fold to the algorithm in [11] that we adopt as the benchmark. This method uses regression clustering to model the RSRP at coordinate x as   where α(x) and β(x) are the parameters found through regression clustering from the known samples in the input. This method requires the knowledge of both measurement positions and distances to the BS. It is worth noting that DeepREM models do not require this additional information.
We compare our method with this regression model using an input of 5% of measurements on a map. As stated in [11], the MAE decreases as the number of regression clusters increases. Thus, to carry out the comparison, we use regressions with 9-clusters and 25-clusters as benchmarks. Since this methodology requires knowledge of BS locations, a total of 107 maps from test set were selected in which all transmitter locations were easily approximated. Table 5 shows the RMSE and MAE results for our models and the benchmark. Benchmark performance improves when the number of clusters increases. DeepREM models outperform the benchmark with an RMSE of 8 and 6 dBm for CGAN and U-Net, respectively. In addition, the error values obtained by the benchmark model (9-clusters and 25-clusters) have greater variance compared to the proposed networks, as shown in Fig. 12. U-Net achieves the lowest prediction errors and the lowest variance. The 9-cluster benchmark has the highest variance. According to MAE and RMSE results, the interpolation carried out by CGAN has a lower variance than the parametric model with 25-clusters. Hence, we validate the effectiveness of DeepREM and its 48706 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
A. Chaves-Villota, C. A. Viteri-Mera: DeepREM: Deep-Learning-Based REM Estimation From Sparse Measurements  capabilities to improve the performance of state-of-the-art methods. As noted above, the U-Net model has the best performance in this test.

E. ESTIMATED REMs
The procedure for estimating REMs is performed from only 5% of known sparse samples over the coverage area. Fig. 17 shows some generated REMs by each model for RSRP measurements in regions with variations of topographic elevation. From top to bottom, the figure shows predicted maps for areas of Washington, DC, Cali, Manizales, and Armenia. The first column shows the input data required to estimate the complete REMs. In all maps, the points in yellow indicate the transmitter positions. At close distances from the transmitters, the power values indicate strong signals (more than -70 dBm), while weak power levels are found at shadowed places. Rows 3 and 4 represent mountainous areas of cities. The signal varies considerably over short distances unlike the signal REMs for rows 1 and 2, where the terrain is flat. REMs generated by U-Net and CGAN models (2 and 3 columns) achieve a better estimation of signal measurements at the edges of the polygon shapes that represent the buildings in outdoor environments, unlike the benchmark model (9 or 25 clusters), where the prediction depends on the clustering parameters [11]. These visual tests of interpolated REMs indicate that DeepREM models generate estimations closer to the ground truth in the evaluated scenarios. Fig. 9 presents constructed REMs of BS coverage. These maps were selected in the same prediction areas as Fig. 17. Each color indicates the identifier of the BS coverage area. This segmentation test is only performed with the proposed networks, since the benchmark method assumes that the BS coverage is known and does not calculate it. Using the trained networks, REMs are constructed from 5% of known samples. DeepREM models classify the BS to which every pixel is connected. As mentioned above, the main prediction errors occur at the coverage boundaries of the base stations. The reconstructed maps show the building interactions in outdoor scenarios. Both U-Net and CGAN achieved a good prediction in most indices (almost 90% of the unknown samples are correctly classified, according to Table 2).

F. U-NET AND CGAN TESTING WITH HIGH UNCERTAINTY
To test the DeepREM generalization capability, we tested the models in four scenarios that were not seen during training. The first one (Scenario A) corresponds to generating REMs from only 2.5% of known samples, even when the models were trained with 5% of known samples. In Scenario B, the models are challenged to generate REMs for a rectangular region of unknown samples within the map. In Scenario C, the models are tested in four new cities that were not seen during training. In Scenario D, we test the model under non-uniform sample location distributions. For Scenarios B to D, 5% of known samples are used. We compare the models that achieved the best evaluation metrics (U-Net and CGAN trained on k = 4 fold) and the 25-cluster benchmark. In scenarios A and B, REMs from the test set on k = 4 fold were VOLUME 11, 2023  used. In C and D, 100 new REMs for each city and spatial distribution were simulated. The results are described below. Table 6 shows the results of REM estimation errors for scenario A. The lowest prediction error was achieved by U-Net followed by the 25-cluster benchmark, and the lowest performance was obtained by CGAN. On one hand, we verify the best performance for U-Net, achieving not only the lowest RMSE and MAE, but also the lowest variance. On the other hand, CGAN presents the largest estimation errors and variance. The visual test of generated REMs is shown in Fig. 13, where it is observed that CGAN-generated REMs are the least accurate in contrast to those provided by the other two models. In general, U-Net and the benchmark present a higher estimation error than the obtained for the 5% of known samples, with an increase in the RMSE of 2.5 and 2.9 dBm, respectively. Thus, we conclude that the smaller the number of known samples, the greater the REM estimation error. Lastly, we emphasize that U-Net can generate more accurate maps, 3.5 and 5.4 dBm less in RMSE compared to the benchmark and CGAN models, respectively.

2) SCENARIO B: UNKNOWN REGION
We generated random regions of unknown samples with areas of 640 × 640 m 2 within the input maps with sparse measurements (5% known samples outside of the unknown areas). The goal is to assess REM generation by models in a more realistic setting when it is not possible to obtain measurements in restricted or inaccessible places. Table 7, and Fig. 16 show that U-Net achieves a better performance than the CGAN and benchmark models in this scenario, with 1.71 and 2.33 dBm less RMSE, respectively. Furthermore, U-Net variance is also much lower. Fig. 16 shows the visual test of generated REMs; odd rows show the complete maps with the region of unknown samples in a red square. This information is zoomed in on the even rows for each map, respectively. The higher errors in the construction of some REMs may be seen in the last row, where, according to the ground-truth map, power measurements show high levels, and our models assume low signal strength values for this region. Thus, we verify that the models do not present over-fitting and, regardless of the region of unknown samples, the interpolation of remaining RSRP measurements is not affected. The map outside the region of unknown samples is still constructed with values close to the ground truth.

3) SCENARIO C: UNSEEN CITIES
To validate the models' generalization capabilities, we test them with 5% of known samples in urban maps that were not seen during training. This testing set includes the Colombian cities of Bucaramanga, Barranquilla, Popayan, and Pasto (the latter in the northern region not seen during training), which have varying elevation and building features. For each city, 100 different random combinations of transmitter positions and operating parameters were generated, for a total of 400 REMs. Table 8 shows the RMSE and MAE results of DeepREM models compared to the benchmark. On average, the benchmark obtained the best results, closely followed by the CGAN model with an error only 0.2 dBm above the benchmark in RMSE and 1.25 dBm above in MAE. However, CGAN obtained approximately half the benchmark's standard deviation. Errors in U-Net are approximately 1.25 dBm above those of CGAN in this test. CGAN performance in this zero-shot scenario improves by approximately 0.3 dBm in RMSE and degrades 1.59 dBm in MAE with respect to the baseline (Table 3 where the cities used for testing were seen during training). We consider this degradation small compared to the high variability of received power over short distances (e.g., fading variance has a standard deviation of approximately 7 dB in many urban areas [27]) and conclude that CGAN has better generalization capabilities in unseen scenarios. We also remark that the REMs reconstructed in Pasto were more accurate than those of other cities; this is because the training set included REMs of other sectors of the city and, therefore, the models learned its geographical features. Fig. 14 shows the visual test of the generated REMs.

4) SCENARIO D: NON-UNIFORM DISTRIBUTIONS
In the last scenario, DeepREM models are challenged to reconstruct REMs using 5% of known samples under three non-uniform distributions of the sample positions in the same unknown cities tested in Section III-F3. This scenario tests the models' generalization capabilities to deal with input A. Chaves-Villota, C. A. Viteri-Mera: DeepREM: Deep-Learning-Based REM Estimation From Sparse Measurements  distributions that differ from that of the training dataset. The analyzed input distributions are [49], [50], [51]: 1) Normal distribution. Each spatial coordinate (i and j) is taken from a normal distribution with the mean selected from a uniform distribution within the map's limits and a standard deviation of 500 meters. 2) Clustered normal distribution. Fifty clusters of sample points are placed on the map. Each cluster has a normal distribution with the mean taken from a uniform distribution and a standard deviation of 80 meters. 3) Clustered Poisson point process. Similar to the clustered normal distribution, but each of the fifty clusters has sample points uniformly distributed within a circle of 80 meters. The number of sample points is 5% of the number of pixels in all cases. Table 9 shows the RMSE and MAE achieved for each technique. The best performance in this scenario is achieved by the benchmark model in both RMSE (11.66 dBm) and MAE (7.91 dBm). Among the DeepREM architectures, CGAN presents the lowest estimation errors, with an RMSE of 14.87 dBm and a MAE of 11.06 dBm. Thus, CGAN performance is between 3 and 4 dBm below the benchmark and approximately 4 dBm below the baseline scenario with a uniform distribution ( Table 3). The U-Net model performed poorly in this scenario. Hence, we note that CGAN has better generalization capabilities in reconstructing REMs that do not match the training distribution; this should be further investigated in future works. The observed performance degradation is acceptable given the large uncertainty of RF propagation measurements and could be overcome by extending the training dataset with other input distributions in future works. Fig. 15 shows a visual test of the reconstructed REMs. Table 10 summarizes the models that achieved the best performance in all the evaluated scenarios. For RSRP REMs, we observe that U-Net works better when the scenario fits the training data distribution (uniform), while CGAN works better when the scenario does not (e.g., unknown cities, or nonuniform distributions). For BS coverage, CGAN outperforms U-Net.

H. IMPLEMENTATION ISSUES AND LIMITATIONS
Even though DeepREM achieved good performance metrics in scenarios not seen during training, its performance can be improved by including more scenarios in the training dataset, such as rural areas or other cities. We believe this could improve DeepREM generalization capabilities. In addition, this research was limited to knowledge of only 5% of the measurements in each map. However, using additional input information (e.g. BS positions, geographic elevation, and building information) could also contribute to improve the accuracy of DeepREM estimates, probably minimizing the number of measurements required for the prediction of complete REMs.
Finally, in order to emulate data acquisition in a more realistic environment (e.g. crowdsensing), the training dataset VOLUME 11, 2023 models are trained using a dataset of REMs obtained with the ray-tracing simulator WinProp. We focused on the map reconstruction for two parameters: i) RSRP and ii) BS coverage (cell indices). Unlike current methods, our models only require elevation and building information during training, but not during normal operation. BS positions are not required at all. Moreover, DeepREM does not rely on parametric radio propagation models, since the physics behind ray-tracing is captured during training. Extensive testing showed that DeepREM outperforms state-of-the-art methods in terms of RMSE and MAE. Our U-Net model has a better performance than CGAN for RSRP estimation, while CGAN outperforms U-Net for BS coverage estimation. CGAN performance is also better when the sample location distributions do not fit the training dataset. Achieved RMSE and MAE during validation are 6.32 and 4.54 dBm for our best model in RSRP reconstruction, while error rate was around 11% for BS coverage reconstruction.