In-City Rain Mapping from Commercial Microwave Links—Challenges and Opportunities

Obtaining accurate rainfall measurements is highly important in urban areas, having a significant impact on different aspects in city life. Opportunistic rainfall sensing utilizing measurements collected by existing microwave and mmWave-based wireless networks has been researched in the last two decades and can be considered as an opportunistic integrated sensing and communication (ISAC) approach. In this paper, we compare two methods that utilize received signal level (RSL) measurements obtained by an existing smart-city wireless network deployed in the city of Rehovot, Israel, for rain estimation. The first method is a model-based approach using the RSL measurements from short links, in which two design parameters are calibrated empirically. This method is combined with a known wet/dry classification method, which is based on the rolling standard deviation of the RSL. The second method is a data-driven approach, based on a recurrent neural network (RNN), which is trained to estimate rainfall and classify wet/dry periods. We compare the results of rainfall classification and estimation from both methods and show that the data-driven approach slightly outperforms the empirical model and that the improvement is most significant for light rainfall events. Furthermore, we apply both methods to construct high-resolution 2D maps of accumulated rainfall in the city of Rehovot. The ground-level rainfall maps constructed over the city area are compared for the first time with weather radar rainfall maps obtained from the Israeli Meteorological Service (IMS). The rain maps generated by the smart-city network are found to be in agreement with the average rainfall depth obtained from the radar, demonstrating the potential of using existing smart-city networks as a source for constructing 2D high-resolution rainfall maps.


Introduction
Accurate and robust rainfall measurements are highly important for urban water management, flood risk mitigation, urban planning, transportation management, agriculture and more.
Flash floods can be caused by excessive rainfall and are characterized by rapid onset. The presence of large impervious areas, buildings and blocking of storm-water flow impose high risks of flash floods in urban areas. Flash floods are frequently responsible for loss of lives and damage to infrastructure. Climate change and increasing urbanization implies more frequent and severe flash floods in the future. Over the last decade, flash-flood forecast lead-time has expanded up to six hours due to improved rainfall forecasts; yet, unknown future precipitation remains the largest source of uncertainty of flash-flood forecasts [1]. Javier et al. [2] showed that high-resolution rainfall rate fields can provide important elements of site-specific flash-flood forecasting systems in small urban watersheds. They also commented that errors in rainfall fields produce the largest sources of uncertainty in quantitative flash-flood forecasting.
The absorption and scattering of electromagnetic waves from hydrometeors along the path of propagation cause attenuation to transmitted signals. This fact was utilized for where A(t), in dB, is referred to as the total attenuation and t indicates time.
As mentioned before, the transmitted signal is scattered by hydrometeors along the path of propagation, which constitutes the rain-induced attenuation of the signal. The total attenuation of a signal is modeled as a summation of free-space attenuation ("path loss"), gaseous attenuation, rain induced attenuation and more.
To summarize, total attenuation A(t) can be described by the simplified model of (2): where A r (t) is rainfall-induced attenuation, A WA (t) is attenuation due to the wet antenna effect [6] and A BL (t) is attenuation caused by sources other than rain, such as free-space attenuation and gas attenuation, and is referred to as the baseline attenuation. Usually, A BL (t) slowly changes with time compared to rain-induced factors. N(t) is an additive measurement noise assumed independent among all links. Attenuation due to rain is usually modeled using the power law [7]: where a, b are coefficients depending on the link-specific frequency, antenna polarization and rain drop size distribution [8]; L, in km, is the link path length; and R(t), in mm/h, is the rain rate (i.e., the rain intensity). A comprehensive study on rainfall and water vapor sensing with microwave links operating at E-band frequencies was conducted in [9]. The authors showed that E-band links are more sensitive to rain than links operating in the K-band (and, specifically, the 15-40 GHz range) and can observe light rainfall. However, those links were shown to be affected more by errors related to the rainfall drop size distribution.
It was shown in [10] that the power-law equation from (3) is less accurate for short links. The authors also showed how recurrent neural networks (RNNs) can be used to estimate the rainfall rate from measurements recorded by a cellular network management system. The RNN-based method outperformed the traditional power-law-based method in terms of the root mean square error (RMSE).
In our previous work [11], we presented an empirical model for short links, which can be seen as a modification to the power law. We showed that rainfall estimation from short links suffers from overestimation using the power law, and the proposed model is able to eliminate the overestimation.
In this paper, we compare two methods of rainfall estimation from the attenuation measurements of short E-band links: the first is a two-parameter empirical model from [11], which corrects the attenuation measurements of short links before applying the power law. The model parameters are calibrated using attenuation measurements from a long link in the vicinity of the network. The second is a variation of a data-driven method based on a gated recurrent unit (GRU) from [12], which showed improved performance in terms of RMSE compared to the traditional algorithm based on the power law.
The main contributions of this paper are: 1.
We present a comparison between an empirical short links model and an RNN-based data-driven approach for rainfall estimation from RSL measurements. We show that although the RNN-based approach performs better in terms of RMSE, the simple short links model yields similar results for moderate and strong rain intensities (higher than 5 mm/h), despite being much simpler.

2.
We create high-resolution 2D maps of 24 h-accumulated rainfall, which are constructed from the estimates of either method. The constructed maps are compared against rainfall maps provided by the IMS weather radar, and both show good agreement with the ground truth.
The rest of this paper is organized as follows: Section 2 describes the data used from the city of Rehovot, Israel. In Section 3, the details of the two rainfall estimation methods are provided. In Section 4, the experimental results from the two methods are shown, and a comparison of the constructed rainfall maps to the weather radar maps is provided. Section 5 concludes this paper.

Data
RSL measurements from the smart-city network of Rehovot are recorded regularly by the network operator company SMBIT LTD (SMBIT. Ltd, Mazkeret Batya, Israel. https://www.smbit.co.il). The network consists of 66 links, where each link contains 2 sub-links for the two opposite directions. All links operate in the E-Band frequency range, namely, in the range of 70 GHz to 84 GHz, while the majority of links operate at 74.375 GHz. The RSL values are sampled every 30 s with a quantization level of 1 dB. The TSL is not recorded and is assumed to be constant. Figure 1 depicts the link map of Rehovot. Colored lines represent links that were used to construct rainfall maps and are marked with an ID number. Dashed black lines represent links that had been excluded in this work due to high unrelated fluctuations of the signals levels that were detected during dry periods, as well as very short links that did not attenuate above a minimal degree. The municipality building is located in the center of the map. The majority of the antennas are located at street level, connecting traffic lights cameras. Others are placed on building rooftops, connecting schools and others to the main municipality building. A rain gauge (the measurements are provided by The Robert H Smith Faculty of Agriculture, Food and Environment (Rehovot), The Hebrew University of Jerusalem. http://www.meteo-tech.co.il/faculty/faculty_periodical.asp?client=1 (accessed on 11 April 2023)) in the northern part of the city is used for validation and marked as 'RG' in Figure 1. The rain-gauge measurements are collected by the faculty of agriculture, food and environment (Rehovot), the Hebrew University of Jerusalem. It measures the accumulated rainfall for 10 min intervals, with a resolution of 0.1 mm.

Data Pre-Processing
Examples of RSL time-series raw data from links 29, 2, 4, 5 and 16 are shown in the top five panels of Figure 2, according to this order. The bottom panel presents the rainfall intensity measured by the rain gauge as a function of time. The RSL signals from links 29, 2 and 4 show an agreement with the rain-gauge measurements. The RSL is attenuated during rainfall, and it is approximately constant when it stops raining. Note that the RSL of link 4 rises slowly after the rainfall stopped. Links 5 and 16 show a higher noise level. These large fluctuations can stem from multi-path propagation, where the signal can be reflected from buildings, cars, vegetation or other reflective surfaces along the path of propagation [13]. Moreover, the aging of the electronics of the transmitter and the receiver can increase noise levels. Estimating rainfall from them is a much harder task and will result in large errors.
We noticed some changes in the RSL properties of some links at different years, such as different baseline levels and even changes in the range of attenuation values due to rain. Some links showed more than a 10 dB difference between the median of the winter RSL of 2021 and the winter RSL of 2020. Furthermore, links that were too short showed only small attenuation values of the RSL during rainfall, and others were not attenuated at all. We excluded those links, as well as links showing large fluctuations during dry periods. The excluded links are shown as dashed lines in Figure 1.

Dataset Split
The full dataset, consisting of RSL measurements from a fixed number of links is split into three datasets. The first one is used for training the network (TRAIN). The second one is used for validation and hyper-parameter tuning (VALIDATION). The last one is used for testing only (TEST).
The TRAIN set consists of data from different rain events that occurred during 1 October 2019 to 31 March 2020.
The VALIDATION set consists of data from different rain events that occurred during 1 November 2020 to 31 December 2020.
The TEST set consists of data from different rain events that occurred during 1 January 2021 to 28 February 2021.
The datasets' durations are summarized in Table 1.

Data Imbalance
Since, most of the time, the weather in Israel is not rainy, the datasets are imbalanced toward dry periods. The wet and dry samples ratio are summarized in Table 2 for the different datasets. The wet samples are also imbalanced toward light rainfall, as can be seen in Figure 3a-c.

Rainfall Estimation
In this section, we compare the two methods of estimating the path-averaged rainfall rate. The first method we use is the empirical short links model from [11], which uses the attenuation measurements of a long link in the network as a reference for estimating the model parameters. The second method we use is a data-driven method, based on the two-step network from [12]. The main block of the network is a GRU, which is used to learn long-term dependencies of the RSL measurements.

Short Links Empirical Model
The empirical short links model from [11] is used to correct the observed attenuation of short links. Since the TSL is unknown, Equation (1) can not be used in order to calculate the observed attenuation. Instead, we calculate the "resting" level of the RSL, which we denote using z(t), by applying a moving median with a centered window of one week on the RSL averaged over 15 min intervals, as was conducted in [9]. The observed attenuation is, therefore, calculated by: The model has two parameters (unique per link), which are used to correct the shortlinks-related inaccuracies. The model is defined as follows: where A i (t) is the observed attenuation, in dB, at time index t, of link i. b i is a parameter to compensate for a constant wet antenna attenuation and W i is a correction factor for short links. A i r (t) is referred to as the rain-induced attenuation of link i. The model parameters are estimated by minimizing the following cost function: where A i r is a vector of rain attenuation measurements from link i, A long is a vector of the observed attenuation measurements of the long link (link 29 in this case), which is used as a reference. L i and L long are the lengths of link i and the long link, respectively.
The wet/dry classification method from [14] is used to detect rainy periods. The method applies a rolling standard deviation (RSD) of a fixed-size window to the RSL measurements. The choice of a small window size enables for capturing more details about the variability in the signal caused by rain, at the expense of a less powerful ability to detect constant rainfall events, since the RSD will be low. The input, RSL n , is classified as wet if the standard deviation at the given time step exceeds a threshold σ. The threshold, σ, is set for each link individually by the 80th percentile of the 30 min rolling standard deviation in the corresponding dataset, multiplied by a scaling factor. The scale factor was adopted from the work of [15] and was also used in [16]. A scaling factor of 1.2 and a 30 min window size were selected to maximize the F1 score on the VALIDATION dataset.
In the next sections, we use the short links model and the RSD method together for wet/dry classification and rainfall estimation and refer to it as the "SLM-RSD" model, which stands for short links model-rolling standard deviation.

Data-Driven Approach
The input of the network includes both the RSL measurements and the corresponding length and frequency of each link. The RSL measurements are sampled every 30 s, whereas the rain-gauge measurements are sampled every 10 min. A single input to the network at time-step n is defined as follows: where RSL n is the normalized RSL measurement at time-step n; w is a fixed window size, which is the ratio between the sampling rate of the rain gauge and the sampling rate of the RSL; L is the link's path length; and F is the link's frequency. In our case, w is set to 20 since the rain gauge is measured every 10 min, whereas the RSL is sampled every 30 s. In this way, x n ∈ R 22 . The RSL measurements of each link are normalized by subtracting the median value of the entire RSL time series for the given link: The network is trained to classify an input sequence of non-overlapping vectors containing samples of the form (7) to a sequence of wet/dry estimates of the same length and to estimate the path averaged rainfall rate corresponding sequence. The interval between each time-step matches the rain gauge sampling rate, i.e., 10 min. For an input sequence of length L (representing 10 · L minutes of data), the input sequence to the network will be: and the corresponding rain-gauge measurement sequence is: The architecture of the combined rain estimation and classification network is depicted in Figure 4. It is based on the two-step architecture from [12]. The network consists of three main blocks: a GRU, which is used to learn long-term dependencies between the RSL samples of the input; a rain head (RH), which converts the GRU output to rainfall estimation; and a wet/dry (WD) block, which converts the GRU output to a scalar between 0 and 1, denoted by p n , representing the probability that the input sample, x n , is considered wet. The probabilities are passed through a fixed threshold τ, where an output of 0 represents dry, and an output of 1 represents wet, as described below: The selection of τ is conducted using a grid search on the VALIDATION dataset. Finally, the wet/dry indicator is multiplied with the RH output to produce the rain estimate,ŷ n . The RH consists of fully connected (FC) layers followed by ReLU activation, and the WD block consists of one layer of FC followed by sigmoid activation. A skip connection, which concatenates the input, x n , and the GRU output as the RH input, is added.
Four configurations were tested, where we changed the RH size and also enabled the skip connection, which concatenates the network's input to the GRU output before applying the rain head. The different configurations are described in Table 3.

Loss Function
The loss function includes a regression term, which is a mean squared error (MSE) loss, and a classification term, which applies focal loss [17] to the wet/dry probabilities of the wet/dry outputs. Focal loss is an extension to the binary cross entropy (BCE) loss such that it reduces the weights assigned to the well-classified examples. For an output sequence {ŷ n,i } N s n=1 of link i and ground truth measurements {y n } N s n=1 , the loss is defined as follows: where where N s is the number of samples in the sequence and N l is the number of links. γ and α are the focal loss hyper-parameters and λ FL is a hyper-parameter controlling the balance between the two loss terms.

Augmentation
The data were augmented by letting the input sequences start at a random point at the beginning of each epoch. The random starting point is selected uniformly from [0, N s − 1]. This augmentation is applied on the dataset used for training only.

Validation Metrics
To evaluate the classification results, we use true positives (TPs), false positives (FPs), true negatives (TNs) and false negatives (FNs) as the basis for other useful classification metrics.
The first metric we used is the overall accuracy, which is defined as: Accuracy = correct predictions total number of samples = TP + TN TP + FP + TN + FN (17) Since the data are imbalanced toward dry samples, and since the more challenging task is identifying wet periods, precision (18) and recall (19) were used to evaluate the model under the positive class. The F1 score (20) is the harmonic mean of the precision and recall and is useful since it summarizes both metrics into one number: The balanced accuracy is also used for the same reasons, where it is defined by the average of the true positive rate and the true negative rate.
Balanced Accuracy = 1 2 To validate the rainfall estimation results, the normalized bias (NBIAS) and normalized RMSE (NRMSE) are used: whereR rg is the average rain rate measured by the rain gauge and N is the number of samples.

Rain Estimation and Classification
A GRU with 2 layers and a hidden layer with 256 neurons were used. The model was trained with sequence lengths of 8 h, which is equivalent to the length of 48 samples. ADAM optimizer [18] was used for training the model with a learning rate of 0.0001. The optimizer and the learning rate were empirically selected as they provided better results on the VALIDATION dataset. To avoid overfitting the training data, a dropout of probability 0.5 was applied to the RNN layers. In addition, an L2 regularization term was added to the loss function with a factor of 0.0001. The focal loss parameters were set to γ = 2, α = 0.95 and λ FL = 100. The model was trained on links 1, 2, 3, 4, 8, 10, 11 and 29.
We compared the performances of the different architectures on the TEST dataset. The results are summarized in Table 4. The threshold that maximized the F1 score on the VALIDATION dataset is presented. The BIAS and RMSE were calculated for samples from the wet class only. The four configurations produce similar results in terms of the classification metrics. All configurations perform better than the SLM-RSD model in terms of RMSE and F1 score. From now on, we continue the results section with the "Large RH + Skip" configuration and refer to it as, simply, "RNN".
We compared the receiver operating characteristic (ROC) curves and their respective area under curve (AUC) of the RNN models to the SLM-RSD model on the TEST dataset in Figure 5. The blue line represents the ROC curve obtained by the RNN model, while the orange line represents the ROC curve obtained by the RSD method. The dashed line represents an ROC curve of a random classifier. The RNN models show an improvement of around 10% in AUC compared to the RSD method. The different configurations of the RNN models show a very similar pattern to the ROC curves and yield approximately the same AUC.
The performance of both rainfall estimation methods are depicted in Table 5, where each row describes the results for each link.
Inspecting the performance of individual links, the short links model performed slightly better on links 1, 2, 22 and 40, while the RNN model performed better on the rest.
Links 26 and 28 exhibit overestimation using the short links model, which can be explained by a non-optimal W parameter for these specific links.  The performances of the models in terms of the NBIAS and NRMSE for different values of rainfall intensity are presented in Figure 6a,b on the TEST dataset, where the performance was averaged over links 1, 2, 4, 10, 15, 22, 26, 28, 29 and 40. In both figures, the blue bars represent the results for the RNN model, and the orange bars represent the results for the SLM-RSD model.
The RNN model achieves lower RMSE than the SLM-RSD model for all ranges of rainfall intensity. The major improvement in the RNN method is achieved at low rainfall intensities. For rain rates larger than 5 mm/h, the RNN method performs slightly better. Estimating light rainfall is obviously a harder task than estimating strong rainfall since the range in the attenuation values are smaller and closer to the quantization noise of the RSL signal. Therefore, the RNN model outperforms the short links model for light rainfall as expected.
An example of the combined wet/dry classification and rainfall estimation results on the TEST dataset for link 2 is shown in Figure 7, with the corresponding rain-gauge measurements. The first panel from the top shows the normalized RSL signal obtained from link 2. In the second panel, the blue and red lines represent the rainfall estimation obtained by the RNN model and the SLM-RSD model, respectively. The black line represents the rain-gauge measurements. In the third panel, the wet/dry probability signal, p n , of the RNN model is shown, and in the fourth panel, the RSD signal obtained from the RSL is shown. In both the third and fourth panels, a dashed line is plotted at the threshold level used to determine wet or dry. Moreover, samples that were classified correctly as wet (TP) were marked by green color. Samples that were classified incorrectly as wet (FP) were marked by yellow color, and samples that were classified incorrectly as dry (FN) were marked by red color. It can be seen that the RSD method results in a higher number of FP samples compared to RNN, which causes an estimation in rain rate larger than zero. As can be seen, for events stronger than 5 mm/h, the difference in the RMSE between the two methods is small.

Rainfall Maps Comparison
In this subsection, we construct high-resolution 2D maps of 24 h-accumulated rainfall and compare the results to the rain-gauge-adjusted weather radar maps of the Israeli Meteorological Services (IMSs).
We apply inverse distance weighting (IDW) interpolation, as described in [19], to construct the rainfall maps, using the accumulated rainfall estimates from each link in the network. where: R(x), in mm, is the interpolated rain at target grid point x;R i , in mm, is the estimated rain at the center of link i, x i , after the calibration process; d i (x) = ||x − x i || 2 , in km, is the distance between the target grid point x and the center of link i; D, in km, is the radius of influence, i.e., the distance beyond which a link ceases to effect target point x; and N L is the number of links. The choice of the radius of influence should depend on the spatial autocorrelation function of the rain field and on the spatial density of the links. A large radius reduces the dependency on noisy links and results in maps that cover wider areas. Too large a radius can result in averaging local variability of the rain field. Here, we empirically set D = 2 km and the spacing between the grid points to be 300 m to achieve a map that is large enough to cover most of the area of Rehovot, but also maintain the variability in the rain field measured by dense and short links.
The 24 h-accumulated rainfall maps obtained from both methods are compared to the IMS-adjusted weather radar in Figure 8, where a single color bar is used across all events. Figure 8a-c are examples of different rain events from 2021, where the first is from a medium rainfall event, starting at 08:00 on 14 January; the second is from a light rainfall event, starting at 08:00 on 18 January; and the third is from a strong rainfall event, starting at 08:00 on 19 January. The first image from the left contains the IMS-adjusted radar map in a zoomed-out view of Rehovot and its surroundings. The second image is a zoomed-in view of the radar map of Rehovot's area. The third image is the estimated rainfall map obtained by the SLM-RSD model, and the fourth image is the estimated rainfall map obtained by the RNN model.
In most cases, the estimated rain maps show an agreement with the average rainfall depth of the adjusted radar, indicating that the method is able to distinguish between light and strong rainfalls.
A comparison between the estimated and the adjusted radar maps using BIAS and RMSE was made. The metrics are calculated over the locations where the estimated values are available, i.e., up to 2 km distance from the center of the closest link. Table 6 summarizes the comparison results for the 24 h-accumulated rain maps. Table 6 indicates that the bias is negative for strong rainfall events accumulated over 24 h, in accordance with Figure 6a, which shows that the rainfall estimation from the links underestimate the rain rate measured by the rain gauge at high rain intensities. In both cases, the RMSE is higher for stronger rainfall events, as expected. A low spatial correlation between the estimated and the adjusted radar map can be obtained in both cases. The high variability in the estimated rain map can be caused by the fact that the links measure the rain at ground level, whereas the radar measures the rain at higher altitude. Errors due to the quantization of the RSL, baseline determination and wet/dry classification also contribute to the differences between the maps.

Conclusions
In this paper, we conducted a comparison between two methods of rain estimation from RSL measurements of an existing smart-city network operating at E-band frequencies. The first method is based on a calibration process, where every link is assigned two parameters that are estimated from attenuation measurements of a reference link in the network. The second method is based on an RNN model, where we trained a set of links to detect wet and dry periods and estimate the rainfall intensity based on rain-gauge measurements, which serve as the ground truth.
When looking at the rain estimation results for a given link, the RNN-based network provided better results in terms of RMSE compared to the short links model, but the improvement itself was less than 20% in average performance, and the largest difference was achieved at lower rainfall intensities. This indicates that a simple (and almost) linear model can be used for medium to strong rainfalls.
A comparison of the estimated rain maps with the IMS radar maps when accumulated over 24 h resulted in low spatial correlation for both methods. However, the estimated maps show an agreement with the average intensity of the radar, and the two methods are able to distinguish between light and strong rainfalls, as well as produce maps with higher spatial resolution (with respect to the radar). Differences between the estimated maps and the radar maps can arise from changes in the rainfall intensity at different altitudes. The links measure the rainfall at street level, whereas the radar measures the rain at much higher altitude. In addition, the quantization of the RSL and errors in the baseline determination and the wet/dry classification also contribute to the differences between the maps.
Filtering out problematic links is a crucial step before constructing the rainfall maps. Noisy links with large fluctuations in the RSL can yield overestimation. The overestimation is prominent when the rainfall is accumulated for long periods, and this is directly related to the performance of the wet/dry classifier. The change in the RSL properties of some links between different years resulted in poor performance of both the short links model and the RNN. Therefore, a more frequent estimation of the model parameters should be performed.