Advection-Free Convolutional Neural Network for Convective Rainfall Nowcasting

Nowcasts (i.e., short-term forecasts from 5 min to 6 h) of heavy rainfall are important for applications such as flash flood predictions. However, current precipitation nowcasting methods based on the extrapolation of radar echoes have a limited ability to predict the growth and decay of rainfall. While deep learning applications have recently shown improvement compared to extrapolation-based methods, they still struggle to correctly nowcast small-scale high-intensity rainfall. To address this issue, we present a novel model called the Lagrangian convolutional neural network (L-CNN) that separates the growth and decay of rainfall from motion using the advection equation. In the model, differences between consecutive rain rate fields in Lagrangian coordinates are fed into a U-Net-based CNN, known as RainNet, that was trained with the root-mean-squared-error loss function. This results in a better representation of rainfall temporal evolution compared to the RainNet and the extrapolation-based LINDA model that were used as reference models. On Finnish weather radar data, the L-CNN underestimates rainfall less than RainNet, demonstrated by greater POD (29% at 30 min at 1 mm·h<inline-formula><tex-math notation="LaTeX">$^{-1}$</tex-math></inline-formula> threshold) and smaller bias (98% at 15 min). The increased ETS values over LINDA for leadtimes under 15 min, with maximum increases of 7% (5 mm·h<inline-formula><tex-math notation="LaTeX">$^{-1}$</tex-math></inline-formula> threshold) and 10% (10 mm·h<inline-formula><tex-math notation="LaTeX">$^{-1}$</tex-math></inline-formula>), show that the L-CNN represents the growth and decay of heavy rainfall more accurately than LINDA. This implies that nowcasting of heavy rainfall is improved when growth and decay are predicted using a deep learning model.

While the early work on nowcasting was focused on extrapolation of radar echoes [1], the scope of nowcasting has expanded to include numerical weather prediction (NWP) models [4]. Despite recent advances in convection-permitting and rapid update cycle NWP models [5], [6], their ability to produce nowcasts at small spatial and temporal scales is still outperformed by extrapolation nowcasts in the first few hours [7], [8]. This is due to the NWP models' limitations to resolve the evolution of the atmosphere at such scales, as well as difficulties in assimilating the observations to represent the initial conditions [5].
While outperforming NWP models for very short lead times, the skill of extrapolation nowcasts declines rapidly during the first hour [9], [10], [11], [12], [13]. Nowcasting rapidly evolving convective rainfall is a particularly challenging task for such methods. The Lagrangian persistence assumption [14] used in extrapolation nowcasting methods makes them unable to predict the growth or decay of rainfall, resulting in a prediction only for rainfall motion. Several approaches have been proposed to overcome this limitation. These include separation and filtering of small scales with low predictability [15], [16], [17], as well as the use of autoregressive [18] or integro-difference equation (IDE) models, such as in the Lagrangian integro-difference equation model with autoregression (LINDA; [19]), to predict the growth and decay of rainfall. These models are applied in the Lagrangian coordinates. Another active research area is adding stochastic perturbations to reproduce the small-scale variability lost in the scale filtering and to estimate the forecast uncertainties to produce probabilistic nowcasts [4], [20].
Despite the recent advances in extrapolation and NWP-based nowcasting, these approaches are still limited in their ability to produce reliable high-resolution nowcasts for longer leadtimes. Deep learning techniques, such as convolutional neural networks (CNNs), have recently emerged as a viable option for filling this gap. In addition to predicting the motion, deep learning techniques also allow predicting the growth and decay of rainfall. A commonly used technique is to add long short-term memory (LSTM) or gated recurrent unit (GRU) layers [21], [22], [23], [24], [25], [26], [27] to the CNN to account for the temporal structure of rainfall. More recently, encoding time as an additional channel dimension in the CNN rather than LSTM or GRU layers has been shown to be a promising approach [28], [29], [30], [31]. Another important area of research is the development of U-Net-based architectures [28], [32], [33], capable of capturing the scale dependence of predictability [14]. In analogy with adding stochastic perturbations to extrapolation nowcasts, generative models and adversarial training (GAN) have recently This  been proposed to produce nowcasts with small-scale variability [23], [34], [35], [36].
One of the main limitations of the CNN-based approaches is that they are entirely data driven. That is, they do not take into account any prior physical knowledge of the prediction problem. To address this gap, we propose the Lagrangian CNN (L-CNN) model that attempts to separate horizontal advection from growth and decay of rainfall. This is a standard assumption in the models based on Lagrangian persistence [14], but it has not been previously applied in the machine learning context. The key idea of the L-CNN is to transform the input data to Lagrangian coordinates and use the CNN to model the time derivative of rainfall fields. This transformation is implemented with standard optical flow and semi-Lagrangian advection techniques [9], [14]. We use the U-Net-based RainNet architecture [28] as the CNN and train it with a root-mean-squared-error (RMSE) loss function. We show that using a separate model for the advection improves the representation of the rainfall growth and decay in the CNN, resulting in a significantly improved forecast skill compared to the RainNet, as well as improvement at short leadtimes and small spatial scales compared to the LINDA model, which represents the state-of-art in extrapolation-based nowcasting.
This article is organized as follows. Section II describes the methodology. The data sources, experiments, and analyses of the results are described in Section III. Finally, Section IV concludes this article.

II. METHODOLOGY
The L-CNN, presented in Fig. 1, consists of transforming the input data to Lagrangian coordinates (steps 1 and 2) and differencing consecutive fields to estimate the time derivative (step 3), applying the RainNet CNN iteratively to produce predictions of the differenced rain rates (step 4), and transforming the predictions to rain rates in Eulerian coordinates (steps 5 and 6). Section II-A presents the theoretical background for the model, including steps 3 and 5. Section II-B describes in detail steps 1, 2, and 6, while step 4 is described in Section II-C. The verification metrics used to evaluate the model's performance are described in Section II-D.

A. Advection Equation and Lagrangian Framework
The temporal evolution of a 2-D rainfall field ψ can be described using the advection equation where v denotes the horizontal velocity field, and S is a sourcesink term representing the growth and decay of rainfall. Assuming that the velocity field is divergence free (i.e., ∇ · v = 0), the equation can further be written as The traditional approach to rainfall nowcasting is to assume S = 0. That is, the rainfall remains invariant under advection, which is called the Lagrangian persistence [14]. Commonly, this paradigm is applied by extrapolating the observed rainfall field by a semi-Lagrangian advection scheme [14], [37]. The advection equation (2) applies to a rainfall field in an Eulerian coordinate system. We can transform the rainfall field to a Lagrangian coordinate system, denoted here as ψ, in which case v · ∇ ψ = 0, and the evolution of the field is solely due to the growth and decay of rainfall Thus, we can estimate the source-sink term S with the time derivative of the rainfall field in Lagrangian coordinates.
Within the state-of-art extrapolation-based methods, S-PROG [15], STEPS [4], [9], ANVIL [18], and LINDA [19] follow a procedure similar to the one introduced above and shown in Fig. 1. However, instead of a CNN (step 4 in Fig. 1), they apply an autoregressive model to the rainfall field or its time derivative in the Lagrangian coordinates. Similarly, S can be estimated in two ways with a CNN. The first method is to pass a time series of rainfall fields ψ 1 , ψ 2 , . . . , ψ n , to the network and use it directly to predict the next field ψ n+1 .
The second method is to first estimate the time derivative of the rainfall field in (3) as This is the step 3 in Fig. 1. In this approach, only the fields Δ ψ 1 , Δ ψ 2 , . . . , Δ ψ n are passed to the neural network that then predicts the change Δ ψ t+1 from the latest field. Note that this requires one more observed rainfall field as input than the first approach mentioned previously. The next predicted field is then obtained as ψ n+1 = ψ n + Δ ψ n+1 . The field ψ n+1 needs to be transformed to Eulerian coordinates to obtain the actual nowcast ψ n+1 . A preliminary analysis indicated that the second method performs significantly better than the first one, so only the second approach is studied here. We denote this model as L-CNN.

B. Lagrangian Transform
To transform a time series of rainfall fields ψ 1 , . . . , ψ n to the Lagrangian coordinates (step 2 in Fig. 1), we first define an advection operator A t that extrapolates a 2-D field along a velocity field t time steps forward if t > 0 or backward if t < 0. Second, we select a reference time t 0 to which all other rainfall fields are extrapolated. For this study, it corresponds to the last observed time after which the nowcasts are created, i.e., t 0 = n. The transformation is then written as In practice, the advection operator was implemented with the pySTEPS Python library [9] using a backward interpolate-once semi-Lagrangian extrapolation scheme [14]. The length of the time series n should be chosen according to number of input fields required by the CNN component. In this study, we select n = 5 in order to obtain four input fields to be fed to the CNN (see Sections II-C and III-C) after discretizing the time derivative. Using the advection operator A t requires an estimate of the horizontal motion field v. The motion field is estimated by applying an optical flow method to the input time series of rainfall fields (step 1 in Fig. 1). We calculate the motion field with the dense Lucas-Kanade algorithm [38], [39] implemented in the pySTEPS library [9] using the four latest rainfall fields as input.
After applying the CNN and summing the predicted differenced fields as described in Section II-A, the time series ψ n+1 , . . . , ψ n+m of predicted rainfall fields in Lagrangian coordinates need to be transformed back to Eulerian coordinates (step 4 in Fig. 1). This is done by applying the inverse transformation of (5). If the nowcast time step is denoted by t , the transformation is written as where ψ n+t denote the forecast fields in the Lagrangian coordinates, and m is the number of time steps to forecast.

C. Convolutional Neural Network (CNN)
The Lagrangian-transformed differenced rainfall fields are fed to a CNN. In this work, we use the RainNet [40], which is a deep CNN based on the U-Net architecture [41]. The network is designed to represent the evolution of rainfall at different spatial scales. Our RainNet implementation follows the original presentation [40], except that our input images are 512 × 512 pixels in size, resulting in 32 × 32 pixel feature maps at the deepest level of the network (compared to 928 × 928 and 58 × 58, respectively, as in the original work).
The RainNet is applied iteratively to produce the predictions, similarly to the original work [40]. The procedure is described in Fig. 2. For the first nowcast timestep, the prediction Δ ψ t+1 is produced with the time series Δψ t−n+1 , . . . , Δψ t consisting of n fields. For the next network iteration to produce the nowcast at leadtime t + 2, the oldest field in the input time series is removed, and the prediction Δ ψ t+1 is appended to the time series, i.e., the input time series is now Δ ψ t−n+2 , . . . , Δ ψ t+1 . This procedure is then repeated for all desired nowcast leadtimes.
To train the model, the CNN is first applied to the (Lagrangiantransformed, differenced) input time series to iteratively produce a time series of forecast (differenced) rainfall fields. Then, for a single pair of observation fields ψ j and prediction fieldsψ j , we define the RMSE loss function where ψ j [i] andψ j [i] are the i:th pixel of the j:th observation and prediction fields, respectively, and N is the number of pixels per field. The total loss function value is calculated by accumulating over the time series of m (differenced) observation and prediction field pairs and is defined as where ψ j andψ j are the jth field of the observation and prediction time series, respectively. The accumulated loss function gradients are then backpropagated through the CNN to optimize the CNN parameters. The updated parameters are used in the next iteration step and so on. This iteration is repeated until the loss function value, calculated from a separate validation dataset, converges, or until the full training dataset has been iterated through sufficiently many times. The termination criteria and training configuration are described in detail in Section III-D,

D. Verification Metrics
The performance of the models is evaluated using the probability of detection (POD), false alarm ratio (FAR), equitable threat score (ETS), fractions skill score (FSS), mean absolute error (MAE), and the mean error (ME), also commonly known as bias, as verification metrics.
To calculate the POD, FAR, ETS, and FSS scores, the observations R and nowcasts R are first transformed to binary fields using different threshold values R thr as From these fields, the metrics are calculated based on the number of hits H, misses M , false alarms F , and correct negatives O as defined in Table I. The probability of detection (POD) describes the fraction of observed pixels that were correctly predicted and is defined as As opposite of POD, the false alarm ratio (FAR) describes the fraction of predicted pixels that did not occur, and is defined as The equitable threat score (ETS) [42] measures how well the rainfall is predicted accounting for hits due to random chance. The ETS is defined as where is a term that describes the contribution of hits due to random chance. The POD scores have values from 0 to 1, with 1 representing the perfect forecast, while FAR has values from 0 to 1 with 0 representing the perfect forecast. The ETS obtains values from −1/3 to 1, with negative values indicating worse forecast skill than random chance, 0 similar forecast skill as random chance, and 1 a perfect forecast. The aforementioned metrics are calculated pixel wise for a single spatial scale, thus allowing the penalization of small displacement errors. Such penalization is not desired in some applications, as a forecast with only a small displacement error can still be useful. Therefore, the performance of the models in different spatial scales is evaluated with the FSS [43], defined as where and I s and I s are the binary observation and forecast fields of size n y × n x . Equations (9) and (10) have been spatially averaged with uniform convolution kernels of size s × s. The FSS obtains values from 0 to 1, with 1 representing a perfect forecast. Finally, the mean absolute error (MAE) is defined as and the mean error (ME) as where n is the number of pixels in the image.

A. Input Data
The input data were retrieved from the Finnish Meteorological Institute (FMI) radar network [44]. The network consists of 11 polarimetric Doppler radars operating at C-band [see Fig. 3(a)]. The range and azimuthal resolutions of the radars are 500 m and 1 • , respectively, and the radar reflectivity composites [see Fig. 3(b)] are constructed from the two lowest elevation angles.
The radar reflectivity composites are constructed from measured data by transforming them into a Cartesian grid of 1 km by 1 km of resolution. The radar data are measured every 5 min and the composites are produced with the same frequency. After data quality control, the composite is constructed by applying a weighted-average rule over the data from the radars involved, based on the altitude, sector, range from radar, and quality of the measurement itself [44]. This helps to reduce the presence of noise and nonmeteorological echoes in the final reflectivity composite. The radar reflectivity values in the composites are stored with 0.5-dBZ resolution.
The radar reflectivity (Z) values were then transformed to rain rate R using the relation [45] Z = 223R 1.53 (20) where the radar reflectivity Z is in linear units of millimeters to the sixth power per cubic meter, and the rain rate R is in units of millimeters per hour. To reduce nonprecipitating echoes, e.g., from biological scatterers, we further removed any pixels with R < 0.1 mm·h −1 . The original composite products from FMI have a size of 760 by 1226 pixels (with 1 km × 1 km resolution) in the horizontal and vertical dimension, respectively [see Fig. 3(b)]. The Lagrangian transform described in Section II-B was applied to the entire composite. However, since RainNet requires the input image size to be a multiple of 2 n+1 [40], where n is the number of down-sampling layers in the U-Net architecture (i.e., n = 4), we further clipped the size of the composites to 512 by 512 pixels, as shown in Fig. 3(c). The clipped composite consists of an area fully covered by the radars and reduces the impact of edge effects in optical flow. Applying the model on the full composite is left as a topic for future study.

B. Datasets
The input data used for training and verifying the model performance are selected as follows. First, we sort all days from May through September in 2019 to 2021 in decreasing order according to the number of pixels with radar reflectivity above 35 dBZ that has previously been used as a threshold for  II  SIZES OF THE TRAINING, VALIDATION, AND TESTING DATASETS FOR THE  DIFFERENT MODELS convective rainfall [46], [47], inside the study area during the day. The first 100 days are selected. Then, the chosen 100 days are divided into training, validation, and test datasets. For this, each day is divided into 6 h blocks. Any blocks with missing data or images with less than 1% of pixels over 20 dBZ inside the study area are removed. Then, the blocks are randomly assigned to training, validation, and test datasets in the ratio of 6:1:1.
The input time series are then selected from the blocks using a moving window with the desired window size. Note that this results in a different number of input time series for the L-CNN model, as it requires five input fields, compared to the four input fields required by the RainNet model used as reference (see Section III-C). The number of input time series in each dataset are listed in Table II. The distributions of rain rate values for each dataset are shown in Fig. 4 . The figure indicates that the training, validation, and test datasets have very similar value distributions. Even though the distributions of the rain rate values are centered around low rain rates, there also exists significant amount of high rain rate values in the dataset, which is important for nowcasting high-intensity convective rainfall.

C. Reference Models
For verification purposes, three reference models were used: RainNet [40], the LINDA [19], and extrapolation nowcast [9]. Comparing the L-CNN model to the RainNet allows studying the benefit of using the CNN only to resolve the growth and decay of rainfall without the advection. LINDA has an approach similar to the L-CNN, but instead of a CNN model, it employs an integro-difference equation to model the growth and decay of rainfall. Thus, comparison between the L-CNN and LINDA allows comparing the impact of this change in the model. Finally, the extrapolation nowcast was selected as it is commonly used as a reference model when studying the performance of new nowcasting methods. Additionally, if the CNN component in the L-CNN would produce predictions consisting only of zero values, the nowcast by the L-CNN is the same as the extrapolation nowcast. Therefore, to be useful, the L-CNN should outperform the extrapolation nowcast.
The architecture of the RainNet model was as in the original work [40], except for the input field size (see Section II-C for details), and the same log-transformation was applied to the rain Compared to the RainNet component in L-CNN, the RainNet was trained similarly (8) but instead of the L RMSE loss, the hyperbolic cosine loss L Logcosh is used, which is also used in the original work where cosh denotes the hyperbolic cosine function, ψ[i] and ψ[i] the i:th pixel of the observation and prediction, respectively, and N the number of pixels per image. The RainNet uses four input fields. LINDA is a nowcasting method capable of good predictive skill for heavy localized rainfall. The nowcasts were produced by applying the algorithm over the clipped composite with no feature detection [19], since the difference in forecast skill between the localized (i.e., with feature detection) and nonlocalized version is very small, and the nonlocalized version is significantly faster to compute for large domains.
The extrapolation nowcast was implemented with the PyS-TEPS Python package [9] using the semi-Lagrangian extrapolation scheme [14] with cubic interpolation. For both the LINDA and extrapolation nowcast models, the motion fields were the same as used for the Lagrangian transform in the L-CNN model, i.e., the motion fields were computed with the Lucas-Kanade optical flow [38], [39] from four input fields using the entire composite.

D. Training
The training of the L-CNN and RainNet was performed by minimizing the total loss function (8) in the training dataset. Note that for the RainNet, instead of the RMSE loss function L RMSE , the hyperbolic cosine loss function L Logcosh (22) was used. The training was performed using nowcasts for m = 6 time steps, i.e., over 30 min. The minimization was performed with the Adam optimizer [49], using the default parameters from the PyTorch library [50]. The optimization was performed using a batch size of 1 (for L-CNN) or 2 (for RainNet).
In the aforementioned training procedure, learning rate was initially set to 10 −4 . Additionally, if the validation loss (i.e., the average loss function value calculated from the validation dataset) had not improved for three epochs, a learning rate scheduler adjusted the rate by multiplying it with 0.1. The training was stopped, if the validation loss had not improved in five epochs. The model checkpoint saved during the epoch with the lowest validation loss value was selected as the final model.
The L-CNN and RainNet were trained using Nvidia V100 GPUs available through the Finnish CSC-IT Center For Science. The models were implemented with the Python programming language (https://www.python.org/; last access 25/5/2022) using the PyTorch [50] and PyTorch-Lightning [51] libraries.

E. Case Examples
Two examples of the nowcasts produced by the L-CNN model and the reference models are given in Figs. 5 and 6. Both figures show the input time series, and the target observations and nowcasts at 10, 20, 30, 40, 50, and 60-min leadtimes. Fig. 5 presents a case of convective rainfall from July 30th, 2021, at 14:30 UTC (17:30 local time). This case shows localized isolated convection cells, of sizes around a couple of tens of kilometers, which were preceded by the passing of a decaying occluded front over Finland. This front originally formed at Faroe Islands during the mature stages of a low pressure system in The North Atlantic, thus bringing moisture and intense rainfall on top of the Baltics and Finland. In this case, the L-CNN maintains the high-intensity rainfall longer than RainNet. For example, in the rain cell in the top-left of the study area, rain rate values over 10 mm·h −1 are present in the 60-min L-CNN nowcast, while the RainNet loses these high-intensity values in the first 30 min. Additionally, the small-scale low-intensity features in the middle of the study area are retained in the L-CNN nowcasts longer than in RainNet, indicating that the L-CNN has less blurring and thus underestimates rainfall less than RainNet. Fig. 6 shows a case of wide-spread stratiform rain from June 30th, 2020, at 14:00 UTC (17:00 local time). Moist and cold air from the Atlantic was advected eastwards on top of drier and warmer continental air mass, which rapidly developed into a northward-moving low pressure system at around Poland's longitude. Similar to the previous case but an earlier stage, at Finland's latitude, the front became occluded, yielding intense rainfall over the Baltics and Finland. During the given time, the rain was moving toward north-east on the eastern side of the rain area, and toward south-east on the western side, resulting in a complex motion field. Also in this case, the L-CNN maintains the higher rain rates longer than RainNet. Additionally, the L-CNN maintaining the lower rain rates is evident in the lower half of the study area.
In addition to Figs. 5 and 6, we have included figures that show the difference in each pixel between the target observation and the nowcast for these cases in the Supplementary material.

F. Verification Results
Next, we present verification results calculated from the test dataset [see Table II]. Fig. 7 shows the mean absolute error (MAE) and mean error (ME) for the models for leadtimes from 5 to 60 min. The ETS, POD, and FAR [see Fig. 8] and FSS scores [see Fig. 9] are shown for thresholds of 1, 5, and 10 mm·h −1 for leadtimes from 5 to 30 min.
The results show that the L-CNN model underestimates rainfall less than the reference models. This is demonstrated by the lower MAE [see Fig. 7(a)] over LINDA (improvement of 10% at 60 min) and the extrapolation nowcast (17% at 60 min). Note that the low MAE for RainNet is explained by the minimization of the Logcosh loss function in the training procedure, which is similar to minimizing MAE. The L-CNN also has the lowest bias [see Fig. 7(b)] for the first 30 min, with improvements of 86% from LINDA and 98% from RainNet at the 15 min leadtime. Additionally, the L-CNN has at 1 mm·h −1 threshold higher POD [see Fig. 8(d)] at all leadtimes compared to the reference models, with increases of 8% from LINDA, 29% from RainNet, and 29% from the extrapolation nowcast at the 30-min leadtime. At the higher thresholds of 5 and 10 mm·h −1 [see Fig. 8(e) and (f)], the L-CNN has higher POD than the reference models up to the leadtime of 15 min, indicating that the L-CNN underestimates rainfall less than the reference models at all the three thresholds.
The L-CNN model also overestimates rainfall less than LINDA and the extrapolation nowcast. In addition to the lower MAE [see Fig. 7(a)], the smaller overestimation is also shown by the lower FAR obtained by the L-CNN at all leadtimes and thresholds [see Fig. 8(g)-(i)]. The decrease in FAR at 10 mm·h −1 and 30-min leadtime is 14% compared to LINDA and 20% compared to the extrapolation nowcast. Note that since the RainNet suffers from blurring and loses high-intensity rain rates faster than the L-CNN, it also has the lowest FAR values at all thresholds.
In low-intensity (stratiform) rainfall, the L-CNN performs slightly better than the reference models. Looking at the lowintensity threshold of 1 mm·h −1 , for ETS [see Fig. 8(a)], the L-CNN has higher skill than LINDA (8% at 30 min) and the extrapolation nowcast (27% at 30 min). Compared to RainNet, the L-CNN has similar skill up to the leadtime of 15 min and slightly better after that, with a 7% increase at 30 min. In comparison to the reference models, L-CNN also has greater POD [see Fig. 8(d)], with an increase of 28% from RainNet and the extrapolation nowcast at the 30-min leadtime.
The L-CNN captures the growth and decay of high-intensity rainfall better than the reference models. This is indicated by the improvements in forecast skill at the higher rain rate thresholds. At the 5 and 10 mm·h −1 thresholds, the L-CNN has higher ETS than LINDA up to the leadtime of 20 min (5 mm·h −1 ) and 15 min (10 mm·h −1 ), with increases of 7% and 10% at the 10-min leadtime, respectively [see Fig. 8(b)-(c)]. A similar pattern can be seen in the POD scores [see Fig. 8(e)-(f)]. Furthermore, when compared to LINDA, the largest absolute improvement in FSS [see Fig. 9] skill of L-CNN occurs at the 2-km scale (0.07 at 10 mm·h −1 and 15 min) and the improvement decreases as the scale increases. Because the scale and lifetime of rainfall decrease at higher rain rates [45], [46], this increase in forecast skill indicates better skill at forecasting the growth and decay of rainfall. This implies that it is beneficial to use a CNN to model the growth and decay of rainfall as compared to the IDE model in LINDA, especially at small scales and high rain rates.

IV. CONCLUSION
The objective of this study was to demonstrate that the skill of a CNN model for rainfall nowcasting can be increased by separating the growth and decay of rainfall from the advection. This was achieved by applying a Lagrangian transform to the rainfall fields, and using the CNN to predict the rain rate difference (i.e., time derivative) of successive fields.
Overall, the proposed L-CNN model showed higher skill than the reference models especially in predicting higher rain rates (>5 mm·h −1 ) and small-scale rainfall. Compared to the RainNet, the L-CNN showed less underestimation of rainfall, and it retained high rain rates longer. Compared to the IDE model used in LINDA, the CNN model in the L-CNN was better at capturing the growth and decay of high-intensity rainfall, which was seen, e.g., in the improved ETS scores at short leadtimes.
A major factor in the improved skill of the L-CNN is the temporal differencing applied in Lagrangian coordinates. This allows the CNN to model only the change in rainfall intensity due to growth and decay. This approach is also independent of the CNN used in the model, so applying the Lagrangian transform and temporal differencing to other CNN models with input data derived from radar reflectivity should provide similar improvements in forecast skill. The RainNet model used in the L-CNN was selected mostly in view of the simplicity of its architecture and the availability of an open source implementation. In fact, other recent convolutional or recurrent neural networks might outperform RainNet, but since the absolute performance of RainNet is outside the scope of this study, such comparison was not done here.
A drawback of the L-CNN model is that, like many other CNN-based models, it still suffers from blurring and loss of rainfall, even though the effect is delayed compared to RainNet. This reduces the usability of the model in applications where long leadtimes are required. Additionally, whether the L-CNN actually forecasts the growth and decay of rainfall depends on the quality of the motion field used in the Lagrangian transform. The motion field can be impacted by, e.g., edge effects or gaps in radar coverage in the radar composite. However, these quality issues in weather radar data also impact other methods.
Another drawback of the L-CNN model is the high computational cost at inference compared to pure machine learning models, where inference is often relatively cheap compared to training. In L-CNN, inference steps require that each nowcast is extrapolated to Eulerian coordinates, which can be computationally costly when the leadtime increases. However, the computational cost would be similar compared to methods that also require extrapolation, such as S-PROG and LINDA.
The L-CNN model could be expanded in several ways. The differenced rain rate fields predicted by the CNN are inherently noisy, so instead of directly summing them to obtain the predictions, a separate CNN could be applied to produce and postprocess the nowcasts. This approach would be similar to the one used in LINDA where an integro-difference equation model is applied to estimate the rain rate fields in Lagrangian coordinates. Additionally, the semi-Lagrangian advection used in the model could be improved by applying a temporally varying motion field instead of the constant field used in this study.
Furthermore, the L-CNN is a deterministic model. However, estimating the forecast uncertainty is essential for many applications. A straight-forward way to extend the L-CNN to produce forecast uncertainty estimates would be to replace the CNN component with a deep learning model capable of producing ensemble predictions, such as a GAN model (e.g., [34]). However, each ensemble member produced by the ML model needs to be extrapolated to Eulerian coordinates, which is computationally expensive.
Our results show that machine learning nowcasting methods can be improved significantly by integrating knowledge of rainfall physics into them. To the best of our knowledge, this study presents one of the first attempts to do that. Similarly, extrapolation-based nowcasting models can be improved by using machine learning to model the parts of the rainfall processes where the physical description is unknown or too complex to be modeled with limited input data, as, for example, the growth and decay of rainfall.