Statistical‐Physical Adversarial Learning From Data and Models for Downscaling Rainfall Extremes

Quantifying the risk from extreme weather events in a changing climate is essential for developing effective adaptation and mitigation strategies. Climate models capturing different scenarios are often the starting point for physical risk. However, accurate risk assessment for mitigation and adaptation often demands a level of detail they typically cannot resolve. Here, we develop a dynamic data‐driven downscaling (super‐resolution) method that incorporates physics and statistics in a generative framework to learn the fine‐scale spatial details of rainfall. Our approach transforms coarse‐resolution (0.25°) climate model outputs into high‐resolution (0.01°) rainfall fields while efficaciously quantifying the hazard and its uncertainty. The downscaled rainfall fields closely match observed spatial fields and their distributions. Contrary to conventional thinking, our results suggest that coupling simple statistics and physics to learning improves the efficacy of downscaling midlatitude rainfall extremes from climate models.


Introduction
The susceptibility to extreme weather events will likely worsen in a changing climate under continued warming from greenhouse gas emissions (Seneviratne et al., 2021).The adverse effects of weather extremes are broad, including but not limited to food security, urban infrastructure, public health, and ecological sustainability.For example, the rising severity and frequency of cyclones and extreme rainfall events lead to more frequent and severe floods, causing economic damage and loss of human lives and livelihood (Neumann et al., 2015).Quantifying the hazard of weather extremes in a changing climate is vital for estimating risk and optimizing climate change adaptation and mitigation strategies.
Conceptually, weather extremes are rare events within a non-stationary, nonlinear stochastic process.Unfortunately, being few and far between, observational records of extreme events are often too short to determine future risk, rendering the historical record inadequate for modeling future risk.One could use physically based numerical climate models to infer the frequency and severity of rare extremes in nature (John et al., 2022).However, coarse-resolution models often do not capture essential fine-scale detail while becoming computationally expensive at the needed high resolutions.Thus, there is strong interest in Downscaling (super-resolution) methods (Wilby & Wigley, 1997) with improved efficacy.
Machine Learning (ML) can provide the needed efficacy.The methodology suggested is to learn a downscaling function using long-range simulations of a relatively small ensemble of high-and low-resolution climate model pairs under different climate scenarios.After that, downscaling many low-resolution climate model outputs to high-resolution fields is rapid.When trained with uncertainty-aware methods (Trautner et al., 2020), the downscaling function would also help rapidly quantify risk.However, high-resolution reference models are not ground truth.Resolving the compounding effect of model bias and downscaling imperfections is essential to improve trust and support the ML pathway.This paper takes an initial step in addressing the issue.Here, we adopt a hindcasting approach using extreme historical events in coarse-resolution reanalysis model outputs and fine-scale observations to train a downscaling function.We compare downscaled results from more recent years with measurements to test performance.Although observational data may have some biases, typically, they are small, so high-resolution climate model bias becomes a non-issue.Since we expect the learning process to correct low-resolution model bias, using observational data effectively yields a verifiable approach that is also immediately applicable.Training to the present time using a coarse model and projecting the risk into the immediate future (inter-annual to decadal timescale) is vital to many applications, such as underwriting insurance and disaster response planning.
However, model-and data-trained downscaling has additional issues that need to be resolved.Downscaling is already an ill-posed inverse problem, so the inability of ML solutions to satisfy basic physical principles without additional support complicates the situation.By definition, there is a severe lack of data at the rare extremes, but developing synthetic data augmentation strategies for "detail-preserving regularization" is challenging.An added training/learning challenge is that one-to-one correspondence between climate model fields (inputs) and observational data (outputs) only sometimes exists.Additionally, missing/spurious events, timing errors, and intensity differences are possible.
Here, we propose a novel dynamic data-driven downscaling approach (Blasch et al., 2018) that overcomes these issues.While the methodology is extensible to many meteorological fields (e.g., near-surface wind and heat stress), this study focuses on rainfall extremes in the mid-latitudes, a fundamental driver of inland flooding.
Our approach consists of the following steps (see Figure 1): 1. Physics of Orographic Precipitation: a simple upslope lift or spectral model captures the basic structure of rainfall fields and provides high-resolution estimates of orographic precipitation.A coarse climate model field input yields a relatively spatially detailed orographic rainfall component.2. Dynamic Data-Driven conditional Gaussian process (CGP): Indexing the training data (coarse model outputs and high-resolution observations) on a manifold and employing a conditional ensemble-based Gaussian process (Ravela, 2016;Trautner et al., 2020) produces "first-guess" rainfall estimates.3. Generative Adversarial Learning: In the third step, the "first-guess" downscaled rainfall and orographic rainfall estimates are input to an adversarial learning framework.We produce high-resolution deterministic rainfall from coarse model inputs with little training data by priming a two-stage Generative Adversarial Network (GAN) with physics and statistics.4. Optimal Estimation for Bias Correction: In the fourth step, we inject stochastic perturbations of residual excess rainfall into the deterministic output to produce an ensemble-based optimal estimate for bias correction (Ravela & McLaughlin, 2007;Ravela et al., 2010;Trautner et al., 2020).We show that the distribution of extreme annual rainfall captured by downscaled predictions closely matches the observation.
In the steps mentioned above, using physics and statistics reduces the need for extensive training data for the GAN, compensating for the spatial details and nonlinearity neither mechanism captures alone.Test results suggest that compared to dynamical downscaling, which is computationally expensive, or statistical downscaling, which is scale limiting, or using detailed physics followed by simple statistics, simple physics and statistics with learning capture high-resolution spatial patterns with fidelity and match the observed distributions.Using a reanalysis model as input (ERA5) and observations (Daymet) as training output immediately enables the interannual risk assessment for insurance and disaster risk response, with potential for future applicability to longterm climate risk.Wilby and Wigley (1997) has written one of the earliest comprehensive reviews of downscaling methodology for precipitation simulated by climate models, where they have broadly categorized the methods into four groups: regression methods, weather pattern-based approaches, stochastic weather generators, and limited area modeling.

Related Work
In practice, a downscaling process can be hybrid by combining more than one of those techniques.The limited area modeling, also known as dynamical downscaling, involves embedding a relatively high-resolution numerical model inside the coarse-resolution model (e.g., Giorgi et al., 2009).The computationally expensive nature of the dynamical downscaling methods has led to the popularity of non-expensive statistical downscaling approaches, especially regression methods.It entails establishing linear or nonlinear relationships between coarse-resolution predictor variables and fine-resolution predictand (precipitation) from historical records.In more recent literature (e.g., Anandhi et al., 2008), regression methods are considered a part of a broader category, named transfer functions, including modern ML approaches for downscaling.This section reviews a non-exhaustive list of relevant statistical and machine-learning methods for precipitation downscaling.).ERA5 rainfall is initially downscaled in the first step using conditional Gaussian Processes and combined with orographic rainfall estimates from the upslope model.The outputs pass to a Generative Adversarial Network (GAN-1), which produces fine-resolution rainfall fields.In the next step, another adversarial network (GAN-2) is trained on upscaled Daymet rainfall fields and then applied to the output of GAN-1 to produce even finer-resolution rainfall.Over a Validation period, rainfall return-period distributions computed from downscaled and observed rainfall fields train bias correction functions.Finally, Bias-corrected rainfall risk curves are back-projected onto rainfall fields.The white ellipses denote methods, and the colored double-boxes denote input or output to the models.The dotted arrows represent actions performed only during training and not in operation.

Statistical Downscaling
Bias Correction and Spatial Disaggregation (BCSD) was one of the earliest successful statistical downscaling techniques.It is a simple yet effective parametric downscaling method, which begins by bias correction of the distribution of coarse-resolution rainfall to match the high-resolution rainfall followed by spatial interpolation (Wood et al., 2002).Despite its simplicity, BCSD has been shown to outperform some relatively complex statistical methods (Bürger et al., 2012).Multiple linear regression (MLR) has been used (Hessami et al., 2008;Najafi et al., 2011) to take the predictive abilities of climate variables other than rainfall into account.MLR approaches are often accompanied by a bias correction technique, such as quantile mapping, and a dimensionality reduction technique, such as Principal Component Analysis or Independent Component Analysis.Najafi et al. (2011) show that, if proper predictors are chosen, MLR techniques can be an efficient method for downscaling.Non-parametric methods such as k-nearest neighbors (Gangopadhyay et al., 2005), kernel density estimators (Lall et al., 1996), kernel regression (Kannan & Ghosh, 2013), non-homogenous Markov model (Mehrotra & Sharma, 2005), Bayesian model averaging (Zhang & Yan, 2015) etc. have also been widely used for rainfall downscaling.Mannshardt-Shamseldin et al. (2010) have used Generalized Extreme Value Theory with regression methods to downscale extreme precipitation.

ML-Based Downscaling
Besides the standard statistical methods, neural methods such as multilayer perceptron (Xu et al., 2020), artificial neural network (Schoof & Pryor, 2001), and quantile regression neural network (Cannon, 2011) have been adapted for rainfall downscaling.Alternative ML approaches such as random forests (X.He et al., 2016), support vector machine (Anandhi et al., 2008;Tripathi et al., 2006) and genetic programming (Sachindra & Kanae, 2019) have also been applied to solve the downscaling problem with varying degrees of success.None of those mentioned above models have consistently outperformed the others in terms of performance and interpretability (Baño-Medina et al., 2020).
With the advent of deep learning techniques, a new suite of ML-based approaches, such as Recurrent Neural Networks (Q.Wang et al., 2020), Long Short-Term Memory (Miao et al., 2019), autoencoder (Vandal et al., 2019), U-net (Sha et al., 2020) etc. have been available for rainfall downscaling.Because of their deep layered structure, deep learning methods are well-suited for extracting high-level feature representations from high-dimensional climate data sets.Several Convolutional Neural Network (CNN)-based approaches, developed for single image super-resolution, are also brought into the climate science domain, as they can explicitly capture the spatial structure of climate variables.Although the methods primarily apply to computer vision, insights also apply to the precipitation downscaling problem.Super-resolution CNN (Dong et al., 2015) was one of the first successful approaches developed in this field.Many subsequent models, such as Very Deep Super-resolution (Kim et al., 2016), Enhanced Deep Super-resolution (Lim et al., 2017), Deep Back-projection Network (Haris et al., 2018) etc., were built upon it.Adversarial super-resolution method Super-resolution Generative Adversarial Network (Ledig et al., 2017) showed that GAN could better model the high-frequency distribution of the image and improve the sharpness and perceptual quality.Enhanced Super-resolution GAN (ESRGAN) (X.Wang et al., 2018) improves upon this approach by providing a better GAN-loss formulation.These CNN and GANbased methods have since emerged for various rainfall downscaling studies (Singh et al., 2019;Vandal et al., 2017;Watson et al., 2020).Video super-resolution methods have been applied to rainfall downscaling to establish temporal relationships (Teufel et al., 2023).

Physics-Based Downscaling
Physics-based approaches for rainfall downscaling analytically estimate high-resolution orographic precipitation components to augment climate model-simulated large-scale rainfall fields.The literature discusses the impact of orography on regional precipitation patterns and orographic precipitation modeling (Roe, 2005).A simple model for orographic rainfall estimation would be the Upslope model (Collier, 1975), where we assume the condensation rate is proportional to the vertical wind velocity, and the condensed rainwater falls immediately to the ground.Considerations of downstream hydrometeor drift (Sinclair, 1994) may reduce these biases.The linear theory for modeling orographic precipitation (spectral method) (Smith & Barstad, 2004), which improves the upslope model, introduces a time delay component between condensation and rainfall and vertical moisture dynamics.However, the spectral model is sensitive to its parameters (Paeth et al., 2017).One of the most significant disadvantages of the spectral model is its inability to account for the spatial variability of the climate variables and parameters.On the other hand, the upslope model incorporates spatial variability of the input variables, which makes it worthwhile despite its biases.

Methods
Our approach downscales low-resolution rainfall data from the European Centre for Medium-Range Weather Forecasts Reanalysis (Hersbach et al., 2020) (ERA5).The high-resolution rainfall fields are comparable to the Daymet gridded daily rainfall data set (Thornton et al., 2022), which serves as the ground truth.Because of model and data collection biases, ERA5 and observed precipitation do not have one-to-one correspondence at a daily scale, which makes learning a downscaling function between them challenging.A two-step downscaling process overcomes this problem (Figure 1).In the first step (GAN-1), rainfall is downscaled from 0.25°to 0.1°resolution, using ERA5 data as the predictor and the corresponding ERA5-Land data as the ground truth.ERA5-land provides a replay of the land surface component of ERA5 at a finer resolution and has high spatial and temporal correspondence with ERA5 (Muñoz- Sabater et al., 2021).The downscaled rainfall is initially estimated by actively searching data on a manifold to learn the downscaling function incrementally using an iterative CGP.
Upon convergence, the "first-guess" downscaled rainfall field and a physics-based estimation of orographic rainfall are processed by an adversarial learning framework (GAN-1) to refine finer-scale details.In the second step (GAN-2), upscaled Daymet rainfall fields become predictors to the corresponding high-resolution Daymet fields for training downscaling from 0.1°to 0.01°resolution.These two trained models provide the pathway from ERA5 predictors to Daymet-resolution downscaled rainfall.Even though this transformation may contain bias, our approach corrects it in the final bias correction step.

Conditional Gaussian Process
We use an iterative CGP regressor to build a dynamic data-driven downscaling method.Figure S1 in Supporting Information S1 shows a schematic representation.First, we index the training pair of low-and high-resolution rainfall fields on a manifold (Ma & Fu, 2012;Ravela, 2016) to query and retrieve nearest neighbors and actively estimate the downscaling function.At each iteration, the downscaled rainfall field upscales again, which targets new data on the manifold for the next learning iteration.Upon convergence, it produces a "first-guess" downscaled rainfall field.
Let a low-resolution rainfall field be L query , and its high-resolution counterpart H query to be generated.We make the nearest neighbor search through the low-resolution fields (L train ) of the manifold for the closest match to L query , and we denote it as L k .We also obtain the high-resolution counterpart associated with L k , indicated as H k .Now we iteratively improve L k and H k until L query and L k converges, after which we assign H k to our desired "firstguess" downscaled field H query . (1) Here D and U are downscaling and upscaling functions, respectively.The rate α is set as a scaling constant.In this study, we used averaging and pooling as the upscaling function and the following Gaussian process regressor as the downscaling function.
where C LL is the sample conditional covariance of L train , C HL is the cross-covariance between H train and L train .To overcome dimensionality issues (Yadav et al., 2020), ensemble-based reduced-rank square-root methods (Ravela & McLaughlin, 2007;Ravela et al., 2010) are employed.

Upslope Orographic Precipitation Estimation
We use the upslope model (Roe, 2005) for estimating orography-induced precipitation, assuming that precipitation is proportional to the total condensation rate in a vertical column of a saturated atmosphere induced by the vertical wind velocity.
Here, S is the orography-induced condensation rate and w is the orographic vertical wind velocity.Further, ρ is the air density, q s is the saturation specific humidity, p is atmospheric pressure level, and p s and p toa are atmospheric pressure at the surface and the top of the atmosphere, respectively.In this study, we assume the top of the atmosphere to be at 200 hPa level.The orography-induced vertical wind velocity at the surface is estimated by, where u → and v → are zonal and meridional components of horizontal wind at the surface, and Z e and Z n are the slope of the surface at eastward and northward directions.We interpolate the elevation of the surface to the resolution of ERA5-land before estimating the slope.w is presumed to decrease linearly from the surface to the top of the atmosphere, where it becomes zero.The saturation moisture content (i.e., ρq s ) is estimated by, where R d is the gas constant of dry air (287.04J/kg/K), R v is the gas constant of saturated air (461.5 J/kg/K), T is the air temperature, and e s is the saturation vapor pressure.e s is estimated by, where, L v is latent heat of vapourization (2.26 × 10 6 J/kg) and e s0 is the reference saturation vapor pressure at the reference temperature T 0 .When T 0 is 273.16K, e s0 is 611 Pa.
Equations 4-6 enable orography-induced vertical wind velocity and saturation moisture content calculations at discrete pressure levels up to 200 hPa, where ERA5 model outcomes are available.We compute the gradient at each pressure level using the second-order finite difference to estimate the integral in Equation 3.

Spectral Method for Orographic Precipitation Estimation
The linear method for orographic rainfall estimation (the spectral method) incorporates the time delay between conversion from cloud water to hydrometeor and hydrometeor to precipitation and vertical dynamics (Smith, 2003).
where P(k,l) is the Fourier transform of the orographic precipitation, ĥ(k,l) is the Fourier transform of the terrain elevation, k and l are horizontal wavenumbers, σ = uk + vl is the corresponding intrinsic frequency, u and v are zonal and meridional components of wind, C w = ρq s is the thermodynamics uplift sensitivity factor, a coefficient relating condensation rate to vertical motion (see Equation 5), m is the vertical wavenumber, and H w is the depth of moist layer penetrated by vertical wind.
Journal of Advances in Modeling Earth Systems where, N 2 m is moist static stability, given by g T (γ Γ m ) , g is gravitational acceleration, T is temperature, γ is the environmental lapse rate, and Γ m is the moist adiabatic lapse rate.
where, R v is the gas constant of saturated air (461.5 J/kg/K) and L v is latent heat of vapourization (2.26 × 10 6 J/kg).

Adversarial Learning
In GAN, a Generator (G) and a Discriminator (D) network each play a game.Here, the game's outcome is to produce high-resolution rainfall from low-resolution input.The Generator (G(L; α G )) maps the input lowresolution (L) rainfall to a super-resolution reconstruction using a deep convolutional and upsampling network with parameters α G .The Discriminator (D(L; α D )), repeatedly fed fine-resolution rainfall ground truth (H) and the super-resolution generator output, learns to tell them apart.The output of the Discriminator is a scalar value, which represents the probability of a rainfall field being "real" (i.e., ground truth).As the two networks train iteratively as adversaries, the Generator's rainfall fields become more realistic, and the Discriminator better distinguishes between them.This is a minimax optimization problem, where the Generator is trying to increase logD(G(L), while the Discriminator is trying to reduce it by optimizing their parameters α G and α D .The adversarial loss functions of the networks are expressed as the following.(Goodfellow et al., 2014) However, in this study, we are using the Relativistic Average GAN (RaGAN) approach, which compares the discriminator outcome of a real image (H) with the average of the fake images (G(L)) and vice versa.Compared to its non-relativistic counterpart, the Relativistic Discriminator increases stability and generates higher-quality samples (Jolicoeur-Martineau, 2018).The discriminator loss function (L D ) in our study is given by, The loss function for the Generator (L G ) considers adversarial Loss as well as pixel-wise ℓ 1 Loss.
where λ ∈ R is a regularization factor tunable as a hyperparameter.
The network architectures of the Generator and Discriminator follow ESRGAN (X.Wang et al., 2018); Figure S2 in Supporting Information S1 shows a schematic representation.The basic building block of the ResNet-style (K.He et al., 2016) generator is a dense residual block, which is multiple convolutional blocks connected with dense connections.The ESRGAN (X.Wang et al., 2018) approach replaced them with Residual in Residual Dense Block (RRDB), which consists of stacked dense residual blocks connected with skip connections.The convolutional component of the Generator stacks RRDBs with a global skip connection.It operates on the lowresolution space, followed by an upsampling part that increases the resolution of the rainfall field.In our model, for GAN-1, we have delegated the upsampling job to CGP, and the GAN performs only the convolutional operations on the high-resolution space.GAN-2 does not use CGP but instead uses a sub-pixel convolution (Shi et al., 2016) (also known as pixel-shuffle)-based upsampling network.For the Discriminator, we have used a VGG-style (Simonyan & Zisserman, 2014) deep convolutional network that converts a given real/fake rainfall field into a single value, which is interpretable as the probability that the rainfall field is "real."Unlike ESRGAN, we do not use any pre-trained VGG network to compute perceptual Loss, as they are unsuitable for capturing climate data features.

Hazard Quantification
Due to the lack of one-to-one correspondence between ERA5 and Daymet, the final downscaled rainfall and ground truth are not event-wise comparable.However, one can compare the rainfall risk (hazard) estimated from both of them.In climate studies, risk typically includes hazard, exposure, and vulnerability; mathematically, the risk is a distributional representation (e.g., exceedance probability) of a variable of interest.We use the term "risk" in the latter sense.To assess the risk of an extreme rainfall event, a two-parameter Generalized Pareto distribution (Hosking & Wallis, 1987) is fit to the annual return periods (R) calculated from the empirical cumulative distribution function (E) of rainfall r from each grid point of each spatial grid and time window of interest.
The probability density function of the Pareto distribution is given by, We compare the average annual return period curves simulated by the fitted Pareto distributions for both highresolution truth and super-resolution predictions and estimate the difference between their mean (bias) and standard error.

Bias Correction
ERA5 underestimates the rainfall hazard due to its tendency to underestimate the intensity of severe storms, leading to bias.To reduce this bias, we model the observed distribution of rainfall excess beyond the 99.9th percentile, conditioned on spatial location and season (month of the rainfall event).Injecting perturbations from the excess rainfall distribution into the deterministically downscaled rainfall fields improves bias.Additionally, it yields higher-order moments useful for an optimal estimation-based bias correction method (Equation 15) for additional improvement.We estimate the excess rainfall distribution on the training data set and develop the optimal estimation-based bias correction equations on the validation data set.The optimal estimator is: Here y t , y p , and y * p represents annual return period curves for ground truth, the mean of the stochastic injections (the prior), and bias-corrected (the posterior) rainfall, respectively.Again, reduced-rank square-root methods are helpful for highly resolved (and, therefore, high-dimensional) risk curve (e.g., return periods, exceedance probabilities) ensembles (Ravela & McLaughlin, 2007;Ravela et al., 2010).Using a quantile mapping method to bias-corrected rainfall fields, we back-project the bias-corrected return period curve.The process of learning the correction and back-projection is applied once.Even though it may be possible to learn and use it iteratively, doing so risks overfitting.

Evaluation Methods
Prior research on ML-based super-resolution techniques is mainly focused on computer vision problems and evaluates their models based on metrics such as peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) (X.Wang et al., 2018).However, we believe those metrics are unsuitable for assessing extreme rainfall patterns and their distributions.We propose the following evaluation methods that are more suitable for our use cases.The supplementary material includes additional metrics for comparison.

Empirical CDF-Based Metrics
For this study, we define an extreme event as the 90th percentile or higher mean rainfall in the study region.For each extreme event, we generate ECDF of rainfall values on all spatial grids for the downscaling model's highresolution reference and super-resolution prediction.In a non-ideal scenario, these two CDFs don't overlap.The mean horizontal difference between these two CDFs is known as bias, and the supremum vertical difference is known as the Kolmogorov-Smirnov (KS) statistic (Massey, 1951).Each event can be represented as a point in the bias-KS statistic plane.The better-performing method produces a point cloud closer to the origin, that is, it has relatively less bias and KS statistic.In practice, these point clouds generated by various downscaling methods may overlap significantly, and it may be challenging to visualize their differences.We summarize each point cloud as their mean, based on which we can assess the performance of each downscaling method at a glance.

Mutual Information-Based Metric
Entropy measures the expected amount of information or uncertainty inherent to a random variable outcome.The entropy H(X) (Cover & Thomas, 2006) of a random variable X is given by, where p() represents the probability function.The mutual information (I(X, Y)) between two random variables X and Y represents how much uncertainty of X is reduced when Y is known.
We can normalize the mutual information based on the entropy of the individual random variables.
The super-resolution produced by a better-performing downscaling model will have higher mutual information with the high-resolution truth.The measure as used here is sensitive to spatial shifts.

Results
We employ a two-step downscaling method to tackle the lack of correspondence between ERA5 and Daymet extreme rainfall fields.In the first step, downscaled rainfall is initially estimated by iteratively learning the downscaling function by actively searching data on a manifold using a CGP.After convergence, the downscaled rainfall field and a physics-based estimate of orographic rainfall are refined using an adversarial learning framework (GAN-1).In the second step (GAN-2), upscaled Daymet rainfall fields are utilized as predictors for corresponding high-resolution Daymet fields to train downscaling.These two trained models are applied in succession to obtain a super-resolution reconstruction of rainfall that is further bias-corrected to capture the observed distribution of extremes.

4.1.
Step 1: Downscaling From 0.25°to 0.1°W e evaluate the following methodological choices for the first step of downscaling: 1. Bicubic Interpolation 2. Physics-based downscaling (Upslope Method) 3. Physics-based downscaling (Spectral Method) 4. Gaussian Process Regression (GP) 5. GAN 6. GP + GAN 7. GP + GAN + Upslope 8. GP + GAN + Spectral To evaluate the methods, the geographic region surrounding Cook County (Chicago), Illinois, USA, a flood-prone area, is selected as the primary domain of this study.Additionally, we chose another study in Denver, Colorado, USA, a mountainous region, to examine the efficacy of the orographic precipitation models.
Figure S3 in Supporting Information S1 presents a qualitative assessment of the performance of the above methods by comparing low-resolution input, super-resolution output, and high-resolution reference for a particular extreme event.We notice that the downscaled rainfall fields capture the spatial pattern of observed rainfall well when a learning method is involved.However, the naive bilinear interpolation method and the pure physics-based model without statistical post-processing cannot capture the spatial pattern of the rainfall at a finer scale.models incorporating adversarial networks seems to outperform their alternatives.
In Figures 2 and 3, we present a quantitative comparison among the models based on the Empirical Cumulative Density Function (ECDF)-based and mutual information-based metrics, described in Section 3.7.The normalized mutual information is individually computed for each spatial grid, and box plots represent their spread.The top and bottom lines of each box represent 25th and 75th percentile, and the red middle line represents 50th percentile of the mutual information.The bias and the KS-statistic represent the deviation of the ECDF of the downscaled rainfall events from that of the observed ones.We consider a model better-performing when the corresponding mutual information is high and bias and KS-statistic are low.As expected, the bilinear interpolation method and the pure physics-based models perform worse in all metrics.The performance of the rest of the models is comparable.There is a slight improvement in performance in the combined GP + GAN method compared to its GP-only and GAN-only counterparts.GP + GAN + Physics models outperform all other models for both mutual information and ECDF-based metrics.The improvement from the more sophisticated physics model (Spectral method) is minimal over the simpler physics model (Upslope method).The above findings are consistent for both the Chicago and Denver regions.We performed a t-test on each pair of methodological choices to examine if the changes were statistically significant.Figures S5 and S6 in Supporting Information S1 show that the improvements in CGP + GAN + Physics methods are statistically significant for mutual information-based metrics but not for the ECDF-based metrics for both regions.

4.2.
Step 2: Downscaling From 0.1°to 0.01°F or the second step of downscaling, the following three methods are evaluated: 1. Bicubic Interpolation 2. CGP 3. GAN Other methods from the previous steps are skipped because of their computationally demanding nature and lack of climatic data availability at this scale.
Figure S4 in Supporting Information S1 compares the above methods with a specific event's high-resolution truth (Daymet rainfall).While the interpolation method can capture the large-scale spatial structure of the rainfall pattern, it fails miserably when it comes to the high-frequency details.The CGP approach can capture the highfrequency structure relatively better, but it suffers from high noise in the output, which deteriorates the overall performance.The GAN approach can reasonably capture high-frequency rainfall, despite room for improvement.Figure 4 presents the ECDF and mutual information-based metrics for the Chicago and region.CGP outperforms interpolation by a large margin in bias and KS-statistic but performs poorly on mutual information due to high noise.GAN outperforms both methods for all metrics.

Combined Downscaling
Once both the downscaling models are trained, we apply them in succession to obtain the final downscaling outcomes.A qualitative assessment of the method's performance can be seen in Figure 5, where we compare a rainfall event simulated by ERA5 and its downscaled outcomes.A direct comparison with observation is impossible due to a lack of correspondence between ERA5 and Daymet rainfall.Instead, we evaluate the performance of the combined downscaling method to capture extreme rainfall hazard, following Section 3.5.Figure 6 compares the Pareto-distribution simulated mean annual extreme rainfall return period curve for observations, deterministic downscaling, stochastic downscaling, and the optimal estimate.Deterministic predictions underestimate the risk curve because ERA5 underestimates the magnitude of some very extreme (>99.9thpercentile) rainfall events-however, the stochastic injection of residual extreme rainfall and optimal bias correction close the gap.The mean corrected return-period curves are derived along with an upper and lower bound representing one standard error.This error bound serves to quantify the uncertainty surrounding the magnitude of rainfall across all events in the testing period.The final annual projections show around 6.8% bias and 10% standard error at Chicago and 6.4% bias and 12% standard error at Denver, even at an estimated extreme 1000-year return period.It is important to note that the data may not contain a 1000-year rainfall event but is a prediction from the fitted distribution.Finally, we back-project the bias-corrected return-period curve to the rainfall field to obtain the final bias-corrected downscaled rainfall predictions.An example rainfall field from this final prediction can be seen in Figure 5.

Discussion
This study leverages data, physics, and ML to develop an approach for rainfall downscaling and estimating extreme rainfall hazards.At the outset, we sought to overcome the difficulty that insufficient training data at the extremes would raise, the nonlinearity that straddling a substantial separation of scales entails, and the lack of correspondence between the source and target fields that the training data presents.In contrast to conventional physics-only, statistics-only, and physics-followed-by-statistics methods, we couple simplified physics and statistics with generative learning, which is novel and promising.Physics (upslope/spectral), statistics (CGP), and learning (GAN) fulfill distinct roles.In our downscaling method, a generative adversarial learning model captures the relatively fine-scale structures of the rainfall field.Priming the learning model with a CGP alleviates the need for extensive data augmentation strategies, given sparse historical extreme rainfall data.Priming with orography also accrues similar advantages; additionally, this improves the physical consistency of the result.While each step is somewhat limited in skill, they support each other's limitations, doing better than anyone alone to statistically improve the model significantly (see Figures 2-4, Figures S5 and S6 in Supporting Information S1).For simplicity, one could take alternative approaches, such as directly upscaling Daymet rainfall to ERA5 resolution for training.However, the proper upscaling function is nonlinear, without which significant bias can be introduced in the learned downscaling mapping.
Our work suggests that if physics and statistics can handle processes at the "larger" scales, learning appears to do a fine job at the "finer" ones.When tested with data containing correspondence (ERA5-to-ERA5Land or upscaled observations to observations), each GAN is skillful in detailed reconstruction.However, we must be mindful that the final result in the absence of correspondence will have uncertainties beyond those induced by observational noise.Input bias (from ERA) and the downscaling operator couple to inflate the uncertainties beyond intrinsic noise levels.In applications such as risk, aggregated feedback in the form of bias correction ameliorates the problem.In other applications, such as precipitation nowcasting or short-term forecasting, an online framework, such as data assimilation, may be needed to reduce uncertainties.
Our method applies to the current climate.In principle, it also applies to future climate hazard studies by directly using the downscaling learned in the present climate or adapting it to train with a few additional high-and lowresolution climate model runs.Our approach can incorporate multiple sources to quantify uncertainty.This includes Monte Carlo samples of coarse resolution model input fields or their perturbations, which our system rapidly downscales.Backprojecting resampled return period curves quantifies additional uncertainties in rainfall fields.This way, rapid, robust assessments of regional rainfall-driven risk estimation become tractable.Estimating parameter uncertainty in the learning model (Trautner et al., 2020) would improve uncertainty quantification.We anticipate that incorporating physics for other processes, such as convective rainfall, will increase the model's efficacy.Additionally, adapting the loss function to capture finer-scale details remains an area for investigation.

Conclusions
In contrast to statistical, physical, or detailed simulation followed by statistics, we developed a downscaling approach that uses simplified physics and statistics to prime a two-stage GAN, post-correcting the downscaled output with an optimal estimation-based bias correction scheme.We apply this to downscale ERA5 model precipitation to Daymet resolution without explicit correspondence between the fields.While KS/bias comparisons of distributions are not sensitive to geometric shifts, the mutual information measure is sensitive to spatial patterns.Ablation experiments using correspondence on the model or data downscaling sides using mutual information indicate that coupling simplified physics (e.g., Upslope or Spectral) with CGP and GAN is statistically significant.However, as expected, the proposed and alternative distribution comparisons are weakly informative.We tested the approach in two mid-latitude regions: Denver, where orography is significant, and Chicago, where it isn't, with similar benefits.However, we found no significant difference in performance from the choice of the physical scheme, suggesting that their large-scale patterns are of relative importance.Similarly, including CGP in the combination is significant, though it performs less effectively.This suggests that CGP and physics might compensate each other while reducing the data burdens on the GAN, effectively serving as a priming mechanism.Due to Daymet's smooth fields, an additional example using CHIRPS (Text S2 and Figure S7 in Supporting Information S1) shows that the downscaling approach can reproduce the patterns in the training data.Our work can be applied to downscale from climate models to observed data or from low-resolution to high-resolution climate models, which we believe can enable short-term and long-term climate risk assessment for sustainability-related decision-making applications.

Figure 1 .
Figure1.Overview of the proposed downscaling methodology.Coarse-resolution (0.25°× 0.25°) climate model outputs are downscaled to fine-resolution (0.01°× 0.01°) rainfall in two steps (GAN-1 and GAN-2).ERA5 rainfall is initially downscaled in the first step using conditional Gaussian Processes and combined with orographic rainfall estimates from the upslope model.The outputs pass to a Generative Adversarial Network (GAN-1), which produces fine-resolution rainfall fields.In the next step, another adversarial network (GAN-2) is trained on upscaled Daymet rainfall fields and then applied to the output of GAN-1 to produce even finer-resolution rainfall.Over a Validation period, rainfall return-period distributions computed from downscaled and observed rainfall fields train bias correction functions.Finally, Bias-corrected rainfall risk curves are back-projected onto rainfall fields.The white ellipses denote methods, and the colored double-boxes denote input or output to the models.The dotted arrows represent actions performed only during training and not in operation.

Figure 4 .
Figure 4. Performance comparison of three downscaling methods (Bicubic Interpolation, conditional Gaussian process, and Generative Adversarial Network), for downscaling 0.1°rainfall to 0.01°, at the Chicago (top row) and Denver (bottom row) region, based on (a, c) ECDF-based metrics and (b, d) Mutual Information-based metric.

Figure 5 .
Figure 5. Qualitative comparison of rainfall event simulated by ERA5 and downscaled predictions.The top row showcases an event in the Chicago region on 12 July 2017, and the bottom row shows an event in the Denver region on 12 September 2013.(a, e) Low resolution rainfall simulated by ERA5.(b, f) Intermediate resolution rainfall produced by the first step of downscaling.(c, g) High-resolution rainfall produced by the second step of downscaling.(d, h) Final downscaled rainfall produced by bias correction.

Figure 6 .
Figure 6.Comparison of mean annual return period curves between high-resolution ground truth (Daymet) and superresolution prediction at (a) Chicago and (b) Denver.The solid lines represent the mean and the shaded area represents the standard error.