Modeling the Scaling of Short‐Duration Precipitation Extremes With Temperature

The Clausius‐Clapeyron (CC) relation expresses the exponential increase in the moisture‐holding capacity of air of approximately 7%/°C. Earlier studies show that extreme hourly precipitation increases with daily mean temperature, consistent with the CC relation. Recent studies at specific locations found that for temperatures higher than around 12 °C, hourly precipitation extremes scale at rates higher than the CC scaling, a phenomenon that is often referred to as “super‐CC scaling.” These scalings are typically estimated by collecting rainfall data in temperature bins, followed by a linear fit or a visual inspection of the precipitation quantiles in each bin. In this study, a piecewise linear quantile regression model is presented for a more flexible, and robust estimation of the scaling parameters, and their associated uncertainties. Moreover, we use associated information criteria to prove statistically whether or not a pronounced super‐CC scaling exists. The techniques were tested on stochastically simulated data and, when applied to hourly station data across Western Europe and Scandinavia, revealed large uncertainties in the scaling rates. Finally, goodness‐of‐fit measures indicated that the dew point temperature is a better scaling predictor than temperature.


Introduction
Short-duration extreme precipitation events pose a serious threat to society and economy as they may cause flash floods (Fadhel et al., 2018;Hettiarachchi et al., 2018;Pall et al., 2011). This is especially true in an urban environment due to the strong surface impermeability and dense population. For this reason, there is a large demand from city authorities to know the impact of climate change on the frequency and intensity of extreme rainfall that causes such events (Bader et al., 2018). Such knowledge could allow adequate decision making in the context of adaptation measures for urban planning or the design of hydraulic infrastructure. An intensification of extreme daily rainfall at the global scale is already observed and is expected to continue as a consequence of anthropogenic climate change (Barbero et al., 2017;Flato et al., 2013;Min et al., 2011;Seneviratne et al., 2012;Westra et al., 2013).
Intense precipitation during storm events is known to increase exponentially with temperature, as is traditionally explained on the basis of the Clausius-Clapeyron (CC) relationship, which states that the saturation water vapor pressure increases with temperature at a rate of 6%/°C at the surface, or 7-8%/°C in the column integral (Bao et al., 2017). The CC relation constrains the change of extreme precipitation from daily to seasonal scale (Pall et al., 2011) and is consistent with observations at midlatitudes (Utsumi et al., 2011).
However, Lenderink and van Meijgaard (2008) revealed for a single station location in the Netherlands that for temperatures above a threshold of around 12°C, the rate of exponential growth for short-duration events exceeds the CC scaling rate. Such increased scaling is often referred to as "super-CC scaling," and its existence has been confirmed for different locations worldwide (Lenderink et al., 2011;Mishra et al., 2012;Panthou et al., 2014;Schroeer & Kirchengast, 2018).
Where evidenced, the existence of super-CC scaling has particular societal implications, as it implies that in a warming world, the increase of short-duration precipitation extremes would exceed the currently expected CC-like growth and, consequently, would have an increased impact on human society. Westra et al. (2014) outlined the state-of-the-art research on the expected intensification of subdaily extreme rainfall due to climate change.
Note that, apart from super-CC scaling, subdaily rainfall extremes also exhibit sub-CC scaling, mostly present at the highest temperature range, but sometimes also extending over a wide temperature range (Drobinski et al., 2016). Negative scaling relationships between extreme daily precipitation and (dry-bulb) temperature were found for a great number of stations in the tropics (Ali et al., 2018). These authors also argue that CC scalings is mostly consistent with respect to dew point temperature. Moreover, it was argued that in Ali et al. (2018), Barbero et al. (2018), Lenderink and van Meijgaard (2010), Roderick et al. (2019), andWasko et al. (2018), dew point temperature is a better predictor for precipitation extremes than temperature.
To date, the physical cause of super-CC scaling behavior is still under discussion (Barbero et al., 2018;Berg et al., 2013;Hardwick-Jones et al., 2010;Loriaux et al., 2013). Potential physical drivers underlying the super-CC scaling include changes to moisture availability and atmospheric circulation and rainfall types (convective vs. stratiform). Rather than focusing on the physical interpretation, a methodology is proposed here for inference of scaling models, which also enables to discriminate the best proxies for extreme rainfall.
To investigate the scaling of extreme precipitation with (dew point) temperature, a binning approach is commonly used, where the data are divided in bins of 1 or 2°C. Next, for each bin, the high quantiles of the distribution of observed precipitation are then computed. Figure 1 shows the binned 0.90-and 0.99-quantiles of hourly precipitation against daily mean dew point temperature (dotted lines) for Uccle (Belgium). The scaling rate clearly increases for dew point temperatures above the change point T c ≈12°C, as was reported in Lenderink and van Meijgaard (2010). Bins with dew point temperatures above 19-20°C include very few data points but show a downturn of the percentiles. This downturn is stronger upon use of the subdaily dew point temperature and might be attributed to moisture limitations under warmer anticyclonic conditions. Hence, one must be cautious not to extrapolate the super-CC scaling to the highest temperature regimes, for instance, in the context of a warming climate, as this may severely overestimate precipitation extremes (Prein et al., 2017;Zhang et al., 2017). Wasko and Sharma (2014) have shown that, in case of a constant scaling across a wide temperature range, the use of linear quantile regression (Koenker, 2005) is superior to the binning approach for extracting the scaling properties. In particular, the quantile regression estimator is asymptotically unbiased (Koenker &

10.1029/2019EA000665
Earth and Space Science VAN Basset, 1978), in contrast to the binning approach. Moreover, a proper statistical framework is necessary given the lack of long and reliable subdaily time series (Li et al., 2019;Westra et al., 2014).
For locations exhibiting super-CC scaling, applying an additional piecewise linear quantile regression to both ranges (T ≤ T c and T c <T ≤ 19°C) turns out to be problematic: first, if the change point is not known in advance, the regression lines may show a discontinuity at the change point. Second, linear quantile regression provides uncertainty estimates of the scaling rates, but the uncertainty in the change point cannot be obtained.
In this paper, we apply the piecewise linear quantile regression framework of Li et al. (2011) by simultaneously estimating the scaling rates and the change point. In Figure 1, the quantiles (dashed lines) provided by the change point model of Li et al. (2011) have been added, which is made up of two different lines with slopes β 1 and β 2 instead of a single slope β. The work of Wasko and Sharma (2014) is extended here in the sense that we model two scaling regimes and, in addition, propose a more complete inference, including uncertainty estimation, model selection with information criteria, and predictor selection with goodnessof-fit measures.

Quantile Regression With a Change Point Model
In the present study, nonzero precipitation is denoted by P and, unless mentioned otherwise, T corresponds to the daily mean dew point temperature. Below, different models are introduced for Q τ (T), which is the τquantile of the conditional distribution of logðPÞ, given a certain T.
The CC model. The scaling of high quantiles of logðPÞ with the predictor variable, T, is calculated using a linear model (Wasko & Sharma, 2014): where β is the slope which is used to estimate the CC scaling rate.
The CC + model. The "change point model" uses piecewise linear quantile regression (Li et al., 2011): where β 1 is supposed to correspond with the CC scaling rate and β 2 may correspond to the scaling rate of super-CC (>7%/°C), sub-CC (<7%/°C, and >0%/°C), or even negative scaling (<0%/°C), but the latter case is rather exceptional (Ali et al., 2018). Note that given the continuity of the quantile regression lines at the change point T c , the CC + model has effectively four unknown parameters. We choose the free parameters, α 1 , β 1 , β 2 , and T c , and use the following relation: Although the change point model, equation (2), might be a valuable tool for quantifying changes in the extremes, other quantile regression functions may fit the data as well. Obvious other choices, including a quadratic model, a piecewise linear-quadratic model, and a piecewise linear-(powered) exponential model, were studied, but none of them were found to be systematically superior to the linear or piecewise linear function (see Text S1 in the supporting information).
The extrapolation to higher τ-quantiles (τ>0.99) is possible by applying the peak-over-nonstationary threshold model (Beirlant et al., 2004;Coles, 2001). However, this requires a number of additional assumptions, especially concerning the dependence of the change point on τ.

Inference for Quantile Regression Models
For a given set of n data pairs (T i ,P i ), the parameter estimates of the quantile regression models are obtained as the solution of the minimization problem (Koenker, 2005):

10.1029/2019EA000665
Earth and Space Science VAN with ρ τ ðuÞ ¼ uðτ−1 fu<0g Þ, the loss function. Estimation of the CC model is straightforward and can be performed using the R-package "quantreg" (Koenker, 2018;Wasko & Sharma, 2014). Efficient estimation of the CC + model can be achieved by minimizing over a range of possible T c values and then selecting the value of T c at which the minimum is reached.
Among the different models (linear, piecewise linear, or others), the best model is found by means of the Bayesian information criterion (BIC), which assesses goodness-of-fit but avoids overfitting by penalizing additional degrees of freedom. LetL be the maximum value of the likelihood function L for a statistical model, then with p the number of estimated parameters and n the number of data points (for the CC model, p=2, while p=4 for the CC + model). The model with the lowest BIC value is chosen. The approach of Yu and Moyeed (2001) is followed by considering a likelihood function based on an asymmetric Laplace distribution for the error term. Maximization of this likelihood function is equivalent to the minimization of equation (3).
In order to examine the influence of the predictor T to the precipitation data, one quantifies the relative success of a quantile regression model against the unconditional quantiles, Q ð0Þ τ , of the climatological distribution. These are obtained by minimizinĝ Koenker and Machado (1999) defined the goodness-of-fit criterion for a particular quantile as follows: whereŜ stands for eitherŜ CC orŜ CC þ . R obtains values between 0 and 1 and the closer R is to 1, the better the quantile regression model (CC or CC + ) for a certain τ-quantile.

Simulation Study
Prior to introducing observational data, stochastic simulations were used to gain insight into the performance of the proposed estimator. On the one hand, we assessed the ability of BIC to uncover the underlying model, and on the other hand, we examined two factors that may impact the inference: (i) the ratio of the scaling parameters, β 2 /β 1 , and (ii) the position of the change point T c in the support of the distribution of T.

Setup
We have drawn bivariate pairs ðT i ; logðP i ÞÞ from the following random process: with independent and identically distributed errors ε i . We assumed that ε i follows the Laplace distribution, where we put σ=0.014. To mimic the different types of scaling behavior, we assumed that the function g(T) is either linear (CC) or continuous piecewise linear (CC + ) with change point T c : 1 fTi≤Tcg ðα 1 þ β 1 T i Þ þ 1 fTi>Tcg ðα 2 þ β 2 T i Þ; ðCC þ simulationÞ; ( with β=β 1 =0.07, the well-known CC scaling rate, and for simplicity, we took α=α 1 =0. In the CC + simulation, two free parameters, β 2 and T c remain and they should be chosen in accordance with the observations. As we have found by fitting on observed precipitation extremes (see section 4 for data description), the normalized change point ranges between 0.5 and 0.9 for different station locations. Consequently, we chose β 2 =1.1,…, 2, and for T c we chose the 0.50, 0.60, 0.75, and 0.90-quantiles of the distribution of T (i.e., standard normal). For each model, bivariate time series with length n=10 4 were generated, which corresponds more or less to a realistic number of observational events spanning a period of around 50-60 years.

Simulation Results
For each of 10 3 simulation replicates, both the CC and CC + models were fitted and the corresponding BIC values were compared. The model with the lowest BIC value should be the underlying model. Figure 2 shows the percentage of cases that the correct model was selected. For τ=0.9, BIC correctly identified the CC model in 90% of the cases (see Figure 2a). However, the success rate strongly decreases for higher values of τ. This result may agree with Marshall et al. (2005) where it was emphasized that BIC model selection may not be consistent for smaller data sets (higher τ values imply less data above the regression lines). From Figures  2b and 2c, it can be seen that the success rate to correctly identify the underlying CC + model is the lowest (around 80%) for β 2 /β 1 =1.1, and effectively 100% for β 2 /β 1 =1.5. Another commonly used information criterion, the Akaike Information Criterion, was outperformed by BIC (not shown).
Finally, other random processes, including different σ values, or assuming a normal distribution for ε i , were found to yield similar results.
In sum, BIC-based model selection was seen to have reasonable success rates for τ ≤ 0.9. For the observations, therefore, we further base our model selection on the use of BIC at the 0.9-quantile, but one may rightfully question whether this is representative for more extreme precipitation. We will return to this in section 5. Figure S1 shows the mean and the 2.5th and 97.5th percentiles of the estimates of β 2 /β 1 and T c from the 10 3 replicates, for the τ=0.99 quantile (which is more representative for extremes than τ=0.9 or τ=0.95). For comparison, the true values of the parameters were added to the plots as blue dots. The following conclusions can be drawn: • For the case β 2 /β 1 =1.1 (Figures S1a and S1d), the uncertainty in the estimation of the change point T c is very large. A large value range for change point estimates, in turn, gives rise to a wide range of estimates of the scaling factors. In addition, it can be seen that the estimator is biased in such a case. Yet, it was pointed Figure 2. Success rate of Bayesian information criterion-based model selection, based on 1,000 simulations. For each simulation, data were drawn from the random process, equations (7) and (8)  out in Li et al. (2011) that the estimator is asymptotically unbiased. Clearly, convergence is not achieved for the present sample size (n=10 4 ). Increasing the ratio β 2 /β 1 strongly reduces the uncertainty and removes the bias to a certain extent. • The estimation uncertainty increases when the change point moves to the upper range of the distribution of T, where less data are available (Figures S1b, S1c, S1e, and S1f).

Data
Long-time quality-controlled series of hourly observed precipitation, temperature, and dew point temperature were collected for different locations in Western Europe and Scandinavia (see Table S1). These include Belgium (Uccle), The Netherlands (De Bilt), France (Paris, Lille, Toulouse, Lyon, and Marseille), Germany (Nordrhein-Westfalen and Berlin), Sweden (Stockholm surrounding area, and Northern Sweden), and Finland (Helsinki; see Kilpeläinen et al., 2008). In the statistical analysis, neighboring station data were treated as one single data set (see Table S1), as is commonly done to improve the estimation of extremes (Buishand, 1991;Davison et al., 2012;Hosking & Wallis, 1997).
As in the original approach of Lenderink and van Meijgaard (2010), we computed the daily mean dew point temperature. Data points were excluded (i) when precipitation observations are equal to, or less than, 0.1 mm; (ii) when hourly instantaneous temperatures are below 0°C (to avoid snow); (iii) for events associated with the downturn in precipitation extremes (mostly dew point temperatures above 18-20°C), as this is likely due to a lack of moisture content; and (iv) when the daily mean temperature exceeds 24°C because Hardwick-Jones et al. (2010) found a reduction in relative humidity in such a case, which may affect the scaling relationship. As in Wasko and Sharma (2014), rainfall events were separated by 5 hr of no precipitation, and we withheld the maximum precipitation depth within each event.
Note that the maximum hourly precipitation depth underestimates the peak 1-hr precipitation because the latter likely occurs across hourly measurements.

10.1029/2019EA000665
Earth and Space Science

Results
The application of the quantile regression models to the observational time series is demonstrated in Table  S2. The corresponding quantile lines, for different τ values, are plotted in Figure 3. Based on the BIC values for the 0.9-quantile (Table S2), the CC + model is better than the CC model for Uccle, De Bilt, Nordrhein-Westfalen, Berlin, Lille, Paris, Toulouse, and Marseille. For the remaining locations (e.g., Lyon, Stockholm surrounding area, Northern Sweden, and Helsinki), the lowest BIC values at the 0.9-quantile were obtained by the CC model.
In Helsinki, it appears that the optimization of the CC + model is problematic, as the estimated change points T c are included at the right end of the range of possible T c values. For Stockholm surrounding area, the same is true to a certain extent at the 0.95-quantile. Based on the BIC analysis, a change point is clearly apparent for the 0.95 and 0.99-quantiles, opposed to the case τ=0.90. This is in line with section 3.2, where it was concluded that BIC-based model selection can become unreliable for higher quantiles, especially when the underlying model is linear (see Figure 2a). For Northern Sweden, the change in the scaling rates is too low to be considered as significant, as the BIC values show. The results for Lyon are anomalous in the sense that it is the only location with sub-CC scaling for τ=0.99, but the BIC analysis might be, again, unreliable for such high τ values. Figure S2 shows the parameter estimates for Uccle, together with the 95% confidence intervals, obtained by the bootstrap method. As expected, the uncertainty in β 2 is much higher than that of β 1 . The confidence intervals of the change in the slope, β 2 /β 1 , range approximately between 2 and 4 (for τ=0.90, 0.95), and between 2 and 5 (for τ=0.99). The confidence intervals of T c are large and extend up to about 7°C. The lower bounds of the confidence intervals of β 2 /β 1 are larger than 1, indicating that the CC + model is significant, even for the 0.99-quantile. Although, we had less confidence in the BIC analysis for higher τ values, the foregoing lends additional support to the CC + model selection.
In the same way, Figure 4 shows the inference results for all the locations where the CC + model is significant, for the particular choice τ=0.99. The confidence intervals of β 1 (Figure 4a) cover the range 5-10%/°C and are thus compatible with the well-known CC rate of 7%/°C. The estimation of the super-CC scaling rate in De Bilt (around 14%/°C) agrees well with the results shown in Figure 2c of Lenderink and van Meijgaard (2010), giving extra confidence to our results. Most likely, the scaling rates seem to change by a factor of more than 2 (Figure 4c), although, due to the large estimation uncertainties, assessing potential regional differences in the scaling rates is difficult. Confidence intervals for Lille and Marseille are particularly large, but the estimation was based on less than 1,0000 data pairs. As in Figure S2c, the lower bounds of the confidence intervals of β 2 /β 1 are larger than 1.
Inference results for locations without change point (Scandinavian stations and Lyon) indicate that the confidence intervals are also large and the bootstrap distributions regularly appear to be very skewed (not shown). Furthermore, the lower bounds of the confidence intervals of β 2 /β 1 are smaller than 1.
Additionally to the BIC analysis, this provides extra support for the conservative CC model selection.
Overall, it is therefore concluded that the existence of a change point could not be demonstrated for the Scandinavian stations and for Lyon.
A BIC-based model selection was done between the CC and CC + models and a few alternative models, as described in Text S1. For the 0.9-quantile, the CC + model is the most suitable model for Uccle, De Bilt, Nordrhein-Westfalen, Toulouse, and Marseille. The piecewise linear-quadratic model was selected in Paris, Lille, and Berlin, although the difference in the BIC values of both quantile models are very small. As expected, the CC model is the most suitable for the Scandinavian stations and Lyon. The quantile regression lines of the alternative models were displayed in Figure S3 (Uccle). As the differences with the CC + model are rather small, it can be concluded that the CC + model is sufficiently good for practical purposes.
Finally, the scaling was tested for different predictors by means of the goodness-of-fit criterion, equation (6). As potential candidate predictors, the temperature and the dew point temperature were compared ( Figure 5). The predictive skill of the dew point temperature is slightly, but systematically higher than that of the temperature, which is physically plausible. Note also that, irrespective of the predictor, the predictive skill at locations with a change point is significantly higher than at locations with no change point (Scandinavian stations and Lyon).

Conclusions and Outlook
A statistical methodology was outlined to study the scaling properties of subdaily extreme precipitation as a function of (dew point) temperature. More specifically, piecewise linear quantile regression (i.e., the change point model) was used to model the transition of CC-like scaling to a regime with different scaling. Compared to the binning approach, which involves a visual check only, the biggest advantages of quantile regression are (i) asymptotic unbiasedness (Li et al., 2011); (ii) the ability to estimate the scaling rates and the change point, together with the associated uncertainties; (iii) the use of information criteria for selecting the most suitable statistical model from a set of candidate models; and (iv) the ability to select the best predictor to characterize the scaling of extreme precipitation, using goodness-of-fit measures. More specifically, the following were found: 1. Simulations with simple stochastic models showed that, for a realistic sample size of n=10 4 , the estimator is fairly unbiased and has a reasonable uncertainty, unless (i) the scaling rates differ only slightly and, to a lesser extent, (ii) the change point temperature is at the upper percentiles of its distribution. 2. Simulations with simple stochastic models showed that BIC-based inference is useful in detecting the existence of a change point. However, when there is no change point, the success rate at the 0.9-quantile is acceptable but decreases at increasing quantiles. 3. The results show a strong evidence for the change point model in Western Europe. Results at Marseille suggest also a change point, but the change in the scaling rates is smaller than for Western Europe. On the contrary, evidence lacks for the change point model in the Scandinavian stations and in Lyon. 4. Although deviations from linear scaling are evidenced at multiple locations, the associated estimations for change points and scaling rates are highly uncertain. More specifically, the factorial change in the scaling coefficients ranges between 2 and 5, while the change point estimates ranges between 5 and 15°C. 5. In view of the recent controversy regarding using air temperature/dew point temperature as proxies for extreme precipitation, an approach is presented to discriminate the best predictor. More specifically, at all observational locations, dew point temperature is slightly superior to temperature as a predictor for extreme precipitation. Moreover, locations with a change point show larger overall explanatory skill than locations without a change point.
Our analysis highlighted that statistically proving deviations from linear scaling is not trivial. Even for time series of more than 60 years, the values for the scaling rates and change points remain highly uncertain.
The study on subdaily rainfall extremes could be extended by using alternative rainfall data sets, that is, other than rain gauge measurements, but they may possibly entail other difficulties. For instance, radar data mostly feature high temporal frequencies and cover a wide area, but the available time period rarely exceeds one decade (Berg et al., 2013;Goudenhoofdt & Delobbe, 2016;Overeem et al., 2009). In addition, this requires a proper declustering method for spatial precipitation extremes.
The methodology at hand could serve to study the impact of climate change on the scaling behavior of extreme short-duration precipitation and to estimate the reliability of these results. The most reliable modeled rainfall extremes are the ones from regional climate models at convection-permitting scales; however, these are scarce (Kendon et al., 2014;Tabari et al., 2016;Termonia et al., 2018). The results derived from this work may also serve for in-depth regional studies (Schroeer & Kirchengast, 2018), or the simulation of extreme precipitation events in a warmer climate (Hazeleger et al., 2015;Manola et al., 2018).
Different extensions of this work could be envisioned. The quantile regression model could be extended to multiple predictors (Li et al., 2011;Wasko & Sharma, 2014) including, for example, time-delayed (dew point) temperatures (Bao et al., 2017), and the predictive skill could be compared as a function of the rainfall duration (Berg et al., 2013;Hardwick-Jones et al., 2010;Lenderink & van Meijgaard, 2010;Utsumi et al., 2011).