Testing a simple formula for calculating approximate intensity-duration-frequency curves

A simple formula for estimating approximate values of return levels for sub-daily rainfall is presented and tested. It was derived from a combination of simple mathematical principles, approximations and fitted to 10 year return levels taken from intensity-duration-frequency (IDF) curves representing 14 sites in Oslo. The formula was subsequently evaluated against IDF curves from independent sites elsewhere in Norway. Since it only needs 24 h rain gauge data as input, it can provide approximate estimates for the IDF curves used to describe sub-daily rainfall return levels. In this respect, it can be considered as means of downscaling with respect to timescale, given an approximate power-law dependency between temporal scales. One clear benefit with this framework is that observational data is far more abundant for 24 h rain gauge records than for sub-daily measurements. Furthermore, it does not assume stationarity, and is well-suited for projecting IDF curves for a future climate.


Introduction: background
Sloping landscape sogged with heavy rainfall may trigger land slides whereas buildings and infrastructure need to be designed to withstand heavy rainfall that result in flash floods [1]. For this reason, it is important to take into account the expected return level for rainfall over a range of timescales, from minutes to days. One way to quantify extreme rainfall amounts is through the probable maximum precipitation (PMP) which is defined as the theoretically greatest depth of precipitation for a given duration that is physically possible over a given size storm area at a particular geographical location at a certain time of year [2]. The PMP estimates have typically been used to help getting the right dimensions for the construction of water reservoirs, and traditional efforts to estimate PMP have involved statistical and meteorological approaches, both being associated with substantial uncertainties. Another common way to incorporate this type of information into the design, is to make use of a set of so-called intensity-duration-frequency (IDF) curves which provide the intensity as a function of both sub-daily duration as well as frequency [3][4][5]. Each IDF curve represents the intensity as a function of duration for one return period, and the generation of such IDF curves may involve a number of different methods. Typically, the IDF curves are fitted for each site and need long records with highquality sub-daily data (e.g. annual maximum values for duration of minutes to days). It is problematic to obtain IDF curves for sites were there is inadequate sub-daily data coverage, and even for sites with reasonable data coverage the estimated return values for the longest return periods are sensitive to the choice of method [1]. The paucity of data, methodological uncertainties, and the restriction of fitting to specific rain gauge records have to some extent been overcome for Norway with the development of spatially coherent maps of return levels for 10 min to daily precipitation. The estimation of rainfall statistics presented by the said maps utilised dependencies of the parameters of generalised extreme value (GEV) distributions on location-specific geographic and meteorological information [6]. This approach enabled the capture of additional unexplained spatial heterogeneity and tackled the sparse grid on which observations were collected.
Most of the work on extreme precipitation has relied upon extreme value theory or other methods that implicitly assume that the rainfall statistics are stationary [4,7]. However, climate change implies that the statistics are non-stationary, and there are historical trends in the probability of heavy rainfall [8][9][10][11]. The non-stationarity can be accommodated for in the GEV analysis, for instance by allowing its parameter to change with time [12][13][14]. Such efforts nevertheless are associated with substantial uncertainties, partly because they use block maximum (annual maximum) rather than the bulk of the data for parameter estimation. There are also studies that have followed an entirely different approach to extreme daily precipitation than the examples based on PMP or GEV, such as using the exponential distribution for describing the 95-percentile of wet-day precipitation amounts and exploring its dependency to geographical factors [15]. Despite shortcomings when it comes to using the exponential distribution to represent 24 h rainfall statistics, recent work has suggested that it nevertheless provides a useful frame of reference for more moderate rainfall amounts [16][17][18]. For example, an expression that provides approximate estimates for return levels for 24 h precipitation x τ is where µ is the mean value for the wet days (henceforth referred to as 'the wet-day mean precipitation'), f w is the wet-day frequency, τ is the return period, and α τ is a correction factor accounting for the fact that the data is not exponentially distributed [8]. This approach makes use of a larger part of the data sample than smaller subsets representing block maxima (e.g. annual maximum rainfall). Equation (1) was derived from the expression for the probability of heavy 24 h rainfall Pr(X > x) = f w e −x/µ that gives an approximate representation of moderate extreme precipitation amounts, and the parameters f w and µ in this expression can capture aspects of a changing climate. Hence, such a framework can incorporate the non-stationary nature of climate change.
A comparison between the exponential distribution and observations has indicated that an empirical correction factor α τ is needed to provide a good match for the return levels and to correct for the mismatch between the upper tail of the exponential distribution and the (unknown) empirical distribution. The value of α τ depends on the return period τ and varies linearly with the logarithm of the return period τ according to α τ = 1.256 + 0.064 ln(τ ) [8], compensating for the fact that the empirical distribution has a 'thicker upper tail' for the 24 h rainfall distribution than the exponential distribution [19]. The log-relationship with respect to return period was derived from the study of a large number of locations, and by applying a principal component analysis (PCA) to a bi-variate representation of the quantiles of the empirical and exponential distributions, it has been shown that the deviation of the data from the exponential distribution is similar for 13 000 locations across the USA and Europe [16]. A framework based on these PCAs makes it possible to predict the 95-percentile for the wet-day 24 h rainfall on the basis of the wet-day mean precipitation µ, the wetday frequency f w , the elevation z, and the distance to the coast d [17]. Hence, the exponential distribution provides a useful reference for the analysis, even when it gives an imperfect and inadequate representation of the 24 h wet-day precipitation amount.
We should also expect that α τ for rainfall measured over sub-daily intervals scales with the timescale L. Henceforth we refer to the duration of the measurement as timescale. The 24 h precipitation can be considered as being the sum over 24 h of hourly measurements, which also implies that the different timescales have different probability distribution functions (pdfs). This dependency to L can be demonstrated through the central limit theorem which states that the mean of samples of data with any type of pdf will converge towards a normal distribution as the sample size becomes infinite [20]. We also expect that the mean wet-spell intensity and wet-spell frequency are functions of timescale L when the duration of the rain events varies between minutes and hours. Henceforth we use the notations α(L), µ(L) and f w (L) when referring to the dependency to sub-daily timescales as opposed to 24 h precipitation presented in equation (1), and distinguish between sub-daily and daily rain measurements by using the term 'wet-spell' as opposed to 'wet-day' for rainfall accumulated over 24 h. We also refer to the correction factor α τ merely as α for simplicity since we mainly look at the 10 year return levels in this study.

Methods and data
In this study, a wet-day was defined as 24 h with recorded rain exceeding a threshold x 0 . We repeated the analysis for different thresholds with an equivalent of 0.1, 0.5 and 1.0 mm for a 24 h interval and the thresholds were scaled linearly for sub-daily timescales, however, that did not affect our results substantially (see the supporting material). Hourly rainfall amounts over the period 1968-2019 from Oslo-Blindern were summed over 1, 2, 3, 6, 12 and 24 h non-overlapping intervals to estimate µ(L) and f w (L). Many of the winter months had no data in the period between 1968 and 2012 because the plumatic sensor did not capture precipitation falling as snow (after 2012 the record also included the winter months, however, the winter months are not included in the estimation of the IDF curves). The IDF curves from Lutz et al [1] used the maximum rain intensity estimated with a sliding time window from years with more than 80% valid data in the period April-September.
We used a generalised version of equation (1) to represent sub-daily scale 10 year return levels by rewriting it as follows: This expression was a starting point for a set of intermediate analyses where we examined the dependencies across timescales for µ(L), f w (L), α(L) and x L , as described in more detail in the appendix. The results from these intermediate steps were used as guidance for seeking an approximate representation of the scaling dependencies for x L . In other words, we explored whether the characteristics of these curves could be predicted through a simple and approximate method to provide a 'rule of thumb' estimate for return levels of sub-daily rainfall. Based on the data and a simplification outlined in the appendix, where we treated the non-dominant terms as roughly constant, we found the following expression: The choice of L/24 in this case was deliberate because 1 ζ = 1 and equation (2) is identical to equation (1) for L = 24. Equation (2) enabled us to express how the return level depends on timescales as well as the wet-day mean precipitation µ, the wet-day frequency f w , and the correction factor derived for 24 h precipitation [8]. The formula was calibrated against IDF curves from Oslo and compared with IDF curves for independent sites. The idea was that if it successfully represented the 10 year return levels from the IDF curves, we should see a linear dependency between ln(x L ) taken from the IDF results [1] and ln(L/24) if the factors α, µ, and the log-term were approximately constants. In this case we used an ordinary linear regression (OLR) to estimate ζ from the slope of the best fit. The slope was estimated for the same 14 sites in the Oslo area as in Lutz et al [1], and we took the mean of these to represent ζ in equation (2). It was then used together with the wet-day mean precipitation µ and the wet-day frequency f w estimated from daily rain gauge data to estimate the IDF curve for the 10 year return period for 9 arbitrary sites. The analysis was carried out in the Renvironment [21] and made use of the open source R-package 'esd' [22] freely available from https://github.com/metno/esd. The information provided in this paper and its appendix is sufficient for following this analysis, however, more details about the analysis are also available in the shape of an R marked-down document and a PDF-file with its output in the supporting material (available online at stacks.iop.org/ERL/16/044009/mmedia). This supporting material is provided in the spirit of transparency and to enable replication of our results. It also provides a 'lab notebook' record for this analysis.

Results
To better understand the dependency between the different temporal scales, we analysed hourly rainfall amounts from Oslo-Blindern in more detail. We found that 85% of all events between 2012 and 2019 lasted shorter than 6 h, implying that most of the 24 h accumulated amounts consisted of short-term events padded out with dry intervals (supporting material). The statistics of the duration of the sub-daily wet spells is therefore expected to have an effect on how the wet-spell frequency f w (L) varies with timescale L. An assessment of the sensitivity of the different factors in equation (2) to different timescale L suggested that the log-term with the wet-spell frequency (ln [f w τ ]) varied by a factor of 1.3 as opposed to a factor of 3.8 for (L/24) ζ for Oslo-Blindern. The fact that (L/24) ζ was dominating over the log-term was consistent with a near-linear relationship between ln[x L ] from the IDF curves representing the 10 year return period for 14 sites in Oslo and ln[L/24] (figure 1). These results indicated that equation (2) may be used as a crude and approximate estimate of the IDF curve representing 10 year return levels in the Oslo region.
The results from testing equation (2) against independent data are shown in figure 2. In these tests, we used the wet-day mean µ and the wet-day frequency f w from the local observations together with the estimate for ζ found for the Oslo sites to estimate the IDF curve for the 10 year return levels for eight independent sites which were considered to have a good quality. The 'true' return levels were taken from IDF curves that were computed using the same method as in Lutz et al [1]. The results suggest that the simple expression approximately reproduced the 10 year return levels for all of these sites.
A more stringent test is to use equation (2) to estimate return levels for longer return periods than 10 years. Our intermediate analysis indicated that the value of ζ varied slightly, but nevertheless systematically, with the return period τ within the range 0.4252-0.4135 (supporting material). By taking this dependency into account, we estimated the return levels for return periods from 2 to 200 years for one independent site. In figure 3 the obtained return levels are compared to IDF curves that were estimated using the method in Lutz et al [1] which is a traditional way to compute IDF values. This second test indicated greatest discrepancies for the longer return periods, not surprisingly, which also were associated with larger values of α as the more moderate events were closer to being exponentially distributed. In this case, equation (2) overestimated the return levels for the timescales 2-12 h. The shape and the curvature of the curves are given by equation (2), which means that it would be unable to reproduce kinks or sharp bends, as seen for the longer return periods in figure 3. The shape of these curves is ultimately defined by the way the mean wet-spell intensity µ(L) and frequency f w (L) vary with timescale.

Discussion
As long as the return level estimates for the 24 h precipitation are representative starting points for equation (2), it will predict sub-daily return levels that diminish according to a power-law. One interesting question then is whether there are predictable relationships between the different timescales and the amount of precipitation. It would probably not be a universal dependency, such as Kolmogorov's power-law for turbulence [23], however, we can expect the presence of different meteorological phenomena and physical conditions to have an effect on the dependency of the rain statistics to timescale. We also note that there are some studies suggesting that the tail of hourly and daily precipitation distribution may have a power-law character [19,24]. The slope from the OLR, ζ, is expected to vary geographically with differences in the local climates and the presence of convective, frontal, and orographic processes. In this case, it may be a coincidence that the values fitted for the Oslo region gave good results for Trondheim. It would be interesting to explore predictable properties that define both ζ as well as the corresponding power-law scaling in the rainfall intensity β in µ(L) = µ(L/24) β+1 as well as γ in f w (L) = f w + γ ln(L/24) (see the appendix). If such properties exist, they can be of great value because it may then be possible to estimate an IDF-curve for different locations from the more abundant 24 h rain gauge records. It is beyond the scope of this study to search for such  connections, and we leave it to future work to study how the timescale dependency varies with physical conditions. Often IDF curves are estimated using the annual maximum rainfall intensity of a sliding time window to provide input to extreme value analysis. In this case, we used the annual wet-spell mean precipitation µ(L) and frequency f w (L). These parameters are not expected to be sensitive to a sliding window because they were estimated from the whole time series divided into sequential nonoverlapping segments. A shift in the starting time of one would introduce a similar shift in the rest, and a rain event may start minutes after the starting point in one segment and before the starting point of another. Nevertheless, the main purpose of the analysis of µ(L) and f w (L) was to see if the approximation leading to equation (2) was justified. Equation (2) was then calibrated against IDF curves that had been estimated with a sliding time window to find the annual maximum rain intensity irrespective of whether it started on the hour or not.
We have shown that equation (1) from Benestad et al [8] on 24 h precipitation can be extended to a set of return periods on sub-daily timescales. This kind of extrapolation has also been an implicit assumption in earlier work concerning PMP for 24 h precipitation [2]. The idea of scaling dependency is not new, since Koutsoyiannis et al [3] proposed a scheme in 1998 for estimating IDF curves by considering different types of distributions. However, they proposed more complicated expressions with coefficients that did not include the wet-day mean precipitation µ and wet-day frequency f w , and hence had no connection to these physical aspects. Menabde et al [25] also observed linear log-log relations between intensity and duration as shown in figure 1, and argued that the IDF curves have a simple scaling property in the duration of 0.5-24 h. They presented a simple analytical formula which embodied the scaling properties that enabled the estimation of IDF curves from 24 h rain gauge data: where typical values for the parameters were µ * = 14.3, ρ * = 7.6, η = 0.65 and T was the return period in years. Their parameters do not have clear physical units, vary from one region to the next, and are different to those used in this study. Hence, the formula presented here is different from previous expressions, and equation (2) is more compatible with fractal behaviour in terms of scaling [26].
The prediction of the shape of IDF curves from daily rain gauge data can be regarded as a way of 'downscaling' information to sub-daily timescales on par with traditional downscaling based on different spatial scales. This form for downscaling is different to 'disaggregating' large-scale conditions, as the framework presented here can predict statistics on smaller temporal scales rather than just providing a plausible and consistent smaller scale realisation. This way, the framework presented here may be useful for regions such as Africa, Asia and Latin America with little sub-daily rain gauge data. Even though it is imperfect, it can provide useful first estimates for IDF curves that are not too far off the truth. Also, this framework can be used in connection with empiricalstatistical downscaling for future local climate conditions if µ and f w can be downscaled for the local climate. In other words, this framework can involve downscaling in both space and time. Also, these results underscore the message that µ and f w are two essential climate variables that should be included in the set of climate indicators. Finally, the framework presented here does not assume stationary statistics as the return levels are estimated analytically from equation (2) that accounts for both changes in the number of rainy days as well as the mean precipitation intensity. It is nevertheless possible that a change in climate also affects the power-law scaling relation ζ, for instance if there is an increase in the convective events and a decrease in the stratiform precipitation [9]. However, it may be possible to explore the dependency of ζ on different physical conditions by studying how it varies between different regions and different local climate types, and equation (2) can be used to quantify how changes in µ, f w and ζ affect the return levels x L .

Conclusions
We utilised the dependencies across timescales L in the wet-spell intensity and wet-spell frequency to estimate sub-daily rainfall return levels. This scaledependency makes it possible to downscale these parameters with respect to timescales. Also, we used these properties to derive an expression that approximately represents return levels and showed that it is able to predict 10 year return levels for independent sites with a reasonable accuracy.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.

A.1. Timescale dependency of wet-spell mean and frequency
In order to search for a method to estimate sub-daily return levels, we used a general version of equation (1) as a starting point: where x L,τ is the return level for timescale L (the duration of the measurements in hours) and return period τ , α(L) is an empirical correction factor that accounts for the upper tail of the statistical distribution for precipitation not being exponential, f w (L) is the wet-spell frequency (how often time intervals L have recorded rainfall), and the wet-spell mean precipitation µ(L) (the mean precipitation intensity for the timescale L). In this case, we examined timescales L between one and 24 h and a return period τ of 10 years (the reference to τ was dropped in the main text). The objective of the intermediate analysis was to use rain gauge data to explore ways to simplify equation (A.1) for providing approximate estimates of sub-daily rainfall return levels.
We used sub-daily rain gauge data from Oslo and started by examining how α(L), µ(L) and f w (L) depended on each timescale L. Since the measurements were restricted to rain (snow was problematic), we analysed only the 'warm' months (May-November) and calculated µ(L) and f w (L) for each year. Our first estimate for x L relied on a constant correction factor where α(L) was taken to be that of the 24 h rainfall [8]. Figure A1 shows the results of this intermediate analysis for Oslo-Blindern. The estimates from equation (A.1) indicated a more pronounced increase in the amounts with the timescale L (slope of green curve compared to red and blue in figure A1) than the IDF curves from the Norwegian climate Change Services and [1]. The figure also shows the return level for 24 h precipitation based on equation (1) (red circle). The difference between the estimated results for 24 h precipitation based on equations (1) and (A.1) in figure A1 was due to different choices in estimating µ and f w (including different calendar months and time periods).
Next, we examined the respective dependency of α(L), µ(L) and f w (L) on timescale L separately. Neither µ(L) nor f w (L) depend on τ and the discussion concerning them will ignore the reference to the return period. Our goal was to find a way to replace them with a function of timescale L that could be substituted into equation (A.1) in order to derive a function that enabled us to estimate the 10 year return level x L as a function where only µ, f w , and L need to be specified for a given site. Whereas the parameters f w (L) and µ(L) represent physical characteristics of the rainfall, the correction factor α corrects for the statistical bias due to the different shapes of the upper tail of the exponential distribution and the real statistical distribution of the data.

A.1.1. Investigation of how the correction factor α depends on the timescale
We studied how the correction factor α(L) depended on timescale L through comparisons between the exponential and observed distributions for different timescales. The results indicated that the statistical distribution of the total amount recorded for each respective event could be approximated as an exponential distribution for amounts below 20 mm with a slow divergence from the exponential distribution for higher values (see the supporting material). The observed distribution had a fatter upper tail in a similar fashion as the 24 h measurements [16]. Precipitation amounts measured over shorter intervals, however, have slightly different distributions and equation (A.1) needs to accommodate for this with a correction factor that varies with timescale L. We then analysed the dependency of α(L) to timescale by estimating the ratio of α(L) for sub-daily rainfall to α estimated for 24 h precipitation. This ratio was estimated by dividing the 10 year return levels from the IDF curve (taken from [1]) by the initial estimate from equation (A.2) that assumed α(L) = α. A scatter plot suggested that there was near linear relationship between ln[α(L)] and ln(L/24), so we used a regression analysis to find the slope between the two. The best fit was ln[α] = −0.187 ± 0.02 − (0.34 ± 0.01) ln[L/24] (R 2 = 0.99, p-value = 8.5 × 10 −6 ).

A.1.2. Wet-spell mean precipitation
We compared the distribution of the rainfall amount for different timescales to see if we could represent the wet-spell mean precipitation by a simple mathematical expression. We should expect to see an effect of timescale on µ(L) because the longer measurement times have a similar effect as having larger sample sizes (the wet-spell mean precipitation depends on timescale L in accordance to µ(L) = 1 nw ∑ n i [´L 0 x(t)dt] i as the rainfall is accumulated over longer intervals). We subsequently used a regression analysis to explore the dependency between the aggregated estimates of µ(L) to timescale L and looked for linear relationships such as ln(µ/L) ∝ ln(L/24) for the wet-spell intensity. Such a linear relationship would imply that Because α(L) = α(L/24) a has the same power-law shape as µ(L), we were able to combine their dependencies and take ζ = β + 1 + a.

A.1.3. Wet-spell frequency
To get a better understanding of how the wet-spell frequency depends on the timescale, we estimated the duration and total amount for each rain event. A histogram was used to analyse the distribution of the wet-spell duration and this statistical distribution was expected to define how f w (L) depends on the timescale (see the supporting material). For the wet-spell frequency, we explored a logarithmic tendency with timescale f w (L) = f 0 + γ ln[L/24], being inspired by the relationship found between α and return period τ [8]. We used a regression analysis to find the best-fit scaling function for the wet-spell frequency:  However, it would be more practical to use an expression with the form of equation (2) directly and calibrate ζ against IDF curves. We can justify using equation (2) to get approximate results if the factor involving ln[f w (L)τ ] varies slowly with L compared to the other factors. Furthermore, this simplification would give us only one parameter to fit, namely ζ, which is the slope of approximately linear curves obtained when plotting ln[x L ] from the IDF curves against ln[L/24] as done in figure 1. The results were not strongly sensitive to the choice of threshold x 0 ∈[0.1, 0.5, 1] (see the supporting material).