Positive correlation between wet-day frequency and intensity linked to universal precipitation drivers

Understanding precipitation is essential for quantifying weather and climate-related risks. Changes in precipitation climatology are typically based on independent analysis of precipitation frequency and intensity. Here we show that where it rains more often, it also rains harder. When grouping global precipitation reanalysis data and observations from the past 40 years into regions of similar wet-day frequency, regardless of geographical separation, there is a strong correlation with wet-day intensity distributions. These wet-day-frequency regions are also more physically coherent than regions based on geographical location. We find the coherent relationship between wet-day frequency and intensity distributions is partially explained by wet-day-frequency regions having similar vertical velocity and convective available potential energy distributions, once polar regions are excluded. These represent dynamic and thermodynamic processes that indicate how conducive wet-day-frequency regions are to large-scale and convective precipitation. This suggests that the main drivers of precipitation are universal. We also show that extreme-precipitation metrics are dependent on wet-day frequency within our framework. Our results imply that wet-day frequency could be used to derive estimates of extreme-precipitation climate indices and corresponding uncertainties, these uncertainties being related to local processes. Precipitation frequency and intensity across different geographic regions are positively correlated in reanalysis data and observations, suggesting universal precipitation-generating processes.

The spatio-temporal distribution of precipitation has critical impacts on the availability of water resources 1 , ecosystems 2 , economic growth rates 3 and is also an important contributor to major hazards 4 . Precipitation distributions and how they might change due to climate change are also critical for quantifying societal and economic risks 4 and are of particular importance in extreme events. Wet-day intensity and wet-day-frequency metrics are also commonly used independently to evaluate the simulation of precipitation within numerical weather-prediction models, reanalyses and climate models 5,6 .
The exact distribution of wet-day intensity is important because the impact of extreme-precipitation events escalate drastically with increasing intensity 7 . A small number of heavy-rainfall days contribute disproportionately to total precipitation compared with a large number of days of light precipitation 8 . For example, across Global Historical Climatology Network (GHCN) weather stations, half the annual precipitation falls during the wettest 12 days of the year 8 . The importance of the wet tail of the distribution is also highlighted in work that examined the geographic distribution of rainfall in different intensity classes and the frequency of each class 6,9 . Article https://doi.org/10.1038/s41561-023-01177-4 zone and regions of low-precipitation frequency over desert regions. The effects of orography on the wet-day frequency can also be seen over the Andes and Himalayas. As the wet-day frequency increases in Fig. 1b, the distribution of wet-day intensity distributions shift towards higher values. These changes in behaviour demonstrate that where it rains more often, it also rains more intensely. Examination of Fig. 1b also shows that the shape of the wet-day intensity-distribution changes with wet-day frequency. In particular, the form of the curves in Fig. 1b change from sub-exponential at low wet-day frequencies to super-exponential as the wet-day frequency increases. This highlights that regions with low wet-day frequencies tend to experience larger proportions of their precipitation at low-precipitation rates. Conversely, regions with higher wet-day frequency experience larger proportions of their precipitation at greater precipitation rates.
This result is robust to the choice of the wet-day threshold (Extended Data Figs. 1 and 2). Separating precipitation into land and ocean regions also shows our results are relatively invariant to surface type (Extended Data Fig. 3). We also show that aggregating precipitation measurements by wet-day frequency tends to produce similar results (in terms of wet-day percentiles and internal variance) to analysing precipitation across similarly sized geographical regions (Extended Data Fig. 4). This suggests that subsetting precipitation data by the wet-day frequency is as valid as analysis based on geographical region, even though it averages spatially disparate regions together.
It has already been established that there can be a substantial disparity between observational and reanalysis products when it comes to daily precipitation intensity [18][19][20] . To demonstrate that the relationships displayed in Fig. 1 are not an artefact of the ERA5 reanalysis, they also have been calculated using the Global Historical Climatology Network (GHCN) gauge dataset 21 , Multi-Source Weighted-Ensemble Precipitation (MSWEP) v2.8 22 , CPC MORPHing technique (CMORPH) 23 , Integrated Multi-satellite Retrievals for GPM (IMERG) 24 satellite datasets and Modern-Era Retrospective analysis for Research and Applications, verion 2 (MERRA2) 25 reanalysis. Once geographic sampling differences between the ERA5 and GHCN datasets are accounted for (as explained in Methods), the two show very similar results (Extended Data Fig. 5). The same aggregation methodology applied to the other datasets (Extended Data Fig. 6) also shows results consistent with the patterns observed in ERA5. Thus, this key relationship between how often it rains and how much it rains is not a product of how precipitation is represented in the ERA5 reanalysis but represents a previously unidentified universal relationship.

Connecting the wet-day frequency to climate indices
To quantify the connection between wet-day frequency and wet-day intensity values observable in Fig. 1b, mean wet-day precipitation intensities were determined for regions aggregated by wet-day frequency. A weighted linear least-squares regression was then calculated between the natural logarithm of the mean wet-day intensity and the wet-day frequency (Fig. 2a). Data were weighted based on the inverse square of the ratio between the standard deviation and value at each wet-day frequency. Figure 2a shows a strong positive correlation between the wet-day frequency and the logarithm of the mean wet-day intensity across the aggregated wet-day-frequency regions. To test whether this correlation was a result of averaging the precipitation intensity, the correlation between wet-day intensity and wet-day frequency was computed for each precipitation-intensity percentile (assessing the correlation horizontally across Fig. 1b). Extended Data Fig. 7 displays a range of coefficients of determination between r 2 = 0.65 and r 2 = 0.92, with the weakest correlations found in the lowest-intensity percentiles. This consistent positive correlation is an indicator that the frequency of precipitation is a strong control of precipitation intensity. These relationships are not dependent on an assumption about the form of the wet-day intensity distribution, unlike previous work that In this Article, ECMWF Reanalysis v5 (ERA5) 10 output is examined to determine the frequency of precipitation, which is used to group regions by their wet-day frequency. Wet-day intensity distributions are then derived over these regions. This analysis is repeated for satellite, gauge and other reanalysis datasets to determine the robustness of these relationships. The statistical relationship between the wet-day frequency and mean wet-day intensity is shown to have a strong positive correlation in this framework. A range of extreme-precipitation metrics are also shown to be strongly positively correlated to the wet-day frequency. Potential drivers for the observed relationships, such as large-scale and convective-precipitation processes, vertical motion and convective available potential energy are examined to determine the physical basis for these relationships.

Relationship between wet-day frequency and intensity
To understand the relationship between the frequency and intensity of precipitation, we use a framework for analysing precipitation over regions with a similar wet-day frequency. Figure 1a shows the geographic pattern of the wet-day frequency across 40 years (1980-2019) of ERA5 output. This frequency is determined using a wet-day threshold of 1 mm d −1 , which is commonly used within the community 11,12 . Previous work 13 has examined precipitation changes under changing climate averaged over wet and dry regions. We take this further by averaging intensity-distribution data from regions with the same wet-day frequency together. This is in contrast to many studies 14-17 that analyse precipitation over geographical regions. Distributions of the wet-day intensity derived over consistent wet-day-frequency regions are shown in Fig. 1b and show the likelihood of particular rainfall intensities occurring over a range of wet-day frequencies. Figure 1a shows physically interpretable structures, such as regions of high wet-day frequency over the inter-tropical convergence needed to assume an exponential distribution 26,27 . These strong relationships provided motivation to complete similar statistical tests on extreme-precipitation indices commonly used in climate studies 28 .
We examine two extreme-precipitation indices based on the number of days per year above a precipitation threshold. Here r10 and r20 are defined for accumulated precipitation greater than or equal to 10 mm d −1 and 20 mm d −1 , respectively. Two metrics that quantify the annual total precipitation above the 95th and 99th percentiles, r95 and r99, are also examined. Finally, the prcptot metric defines the annual total precipitation on wet days (precipitation greater than or equal to 1 mm d −1 ). Figure 2b-f shows strong positive correlations between the precipitation-extreme climate indices and wet-day frequency in each case. Looking at the standard deviation around the data, Fig. 2a-d displays small variability, except at low wet-day frequencies. This is not the case for the r95 and r99 metrics, which show notably higher variability within each wet-day frequency. This suggests that in our framework, many extreme-precipitation climate indices are heavily dependent on the wet-day frequency, but large variability in r95 and r99 reduces the strength of our result for these metrics.
One might argue that this strong correlation is due to a sampling bias introduced when averaging to 100-wet-day-frequency regions. However, corresponding analysis, which completes spatial averaging over 100 geographically coherent regions (Extended Data Fig. 8), shows a much-poorer correlation than those shown in Fig. 2b-f. There is therefore an advantage to working in the aggregated wet-day-frequency framework to identify these relationships.

Identifying physical drivers
Our analysis shows that the strong relationships between wet-day occurrence and other precipitation metrics is only notable when geographically disparate data are clustered based on wet-day frequency.
A myriad of processes impact precipitation including large-scale atmospheric dynamics, meso-scale convective and storm dynamics and local precipitation microphysics 29 . To provide physical justification for why precipitation changes across these regions coherently, the different processes that control precipitation were investigated.
Previous studies 7,30-32 have shown changes in vertical velocity are an important influence on precipitation with larger upward velocities relating to larger precipitation intensities. Given this, we derived distributions of ERA5 vertical velocities (in pressure coordinates, negative values identifying ascent) grouped based on wet-day frequency. The literature also identifies both dynamical and thermodynamical drivers can impact the wet-day intensity 29,31 . Therefore, we also analysed the convective available potential energy (CAPE) metric 33,34 to examine how the vertical thermodynamic structure of the atmosphere might impact the observed relationship. Figure 3a shows distributions of ERA5 vertical velocity at the 850 hPa level, and Fig. 3b shows ERA5 CAPE grouped based on the wet-day-frequency values. To connect back to precipitation processes, the amount of precipitation associated with both large-scale and convective precipitation in the ERA5 dataset was also determined for wet-day frequency. As previous work has identified errors due to detection of snowfall over high latitudes [35][36][37] , areas polewards of 60° latitude have been removed from this analysis. For latitude-restricted wet-day intensity distributions, Fig. 3c shows the amount of large-scale precipitation as a fraction of the total precipitation.
Vertical velocity distributions in Fig. 3a show that regions with larger wet-day frequencies and therefore higher precipitation-intensity rates relate to regions with higher ascent rates and have distributions skewed towards ascent. A correlation between the vertical velocity and wet-day frequency (Extended Data Fig. 9) displays a coefficient of determination of r 2 = 0.27, meaning that changes in vertical velocity potentially account for 27% of the variability in the wet-day frequency. shows that regions with the highest frequency and therefore intensity also occur in regions that have the greatest CAPE. A similar spatial analysis (Extended Data Fig. 9) displays a coefficient of determination of r 2 = 0.23. These distributions of higher ascent rates and higher values of CAPE for regions with higher wet-day frequency can both be interpreted as reflecting higher occurrence of environments conducive to precipitation. Furthermore, Fig. 3c shows a shift in the fraction of large-scale precipitation as the wet-day frequency increases. Excluding polar regions, convective precipitation dominates at places with low wet-day frequency while large-scale precipitation dominates elsewhere. Figure 3 indicates that most extreme-precipitation events are dominated by large-scale processes, such as cyclones, fronts and the slow ascent of air in synoptic systems. This is coherent, because while convective-precipitation events are related to high rainfall rates, they generally have a short duration. Conversely, large-scale precipitation events have lower rainfall rates occurring over a larger portion of the day 38,39 . Thus, when precipitation intensities are averaged over a day, large-scale precipitation will produce higher precipitation intensities. Our result that the distributions of vertical velocity and CAPE change considerably as a function of wet-day frequency therefore identifies that both these metrics are important drivers of large-scale precipitation-generating processes. This provides a physical explanation of the relationship between wet-day frequency and the wet-day intensity distribution observed in Fig. 1b.

Discussion and outlook
Our work focuses on analysing precipitation over regions of similar wet-day frequency, effectively aggregating wet and dry regions in a consistent way globally. The central result of this study is that there is a strong relationship between the frequency of wet days and the intensity distribution of precipitation on those days across disparate regions of the globe. We have shown wet-day-frequency regions have similar variability to geographical regions (Extended Data Fig. 4) and correspond to places with distributions of vertical velocity and CAPE that are conducive to precipitation. This suggests that using wet-day frequency to analyse precipitation is more physically coherent than using geographical regions, as our framework displays a connection between precipitation and its dynamic and thermodynamic drivers. This previously unquantified relationship provides a framework for understanding precipitation and how it might change in a warming world, which we believe could have important ramifications for model evaluation and fundamental understanding of the relative importance of drivers of precipitation in different regions. Our results imply that fundamentally, the processes that drive precipitation are universal.
Our framework provides an aggregated overview of the properties of precipitation and important physical drivers. When assessing local distributions of precipitation, other factors and processes would cause uncertainties in any estimates. Extended Data Fig. 9 shows that CAPE and vertical velocity account for approximately 50% of the variance in the wet-day frequency. However, we would expect local factors, such as orographic forcing to dominate over other regions and cause deviations from the distributions of wet-day intensity seen on Fig. 1b. This provides an interesting possibility to look at anomalies from the norm to identify regionalized drivers. We also note that the temporal resolution of the available datasets used will potentially impact the results of aggregation, given that different precipitation types (convective and stratiform) scale differently 38,40 .
While some studies identify the importance of the combination of wet-day frequency and wet-day intensity, they have not explored fully the relationship between these factors. An inspection of previous literature does show that this relationship is previously implied 8,17,19,27,41 . For example, one study 27 used principal component analysis to derive the most important relations between observed precipitation and the two precipitation metrics from thousands of rain gauges. However, the intensity-frequency relationship would have been very difficult to identify given the limited geographic coverage of the gauge network globally. Another study 41 looking at the climatological characteristics of precipitation identifies a link between the geographical pattern of the most common precipitation intensity and the geographical pattern of the precipitation-frequency peak. While those authors show the existence of a frequency-intensity link, they do not explore the link between frequency and the rainfall-intensity distribution detailed in this study or examine the implications of their findings.
One common set of precipitation metrics used to compare across gauge-based, satellite, reanalyses and climate-model outputs are extreme climate indices. Their standardization by the Expert Team on Climate Change Detection and Indices (ETCCDI) 42 allows for comparison across various timescales and resolutions, even for the gauge-based datasets that suffer from sampling biases due to sparse coverage and short observational records. Our results show that when averaging using wet-day frequency, there is a strong positive correlation between the wet-day frequency and many of the precipitation-related extreme climate indices. While variance within these mean statistics exists, especially for the r95 and r99 indices, the appearance of such a strong correlation when framing the analysis in a different way highlights that many of these climate metrics are not independent. However, most studies of precipitation-related climate indices [43][44][45] assess them as independent metrics and ultimately could be impacted by our result.
Effectively, results in this study suggest that getting the wet-day frequency correct can provide understanding of some of the more complex climate indices. This is useful in the case of extreme-precipitation-related indices (for example, r95, r99) as long-term datasets are required to get an accurate representation of the tail of the precipitation distribution. In addition, the relationships between wet-day frequency and intensity could be used to help further identify the relative importance of physical drivers of precipitation and their impact on the precipitation-intensity distribution. Research efforts have created global analyses fitting generalized extreme-value parameters to precipitation-intensity distributions geographically 46 but have not offered opportunities for any physical insight given their statistical nature. Past research suggests that many precipitation products should be used in studies, as none provide a single best estimate of precipitation 47,48 . While six different datasets were used in this study to verify our findings, a limitation of this study remains a lack of intercomparison. Forthcoming work will focus on evaluating a larger number of commonly used precipitation datasets, reanalyses and Coupled Model Intercomparison Project Phase 6 (CMIP6) climate-model outputs to more completely assess how the precipitation frequency-intensity relation, climate indices and the physical processes that drive them are represented across these products.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41561-023-01177-4.

ERA5
ECMWF Reanalysis v5 (ERA5) is a climate reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), combining observational measurements with model output to provide a consistent gridded product for recent climate 10 . ERA5 total, convective and large-scale precipitation components, pressure vertical velocity and convective available potential energy output between 1980 and 2019 are used in this study. ERA5 total precipitation consists of both rain and snow at Earth's surface and is the sum of the ERA5 large-scale and convective-precipitation fields, which are also used in this study. This variable does not include precipitation that evaporates in the atmosphere before it lands at Earth's surface or fog and dew 10 . All the ERA5 outputs used in this study are available on a 0.25° × 0.25° spatial grid and at an hourly temporal resolution, which was sampled at a three hourly interval.
In the underlying ERA5 model, large-scale precipitation is produced from the large-scale (stratiform) cloud microphysical processes and denotes both rain and snow, while convective precipitation is produced by the convective parameterization scheme and relates to a single-grid box column when convection is diagnosed. Convective available potential energy is an metric of the stability of the atmosphere and can be used to assess the potential for the development of convection. Using pressure as the vertical coordinate, negative vertical velocities represent regions of ascent and positive vertical velocities represent descending air. Regions of ascent are important in the formation of precipitation 30 .

GHCN station data
In addition to the ERA5 reanalysis dataset, gauge data included in the Global Historical Climatology Network 21 (GHCN) were also analysed to investigate the robustness of ERA5 results. Gauge data must be quality controlled to ensure data artefacts are removed. In this study, only gauge stations that operated from 1980 to 2019 were considered. We also used the GHCN quality flag system to remove potentially low-quality data and we remove stations above 60° N or below 60° S to avoid regions with a risk of frequent snow in the gauges. These stations were then aggregated into a global grid with the same resolution as the ERA5 data to reduce sampling biases due to the uneven distribution of stations. There will, however, still be strong sampling biases in the gauge data due to the regions with large areas of no station coverage, such as Asia and Africa, and the complete absence of data over the oceans. Because of these biases, when this data are compared to ERA5 (Extended Data Fig. 5), only ERA5 grid cells that have a corresponding GHCN measurement are examined.

MSWEP
We also used the Multi-Source Weighted-Ensemble Precipitation (MSWEP) v2.8 dataset 22 ; this uses a combination of satellite, gauge and reanalysis datasets between 1980 and 2019. The MSWEP v2.8 combines IMERG data 24 with other datasets (including ERA5) to produce a consistent dataset over both the ocean and land at daily resolution on a 0.1° × 0.1° grid. Given the presence of gauge and satellite data in MSWEP v2.8 and coverage over the entire 40-year period used for ERA5, we believe MSWEP is a valuable product for assessing the robustness of our conclusion between datasets.

CMORPH
We also used the CPC Morphing technique (CMORPH) high-resolution global satellite precipitation dataset 23 . Satellite data are merged into a precipitation product that is then bias corrected using CPC daily gauge data over land and Global Precipitation Climatology Project (GPCP) gauge data over the ocean. We use data between 2001 and 2019 at daily resolution on a 0.25° × 0.25° grid. Note that CMOPRH data have only spatial coverage between 60° N and 60° S.

IMERG
We also used the Integrated Multi-satellitE Retrievals for GPM (IMERG) V05 precipitation dataset 24 . IMERG combines information from multiple satellites present in the core GPM constellation. The IMERG dataset aims at deriving precipitation by intercalibrating and merging 'all' the available microwave sensors, along with microwave-calibrated infrared satellite estimates and precipitation gauge analyses. V05 of the IMERG product has a 0.1° × 0.1° spatial resolution, and daily data were used between 2001 and 2019. Due to an observed discontinuity in the wet-day frequency at 60° N and 60° S, polar regions were excluded.

MERRA2
Finally, we use the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) dataset 25 . MERRA2 is a reanalysis dataset that combines observations data with climate-model output to provide a consistent gridded product. We use data between 1980 and 2012 at a daily resolution on a 0.625° × 0.5° spatial grid.

Wet-day frequency
The geographical distribution of wet-day frequency was derived for each grid cell of the observational products irrespective of their resolution. A period was marked as precipitating if the accumulated precipitation was greater than a 1 mm-per-day threshold and non-precipitating otherwise; this threshold is commonly used within the community 11,12 and is also used in a number of extreme-precipitation indices examined 28 . Products with sub-daily resolution were identified as precipitating if their accumulation over the period since the last measurement corresponded to an equivalent accumulation of 1 mm over a full day.

Wet-day intensity
Precipitation data were then aggregated into regions where the wet-day frequency was the same. For these regions, precipitation was grouped together to derive cumulative precipitation-intensity distributions. As for the wet-day frequency, a 1 mm d −1 equivalent threshold was applied such that only wet days were considered. This process averages spatially disparate regions together, but this averaging produces a range of distributions that are similar to those identified when calculating regional or zonal means (Extended Data Fig. 4), which is common in the literature [14][15][16][17] . We also demonstrate later that key properties that help define the likelihood and intensity of precipitation, such as vertical velocity [30][31][32] and convective available potential energy 33,34 are coherent across these regions.

Climate indices
Commonly used extreme-precipitation indices based on the recommendations of the Expert Team on Climate Change Detection and Indices (ETCCDI) are also examined in this study 28,42 . Five indices (prcptot, r10, r20, r95 and r99) were chosen, with a particular focus on indices that represent the extreme precipitation. r10 and r20 are defined as the annual count of days where precipitation is greater than or equal to 10 mm d −1 and 20 mm d −1 , respectively. prcptot is defined as the annual total precipitation on wet days where precipitation is greater than or equal to 1 mm d −1 . r95 and r99 are defined as the annual total precipitation that occurs above the 95th and 99th percentiles of the precipitation distribution. Climate indices were determined globally across the 1980-2019 period for each ERA5 grid cell and then averaged when the wet-day-frequency clustering was applied.

Data availability
The ERA5 reanalysis products were obtained from the Copernicus Climate Data Store (https://cds.climate.copernicus.eu/). GHCN station data are available from National Oceanic and Atmospheric Administration National Centers for Environmental Information website (https:// www.ncei.noaa.gov/data/global-historical-climatology-network-daily). MSWEP data are available through Google Drive with the access process