Intercomparison of annual precipitation indices and extremes over global land areas from in situ, space-based and reanalysis products

A range of in situ, satellite and reanalysis products on a common daily 1° × 1° latitude/longitude grid were extracted from the Frequent Rainfall Observations on Grids database to help facilitate intercomparison and analysis of precipitation extremes on a global scale. 22 products met the criteria for this analysis, namely that daily data were available over global land areas from 50°S to 50°N since at least 2001. From these daily gridded data, 10 annual indices that represent aspects of extreme precipitation frequency, duration and intensity were calculated. Results were analysed for individual products and also for four cluster types: (i) in situ, (ii) corrected satellite, (iii) uncorrected satellite and (iv) reanalyses. Climatologies based on a common 13-year period (2001–2013) showed substantial differences between some products. Timeseries (which ranged from 13 years to 67 years) also highlighted some substantial differences between products. A coefficient of variation showed that the in situ products were most similar to each other while reanalysis products had the largest variations. Reanalyses however agreed better with in situ observations over extra-tropical land areas compared to the satellite clusters, although reanalysis products tended to fall into ‘wet’ and ‘dry’ camps overall. Some indices were more robust than others across products with daily precipitation intensity showing the least variation between products and days above 20 mm showing the largest variation. In general, the results of this study show that global space-based precipitation products show the potential for climate scale analyses of extremes. While we recommend caution for all products dependent on their intended application, this particularly applies to reanalyses which show the most divergence across results.


Introduction
Precipitation indices (e.g. annual wettest day, consecutive dry days, days above 20 mm) are widely used in the climate literature to assess global and regional trends in precipitation extremes (e.g. Alexander et al 2006, Donat et al 2013, Sillmann et al 2013, Zhou et al 2016. Indeed, the last three Intergovernmental Panel on Climate Change Assessment Reports (IPCC 2001, IPCC 2007, IPCC 2013 have drawn global conclusions on extreme precipitation trends using such indices. While indices limit some inferences regarding the full distribution of daily data, they offer a mechanism to increase the quality and amount of data that can be shared among researchers to examine extremes (e.g. Alexander 2016). Most of the literature to-date has been primarily focused on in situ data because this has offered a sufficiently long record to determine climate-scale trends although spatial coverage is limited. However, some satellite records are now reaching several decades long, potentially making them suitable for long-term global assessment (Roca et al 2019, others). In addition, reanalysis products are available that have been assessed for their Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. suitability to study long-term changes in precipitation extremes (Donat et al 2014, however it has generally been recommended to use these products with caution for longer-term assessments due to inhomogeneities in the data network used for assimilation (Thorne andVose 2010, Bosilovich et al 2011). Despite limitations, each of these products i.e. in situ-based, satellite and reanalyses are used widely in the scientific literature and all offer advantages over the other in terms of e.g. temporal resolution, spatial representativeness, accuracy of measurement, homogeneity etc. Therefore, it seems necessary to evaluate each product with respect to the other and to make recommendations for the assessment of global precipitation extremes. However, this type of intercomparison has not previously been easy because of the lack of standardisation across products (e.g. resolution, formats etc) and in how precipitation extremes are defined and analysed by different research groups. Some groups have focussed on a small subset of available products (globally e.g. As part of the International Precipitation Working Group (IPWG; http://isac.cnr.it/~ipwg/), GEWEX Data and Analysis Panel (GDAP; https://gewex.org/panels/ gewex-data-and-analysis-panel/), and World Climate Research Programme Grand Challenge on Weather and Climate Extremes (WCRP GC Extremes; https://wcrpclimate.org/gc-extreme-events), a database has been developed which enables a comparison of multiple products on a daily 1°×1°resolution . This database named Frequent Rainfall Observations on GridS (FROGS) is freely available from ftp://ftp. climserv.ipsl.polytechnique.fr/FROGs/ and contains data from in situ, satellite, blended and reanalysis datasets. Here we assess how these products compare using a standard suite of precipitation indices in order to determine whether these products can be used for longterm assessment of precipitation extremes (over land). We start by introducing the data used, followed by a description of the methods employed and end by analysing product spread using a range of standard metrics.

Data and methods
We extract data products from FROGS (Roca et al 2019) that have global land coverage (or at least cover a minimum land area from 50°S to 50°N). Table 1 shows the 22 products that meet these criteria, with 13 years being the shortest time period covered (GSMAP-gauges-RNLv6.0) and 67 years being the longest time period covered (REGEN_ALL_v1.1/ REGEN_LONG_v1.1). All products share a common overlapping period of 2001-2013. For each product a suite of 10 'ETCCDI' precipitation indices (Zhang et al 2011) are calculated (table 2) resulting in one value per year and representing annual measures of frequency, duration and intensity (see also figure 1 of Alexander et al 2019 in this SI is available online at stacks.iop.org/ ERL/15/055002/mmedia). Our aim is to intercompare datasets using some basic statistics to determine if   and analysed in this study. Products are organised into four cluster groups representing those that contain (1) only station (in situ) data, (2) satellite data that has been corrected using in situ data (3) satelliteonly measurements and (4) reanalysis data. The number of products in each product cluster is indicated in the first column.

Dataset name Years available References
In situ-based (  Max 5 d precipitation amount Annual maximum consecutive 5 d precipitation (index also available as monthly maximum) mm SDII Simple daily intensity index The ratio of annual total precipitation to the number of wet days (>= 1 mm) mm/day R95p Very wet days Annual total precipitation from days>95th percentile mm R99p Extremely wet days Annual total precipitation from days>99th percentile mm PRCPTOT Annual total wet-day precipitation Annual total precipitation from days1 mm mm Frequency R10mm Number of heavy precipitation days Annual count when precipitation >= 10 mm days R20mm Number of very heavy precipitation days Annual count when precipitation >= 20 mm days Duration CDD Consecutive dry days Annual maximum number of consecutive days when precipitation<1 mm days CWD Consecutive wet days Annual maximum number of consecutive days when precipitation1 mm days conclusions with respect to extreme precipitation climatologies and trends are robust and product independent. If so, this could inform e.g. IPCC as to the reliability of studies using products other than those that are in situ-based for long-term climate assessment. To achieve our goal we first calculate climatologies for each product over the common 13year period (2001-2013) when all products have data (figure 1). Then timeseries are intercompared over their full period of record. For this analysis we also group the datasets by product type: (i) in situ data, (ii) satellite products that have been corrected using in situ data, (iii) uncorrected satellite products and (iv) reanalyses (figure 2). A coefficient of variation (cov) is calculated for each cluster (that is, the interproduct spread in %) with covs shown for a selection of indices that cover precipitation intensity (SDII), frequency (R10mm) and duration (CDD) (figure 3). The covs are then calculated as a regional land average for the globe (50S-50N), the Extratropics (20S-50S and 20N-50N) and the Tropics (20S-20N) for all indices from table 2 (figures 4(a)-(c)). Finally, we compare trends from 1988 to 2013 for each product that covers this period with a sufficient amount of non-missing data.

Climatology
Figures 1 and S1-S9 show a 13-year climatology for each of the indices from table 2 for all 22 precipitation products considered. Globally spatial patterns are similar across climatic zones over the 2001-2013 period. However, there are many important caveats. For annual total wet-day precipitation (PRCPTOT), for example, the 'wettest' product is >50% wetter than the 'driest' product (1256.5 mm (MERRA2) versus 821.1 mm (CPC_v1.0) globally averaged over the 13-year period). Overall extreme precipitation indices in reanalyses tend to be less spatially consistent than the in situ and satellite datasets with, for example, tropical Africa being very wet in some reanalyses and much drier in others (figures 1, S1-S9). This masks the fact, however, that reanalysis products often fall into two groups ('wet' and 'dry'). This can be seen more clearly in table 3 where all the datasets are ranked for each index from driest to wettest on average over the 2001-2013 period based on the global (50°S-50°N) land average. Interestingly MERRA2 mostly falls into the wet group and MERRA1 mostly falls into the dry group (see also figure 2). This means that the spread of actual values across reanalyses is larger on the whole than any of the other product clusters. MERRA2 and CFSR are the wettest products overall, ranking highest and second highest in 6 out of the 10 precipitation indices and all but one of the precipitation intensity metrics respectively. The uncorrected satellite products tend to fall in the dry end of the scale except for 3B42_IR_v7.0 which is much wetter. CHIRP_v2 has some of the driest precipitation intensity, frequency and duration measures. Despite this it also has one of the longest durations of consecutive wet days (CWD) compared to other products. The in situ corrected version, CHIRPS_v2.0 tends not to be quite so dry and is wetter in most metrics except R95p, R99p and CWD. In general, the satellite products that are corrected to in situ measurements are somewhat more mixed in terms of rankings and tend to vary from index to index although 3B42RT_v7.0 stands out as being particularly wet (although note this product has minimal correction compared to others in this cluster). The in situ-based products tend to have very similar spatial patterns and global means although GPCC_FDD_2018 is distinct as being wetter than the other in situ products and indeed most of the other satellite-based products. This is particularly evident in the intensity-based metrics such as Rx1day (figure S7), with the wettest day in GPCC_FDD_2018 being around 40% wetter on average than the other in situ products (see climatological global averages inserted in figure S7). Differences are clearest in the tropics as also highlighted in Bador et al (2020) and Roca (2019). Indeed GPCC_FDD_2018 has the largest land-based daily precipitation intensity (9.3 mm d −1 ) of any other product (figure S9), the other products ranging from 5.5 mm d −1 (CHIRP_v2) to 9.0 mm d −1 (3B42_IR_v7.0) on average over the 2001-2013 period. Overall CHIRP_v2 has the lowest average ranking (driest) and MERRA2 has the highest average ranking (wettest). While not shown, difference plots between all products and REGEN_All_2019 were created for each index (similar to figures S4 and S7 in Bador et al (2020) for Rx1day and PRCPTOT respectively) to determine whether the products that were consistently wetter/drier in table 3 had a global or more regional difference signature. What we found was that for the more extreme indices (e.g. Rx1day, R99p), products are broadly classified as wetter/drier everywhere i.e. there is a global-scale signature. However for some of the more 'moderate' extremes (e.g R10mm) there is some regional variation. While these regions vary, the Amazon, central Africa and south-east Asia often stand out as areas of high contrast. Table 3 also highlights other interesting features of the data such as the contrast between CDD and CWD for some products. For example, GPCC_FDD_2018 is ranked 'dry' for both CDD and CWD while MERRA2 is ranked 'wet' for both. In the former this implies long dry spells interspersed with short wet spells while the latter implies long wet spells interspersed with short dry spells. Other products have contrasting features in these indices (e.g. CDD is dry while CWD is wet in CHIRP_v2 and CDD is wet while CWD is dry in 3B42_IR_v7.0) which implies for some products long/short dry spells alternating with long/short wet spells. Some further work is planned to try and address these intriguing features both within and between products. Figure 2 shows the global (50°S-50°N) land average timeseries for each product (table 1) and each precipitation index (table 2) over the period of available data. Note that due to the fact that the calculation of CDD and CWD can cross over the year end (in order not to artificially break a dry or wet season), we do not include the final year of each product in the timeseries plots for those two indices. All other indices include all the years noted in table 1. Some things of note are that reanalyses generally have a larger spread than the other product clusters (especially for CWD, PRCPTOT, R20mm, R95p, R99p, Rx1day and Rx5day) and seem to form into wet and dry camps (as noted above). The in situ products (perhaps with the exception of GPCC_FDD_2018) tend to band in the middle of the multi-product ensemble and are well-aligned with the median of the boxplot although they have some of the highest consecutive dry days (CDD) (figure 2(a)) and fewest CWD ( figure 2(b)). In fact there is no overlap between the range of in situ and reanalyses CWD (rhs of figure 2(b)). Many of the multi-product distributions are heavily skewed (e.g. CWD, PRCPTOT) but this can mostly be explained by the inclusion of the reanalysis data which might be affected by the excessive drizzle that is found in atmospheric models (e.g. Stephens et al 2010). This would certainly seem to be the case based on the results from Herold et al(2016) who found that reanalysis products generally had more wet days than in situ-based products. Overall the in situ and two satellite-based clusters are more closely aligned with each other than the reanalyses. Other points of note are that there are clear inhomogeneities in some of the products that would require further investigation. For example, PERSIANN_v1_r1 has an obvious jump around the mid-1980s (also noted by Herold et al 2016) and CHIRP_v2 shows what looks like a shift around 2012. In addition, there are clear trends in some of the timeseries which are not apparent in other products. For example, in R95p and R99p, MERRA2 has an increasing trend throughout the timeseries while CFSR appears to show a trend starting in about 2000. These trends are likely data artefacts as they are not present in any of the other products and could be responsible for the non-stationary systematic errors identified in Funk et al(2019) in this Focus Collection. These issues will be investigated further as part of the wider remit of the IPWG/GDAP/WCRP extreme precipitation project.

Timeseries and interproduct spread
To investigate some of the product differences from a more regional perspective we calculate the  Figure 3 shows the results for SDII, CDD and R10mm. Figures S10, S11 show the same information for the remaining precipitation indices studied. As we move through the product clusters from in situ to corrected satellite to uncorrected satellite to reanalyses the cov increases, particularly in the tropics and high-altitude regions. In addition, it is clear that some indices agree better than others, that is, they are less sensitive to the choice of dataset. For example, SDII is not as sensitive as CDD and similarly CDD is not as sensitive as R10mm to dataset choice. This agrees well with the results of Herold et al (2017) who used a smaller satellite-only subset of products. The cov should be read as the interproduct spread in the sense that just because a cov is small that does not mean that those products are closer to the 'truth'. Simply they tend to agree better because they may rely on similar source data (in the case of the in situ-based products and satellite-based datasets). It is clear though that there are regional differences in how well products agree. In regions of high station density e.g. North America, Europe, East Asia and Australia products and indices are in much closer agreement. Conversely in regions of low station density e.g. the Sahel there is a very large range of precipitation index values (>100% in many cases). However even in regions of high station density, reanalyses and satellites can have large interproduct spread meaning that caution should be applied to how results should be assessed and interpreted at the regional scale. Figure 4 intercompares the average covs for each index and each product cluster for global land regions, tropical land and extratropical land. The broad message here is that in general in situ-based products have Table 3. Ranking of precipitation datasets (table 1) averaged over global land areas from 50°S to 50°N for each precipitation index (table 2) from driest (dark brown) to wettest (dark green) globally averaged over the period 2001-2013. the least interproduct spread while reanalyses have the largest spread, apart from the extratropical land values where reanalyses in many cases are much closer to each other than the satellite products.
Finally we calculate trends for each product that have at least 25 years of data available over the period 1988-2013 (table 4). This period is chosen to maximise the length of record and number of products available for comparison and to ensure that all in situ-based products are included. We find that the majority of products show significant increases in the wettest day (Rx1day), wettest consecutive five days (Rx5day), daily precipitation intensity (SDII), total precipitation (PRCPTOT), days above 10 mm (R10mm) and days above 20 mm (R20mm) despite the different data sources. It should be noted though that the magnitudes of the trends can be quite different. For example, for indices where all but one of the trends are significantly increasing (R95p, R20mm) there is a factor of ∼4-5 magnitude difference from the lowest to the highest trends. The coherence in trend signatures tends to break down for CDD and CWD with mixed sign and significance of trends across products. This is even the case across the in situ-based products with, for example CDD, the two REGEN datasets and GPCC_FDD_v2018 having non-significant decreasing trends (−0.41 days/decade, −0.41 days/decade, −0.65 days/decade) while GPCC_FDD_v1.0 has a significant increasing trend (1.68 days/decade).

Discussion and conclusions
We have compared the largest available, consistent database of daily gridded land-based precipitation products for their representation of a range of annual precipitation indices and extremes. We conclude that taken on global average products can appear reasonably similar in terms of their spatial patterns but in terms of the range of values of precipitation extremes over space and time they can have quite different forms. Splitting the 22 products available from the FROGS database that met our selection criteria into products clusters we found that in situ-based products are most similar to each other compared to other product clusters and reanalyses are least similar. The largest differences between products occur in the tropics, the driest regions and areas with high topographic contrasts (where there is limited in situ data) and differences are particularly marked in South-East Asia and Africa. Some products are particularly 'wet' or 'dry' especially in the tropics and such differences are particularly marked in reanalysis products. CHIRP_v2 and CHIRPS_v2.0 are among the driest products. Reanalyses fall into two camps: 'dry' or 'wet' and interestingly the same product family (e.g. MERRA) fall into one of each. Similarly, the GPCC family of in situ-based products fall into two camps with GPCC_FDD_2018 much wetter than its predecessor GPCC_FDD_v1.0. The inclusion of much more data and improved quality control could be part of the  1 for units) for each product that has data covering the period 1988-2013. Products that have too much missing data over this period are not included (e.g. PERSIANN). Trends were calculated using a Sens slope estimator and significance was tested at the 5% level using a Mann-Kendall test. Dark (blue/orange) grid cells indicate significant trends (increases/decreases) in each index except for CDD where the colours are swapped. reason but this unlikely explains the global scale 'wettening' between the products. The other main difference between GPCC_FDD_v1.0 and GPCC_FDD_2018 is the change in interpolation algorithm from kriging to spheremap. Investigating only monthly precipitation averages and PRCPTOT for example does not always highlight some of the 'quirks' that we have discovered in analysing the more extreme ends of the precipitation distribution but all of the differences between the products highlighted here require much further investigation.
There are obviously issues which we have not addressed in this paper such as the timing of extremes or how the 'drizzle effect' (Stephens et al 2010) might impact some of the indices. In addition, we have not discussed in detail the problems associated with remote sensing of precipitation extremes (such as instrument sensitivity to capturing all types of precipitation or retrieval method uncertainties) or the fact that satellites measure instantaneous rain rates while the in situ products generally measure 24 h accumulations. Another issue is the mismatch between stationbased products which convert points to grids and areal average precipitation (such as produced by reanalyses). Indeed we broadly assume that the indices calculated from each product are meaningful and directly comparable between the product types. Also we have not included dataset uncertainties in this analysis (that is the uncertainties that are often provided by the dataset developers associated with their products) mainly due to the fact that not all datasets come with such information and/or what information is provided e.g. interpolation errors might not be comparable across products. It should also be remembered that FROGS is a 'living database' so, for example, not all freely available products are currently included. One perspective of this work would be to refine the clustering of the products (using new products added to FROGS as they become available, like IMERG from NASA, CMORPH-2 from NOAA or precipitation from ERA-5) in particular to explore within the satellite ensembles the sensitivity to the use of constellation data versus single platform. This could further reduce the cluster spread as shown for the tropics over a short period of time (Roca 2019 this Focus Collection).
However, based on the information that we have gathered we can make some recommendations on the use of global products for understanding observed precipitation extremes.
• First, know the product you are using and its inherent issues (e.g. GPCP has a parameter setting that if used limits daily values to a maximum of 100 mm which we know in some regions is an unreasonably low value, see Bador et al 2020 for more details on the impacts on precipitation extremes).
• Second, note that some indices are more robust than others in terms of their similarity amongst products.
For example, SDII, shows the most interproduct consistency while R20mm shows the least, and broadly speaking precipitation intensity measures are more robust than frequency or duration measures.
• Thirdly, despite the fact that the magnitude of trends can vary substantially, all product types (in situ-based, satellite uncorrected, satellite corrected and reanalyses) show broad consistency in the sign of the trends for most extremes indices especially those that are intensity-based. This should give us some comfort in at least the robustness of the sign of the thermodynamic component of global precipitation extremes trend estimates if not their magnitude.
• Lastly, and not surprisingly, we have more confidence in regions with an abundance of in situ observations and conversely our largest uncertainties are over regions that are poorly sampled. In the latter regions, it is possible that the uncorrected satellite products might offer a generally more realistic view of precipitation extremes in these regions due to problems with bias correction algorithms, although this is yet to be thoroughly tested.
Overall our results indicate that satellite-derived precipitation datasets, if properly assessed, could provide useful information to inform long-term trends and can fill in useful gaps in regions with limited gauge density if long-term homogeneous satellite-based data can be provided. It is recommended that IPCC reports make more use of these data in current and future assessments of extreme precipitation while acknowledging the shortcomings in all observational-based datasets especially reanalyses.