Attributing observed increase in extreme precipitation in China to human influence

This paper examines new evidence from observational and detection and attribution studies of changes in extreme precipitation in China since the early 1960s. We have also designed a series of sensitivity tests to explore the robustness of detection and attribution results to the differences in sample size, in extreme precipitation index, and in data processing procedure. Our analyses used the most recent update of observational records as well as simulations conducted with the climate models participated in the Coupled Model Intercomparison Project Phase 6. Based on the existing studies and our additional analyses, we found that human influence is detectable in extreme precipitation in China regardless of the period, extreme precipitation index, or data treatment considered, in both China as a whole and in northern and southern China separately. We also found, as is often encountered in detection and attribution studies, it is difficult to separate the contribution from anthropogenic forcing from that of natural external forcing, and it is also challenging to decompose the anthropogenic component into a greenhouse gas forcing component and a component that reflects other anthropogenic forcing agents (dominantly, aerosols).


Introduction
As the global surface temperature increases, 'the frequency and intensity of heavy precipitation events have likely increased at the global scale over a majority of land regions' and 'heavy precipitation has likely increased on the continental scale over three continents: North America, Europe, and Asia' [1]. There is also evidence of an increase in extreme precipitation over China [2][3][4][5][6]. Yet, longterm changes in the indices of extreme precipitation in the observations exhibit a large spatial heterogeneity between different parts of China and between northern and southern China. For example, in recent decades, southern China has experienced an increase in severe flooding, with direct economic losses caused by extreme precipitation increasing, while parts of northern China have experienced more droughts [7,8]. It is essential to understand if anthropogenic drivers such as the emission of greenhouse gases (GHG) have contributed to these observed changes in China.
Studies about observed changes in extreme precipitation in China and their possible causes have focused on two main aspects. One aspect is about if observed changes are already detectable with respect to the observed warming. Li et al [9] fitted generalized extreme value distribution to annual maximum oneday precipitation amount (Rx1day) over more than 2000 observing stations in China with time or the global mean surface temperature as a covariate. Based on a field significance test, they found that while there was evidence of an increase in different parts of China, that increase might still be too weak to be robustly detectable for China as whole. They also found that by 2030, an increase in extreme precipitation would have become robustly detectable, using different methods, in the simulations conducted with the climate models participated in the Coupled Model Intercomparison Project Phase 5 (CMIP5). Wang et al [6] examined the power of detecting a significant change by field significance test. They found that the methods used in trend detection influenced the power of detecting a field significant change. When the non-parametric Mann-Kendall test was applied to Rx1day data from close to 600 basic stations of China over the period 1961-2017, a significant increase was detected. Additionally, a significant increase was also detected when an extreme value distribution was fitted to the data with a covariate in the location parameter. But a significant change was not detected if the covariate was also included in the scale parameter, an indication that the added complexity in the statistical model and the need to estimate an additional parameter with limited observational data may have weakened the power of detecting a significant change. Taking together the results from Li et al [9] and Wang et al [6] and considering the characteristics of the methodologies that were used, the evidence does point to a detectable, and possibly robust, increase in heavy precipitation at least in Rx1day over China since the 1960s.
Another aspect relates to identifying possible causes of the observed changes. Li et al [3] compared the observed changes during 1961-2005 in Rx1day and in annual maximum amount of 5 d precipitation (Rx5day) with those simulated by CIMP5 models using an optimal fingerprint approach. They found that the response to anthropogenic forcing (ANT, including both GHG and anthropogenic aerosol (AER) forcings) was robustly detectable at the 10% level. But the response to the combined effect of anthropogenic and natural external forcing (ALL), which should be closer to reality than that of ANT, was surprisingly not detected. A follow up study by the same group [10] using the same observational dataset (but extended to 2014) and simulations by the climate models participated in the Coupled Model Intercomparison Project Phase 6 (CMIP6). They confirmed their earlier findings that ANT was robustly detected but again, ALL was not detected. The inconsistency between detection results for ALL and ANT poses a question if the detection of ANT or the lack of detection of ALL is robust. The analysis also included several other indices representing various aspects of extreme precipitation including annual amount of precipitation contributed by the highest percentiles (95th and 99th) of daily precipitation (R95p and R99p). In both analyses, the observational datasets were gridded daily precipitation on 0.25 • × 0.25 • grids of Wu and Gao [11] and gridded data were converted to probability following [3,10] before being averaged to regional mean values. In a detection and attribution analysis for Asia with observational data coverage mostly from China, Dong et al [12] showed a clear detection of ALL signal simulated by CMIP5 models over 1958-2012 when station observations were aggregated to regional average. Additionally, detection of ALL signal was also robust in various aspects of extreme precipitation for Asian continent [13].
China's surface temperature has clearly increased since the 1960s and the warming can be attributed to human influence [14,15]. The warming has also resulted in an increase in atmospheric moisture [16] as predicted by the Clausius-Clapyren equation that dictates the relation between air temperature and saturation moisture pressure. The observed increase in atmospheric moisture is expected to translate to an increase in extreme precipitation, as is the case in model projections [17,18]. While anthropogenic aerosols can partially mask the greenhouse warming and result in reduction in Asian summer monsoon precipitation [19][20][21][22][23], if aerosol induced reduction in total precipitation will translate to a detectable decrease in extreme precipitation on the backdrop of warming induced increase in extreme precipitation is unclear. In fact, model simulations show an increase in extreme precipitation even in regions where total precipitation is projected to decrease [17].
Sun et al [14] concluded that 'the evidence of human influence on extreme precipitation is emerging' . In light of recent publications [6,24], the availability of longer observations and the simulations from the Detection and Attribution Model Intercomparison Project (DAMIP) [25], we revisit the question about extreme precipitation attribution with an aim to produce a synthesized view that would update relevant assessment in Sun et al [14]. Here, we will design some specific analyses addressing potential weakness in the earlier studies. We will examine sensitivity of detection results to the use of different observational datasets, different data processing procedures, different data sample sizes, and outputs from different climate models. We will draw conclusions based on earlier findings along with our own analyses. Section 2 describes the datasets we use. We present the results of our analysis in section 3 and our final assessment is provided in section 4.

Extreme precipitation indices
We consider six daily precipitation derived indices to represent different aspects of extreme precipitation. They are Rx1day and Rx5day, annual total precipitation accumulations from wet days (the days with precipitation amount larger than 1 mm) with precipitation exceeding the 95th and 99th percentiles of daily amount during the 1961-1990 base period (R95p and R99p), and two indices reflecting contributions of extreme precipitation to annual total precipitation on wet days (PRCPTOT), that is, R95pTOT and R99pTOT. For convenience, these indices are termed as the intensity indices (RX1day and Rx5day), percentile-based indices (R95p and R99p) and fractional indices (R95pTOT and R99pTOT) hereafter.

Datasets
Multiple observational datasets have been used in the previous analysis. These include (a) the daily precipitation dataset for 839 national basic station of Wang et al and Li et al used in [6,9], (b) homogenized daily precipitation for 2419 stations of Dong et al used in [12], (c) gridded daily precipitation at 0.25 • × 0.25 • resolution, CN05 of Li et al and Xu et al used in [3,10]. The national basic stations are a subset of the 2419 stations, they typically have longer observational history compared with their neighboring stations. The full 2419 stations include almost all long-term observing stations in China. The CN05 is gridded daily dataset based on the daily precipitation values from the 2419 stations. The interpolation uses the 'anomaly approach' by first interpolating station daily precipitation anomalies onto the regular grid using angular distance weighting and then adding station daily precipitation climatology interpolated with thin-plate smoothing splines [11]. Given the vast land mass of China, gridding daily precipitation to 0.25 • × 0.25 • resolution even with data from 2419 stations is very challenging, the results can be problematic in regions such as in northwestern China where observing network is very sparse and precipitation variability is large when compared with climatology. Our additional analysis is mainly based on the quality-controlled and homogenized daily precipitation data at the 2419 stations for the period 1961-2020. The station data were provided by China National Meteorological Information Center [26]. We also use the CN05 dataset [11], updated to 2020, for some analyses for comparison.
Our detection and attribution analyses are conducted on regional mean values for China, northern and southern China (detailed later). There can be different means to produce regional averages. Li et al [3] followed Min et al's [27] method by converting grid box daily values into probability and then averaging probabilities over the region. This probability transform converts all index values to a dimensionless scale ranging from zero to one, with the same scale used everywhere in China. This effectively means that each grid cell influences a regional or national average in proportion to its area. Yet as areas with large variability (relative to climatology) would have larger yearto-year variability in probabilities, they may play a larger role in the year-to-year variability of the national or regional average and thus influences the detection and attribution analysis.
Dong et al [13] computed regional average by first computing station anomalies relative to their 1961-1990 climatology and then averaging station anomalies. Regional mean is obtained as the averages from the grid boxes with valid values. The average of anomalies will be more strongly influenced by areas where there is higher anomaly variabilitywhich for precipitation related quantities, will tend to be areas with higher precipitation amounts. Quantities like Rx1day will most strongly reflect areas where Rx1day tends to be large. Thus, a national average may, in these cases, be more dominated by variability in only part of the country. Due to very large land mass and large differences in climates in different parts of China, the use of different averaging matric can mean giving different weights to different regions. As there is no perfect averaging scheme, we use both methods of computing regional average to examine the robustness of the relevant analyses.
To convert station values to regional mean, we first gridded station data into grid box values and then averaging the grid box values into regional mean. Our working grid box resolution is 1.875 • × 1.25 • , the one used in HadEX3. A grid box is marked as missing if there is not station observation within the grid box. There can be sensitivity to the use of different spatial resolutions in the regional mean values. We tested this sensitivity to various spatial resolutions including We found that qualitative conclusions about detection are not affected using these varying sizes of the grid boxes (not shown).
The model data comes from the DAMIP [25] experiment of CMIP6. We use the simulations from the experiments under ALL, GHG, NAT and AER forcings. We mainly use output from five models that have at least three members under each forcing. While there are additional models providing simulations under some of the forcings such as ALL and making use of these simulations has the potential to improve estimate of relevant signals, we opted the smaller set as in this case we can be sure the differences in the model responses to different forcings are not compounded by the differences also in the combination of models under different forcing. In total, there are 25 runs for each experiment under individual forcing agent during 1961-2020 (table 1). Because the ALL experiment ends at 2014, the model data under the Shared Socioeconomic Pathways (SSP) 2-4.5 scenario from 2014 to 2020 are used to extend the historical ALL simulation. The SSP4.5 represents a central emission pathway [28]. The results from preindustrial (CTL) experiments of 16 models are used to estimate the noise, i.e. internal variability for the detection and attribution analysis (table 1). All the precipitation indices are calculated at the model's original grid boxes, they are then converted to anomalies (relative to  or probability and bilinearly interpolated onto the 1.875 • × 1.25 • grid boxes. These values are masked by the availability of relevant observational data and then used to compute regional mean. Table 1. List of multi-model simulations used in this study. Numbers indicate ensemble sizes for historical simulations under all (ALL), greenhouse gas (GHG), aerosol (AER) and natural (NAT) forcing or the number of 60 years chunks for the CTL simulations. Additional model runs or total number of model runs used in the estimating precipitation sensitivity, the relationship between model simulated changes in extreme precipitation and in global mean surface air temperature (GMST), are in bold italic font. Transient climate response (TCR) for individual models are taken from IPCC AR6 report [29]. IPSL-CM6A-LR - We consider China as whole and additionally northern and southern China separately. China's climate is influenced by large topographical divides. The west-east divide Qingling Mountains-Huaihe River line has shaped not only the climates, but also other geographical conditions and to some extent the way of people's lifestyles. Here we divide China into northern and southern China along 35 • N, roughly corresponding to the Qinling Mountains-Huaihe River line, which is also the dividing line with annual precipitation of 800 mm.

Detection and attribution methods
Regression-based methods, known as fingerprinting methods, have been widely used in detection and attribution. We utilize the optimal fingerprinting method (TLS [30]), where observed changes are regressed onto a model-generated response pattern to a set of forcings (or a single forcing). The method: where Y represents observations, X represents signal patterns, v represents the internal (unforced) variability in the signal patterns of simulations, β represents regression coefficients, ε represents the residual vector or the internal variability in the observations which are assumed to follow Gaussian distributions. Our detection and attribution analyses are conducted on the regional mean values of extreme precipitation, averages of Rx1day or Rx5day values or their probability-transferred values across a large number of sites or grid boxes, these regional mean values shall in general follow Gaussian distributions according to central limiting theorem. Signals or model-simulated responses are computed as multi-model ensemble mean by first computing the arithmetical average of ensemble runs from each model and then averaging them for the region. The variance-covariance matrix of ε is obtained from CTL chunks. The 60 years CTL chunks are divided into two sets. One set is used for optimization for the estimation of the scaling factors β and the other set is used for testing, to obtain the 5%-95% range of β and to conduct the residual consistency test [30,31]. When the 90% uncertainty range of a scaling factor is above zero, the corresponding signal is said to have been detected. If the estimated confidence interval also includes 1, the observed change is consistent with the modeled response or attributable to the relevant forcing. If the residual consistency test indicates modeled internal variability is too small, the uncertainty range of the corresponding scaling factors could be underestimated, and a detection claim could be spurious.
The regional mean time series are averaged to produce nonoverlapping 5 years means to reduce temporal dimension, resulting in 12 temporal data points during 1961-2020. Detection and attribution analyses for northern and southern China are conducted on their respective 5 years regional mean series. But the analyses for China as whole are conducted by combining the series for the northern and southern China as two space dimensions, this way, possible difference in precipitation response over the space can be considered. We conduct one-, two-and three-signal analyses. For the one-signal analyses, the observed precipitation changes are regressed onto ALL signal. For the two-signal analyses, the observed precipitation changes are regressed onto ALL and NAT responses jointly to produce scaling factors corresponding to anthropogenic forcing (ANT, represented by ALL-NAT) and NAT, respectively. Note that as ALL simulations also include anthropogenic forcing such as land use and land cover changes, estimate for ANT as the difference between ALL and NAT may better reflect all anthropogenic forcing than the sum of GHG and AER responses. For the three-signal analysis, GHG, AER and NAT are jointly considered, and the observed precipitation changes are regressed onto these three factors.

Observed and simulated changes
The observed trends of six extreme precipitation indices during 1961-2020 over China are shown in figure 1. Overall, extreme precipitation has increased as many more stations showing an increase trend than a decrease trend even though the spatial pattern can be complex. This is consistent with the findings of Wang et al [6] who found field significance in the increase in extreme precipitation. The spatial patterns of Rx1day and Rx5day trends are very similar to those of percentile-based indices (R99p and R95p). The largest increase in the amount of extreme precipitation is observed in southeastern China, where larger precipitation climatology generally occurs. A decrease is mainly seen along a belt from northeastern China to southwestern China. There is also a clear increase of extreme precipitation in most parts of western China, especially in Rx1day and Rx5day. The spatial pattern of R99pTOT and R95pTOT trends (the bottom two panels) show increases and decreases similar to those of other indices, but the magnitude of trends of these two fractional indices are more even across the country, reflecting a generally homogeneous increase of the contribution to total precipitation by heavy precipitation events. The smaller difference in the magnitude of fractional indices trends over the space than the percentile-based indices is due to the normalization by the total wet day precipitation. Trends computed for 1961-2014 are similar to those computed for 1961-2020, though their magnitudes are smaller in northern China.
The trends in the multi-model ensemble mean under the ALL, GHG, AER and NAT forcings are shown in figure 2. The ALL simulations produce an increase trend, similar to that in the observations. Compared with those of observations, there is little noise over the space, reflecting the fact that the multi-model ensembles have very small natural variability due to averaging. The larger amount of increase (in absolute terms) in the observations of Rx1day and Rx5day in eastern China is also reproduced in the ALL simulations though the magnitude in the modeled response is smaller that of observations. The percentile-based indices (R99p and R95p) and fractional indices (R99pTOT and R95pTOT) also show consistent increase across China. Under the GHG forcing, the spatial patterns of trend are similar to those under ALL forcing but the magnitudes of trend are larger. This indicates precipitation increase due to GHG forcing is offset by other forcings. However, compared with the observation, the simulated increase of extreme precipitation is still smaller in the GHG experiments. For the AER experiments, most areas show a decrease of extreme precipitation, especially in southern China where aerosol emissions have been larger in general. This is consistent with the thermodynamic effect of aerosol cooling. Additionally, cooling induced by aerosol emission can cause the reduced land-sea thermal contrast between the Asian continent and adjacent oceans and thereby weakening eastern Asian monsoon [32,33]. The effect of aerosol on cloud microphysics that affects precipitation may also come into play. The magnitude of precipitation response to GHG forcing is larger than that to AER and decrease due to AER does not fully offset that the increase caused by GHG. The magnitude of the response to NAT forcing is weak overall, suggesting little contribution to the observed changes by NAT forcing.
The time series of 5 years mean anomalies for the observed and modeled extreme precipitation indices are displayed in figure 3. For observations, extreme precipitation shows an upward increase in China and the two sub-regions, along with clear decadal and inter-decadal variations. In northern China, observed extreme precipitation dropped in the early 2000s. But precipitation has recovered since, leading to a positive trend of extreme precipitation in the region. Judging from the time series, trend for shorter time periods could be negative, consistent to previous studies [8,34]. In contrast, extreme precipitation in southern China has consistently increased in the past decades. For the modeled responses, all indices show gradual increase under ALL and GHG forcings while decrease under AER forcing and almost no change under NAT forcing in China and the two sub-regions. The simulated ALL and GHG responses closely follow the observed changes. For the simulated AER response, the decrease in southern China is larger than those in northern China, reflecting larger precipitation climatology, though larger aerosol loading in the region may also play a role ( figure 2). For all the indices, the GHG results show similar changes between northern and southern China while the spatial difference of AER influence is very clear. For R99pTOT and R95pTOT, most characteristics of changes are similar to those  of the other indices, but the changes are obviously smoother. Figure 4 shows the scaling factors and their 90% confidence intervals for Rx1day and Rx5day from our sensitivity tests for whole China. The sensitivity tests include the detection and attribution analysis for the following quantities and time periods: (a) average of anomalies for shorter period 1961-2014, (b) average of anomalies for 1961-2020, (c) average of probability for shorter period 1961-2014, (d) average of probability for 1961-2020. The analyses are also conducted for the two observational datasets: homogenized daily precipitation for 2419 stations and the gridded daily precipitation at 0.25 • × 0.25 • resolution, CN05. In all cases, the 5th percent of scaling factor is above zero. Additionally, the use of a longer time period corresponds to a smaller confidence interval of the scaling factor, indicating that the response to ALL becomes more easily detectable in the longer time period because the increase in sample size and in the strength of signal due to continued warming. We also note that the 90% confidence interval includes one in most cases but above one in one case. These lead us to conclude that extreme precipitation response to ALL can be robustly detected. But there is not overwhelming evidence to indicate if the models have properly simulated the magnitudes of changes in the observation. Having resolved this question, for simplicity we will report results from average of anomalies for the longer period 1961-2020 in the rest of the paper. Figure 5 displays the best estimates of the scaling factors and their 90% confidence intervals of the ALL signal for other indices and for China, and the two sub-regions for the period 1961-2020. As shown in the figure, ALL is robustly detected in China and in the two sub-regions. The residual consistency test can be passed for most indices, especially in China and southern China, meaning that the detection claims are reliable. The best estimates of scaling factors and their 5%-95% confidence intervals in three-signal analyses for GHG, AER, and NAT are shown in figure 7. A general impression is that scaling factors are associated with large confidence intervals, though the scaling factors for the percentile-based and fractional indices for GHG do seem to be well constrained and that the scaling factor for the AER forcing is hardly well constrained. Additionally, detection conclusions fluctuate when the combination of models used for the signal estimate or when the time period of analyses changes. These together suggest that the three-signal detection is still uncertain at regional scale and the separation of GHG and AER signal is difficult against the natural internal variability. The two-signal and three-signal results show that it can be difficult to separate the contribution from anthropogenic forcing from that of natural external forcing, and it is even more challenging to decompose the anthropogenic component into a greenhouse gas forcing component and aerosol forcing component. Figure 8 shows simulated changes in different precipitation indices under GHG, AER and ALL forcing during 1961-2020 for individual models and for the observations. For the same precipitation     index, most models show an increase in the response to GHG forcing but a decrease to AER forcing, and the magnitude of response to GHG forcing is larger. The net anthropogenic response is dominated by GHG forcing. Figure 9 shows precipitation sensitivity to warming, indicating that that the magnitude of extreme precipitation response to GHGinduced warming in individual models is closely linked to the modeled GMST responses to the forcing. The models with a high climate sensitivity such as HadGE3-GC31-LL (Transient Climate Response, TCR = 2.55 • C) and CanESM5 (TCR = 2.74 • C), produce larger precipitation changes, while the models with low climate sensitivity such as NorESM2-LM (TCR = 1.48 • C) yield smaller precipitation changes (table 1). The scaling rates for Rx1day and Rx5day roughly follow the Clausius-Clapeyron relationship (∼7% K −1 ), illustrating the importance of saturation moisture increase to the changes in extreme precipitation. The scaling rates for percentile-based indices are larger than 7% K −1 , at about 20% K −1 for R99p and 18% K −1 for R95p. This is because percentile-based indices are accumulations of daily precipitation exceeding a defined threshold and as such are affected by both the changes in precipitation intensity as well as by the number of exceedances [35]. As precipitation intensity increases, the number of exceedances also increases, resulting in temperature scaling rate larger than that of Rx1day. The scaling rates for fractional indices are like the percentilebased indices, but they are lower than percentilebased indices. Within the three types of indices, the extremes that are rarer (e.g. R99p as opposed to R95p) tend to have larger scaling rate. This is consistent with model simulated future projections [36]. Generally, the scaling rates of all the indices in northern China are slightly smaller than in southern China.

Understanding inter-model difference
Precipitation responses to AER by individual models are more complex (figure 1), there can be large difference in the spatial pattern of the response. The differences among the modeled responses are the results of not only the differences in the aerosol loadings used by modeling centers but also differences in aerosol processes in the models. The inclusion of indirect aerosol effect in the models can lead to larger decrease in both temperature and precipitation [37]. Relatively small ensemble size in the backdrop of weak aerosol signal may have made it difficult to actually characterize aerosol response. Additionally, aerosol forcing used to drive CMIP6 models may not well reflect the reality over the region [38]. For example, there is evidence that CMIP6 models may not able to capture the observed dipole pattern in the trend of aerosol optical depth over Asia during 2006-2014 [39]. These, along with the generally weaker aerosol forcing, must have contributed to the lack of detection of AER responses in precipitation.

Conclusions and discussions
We started the paper with reviewing recent observational studies about changes in extreme precipitation and concluded that extreme precipitation, at least that represented by Rx1day, has likely increased in China since the 1960s. We then conducted detection and attribution analysis using the most recent observational datasets and CMIP6 multimodel ensemble simulations. When conducting a detection and attribution analysis, multiple choices must be made regarding the region and period under study, indices of climate, methods for aggregating station observations into a form suitable for comparison with climate model simulations etc. As different choices and their combinations may result in different detection and attribution results, we designed a series of tests including the use of longer time periods, different metrics for regional precipitation, and a different data processing procedures to explore sensitivities of the analyses to these choices.
Overall, we detected precipitation response to ALL and the result is robust to different choices used in the sensitivity analyses. We found it is hard to determine if models have correctly simulated the magnitude of changes; while some analyses show underestimation by models, others do not. Increase in extreme precipitation is predominately due to the GHG emission. Aerosol should have played role in offsetting the effect of GHG, but it is contribution to the observed change is difficult to determine due to deficiency in the AER forcing implementation for the regions as well as difference in model's parameterization and the fact that AER signal is weaker compared with that of GHG. We also found that changes in extreme precipitation scale with warming, consistent with findings of earlier model projection studies.
Compared with evidence examined in Sun et al [14], we are now seeing more robust and consistent evidence of increase in extreme precipitation in the observations. The addition of more recent records in the analysis, which increases sample size that improves power of detection and enhances signal strength due to continued warming, has led to the more robust detection and attribution results. Moreover, the observations, and detection and attribution results are also consistent with expectation from warming and with model projected future changes. We thus conclude that extreme precipitation in China has likely increased since the 1960s and the increase is likely due to human influence on the climate system.

Data availability statement
The data generated and/or analysed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.