Uncertainty analysis of future summer monsoon duration and area over East Asia using a multi-GCM/multi-RCM ensemble

This study examines the spatiotemporal characteristics of the summer monsoon rainy season over East Asia using six regional climate models (RCMs) participating in the Coordinated Regional Domain Experiment (CORDEX) East Asia Phase II project. The framework combining multiple global climate models (GCMs) with multiple RCMs produces a larger spread in summer monsoon characteristics than driving GCMs only, enabling a better quantification of uncertainty factors. On average, the RCM simulations reproduce the observed summer monsoon duration and area better than the corresponding boundary GCMs, implying the added values of downscaling. Both the area and duration of the East Asian summer monsoon are projected to increase by the late 21st century, more strongly in high emission scenarios than in low emission scenarios, particularly in China. Different responses between scenarios, which indicate warming mitigation benefits, only become significant in the late 21st century due to large intersimulation uncertainties. Analysis of variance results show that uncertainty in future monsoon area and duration is larger between boundary GCMs than between RCMs over East Asia and its coastal subregions. A strong intersimulation relationship between RCMs and GCMs supports that boundary GCMs substantially diversify downscaled RCM projections through different climate sensitivities. Furthermore, the distinct subregional responses in future monsoon area and duration emphasize the importance of fine-resolution projections with appropriate uncertainty measures for better preparing region-specific adaptation plans.


Introduction
Monsoons are a vital component of the global climate system, and understanding and predicting changes in summer monsoon characteristics is critical since half of the annual precipitation occurs within the rainy season (Wang and LinHo 2002, Zhisheng et al 2015, Wang et al 2020. Researchers have conducted numerical model experiments to understand global warming impacts on Asian summer monsoon characteristics. They suggest that summer rainy season length will increase (Kitoh et al 2013, Sun et al 2022 and summer monsoon area will expand (Hsu et al 2012, Kitoh et al 2013, Wang et al 2020 over Asia under global warming scenarios. Changes in rainy season characteristics are closely related to precipitation responses to warming, and thus higher emission scenarios usually lead to stronger increases in monsoon precipitation (Kitoh et al 2013. Despite the differences in global climate models (GCMs), results from the Coupled Model Intercomparison Project Phase 5 and 6 (CMIP5 and CMIP6) multiple GCM simulations agree on the warming-induced intensification of Asian summer monsoon precipitation (e.g. Piao et al 2021).
Multi-GCM simulations enable the quantitative assessment of future projections on subcontinental scales (Giorgi and Raffaele 2022), but the GCMs' coarse resolution limits the subregional or country-based projections. In particular, smaller scale responses are more sensitive to the complex topography and coastlines (e.g. Suzuki-Parker et al 2018, Lee et al 2020, Sørland et al 2021. In this respect, regional climate models (RCMs) have been developed and utilized to downscale GCM outputs for target regions. For East Asia, multiple RCM simulations were performed at 50 km resolution using the Coordinated Regional Domain Experiment (COR-DEX; Giorgi et al 2009) Phase I East Asia domain (e.g. Park et al 2016 and at 12.5 km resolution for a smaller domain of Northeast Asia (e.g. Ahn et al 2016, Lee et al 2017. Overall, these simulations predicted an increase in East Asian summer monsoon rainfall and extremes, depending on the scenarios and the boundary forcing GCMs. RCM simulations under the CORDEX East Asia Phase II framework have recently been completed, which have a finer resolution of 25 km (Gutowski et al 2016) than Phase I. Overall, these simulations show a better performance than the Phase I results (e.g. Sørland et al 2021). Several studies analyzed these simulations to assess RCM performances in tropical cyclones  and East Asian mean climate  as well as future projections in heat stress (Juzbašic et al 2022), Köppen-Trewartha climate-type (Jo et al 2019), potential for solar energy production (Park et al 2022), and mean and extreme ocean wave heights (Li et al 2022).
To understand the likely range of future projections, it is essential to decompose and quantify the sources of uncertainty (e.g. Hawkins and Sutton 2011). Many RCM studies have investigated future projections over East Asia, but accompanying uncertainties in the projections remain unexplored. The limited number of ensembles was one of the major obstacles in representing the categorized sources of uncertainty in future projections. Suzuki-Parker et al (2018) investigated the future projection uncertainties in mean and extreme precipitation around Japan and found comparable contributions from GCMs and RCMs to the projection uncertainty. Kim et al (2020) used an analysis of variance (ANOVA) approach to quantify model uncertainties over Northeast Asia but only for present-day precipitation. Kjellström (2020, 2022) decomposed the sources of precipitation projection uncertainties over Europe and found larger contributions from GCMs than from RCMs to the intersimulation uncertainty. One exception was observed for mountainous areas where the RCM contribution was dominant, with a better representation of orography and associated snow and ice.
Given the lack of uncertainty analyses for East Asian monsoon characteristics, this study aims to identify sources of future projection uncertainty using a multi-GCM/multi-RCM ensemble framework, focusing on summer monsoon duration and area. We start by evaluating the multiple RCMs for their performance in capturing observed summer monsoon characteristics over East Asia and its subregions. Next, intersimulation spreads of future RCM projections are investigated to assess the contribution of GCM differences in comparison with those of RCM differences. This paper is structured as follows.
In section 2, model simulations are described and analysis methods are explained, including the definition of monsoon duration and area and the ANOVA approach used to quantify the sources of uncertainty. Results are provided in section 3 for RCM evaluations and future projections for East Asian summer monsoon characteristics under high-and low-emission scenarios, as well as GCM and RCM contributions to future projection uncertainties. Summary and discussions are given in section 4.

Data
We use 15 multiple RCM simulations obtained from a four GCMs/six RCMs ensemble chain, participating in the CORDEX East Asia Phase II project (table 1). Surface and lateral boundary conditions for historical  and future (2025-2099) periods are obtained from four different GCMs, three from CMIP5 (GFDL-ESM2M, HadGEM2-AO, and MPI-ESM-LR, hereafter GFD, HG2, and MPI, respectively) and one from CMIP6 (UKESM1-0-LL, hereafter UKE). To account for future projection uncertainties from the greenhouse gas emissions pathways, two scenarios based on the representative concentration pathways (RCPs) and the shared socioeconomic pathways (SSPs; O'Neill et al 2016) are considered: low emission (LE; RCP2.6 for CMIP5 and SSP1-2.6 for CMIP6, respectively) and high emission (HE; RCP8.5 for CMIP5 and SSP5-8.5 for CMIP6, respectively). Different physics and dynamics in the RCMs can lead to broad ranges in downscaled results even under an identical driving GCM. Details of the RCM configuration are provided in table S1. We interpolate all model data into the 0.25 • × 0.25 • grids of the Asian Precipitation-Highly-Resolved Observation Data Integration Towards Evaluation (APH-RODITE) observations (Yatagai et al 2012), which are used to evaluate the performance of RCMs. The land grids are only considered according to the APHROD-ITE land mask. A comparison of RCMs' land masks with the observations indicates negligible differences (not shown). We obtain multimodel ensemble means (hereafter referred to as MME) from all 15 RCM simulations with no weighting applied to the RCMs or GCMs (table 1).

Summer monsoon duration and area
Our study focuses on the spatial (area) and temporal (duration) characteristics of the summer monsoon over East Asia, which are defined based on daily precipitation following previous studies (Wang and LinHo 2002, IPCC AR5 2013, Kitoh et al 2013. First, we exclude 29 February in all our analyses and then calculate the daily climatological precipitation for 365 (or 360) days from 1 January to 31 (or 26) December, considering the different calendar types in the RCM simulations (table 1). For instance, HG2 and UKE use a 360 d calendar. We then obtain daily precipitation anomalies relative to the climatology during January, which corresponds to the driest month in East Asia. To exclude the submonthly variability from these daily anomalies, we apply a harmonic analysis and use the first 12 harmonics only (see examples in figure S1). Finally, we obtain the onset and the retreat date of the summer monsoon when the smoothed precipitation anomaly firstly exceeds the 5 mm d −1 threshold and drops below 5 mm d −1 , respectively (Wang and LinHo 2002). By definition, the summer monsoon duration is the number of days between onset and retreat, and the summer monsoon area is computed as the sum of areas where the monsoon duration occurs. Our analysis ignores grids where monsoon duration occurs less than 5 d, used as the minimum threshold for the monsoon rainy season characteristics. When testing the sensitivity to the minimum 10 d duration threshold (cf Ha et al 2020), the main findings remain unaffected (not shown).
When assessing future changes in monsoon duration, we consider the common grids that belong to the monsoon area in the current and future periods. Five subregions are defined to explore the detailed behaviors of East Asian summer monsoon changes: southern, eastern, and northern China (SCN, ECN, and NCN), Korea (KOR), and Japan (JPN). The sum of these five domains is defined as East Asia (EAS). The observed monsoon characteristics including onset, retreat, duration, and area for EAS and its subregions are provided in table S2.

Uncertainty quantification by ANOVA
Following the approach used by Suzuki-Parker et al (2018) and Kim et al (2020), we apply an ANOVA to our 15 simulations to identify sources of uncertainty for historical and future simulations. Since ANOVA was originally designed for a fully filled matrix, we use a revised ANOVA, which is applicable for a partially filled matrix like our GCM × RCM samples (table 1). We consider two variabilities (sum of squares (SS)) due to different RCMs (SS RCM ) and boundary GCMs (SS GCM ). Scenario uncertainties are not considered here for uncertainties, and results are, rather, compared between scenarios to assess global warming mitigation influences (see below). Thus, our analysis corresponds to a two-way ANOVA (GCM and RCM). Our samples' grand mean (Y * * ) and total sum of squares (SS T ) can be expressed as equations (1) and (2) From left to the right, SS T can be decomposed by the GCM effect (SS GCM ), RCM effect (SS RCM ), and their interaction (SS INT ), respectively. Although the number of RCM simulations forced by a certain GCM (n i , e.g. GFD: two, and UKE: five) or the number of RCM types joined for the simulation matrix (n j , e.g. GRIMs: one, and CCLM: three) is unequal across models (table 1), the grand mean, SS T , SS GCM , and SS RCM are arithmetically resolvable even in the condition of an incomplete matrix. The computational issue from this imperfect condition of samples would occur in calculating the interaction term, SS INT . Rather than filling these missing values based on statistical assumptions Kjellström 2020, 2022), we estimate the last term as residuals (equation 3), considering that uncertainties due to the GCM-RCM interactions (SS INT ) are generally much smaller than GCM and RCM effects (Suzuki-Parker et al 2018, Kjellstrom 2020, Kim et al 2020). Then, equation (2) can be expressed as equation (4): To supplement our findings, we also compare our two-way ANOVA results with those from using the fully filled matrix. Following previous studies Kjellström 2020, 2022), we have estimated nine missing elements of our GCM-RCM matrix as the sum of expected values of corresponding GCMs (Y i * ) and RCMs (Y * j ) after subtracting the grand mean (Y * * ) as Y ij = Y i * + Y * j − Y * * , and then implemented two-way ANOVA on the full GCM-RCM members. Concisely, our main findings remain unaffected by the number of samples or the method details, supporting our results based on the revised two-way ANOVA.

Model evaluations
The spatial patterns of the boreal summer (June-July-August (JJA)) mean precipitation, summer monsoon duration, and area for the current period are displayed in figure 1. Note that we only use JJA to evaluate spatial patterns and subregional averages of boreal summer monsoon precipitation over East Asia. When examining monsoon duration and area, we consider all months covering May-September (table S2). Compared to the observations (figures 1(a) and (d)), RCM_MME (figures 1(c) and (f)) shows a similar magnitude of mean precipitation, rainy season length, and coverage over ECN and NCN. However, it overestimates the precipitation amount and monsoon duration over SCN and southeastern Japan (figure S1(c)) and underestimates them over KOR and southern Japan. RCM_MME shows better performance than GCM_MME with reduced wet biases (figures 1(b) and (c)), but bias patterns of RCM_MME are very similar to those of GCM_MME, suggesting the important influence of the GCM boundary conditions. Statistically significant intersimulation relationships are found between JJA mean EAS precipitation amount and monsoon duration and area (figures 1(g) and (h)), and between monsoon duration and area ( figure 1(i)). This relationship implies that the bias in summer mean precipitation is closely related to the performance in spatiotemporal monsoon characteristics and also that these rainy season characteristics are mutually connected. In the observations, area of about 29.0 × 10 5 km 2 experiences a summer monsoon during 56 d on average over EAS. Whereas GCMs tend to overestimate precipitation and summer monsoon duration and area, the majority of RCMs are distributed around the observed values, leading to better performance in RCM_MME but also manifesting large inter-RCM spreads.
GCM and RCM skills for summer precipitation, monsoon duration, and monsoon area are evaluated over EAS and five subregions (figure 2). As shown in figure 1, RCM_MME performs better than GCM_MME but with larger intersimulation spreads. In the observations, the longest monsoon duration (91 d) occurs in JPN, starting from 11 June to 10 September, and the shortest (24 d) is seen in both NCN and ECN, starting from 14 July to 7 August and from 1 to 25 July, respectively (table S2). The monsoon area is the largest in SCN, covering a large portion (45.5%) of EAS, while NCN has the smallest contribution (4.8%). When checking the relationship between JJA precipitation and monsoon area and duration, most of the subregions show strong intersimulation relations (r = 0.51, statistically significant at 5% level). A weaker relation is found for monsoon duration in JPN, and this is because the monsoon onset or retreat dates are positioned far beyond JJA (figure S1). The RCMs show improved performance for summer monsoon characteristics over SCN and JPN compared to the GCMs, which show overestimation and underestimation, respectively. This added value of RCMs in summer monsoon simulations is consistent with the improved monsoon duration over SCN and JPN compared to GCMs (figures 1(e) and (f)). The largest inter-RCM spread in monsoon duration occurs over SCN and ECN, where individual RCMs simulate diverse climatology patterns and only a few RCMs can realistically capture the observations (figure S2).
The intersimulation spreads of the RCMs are much larger than those of the GCMs (figures 1, 2, and S1). As described in section 2.3, the sources of uncertainty are arithmetically decomposed by revised ANOVA (hereafter 4 × 6 RAW) and two-way ANOVA based on the fully filled GCM × RCM matrix (4 × 6 FULL). Figure S3 illustrates the decomposed means grouped by boundary GCMs or RCMs and the portion of uncertainty from GCM, RCM, and their interactions (residuals). Despite the differences in methods and sample sizes, 4 × 6 RAW (15 samples) and 4 × 6 FULL results (24 samples, including nine reconstructed samples) show a larger uncertainty originating from the choice of RCM than GCM, suggesting that GCM biases in climatology can be amplified by dynamical downscaling by different RCMs in the present-day simulations. This supports the conclusion of Kim et al (2020), who evaluated mean precipitation results using a 3 GCM × 3 RCM matrix.

Future projections and uncertainties
Changes in summer monsoon characteristics in the late 21st century  are assessed over EAS and subdomains under HE and LE scenarios relative to the current climate . The RCM_MME results (figure 3) show the future increases in summer mean precipitation (LE: 0.32 mm d −1 , HE: 0.74 mm d −1 ), monsoon duration (LE: 8 d, HE: 20 d), and monsoon area (LE: 4.2 × 10 5 km 2 , HE: 7.7 × 10 5 km 2 ). The LE and HE projections remain comparable during the early (2025-2049) and mid-21st century , implying that global warming mitigation effects (HE minus LE) are only noticeable in the late 21st century. This also indicates the inevitable global warming impact in the nearterm future, despite a sharp reduction in greenhouse gas emissions, consistent with previous studies (Boer andArora 2013, O'Neill et al 2016).
There are large uncertainties in RCM projections with different signs of changes in some cases (figures 3(c), (f), and (i)). The intersimulation spread (1σ, defined as noise, N) is larger than the MME values (defined as signal, S) for monsoon precipitation, duration, and area in most cases, representing that the signal-to-noise ratio (S/N) is less than unity. The S/N   of duration change becomes larger than unity under HE in the late 21st century only while area exhibits significant change (S/N > 1) in both HE (1.24) and LE (1.11). This implies that additional greenhouse gas emissions will shift the probability of monsoon duration and area in most RCM simulations, resulting in a robust temporal and spatial expansion of summer monsoon season. Results from individual RCM runs are provided in figures S4 and S5 for monsoon duration and area, respectively. RCMs forced by UKE generally predict a longer duration and broader area expansion than RCMs forced by MPI, suggesting the important role of GCM boundary forcings.
The relative contribution of GCM and RCM differences to future projection uncertainties are quantified for EAS and five subregions using ANOVA. Figure S6 illustrates the quantified means from different variables of GCM (G1-G4) and RCM (R1-R6) and relative contributions to the total uncertainties of multiple RCM simulations for EAS. It can be seen that the G4 (UKE) group has the largest mean while the G3 (MPI) group has the smallest mean. In contrast, RCM groups show smaller differences, indicating that RCMs project similar changes in EAS mean monsoon duration and area when forced by the same GCM. Accordingly, the critical source of uncertainties is found to be the type of GCM for both changes in monsoon duration and area. The size of GCM contributions is larger for monsoon duration (explaining 63%-83%) than area (explaining 44%-71%), whereas differences between HE and LE results are not large. Figure 4 shows subregional results for the changes in monsoon duration and area, RCM simulation uncertainties (SS T ), and the relative contribution of GCM and RCM (SS GCM and SS RCM ) to the total uncertainties (equation 4). RCM_MME projects the lengthening of the monsoon rainy season and the expansion of the monsoon area in the late 21st century over all five subregions with stronger S in HE than in LE (figures 4(a) and (d)). Warming mitigation (HE minus LE) exerts different regional impacts on monsoon characteristics, having larger changes for China (SCN, ECN, and NCN) than KOR and JPN. However, the S/N is smaller than one in many regions for both duration and area, indicating large uncertainties which become stronger in HE than in LE (figures 4(b) and (e)). Quantified uncertainties show that SS T is largely explained by the sum of SS GCM and SS RCM in most regions, although GCM-RCM interactions remain considerable in some cases (e.g. duration of ECN, NCN, and KOR under LE). The relative contributions of GCM differences to the total projection uncertainties are found to be stronger than RCM contributions, which is more clearly seen in the HE results (figures 4(c) and (f)). This is consistent with the EAS case (figure S6). GCM differences explain more than half of the total projection uncertainties over the coastal subregions (SCN, KOR, and JPN), while the other inland regions (ECN and NCN) exhibit similar or even slightly larger contributions from RCM differences. This contrast suggests that coastal regions tend to be affected more by GCM differences (through prescribed sea surface temperatures) than regions located more inland with complex terrain, where RCM differences in physics and dynamics will exert impacts comparable to GCM differences (Im et al 2008, Li et al 2016, Park et al 2019, Coppola et al 2021. We have further checked the influence of GCM differences on RCM projection spread using intersimulation correlation, following Nishant and Sherwood (2021). Statistically significant intersimulation correlation coefficients are obtained for both monsoon duration (r = 0.81) and area (r = 0.68; figure S7), indicating the important role of GCMs in determining inter-RCM spreads. This supports our ANOVAbased results and also the findings by Nishant and Sherwood (2021), who analyzed mean and extreme precipitation over Australia. Based on the equilibrium climate sensitivity (ECS) values provided by Meehl et al (2020), RCMs forced by a GCM with high sensitivity (UKE, ECS = 5.3) predict more increases in monsoon duration and area than those forced by GCMs with low sensitivity (MPI and GFD, ECS = 3.6 and 2.4, respectively). These are well linked with corresponding global warming projections (GFD: 2.6 • C, HG2: 4.2 • C, MPI: 3.5 • C, and UKE: 6.1 • C, respectively, in the late 21st century) under HE. Considering that the forcing difference between SSP and RCP is unlikely to be large for HE or LE (O'Neill et al 2016, our results highlight that the choice of GCM with different climate sensitivity is more important than the difference between scenarios (Giorgi and Raffaele 2022).

Summary and discussion
This study examines future changes in summer monsoon duration and area over East Asia using 15 RCM simulations participating in CORDEX-East Asia Phase II experiments, where six RCMs are forced by four GCMs in a multi-GCM/multi-RCM framework. Evaluation results of RCM simulations show that, on average, RCMs exhibit better performance in simulating mean summer precipitation and monsoon duration and area over East Asia than GCMs. However, individual RCM simulations show large ranges of biases, which are found to be mainly due to RCM differences rather than GCM differences according to the ANOVA results, consistent with previous studies based on summer precipitation . Future projections based on 15 RCM runs indicate that monsoon duration and area will increase in the future, more strongly under HE scenarios than LE scenarios. The spatiotemporal growth of monsoon precipitation is robust only in the late 21st century (multimodel mean signals being larger than intermodel standard deviation), while near-term and midterm projections remain uncertain. Regional analysis shows that the global warming mitigation effect (HE minus LE) becomes discernible more strongly in China in the late 21st century (SCN, ECN, and NCN). The ANOVA results show that GCM differences majorly explain the RCM projection uncertainty for the coastal subregions (SCN, KOR, and JPN). For the other inland subregions (ECN and NCN), GCMs and RCMs exhibit comparable contributions to the total uncertainties, depending on the variables and scenarios. The results remain similar when repeating ANOVA using a fully filled GCM × RCM matrix. Significant intersimulation correlations between four GCMs and 15 RCM runs also support the important role of GCM differences in shaping RCM projections of monsoon characteristics.
Our results are largely consistent with previous studies that analyzed multi-RCM projection uncertainties in precipitation over Japan (Suzuki-Parker et al 2018) and Europe (Christensen and Kjellström 2020). Our findings illustrate the importance of climate sensitivity in driving GCMs when producing fine-scale outputs with multiple RCMs. As many GCM studies have found in mean precipitation projections , the broader ranges of driving GCMs need to be considered to assess the plausible range of uncertainties in future monsoon behaviors. Further investigation based on an improved GCM × RCM matrix with no missing elements and an increased number of GCMs is warranted to reaffirm our findings.
Nevertheless, caveats remain due to the lack of air-sea interactions (Cha et Ryu et al 2022). In addition, the present study focuses on the monsoon rainy season defined based on precipitation only, but some studies analyzed the summer monsoon season with regard to atmospheric circulations (e.g. Sabeerali et al 2018, Maharana et al 2019. Although their projections remain inconsistent, this indicates the important role of dynamical changes as well as thermodynamic changes (e.g. Kioth 2014, Lee et al 2018). All these factors are known to affect regional precipitation over East Asia and their individual and combined impacts on future East Asian monsoon characteristics need to be examined based on improved RCM simulations.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// esg-dn1.nsc.liu.se/search/cordex/.