Skilful seasonal predictions of global monsoon summer precipitation with DePreSys3

We assess skill of the Met Office’s DePreSys3 prediction system at forecasting summer global monsoon precipitation at the seasonal time scale (2–5 month forecast period). DePreSys3 has significant skill at predicting summer monsoon precipitation (r = 0.68), but the skill varies by region and is higher in the northern (r = 0.68) rather than in the southern hemisphere (r = 0.44). To understand the sources of precipitation forecast skill, we decompose the precipitation into several dynamic and thermodynamic components and assess the skill in predicting each. While dynamical changes of the atmospheric circulation primarily contribute to global monsoon variability, skill at predicting shifts in the atmospheric circulation is relatively low. This lower skill partly relates to DePreSys3’s limited ability to accurately simulate changes in atmospheric circulation patterns in response to sea surface temperature forcing. Skill at predicting the thermodynamic component of precipitation is generally higher than for the dynamic component, but thermodynamic anomalies only contribute a small proportion of the total precipitation variability. Finally, we show that the use of a large ensemble improves skill for predicting monsoon precipitation, but skill does not increase beyond 20 members.


Introduction
Global monsoon precipitation variability has substantial effects on about two thirds of the world's population (Wang and Ding 2006). Therefore, understanding the factors that drive monsoon variability, and its predictability, is societally important. However, simulating and predicting monsoon precipitation is still challenging, with many prediction systems exhibiting moderate-to-no skill on seasonal to multi-annual time scales (Bellucci et al 2013, Saha et al 2016. Recently developed prediction systems have shown substantial skill at predicting tropical precipitation, for both seasonal to multi-annual time scales ( (King et al 2020). However, there are relatively few studies focusing on understanding the predictability of global monsoon precipitation (e.g. Saha et al 2016).
The predictability of tropical precipitation on seasonal time scales relies on the slowly varying lower boundary conditions (Charney and Shukla 1981), and thus is largely dependent on the ability of models to predict anomalous sea surface temperatures (SSTs) and their remote effects on monsoon precipitation. The El Niño Southern Oscillation (ENSO) is well known as a driver of tropical climate variability and is key to predicting monsoon precipitation (Shukla and Paolino 1983, Wang et al 2018, Sohn et al 2019, Dunstone et al 2020. In addition, anomalous North Atlantic and Indian Ocean SSTs also allow prediction of variations in monsoon precipitation (Mohino et al 2016, Wang et al 2018. Therefore, our ability to predict monsoon precipitation could be dependent on SST conditions (e.g. ENSO phase) and on factors that can modulate SST-monsoon teleconnections (e.g. internal climate variability and external forcing) (Annamalai et al 2007, Chen et al 2010, Monerie et al 2018. There has also been little work to understand the processes leading to skill in global monsoon predictions. For instance, precipitation variability is associated with both thermodynamic and dynamic mechanisms (Seager et al 2014). Thermodynamic changes are due to changes in surface temperature and specific humidity and, hence, may be highly predictable on a regional or even global scales. However, dynamic changes are associated with changes in the strength and pattern of the atmospheric circulation, which might lead to a reduced prediction skill. It is not clear which of these mechanisms is the dominant contributor to skill. We fill this gap by decomposing precipitation anomalies following Chadwick et al (2016) and quantifying skill at predicting each component.
Unpredictable noise acts to reduce our ability to predict monsoon precipitation. The large-ensemble approach aims to reduce unpredictable noise and hence elevate prediction skill by focusing on the predictable signal. However, the relative size of large ensemble required for predicting the global monsoon has not been assessed. DePreSys3 is a unique opportunity to analyse skill with a large ensemble, in which 40 members are available. In addition, DePreSys3 allows assessment of prediction skill over a long period  while previous prediction systems cover a shorter period .
This study aims at filling the aforementioned gaps, focusing on causes of skill at predicting global monsoon precipitation at a seasonal time scale. In addition, results help define the necessary ensemble size needed to predict global monsoon precipitation with a single climate model.
We address the following questions: • Can we predict global monsoon precipitation, up to six months ahead? • What are the sources of skill for monsoon precipitation in terms of dynamic and thermodynamic contributions? • What ensemble size do we need to predict monsoon precipitation?
The paper is organised as follows: section 2 describes the simulations and the methodologies used. In section 3 we quantify skill at predicting monsoon precipitation. Sources of skill are shown in sections 4 and 5 provides an estimation of the ensemble sizes required to reduce unpredictable noise. Section 6 concludes.  (Valcke 2013).

Model and methods
Two sets of hindcasts have been performed. The first set is performed by initialising simulations on 1 November each year between 1959 and 2016 (i.e. 58 start dates). In the second set, hindcasts are started from 1 May, each year between 1960 and 2016 (i.e. 57 start dates). Forty ensemble members are generated for both hindcast sets by using different seeds to a stochastic physics scheme (Dunstone et al 2016). Each hindcast is forced by the historical evolution of external forcings (greenhouse gases, aerosols, ozone, solar radiation and volcanoes). After 2005, external forcing is taken from the RCP4.5 scenario, as in the Climate Model Intercomparison Project (CMIP5) protocol (Taylor et al 2012). DePreSys3 is full-field initialized by relaxing a coupled integration of HadGEM3-GC2 towards gridded observations (see Dunstone et al 2016). Three-dimensional ocean temperature and salinity are relaxed toward the Met Office statistical ocean reanalysis Murphy 2007, Smith et al 2015) and sea-ice concentration is relaxed towards HadISST (Hadley Centre Sea Ice and Sea Surface Temperature; Rayner et al 2003), both at one day relaxation time scale. The atmosphere model is initialised from ERA-40, before 1979, and from ERA-Interim afterwards with a six hourly relaxation time scale.
DePreSy3 is the Met Office decadal prediction system, however in this study we focus on the seasonal time scale (2-5 month forecast period). DePreSys3 is based upon the same physical climate model, and is at the same resolution, as the Met Office operational seasonal prediction system (GloSea5; MacLachlan et al 2015) but the longer DePreSys3 hindcasts period (1960-2016 vs 1992-2016 of GloSea5) allows for a more robust evaluation of seasonal hindcast skill over the global monsoon regions.

Observations and reanalysis
Prediction skill is evaluated using observations and reanalysis. For precipitation we use the Global Precipitation Climatology Centre (GPCC) version v7 (Schneider et al 2014). GPCC is available over 1901present on a 0.5 • grid. We also use data from the Climate Research Unit (CRU; Harris et al 2014), version 4.03, which spans 1901-present. For a large range of atmospheric variables we used the data from the National Centers for Environmental Prediction (NCEP) reanalysis (R-1; Kanamitsu et al 2002). NCEP is given on a 2.5 • resolution (144 × 72) with 17 vertical levels. NCEP spans 1948 to present.

Precipitation metrics
Observed and simulated precipitation are first interpolated onto a common 1 • horizontal resolution grid when computing the monsoon precipitation indices. Precipitation is first interpolated to a common 2.5 • resolution prior to assessing skill at each grid point.  figure 1(a) and are named NAM (North America), NAF (North Africa), SAS (South Asia), EAS (East Asia), SAM (South America), SAF (South Africa), and AUS (Australia). Since we seek enough spread for a probabilistic forecast, we remove the first month of each simulation. We assess the ability of DePreSys3 at simulating precipitation over the northern hemisphere using hindcasts initialised in May, focusing on JJAS, i.e. on a 2-5 month forecast period. Over the southern hemisphere, the 2-5 month forecast period is defined using hindcasts initialised in November and focusing on DJFM.
We also assess global monsoon predictability, averaging together precipitation initialised in May for the northern hemisphere precipitation (focusing on JJAS) and initialised in November for the southern hemisphere precipitation (focusing on DJFM). This metric is hereafter called GM_nm.

Bias adjustment
Once initialised from reanalysis, models drift to their preferred (and imperfect) mean climatology. We remove the drift following the procedure described in the World Climate Research Program recommendation (ICPO 2011), as: where Y and X are given for a member i and a start date j for respectively DePreSys3 and the corresponding observations/reanalysis, spanning n start dates and m members. The drift, dr, is only lead-time (τ ) dependent and is assumed to be start independent. Here, we assume that the ICPO method reliably removes drift for a large range of variables and over several regions. Note that the drift correction method does not impact our estimation of the model skill (anomaly correlation coefficient (ACC) values).

Evaluation of the model skill
We evaluate skill using the ACC between the ensemble-mean prediction from DePreSys3 and observations. The statistical significance of the ACC value is assessed by performing a Monte Carlo procedure through resampling (5000 permutations). For a given lead time, we randomly resampled time-series (of 57 and 58 years) using blocks of 5 year periods and filled until the size of the original time-series is reached, to preserve a multi-annual variability. Correlation between DePreSys3 and observed/reanalysed time-series are then computed for each permutation to form a distribution of ACC scores. ACC is then considered significant at p ⩽ 0.05 when values are greater than the 95th percentile of the permutation distribution (i.e. a one-sided test).
In this study the skill is always shown relative to the long-term trend, removing a linear trend for each-grid point and for each monsoon index. Note that removing the linear trend does not dramatically impact the results on prediction skill (not shown).

Decomposition
We decompose precipitation anomalies into terms documenting precipitation anomalies due to thermodynamic and dynamic changes. Held and Soden (2006) assumed that precipitation can be approximated by, where, P is precipitation, M * is a proxy for convective mass-flux from the boundary layer to the free troposphere (with M * = p/q), and q is the near surface specific humidity.
A change in precipitation (∆P = ∆(M * q)) is computed for each month, regarding the 1959-2016 and 1960-2016 mean periods. Anomalies in precipitation are reformulated in terms of thermodynamic (∆P therm ), dynamic (∆P dyn ) and crossnonlinear (∆P cross ) components, following Chadwick et al (2013,2016) as: where ∆P therm is the anomaly in precipitation due to a change in specific humidity, with no change of the  1960-2016 (1959-2016). Stippling indicates that ACC is significantly different to zero according to a Monte-Carlo procedure with 5000 permutations and a 95% confidence level (see text for details). Black contours show the monsoon domains computed from GPCC. (b) Skill at predicting precipitation at the 2-5 month forecast period for different monsoon domains, the global monsoon, and the northern (NH) and southern (SH) hemispheres. Results for JJAS (DJFM) with the simulations initialised in May (November) are shown in red (blue). The combined GM definition (GM_nm) is shown in green. All bars are significant at the 95% level, method as in panel (a). ACC are computed with respect to GPCC and using ensemble-mean values for DePreSys3. atmospheric circulation (constant M * ), ∆P dyn is the anomaly in precipitation due to a change in the atmospheric dynamics, with no change in specific humidity value (q), and ∆P cross is the anomaly in precipitation due to changes in both dynamics and specific humidity.
Further decomposition of ∆P dyn allows to document changes that are due to the strength of the tropical mean circulation (∆P weak ) and to a shift in the pattern of the circulation (∆P shift ), as α is scaled by the strength of the mean tropical circulation. Finally, The decomposition is performed at the monthly time step (as in Chadwick et al 2016, Rowell and Chadwick 2018) prior to computing the seasonal means and the area-weighted averages.

Quantification of the variability
We use a covariance analysis to quantify the part of the precipitation variance that is due to each term. Following Kent et al (2015), the precipitation variance can be written as the sum of the covariance matrix for all components: Here, n = 3 when decomposing precipitation using ∆P therm , ∆P dyn and ∆P cross and n = 4 when the dynamical term is decomposed further (i.e. using ∆P therm , ∆P shift , ∆P weak and ∆P cross ). Subscripts i and j are different precipitation terms. Hereafter, cov denotes the covariance between two terms (∆P i and ∆P j ) and between itself (when i = j). cov (∆P i * ) denotes the sum of the covariances between ∆P i and all terms (including ∆P i itself). It is worth nothing that the thermodynamic component is negatively correlated with ∆P weak due to the fact that both ∆P therm and ∆P weak are associated with changes in tropical SST (Ma et al 2011, Kent et al 2015. Therefore, cov (∆P weak , ∆P therm ) is negative. As a result, the variance explained by a term can exceed 100%. For instance, in DePreSys3, cov ( ∆P dyn * ) is of 110% of the total precipitation variance for several monsoon domains. Figure 1 shows the precipitation skill at a 2-5 month forecast period, for each grid point and when averaged over each monsoon domain. Displayed time series (figure 2) show the ability of DePreSys3 to predict precipitation anomaly magnitude.

Skill at predicting tropical precipitation
DePreSys3 exhibits substantial skill at predicting summer precipitation over the tropics ( figure 1(a)).
Significant ACC values are found over North and South America, the Sahel, southern Africa, northern India and Australia (figures 1(a) and (b), 2(b)-(j)). We have also evaluated ACC using CRU TS observation and find similar results (figure S1 (available online at stacks.iop.org/ERL/16/104035/mmedia)). In terms of monsoon domains, skill is strongest over the NAM and NAF (ACC = 0.61) domains, while skill is moderate, but still significant, over the EAS (ACC = 0.39) and SAS (ACC = 0.37) domains. Skill is also significant over the southern hemisphere with significant ACC values for the SAM (ACC = 0.45), SAF (ACC = 0.45) and AUS (ACC = 0.39) monsoon domains.
When assessed over the relatively long 1950-2016 period, the skill at predicting summer monsoon precipitation is due to DePreSys3's ability to simulate both interannual and multi-annual fluctuations in summer monsoon precipitation. Skill at predicting interannual and multi-annual variations in precipitation is shown for most of the monsoon domains (table 1; figures S2 and S3).
A substantial amount of skill in predicting NAF summer precipitation is due to the ability of DePreSys3 to simulate the multi-annual monsoon precipitation variability (ACC = 0.71; table 1; figure  S2(e)), with the drying trend to the 1980s and the limited precipitation recovery of the 1990s (figures 2(e) and S2(e)). This seesaw in precipitation has been previously associated with Atlantic multidecadal variability (AMV) (Martin and Thorncroft 2014b). Therefore, we attribute a part of the skill in predicting NAF precipitation to be due to the high ACC values of the North Atlantic SSTs (figure S4), as shown in Mohino et al (2016). The skill at predicting NAM interannual variability is high (ACC = 0.71; table 1; figures 2(d) and S3) and is suggested to be due to the ability of DePreSys3 to simulate tropical Pacific SSTs (figure S4), which have strong effects on NAM precipitation (figures S5 and S6). The skill at predicting AUS precipitation mostly arises from the ability of DePreSys3 to simulate interannual precipitation variability (ACC = 0.61; table 1; figures 2(j) and S3) while skill is rather low for the multi-annual variability (ACC = 0.03; figure S2).
Skill at predicting global monsoon precipitation (GM_nm) is high (ACC = 0.68) and the model is also able to capture the large magnitude of anomalies in GM precipitation. We expect common sources of variability across the northern and the southern Hemisphere because of internal variability (e.g. AMV, ENSO, Interdecadal Pacific Oscillation) ( Table 1. Skill (Pearson's correlation between GPCC and DePreSys3) at predicting precipitation time series, for the total (interannual + multi-annual), and interannual and multi-annual variability separately in global and regional monsoon precipitation. The multi-annual evolution is extracted by performing a four year running mean. The interannual precipitation evolution is defined as the deviation of precipitation from the multi-annual component. (One) Two starts are added when the ACC is significantly different to zero according to a Monte-Carlo procedure with 5000 permutations and a (90) 95% confidence level. We expect skill at simulating monsoon precipitation variability to be associated with the ability of DePreSys3 to simulate ocean modes of variability. ENSO is an important source of skill at a seasonal lead time (Robertson et al 2015). We find that DePreSys3 has a high skill over the tropical Pacific Ocean at the 2-5 month forecast period (figure S4), which is consistent with the high skill in predicting interannual summer monsoon precipitation variability (table 1 and figure S3). In addition, low-frequency modes of variability (e.g. AMV and Interdecadal Pacific Oscillation (IPO)) and their respective impacts over land could also contribute to skill at predicting summer monsoon precipitation (table 1)

Explaining summer monsoon precipitation variance
Section 3 did not indicate mechanisms responsible for the skill. Therefore, we further analyse the sources of skill by decomposing precipitation into terms representing the dynamic and thermodynamic contributions to the summer monsoon precipitation variability and by analysing the ability of DePreSys3 to capture the evolution of these precipitation terms. Tropical precipitation can be decomposed into different components, including dynamic (∆P dyn ), thermodynamic (∆P therm ) and cross non-linear (∆P cross ) terms (see section 2.6). Thus, we use this decomposition to explore the reasons for skill. However, we first need to assess the relative importance of each term in explaining the total monsoon precipitation variability before assessing the skill of each precipitation component separately.
We find that precipitation variability is mostly associated with changes in the atmospheric dynamics (r ranging from 0.7 to 0.9 over the monsoon domains between precipitation and ∆P dyn ; not shown). However, the terms are not independent, and we document the importance of each term with a co-variance analysis (see section 2.7). The result is expressed as a percentage of the precipitation variance explained in order to stress the respective importance of each term to summer monsoon precipitation variability (figure 3). For clarity, we only show the covariances that explain most of the summer monsoon precipitation variance. Figure 3(a) shows that, in reanalysis, ∆P dyn is the dominant driver of the precipitation variance. ∆P dyn explains most of the monsoon precipitation variance (cov(∆P dyn * ) explains ∼90% of precipitation variance for all monsoon domains). ∆P shift is the main contributor to ∆P dyn , and is the main source of the summer monsoon precipitation variance ( figure 3(a)). The dominance of ∆P dyn (and of ∆P shift ) shows that monsoon precipitation variability is mostly dominated by changes in atmospheric circulation. However, the relative importance of terms is monsoon domain dependent. The contribution of ∆P dyn to summer monsoon precipitation variance is highest over NAM and SAM monsoon domains and is lowest over the AUS and EAS monsoon domains (figure 3(a); ∼40% in reanalysis). In contrast, the ∆P therm and ∆P cross terms only account for a moderate part (less than 10%) of the summer monsoon precipitation variance ( figure 3(a)).
DePreSys3 summer monsoon precipitation variance is also dominated by the dynamic components, while thermodynamic and cross components only moderately contribute to the precipitation variance ( figure 3(b)). Like in observations, the covariance between ∆P dyn and all terms (i.e. cov(∆P dyn * )) contributes more strongly to the precipitation variance over the NAM and SAF monsoon domains than over the SAS, EAS and AUS monsoon domains. Over the Australian and southern African monsoon domains, the contributions of ∆P weak and ∆P therm are relatively large.

Skilful prediction of the thermodynamic and dynamic terms
We now assess the predictability of the different precipitation components. Figure 4 shows the skill at predicting the different components of precipitation with DePreSys3, and figure 5 shows the skill for the terms averaged over the monsoon domains, both at a 2-5 month forecast period.
The skill at predicting the thermodynamic term (∆P therm ) is high over the tropics, and particularly over the American monsoon domains, South Africa, Indonesia and East Asia ( figure 4(a)). We also find that the skill at predicting ∆P therm is statistically significant for all monsoon domains, and for GM_nm ( figure 5(a)). The significant skill at predicting ∆P therm is consistent with the ability of DePreSys3 to predict surface air temperature and specific humidity (figures S4 and S7). The precipitation variability is strongly associated with changes in atmospheric  1960-2016 (1959-2016). Stippling indicates that ACC is significantly different to zero according to a Monte-Carlo procedure with 5000 permutations and a 95% confidence level. ACC are computed with respect to observations/reanalysis. circulation (figures 3(a) and (b)). However, skill in ∆P dyn is rather low over the tropics ( figure 4(b)), although it is significant when averaged over most of the monsoon domains ( figure 5(b)). The lowest skill is evident at the grid-box scale for ∆P dyn , but recovers to near the level of ∆P therm for the area-average indices, and skill in ∆P dyn is event larger than skill in ∆P therm for NAF. We hypothesize that the low skill at the grid-box scale in ∆P dyn may be due to noise in the verifying data.
The skill at predicting ∆P shift and ∆P cross is relatively low when assessed for each grid point (figures 4(c) and (e)). Nevertheless, there is significant skill in ∆P cross when averaged over the monsoon domains ( figure 5(c)). However, the skill at predicting precipitation associated with shifts of the circulation (∆P shift ) is the lowest, which highlights deficiencies in predicting atmospheric circulation variability ( figure 5(e)). The skill at predicting ∆P weak is high and is the same for each grid point for a given hemisphere (figures 4(d) and 5(d)), because it is a tropicalmean quantity.
It is unclear whether the low skill is due to DePreSys3's inability to simulate large-scale or regional changes in atmospheric circulation. Wang et al (2018) proposed a NH monsoon circulation index, which is significantly positively correlated with global monsoon precipitation. This index is defined by computing changes in zonal wind shear, between the 850 westerlies and the 200 hPa easterlies, and averaged over a large area (between 0 • -20 • N and 120 • W-90 • E). In DePreSys3, the skill is high for the wind shear averaged over the tropics, in both JJAS (ACC = 0.7) and DJFM (ACC = 0.8) (significant at the 95% confidence level) (figure S9). Therefore, we conclude that DePreSys3 can predict the important large-scale atmospheric dynamics associated with the northern hemisphere summer monsoon. Hence, the relatively low ACC values of ∆P shift are likely to be due to errors in simulating regional-scale atmospheric circulation.
The skill at predicting summer monsoon precipitation is not solely due to ability of DePreSys3 to predict ∆P dyn . This suggests that, even if accounting for a relatively small proportion of the explained precipitation variance, ∆P therm and ∆P cross could be helpful for predicting precipitation. For instance, DePreSys3 has skill at predicting EAS and AUS precipitation ( figure 1(b)), that cannot be attributed to a prediction of ∆P dyn (figure 5(b)) or ∆P shift (figure 5(e)). However, we show in figures 3(a) and (b) that the contribution of ∆P dyn and ∆P shift is anomalously low for the EAS and AUS precipitation variability, compared to the other monsoon domains, and the Filled bars indicate that the ACC is significantly different to zero according to a Monte-Carlo procedure with 5000 permutations and a 95% confidence level. ACC are computed with respect to observations/reanalysis. relative importance of the other pairs of covariances could be of importance, through for instance feedbacks between precipitation drivers.
The relatively low skill at predicting ∆P shift shows that simulating changes in atmospheric circulation is challenging. The low skill at predicting ∆P shift could be due to the inability of DePreSys3 to simulate remote effects of SSTs on precipitation. For instance, we assess the ability of DePreSys3 to simulate effects of SSTs on shifts in circulations on SAS and AUS monsoon precipitation (figure S8) and we note strong differences between observation and DePreSys3. Therefore, we suggest the low skill in SAS and AUS monsoon precipitation to be partly due to errors in simulating teleconnections between the tropical Pacific SSTs and the monsoons.

Ensemble size
We assume that a substantial proportion of summer monsoon precipitation variability might be unpredictable. Therefore, low skill at predicting summer monsoon precipitation and ∆P shift might be due to unpredictable variability rather than model errors. However, it is well known that increasing the ensemble size will reduce stochastic and unpredictable noise and, hence, increase skill, as in Scaife and Smith (2018) for predicting the North Atlantic Oscillation. Over the tropics the minimum number of members needed to extract substantial skill at predicting summer monsoon precipitation on seasonal time scales has not been assessed so far, in a unique explicit assessment for all monsoons and using a large ensemble. We fill this gap using the large DePreSys3 ensemble size. To test this, we have resampled the hindcast dataset to create new synthetic timeseries, randomly selecting m ensemble members for each start date. The ensemble-mean of each ensemble of m-members is performed before to compute the ACC relative to the observed timeseries. m vary from 1 to 40 and 50 000 permutations are used. We have defined the result as significant when at least 95% Figure 6. Skill at predicting summer monsoon precipitation, depending on the number of ensemble members (solid lines) (ranging from 1 to 40 members). The new ensemble members are computed by randomly re-sampling the data set and compared to observations (solid lines). A total of 50 000 permutations have been performed to re sample the data. Bold lines indicate that at least 95% of the 50 000 ensemble members show significant correlations (as defined here with a Student's t-test at the 95% confidence level). We show the median of the 50 000 correlations. ACC are computed with respect to GPCC. of the 50 000 ensemble-means produce significant skill (as defined here with a Student's t-test at the 95% confidence level) at predicting summer monsoon precipitation. Figure 6 indicates that, as expected, skill increases with m, a larger ensemble allowing for better skill.
For predicting global monsoon and northern and southern Hemisphere summer monsoon precipitation we only need a limited ensemble size and skill converges with ensembles of ∼6-10 members (figures 6(a)-(d)). A limited number of members are also needed for most of the monsoon domains, as seen for the NAM, NAF and SAM monsoon domains (figures 6(e), (f) and (i)). However, an ensemble of 20 members or more, could be needed for predicting EAS, SAS and AUS summer monsoon precipitation (figures 6(h), (j) and (k)). This raises the importance of using relatively large ensembles over some monsoon domains.

Discussion and conclusions
We find that DePreSys3 (Dunstone et al 2016) has significant skill at predicting summer monsoon precipitation on a 2-5 month forecast period, using predictions initialised annually over 1959-2016. However, the skill depends on the specific monsoon domains being considered. The highest skill is found for the NAM and NAF monsoon domains (ACC = 0.61), while the lowest skill is found for the SAS monsoon domain (ACC = 0.37). Skill at predicting interannual monsoon variability is high (ACC = 0.72) for global monsoon precipitation and is associated with the high skill of the tropical Pacific SSTs ( figure S4). However, there is significant skill on multi-annual time scales too, but the skill is monsoon domain dependent; low over the Australian monsoon domain (ACC = 0.03) and high over the NAF monsoon domain (ACC = 0.71). Although skill at predicting monsoon precipitation was shown to be strongly model dependent (Rodrigues et al 2014), DePreSys3 has comparable skill to other prediction systems for global monsoon precipitation (Saha et al 2016) and the individual monsoon domains. However, results of this study are not directly comparable to other prediction systems because of differences in ensemble size and length of the hindcast period covered.
We have assessed whether predictability in monsoon rainfall arises from thermodynamically (∆P therm ) and dynamically (∆P dyn ) driven components by using the decomposition method of Chadwick et al (2016). Significant skill is obtained for both ∆P therm and ∆P dyn , although ∆P dyn generally contributes less skill than ∆P therm . Overall, we show that the interannual variability of monsoon precipitation is primarily due to shifts of the atmospheric circulation (∆P shift ), which is not well captured in predictions. Hence, it is critical to improve predictions of ∆P shift to improve the skill of monsoon precipitation prediction. However, we acknowledge that prediction skill could depend on the observations/reanalysis that are used for verification, and that the prediction skill of ∆P shift is expected to be more uncertain that the prediction skill of ∆P therm .
We find that deficiencies in the predictions of monsoon precipitation are largely explained by DePreSys3's inability to simulate ∆P shift . More specifically, we show that skill is low at capturing ∆P shift over the Australian and South Asian monsoon domains, because of errors in the simulated teleconnections between Pacific and Indian SSTs and land monsoon precipitation. However, ∆P shift only explains a moderate part of the precipitation variance of AUS and SAS precipitation, suggesting that the improvement of the skill in these regions could be limited even if predictions of ∆P shift were improved. Therefore, additional efforts should be devoted to understanding feedbacks between the different precipitation terms and their biases in DePreSys3, especially over South Asia, Australia and Indonesia.
Another way to increase skill at predicting precipitation is to reduce the unpredictable noise by increasing the ensemble size of the predictions. For most of the monsoon domains, and for global monsoon precipitation, we show that an ensemble of 5-10 members is necessary to extract significant skill for prediction. However, at least 20 members are necessary to get useful predictions of precipitation over southern Africa and over the Australian monsoon domain. Increasing the number of members beyond this offers no significant increase in skill for monsoon forecasts on this time scale.
In Summary, our results suggest that improving our ability to simulate shifts of the circulation would lead to an important improvement of the predictive skill of summer monsoon precipitation. This improvement could result from detailed assessment of the role of systematic biases in the mean-state simulation of SST and monsoon circulation on ENSOmonsoon teleconnections (e.g. Turner et al 2005) and prediction skill (e.g. Lee et al 2010). Nevertheless, we expect model error and prediction skill to be modeldependent and, hence, analysing skill of several prediction systems will be important to further define the sources of skill. In addition, a multi-model combination could further improve skill because of differences in structural models biases (e.g. Dunstone et al 2020).

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.
for Service Partnership (CSSP) China as part of the Newton Fund; P A M and J R were also part-funded via the partnership via the DOVE project. We thank Doug Smith, Leon Hermanson and the Met Office for providing the DePreSys3 output. We also thank the two anonymous reviewers for their comments and suggestions.