Forecasting South China Sea Monsoon Onset Using Insight From Theory

Monsoon onset over the South China Sea occurs in April–May, marking the start of the wet season over East Asia. Skillful prediction of onset timing remains an open challenge. Recently, theoretical studies using idealized models have revealed feedbacks at work during the seasonal transitions of the Hadley cells and have shown that these are relevant to monsoon onset over Asia. Here, I hypothesize that monsoon onset occurs earlier in years when the atmosphere over the South China Sea is already in a state where these feedbacks are more easily triggered. I find that local anomalies in lower‐level moist static energy in the preceding January–March are well correlated with South China Sea Monsoon onset timing. This relationship remains relatively consistent on decadal timescales, while correlations with other teleconnections vary, and is used to develop a simple forecast model for onset timing that shows skill competitive with that of more complex models.

SCSM onset timing shows both interannual variability and slower, interdecadal trends Kajikawa & Wang, 2012; Figure 1). The El Niño Southern Oscillation (ENSO) has been identified as one clear source of interannual variability, with La Niña (El Niño) events linked to early (late) monsoon onset via their influence on the Western North Pacific subtropical high (Zhou & Chan, 2007). However, the strength of the relationship between the SCSM and ENSO varies dramatically on decadal timescales, likely influenced by the the Pacific Decadal Oscillation (PDO;  and Atlantic Multidecadal Oscillation (AMO; Fan et al., 2018). In addition to ENSO, SCSM onset variability has been related to a range of factors in the preceding winter and spring: thermal and mechanical forcing over the Tibetan Plateau (G. Wu & Zhang, 1998); temperature contrasts between the South China Sea and Western North Pacific and the land surface temperature to the north (P. Liu et al., 2009); and the cross-equatorial flow over the South China Sea (Hu et al., 2018;Lin et al., 2017).
Physical-Empirical models have been developed to predict SCSM onset timing and intensity based on correlations with sea surface temperature (SST), sea level pressure and temperature tendency anomalies in the preceding months. These models can show high forecast correlation skill over the time-periods analyzed (e.g., r = 0.72, Zhu & Li, 2017). Recently, dynamical seasonal forecasting ensembles have also been found to give skillful prediction of the SCSM onset in hindcasts (Fan et al., 2016;Martin et al., 2019). However, the skill of both types of model stems from teleconnections such as ENSO, whose correlation with the SCSM vary decadally.
Here, I try a different approach. First, I identify the processes found to be most important to monsoon onset in idealized modeling studies (e.g., Geen et al., 2018). I then explore whether these insights can help to identify direct, local precursors to the monsoon. My aim is to find common pathways via which multiple teleconnections affect onset timing. In Section 2, I motivate the precursors that I hypothesize to be relevant and detail the datasets used in the paper. In Section 3, I then examine the correlation between SCSM onset and these precursors over different time periods. In Section 4, I develop a simple forecast of SCSM onset and test its predictive skill. Section 5 concludes. GEEN 10.1029/2020GL091444 2 of 10 Figure 1. (a) Onset timing of the SCSM in pentads (5-day means; black circles). The dotted line shows the 11-year rolling mean, which I use to distinguish longer term trends from interannual variability. The overall mean onset timing is pentad 28 (solid line). Green (orange) circles show predicted onset pentads for each year based on MSE over 5°-15°N, 110-125°E (SST over −5°-5°N, 160-210°E). Details of how these forecasts are generated is given in Section 4. (b) Rolling mean skill statistics for the forecasts in (a) to show how skill varies in time. Darker lines show Pearson r, lighter lines show root mean square error, both based on centered 31-year rolling windows. An equivalent plot for CERA-20C data is given in Figure S3.

Hypothesized Precursors
To select potential mechanistic pathways, I apply results from idealized modeling studies that compare the seasonal behavior of the Hadley cell in aquaplanets with Earth's monsoons (Geen et al., 2020;Hill, 2019). One important expectation from this theoretical work is that, if the tropical atmosphere is near convective quasi-equilibrium (CQE) and the influence of extra-tropical eddies on the Hadley cell is weak, then the 0-streamfunction line separating the Hadley cells is colocated with the maximum in subcloud moist static energy (MSE). If this maximum is off the Equator, as is the case during the monsoon season, then the maximum ascent and associated rainfall will lie just equatorward of this (Nie et al., 2010;Privé & Plumb, 2007).
where c p = 1,004.6 J K −1 kg −1 is the heat capacity at constant pressure, T is temperature, L = 2.507 × 10 6 J kg −1 is the latent heat of vapourization of water, q is specific humidity, g = 9.80665 m s −2 is the gravitational constant, and z is geopotential height. Values for constants are those used in the JRA-55 reanalysis.
The connection between the overturning circulation and MSE distribution results in two feedbacks occurring during monsoon onset. First, diabatic heating by the insolation warms the summer hemisphere. In response, the ITCZ shifts into the summer hemisphere and the winter-hemisphere Hadley cell becomes cross-equatorial. This cross-equatorial cell advects cooler, drier air up the MSE gradient, while diabatic processes increase MSE poleward. As a result, the MSE maximum shifts farther poleward and the cell becomes more cross-equatorial. The result is a positive feedback between the circulation and the thermal forcing, so that the convergence zone jumps abruptly into the summer hemisphere . The second feedback relates to the tropical upper-level easterlies generated by a cross-equatorial Hadley cell. These limit the propagation of eddies to lower latitudes. As a result, the cell becomes primarily thermally driven, rather than eddy-driven, and responds strongly to changes in the MSE distribution, strengthening the cell, and so further enhancing the easterlies (Geen et al., 2019;. Although these ideas have been developed in an idealized framework, they appear to apply to both the climatology Geen et al., 2018;Ma et al., 2019;Nie et al., 2010) and interannual variability (Hurley & Boos, 2013) of the Asian monsoons in reanalysis data. In this study, I further hypothesize that in the months prior to monsoon onset, both local and remote influences may cause the atmosphere to be in a state where these feedback cycles will more readily begin, so that onset may then occur earlier in the season. Based on this, I suggest that early SCSM onset will be associated with positive 850-hPa MSE and negative 200-hPa zonal wind speed anomalies in the SCSM region. The present study is motivated by an interest in prediction of climate over China. However, some initial exploration of the correlations with monsoon onset over the Bay of Bengal and India are shown in Figure S1.

Data and Metrics
Results are presented for the JRA-55 reanalysis data set (Japan Meteorological Agency/Japan, 2013; Kobayashi et al., 2015) for years 1958-2019, with SSTs taken from the COBE SST data set (Japan Meteorological Agency, 2006; Japan Meteorological Agency, Ongoing). In addition, results are shown in supporting material ( Figures S2 and S3) for the CERA-20C data set (Laloyaux et al., 2016), confirming that similar relationships are seen over a longer record. Daily mean 850-hPa zonal wind data were used to establish the SCSM onset pentad using the criteria developed by B. Wang et al. (2004). SCSM onset is defined as the first pentad after April 25th (pentad 24) where the average zonal wind speed over 5°-15°N, 110°-120°E, U SCS , is westerly, and where U SCS is positive in at least three of the four subsequent pentads (including the onset pentad) and the accumulative 4-pentad mean of U SCS > 1 m s −1 . The onset pentads identified in the JRA-55 data set are shown by the black circles in Figure 1. These are broadly consistent with dates evaluated in previous studies using the NCEP/NCAR (Kalnay et al., 1996;B. Wang et al., 2004) or ERA-Interim reanalyses (Martin et al., 2019;Uppala et al., 2005). JRA-55 data are presented here due to the long data record and use of 4Dvar data assimilation, but correlations were also checked using the ERA-Interim, NCEP/NCAR and NCEP/DOE-R2 (Kanamitsu et al., 2002) data sets, with similar conclusions obtained overall (not shown).

10.1029/2020GL091444
In this study, I aim to explore precursors for interannual variability, but slower decadal trends are also present in the data. When investigating correlations data were therefore detrended with an 11-year rolling mean, which is illustrated for the onset dates by the dashed line in Figure 1. This ensures that the correlations presented relate to interannual variability, rather than to coincident trends in variables due to, for example, global warming or variations in multi-decadal modes. Note that the initial and final 5 years are detrended using the mean of the initial and final 11 years to allow these to be included. Figure 2 shows correlations between the hypothesized predictors, averaged from January to March, and the SCSM onset pentad, with both detrended with an 11-year rolling mean. A negative correlation indicates that a positive anomaly is associated with earlier monsoon onset. Looking first over the full reanalysis record, I see correlations that are consistent with the hypothesized relationships. Spring 850-hPa MSE over the South China Sea is negatively correlated with SCSM onset timing (Figure 2a), while 200-hPa zonal wind over East Asia is weakly positively correlated with monsoon onset (Figure 2b). MSE anomalies can be expected to relate to anomalies in SST and MSE advection. Over the full record, SCSM onset is correlated with SST over the Philippines to the east (Figure 2c), but the correlation is weaker than that with MSE. This suggests that the MSE pattern is partially, but not completely, related to local SST anomalies. Breaking down MSE into its contributions from internal, latent, and potential energy, the majority of the correlation was found to come from the latent heat (not shown). The seasonal evolution of the MSE correlation with SCSM onset is shown in Figure S1. A dipole is seen in April and May, with enhanced MSE over the SCS and reduced MSE to the south associated with earlier onset. By June onset has occurred in most years, but a significant GEEN 10.1029/2020GL091444 4 of 10 correlation remains to the south indicating that late onset is linked to enhanced MSE over and to the south of Indonesia.

Hypothesis Testing
To investigate whether these relationships are consistent throughout the reanalysis record, I also divided the data into two 31-year sections, an early period spanning 1958-1988 (Figures 2d-2f) and a later period spanning 1989-2019 (Figures 2f-2i). A statistically significant negative correlation between January-March 850-hPa MSE is present over the South China Sea for both of these time periods. In contrast, for upper-level zonal wind, it becomes clear that the correlation is dominated by the later period, with the earlier period showing no statistically significant relationship over East Asia. The mean SCSM onset pentad became earlier in 1994, but I find that dividing the data as 1958-1993 and 1994-2019 gives similar results (not shown).
It has been noted that the ENSO-SCSM relationship appears to have strengthened from the late 1970s onwards (B. Wang, Huang, et al., 2009) and that this provides a strong source of predictability for SCSM onset (Martin et al., 2019). The correlations over the later period are consistent with the ENSO teleconnection influence exerting a strong influence on SCSM onset. The MSE correlation pattern shows a clear East-West asymmetry across the Pacific Basin (Figure 2g), while the zonal wind correlation resembles the upper branch of the Walker cell ( Figure 2h) and the SST correlation shows a clear ENSO pattern (Figure 2i). The lack of correlation of 200-hPa zonal wind with SCSM onset in the earlier period suggests two possibilities. First, upper-level easterly anomalies in the preceding spring may not causally connect to an earlier Hadley cell regime change as hypothesized in Section 2.1, and may instead simply be coincident with SST anomalies that result in warm, humid air converging over the SCS region. Alternatively, these upper-level easterlies may contribute causally to an earlier transition, but may only be steady if SST anomalies are present to support zonal flow anomalies. Systematic model simulations would be needed to distinguish these possibilities. Overall, I conclude that upper-level zonal wind does not provide a steady predictor for SCSM onset.
In contrast, in the earlier period there is no clear connection between ENSO and SCSM onset. Instead, the strongest correlation is found to be with MSE over Australia (Figure 2d). This correlation is not captured by looking purely at SSTs (Figure 2f) and is predominantly due to the latent heat (not shown). Specifically, in this period, higher MSE over Australia from January-March was associated with later onset of the SCSM. The dipole around the Equator in Figure 2d suggests that meridional thermal gradients, and their influence on the Hadley circulation, were more important precursors for monsoon onset in this period, compared with the role of zonal thermal gradients and the Walker circulation in the later period. This is supported by Figure 3  For all panels, data have been detrended relative to an 11-year rolling mean. Schwendike et al. (2014). In the earlier period, Walker circulation anomalies show little correlation with SCSM onset timing, but a weakening of the Northern Hemisphere Hadley cell in January-March is associated with earlier monsoon onset. In the later period, a strong correlation can be seen between both the Walker and Hadley circulations and SCSM onset, reflecting the influence of ENSO on the circulation. Figure 2j shows running correlations with a 31-year mean, indicating how the teleconnections vary in time.
The blue and orange lines show MSE averaged over the boxes shown in panels (a) and (d), while the green line shows the SST averaged over the Niño 4 region (panel (i)). For SCS MSE, a statistically significant negative correlation is found to be present over almost the entire record, although the strength of the correlation does vary in time. The correlation of onset with Australian MSE is in fact stronger than that with SCS MSE, but begins to drop off after 1983, while the relationship with Niño 4 SST strengthens at this point. SCS MSE approximately follows the correlation with both connections but is consistent across the record. Previous studies (Zhu & Li, 2017) identified different precursors to those presented here; I find that rolling correlations of these also show strong interdecadal variations ( Figure S4). Figure S2 confirms that similar behavior is seen in the CERA-20C data set. Here, the correlation does dip below the 95% confidence level, but nonetheless remains more steady than other teleconnections. I note that all correlations are low in the very early years of the data set (prior to ∼1930). This might result from either a lack of predictability or sparse observations in this period; an in depth analysis of the available observations would be needed to explore this issue.

A Simple Forecast
The correlations in Figure 2 suggest that the area mean MSE over 5°-15°N, 110°-125°E could provide a useful predictor of SCSM onset. To assess the predictive skill, I test how well data from previous years can be used to predict the onset timing for the next year. To mimic a plausible operational forecast, an expanding window method is applied to the un-detrended data. The prediction for a given year is estimated by using least squares regression to fit a linear model between the observed onset dates and SCS MSE from all previous years. Each year the window used for generating the forecast expands as new observations are incorporated. This approach was repeated using the SST averaged over the Niño 4 region to confirm the skill is not purely a result of the correlation with ENSO and to provide a model for comparison in periods not covered by previous studies (specifically Martin et al., 2019;Zhu & Li, 2017).
The forecasts produced are shown in Figure 1a from 1968 onwards, providing 10 years of training data for the first forecast plotted. Figure 1b shows rolling skill metrics evaluated with a 31-year window (cf. Figure 2j). Note that skill is expected to be initially low, as the early years are forecast based on a limited amount of data. An equivalent forecast for CERA-20C is given in Figure S3, helping to distinguish whether lower skill results from a lack of training data or reduced correlation with MSE. In general, for both JRA-55 and CERA-20C, the SCS MSE based forecast has a higher correlation with the observed onset pentads and a lower RMSE than the forecast using the Niño 4 SST, although there are short windows where the Niño 4 forecast shows higher correlation with the observed dates. Skill is low for both models prior to 1930, reflecting the weak correlations seen in Figure 2j in this period.
Averaging over a subset of years allows comparison with previous efforts with more complex models. Zhu and Li (2017) applied three predictors to model the SCSM onset dates in the NCEP/DOE-R2 data set, achieving a correlation over their test period, 2005-2014, of 0.72 (RMSE 2.08 pentads). Martin et al. (2019) found that the Met Office GloSea5 ensemble (MacLachlan et al., 2015;Williams & Coauthors, 2015) could predict SCSM onset with a correlation of 0.5 over a study period from 1993 to 2015. Over these periods, the predictions using SCS MSE show correlations with observed onset dates of 0.70 (p = 0.03; RMSE 1.96 pentads) and 0.67 (p = 0.0005; RMSE 2.00 pentads), respectively. The Niño 4 forecasts show lower skill: 0.23 (p = 0.5; RMSE 2.93 pentads) and 0.27 (p = 0.2; RMSE 2.88 pentads). It is worth noting that the correlations and forecast skill vary interdecadally (Figures 1, 2, S2 and S3), and that correlations were relatively strong throughout 1993-2015. The stronger correlation with ENSO in the later period may have given optimistic estimates of forecast skill in previous studies. Where possible, it would be helpful to produce longer hindcasts to give a more complete a picture of model skill.
Repeating the predictions using only MSE averaged in January-February or January, I find the correlations also remain high when only earlier data are used. For the 2005-2014 and 1993-2015 periods, January-February averaged MSE gives correlations of 0.69 (RMSE 1.92 pentads) and 0.65 (RMSE 1.98 pentads), respectively, while January-mean MSE gives a correlations of 0.55 (RMSE 2.13 pentads) and 0.60 (RMSE 2.06 pentads). Using only a single precursor, this simple model is able to give an initial estimate of SCSM onset timing with roughly 3-4 months lead time (given that onset occurs in May on average), and shows correlations competitive with previous models.
Last, I note that the SCS MSE-based forecast in Figure 1 appears reasonably well correlated with the observed variability, but does not capture the extremes in onset timing, which might have the highest impact on agriculture. I find that, for the JRA-55 data, this issue appears to be improved by applying exponential smoothing to account for non-stationary statistics in the forecasting model. For example, with exponential smoothing applied RMSE decreases for both the GloSea5 and Zhu study periods ( Figure S5). However, this improvement was not reproduced when forecasting the longer CERA-20C record. An explanation of the method and figures showing the forecast generated are included in the supporting information (Text S1 and Figure S5) for interested readers.

Discussion
Based on theoretical insights into controls on the meridional overturning circulation that have been developed in aquaplanets, I set out with two hypothesized predictors for SCSM onset: lower-level MSE and upper-level zonal wind in the preceding January-March. While the latter does not correlate well with SCSM onset timing, I find that MSE is a useful predictor of interannual variability in SCSM onset. Although the strength of individual teleconnections to the South China Sea varies in time, looking at local MSE allows us to take a step farther along the mechanistic chain from a remote forcing to a local impact on the monsoon onset. Multiple processes can produce local MSE anomalies, but these anomalies are consistently correlated with SCSM onset timing across the JRA-55 record ( Figure 2) and additionally show relatively steady correlations throughout twentieth century in the CERA-20C data set ( Figure S2). The correlation strength was also tested in other reanalysis data sets (not shown). In ERA-Interim, similar correlation patterns and strengths are seen to those shown in Figure 2. Correlations in NCEP/NCAR and NCEP/DOE-R2 were also found to follow similar patterns, although these are weaker in magnitude. These findings are thus robust across data sets.
Generating a simple linear regression model based on this single predictor, I produced predictions of onset timing from 1993 onwards via an expanding-window approach. I find predictive skill comparable with ensemble forecast results from as early as January. I conclude that local MSE in the months preceding SCSM onset is a useful source of predictability and would merit further exploration both for use in Physical-Empirical forecast models and for guiding development of dynamical forecasting ensembles.
Despite this favorable comparison in skill, onset in several years was poorly predicted. Subseasonal factors such as intraseasonal oscillations (R. Wu, 2010; and tropical cyclones (B. Liu & Zhu, 2020;Mao & Wu, 2008) have been found to trigger onset in individual years, limiting seasonal predictability. Although MSE anomalies do appear to precondition the region for early or late onset, it is therefore also highly important to account for sub-seasonal systems that may cause deviation from the seasonal forecast. Recent results suggest that these problems of seasonal and sub-seasonal prediction may be best approached in parallel: the sub-seasonal character of SCSM onset varies between early and late onsets, so that an improved sub-seasonal prediction of SCSM onset is expected to benefit from the consideration of interannual variability (H. Wang et al., 2018). I also note that although the correlation indicates skill, the predictions in Figure 1 do not capture the extremes in onset well. Exponential smoothing was explored as a method to address this issue (Text S1 and Figure S5).
The present study focuses on the SCSM onset, motivated by an interest in seasonal prediction of Chinese climate. However, it is also interesting to explore if this relationship with MSE applies elsewhere. Figure S1 shows how the correlation with MSE evolves over Spring and Summer for the SCSM, Bay of Bengal monsoon (BOBM), and Indian summer monsoon (ISM Pacific. These indicate that the BOBM onset is strongly related to ENSO, but that delayed onset is also associated with enhanced MSE to the south, physically consistent with the aquaplanet physics. However, I note that in the BOBM region the reversal of the meridional MSE gradient does not appear to be concurrent with onset and that the 850-hPa wind reversal used to define onset is here associated with the development of a shallow, rather than deep, overturning circulation (not shown). MSE reversal occurs later in approximately pentad 28, at which time the cross-equatorial cell deepens. The aquaplanet dynamics may therefore not be a suitable model for BOBM onset.
For the ISM, a weak correlation with MSE over the Tibetan Plateau is seen in January-March, but does not persist into April. A strong, negative correlation over India develops in May, which develops into a dipole in June, similar to that seen for the SCSM (left column). ISM onset occurs in pentad 30 on average, corresponding to the end of May, with a standard deviation of 1.7 pentads. The correlation seen in May might, therefore, indicate some late-stage predictability from MSE or could simply reflect the increase in MSE that occurs during monsoon onset.
This theory-motivated approach has successfully identified a predictor whose correlation with SCSM onset persists across the record and highlights the direct benefits of better understanding the controls on the large-scale circulation. Consistent behavior is observed in other regions but with reduced lead time. Further work is needed to examine where the aquaplanet is an appropriate model for monsoon onset and where MSE might be used to predict regional monsoon onset. However, the second proposed predictor, 200-hPa zonal wind, did not correlate well with monsoon onset. This could relate to a lack of memory in the system for upper-level wind anomalies or could indicate issues with applying ideas from highly idealized models to Earth. The theoretical foundation for understanding the climatological monsoon still shows some key gaps, in particular the role of zonal asymmetries and transient weather systems in the seasonality of the Hadley cells (Geen et al., 2020). The results presented here suggest that bridging these gaps may provide further opportunities for improved seasonal forecasts.

Data Availability Statement
Data sets for this research are available in these in-text data citation references: Japan Meteorological Agency/Japan (2013); Japan Meteorological Agency (Ongoing); ECMWF (2016).