Two deep learning-based bias-correction pathways improve summer precipitation prediction over China

As most global climate models (GCM) suffer from large biases in simulating/predicting summer precipitation over China, it is of great importance to develop suitable bias-correction methods. This study proposes two pathways of bias-correction with deep learning (DL) models incorporated. One is the deterministic pathway (DP), in which the bias correction is directly applied to the precipitation forecasts. The other one, namely the probability pathway (PP), corrects the forecasted precipitation anomalies using a conditional probability method before being added to the observational climatology. These two pathways have been applied to correct the precipitation forecasts based on a GCM prediction system Nanjing University of Information Science and Technology Climate Forecast System version 1.0 (NUIST-CFS1.0). The applications of DL models in the both pathways yield higher resolution of corrected predictions than the uncorrected ones. Both pathways improve summer precipitation predictions at 4-month lead. Moreover, the DP correction shows a better performance in predicting extreme precipitation, while the PP is proficient in correcting the spatial pattern of precipitation anomalies over China. The present results highlight the importance of the application of appropriate correction strategy for different prediction purposes.


Introduction
Most areas of China are exposed to the influence of East Asian summer monsoon (EASM), where the summer precipitation is largely subject to the interannual variations of the EASM and modulated by multi-scale variations ranging from intra-seasonal to interdecadal time scales [1]. In particular, the variations of the Mei-yu rain band extending along the Yangtze River valley of China exert great impacts on the densely populated region. For example, Zhou et al showed that the historic Yangtze flooding in summer 2020 caused 141 death and approximately 11.75 billion US dollars of direct economic losses [2]. Given its wide-spread socio-economic impacts, numerous efforts have been dedicated to improve the summer precipitation forecast by developing/updating dynamical climate model forecast systems and/or statistical methods [3][4][5][6][7][8].
Since the 1990s, dynamical prediction methods based on numerical climate models have become one of the main tools for climate predictions. However, due to the simplification of physical processes, the semi-empirical model of physical parameterization and the low-resolution grid, the predictions of climate models are often vulnerable to systematic errors [9][10][11]. Although many state-of-the-art dynamical models can provide skillful predictions of large-scale features of atmospheric variables [12], they still suffer from a limited capability in predicting summer rainfall [13,14]. Therefore, it is of great importance to reduce the biases of the predictions of the dynamical models in order to improve their prediction skills.
There are two main categories of bias correction: (a) the dynamical correction based on the physical framework of regional climate models [15][16][17][18] and (b) the revision of statistical methods on statistical features [19][20][21]. As the former corrects the predictions using regional climate models, the products are physically explainable. However, such predictions are prone to the selection of the boundary conditions and parameterization schemes and the process involves large consumption of computational resources. The latter is based on the statistical characteristics of the predictions from the global climate models (GCMs). For example, the Quantile Mapping (QM) method is widely used in precipitation bias correction due to its simplicity and non-parametric configuration, which offers an immediate improvement on precipitation forecasts [20,22,23]. However, QM changes the range of variables of the predictions by only mapping for a single point to the observed cumulative distribution function, therefore showing poor continuity for the corrected data. Besides this, it is ineffective for the correction on the phase changes of the climate. Another statistical approach based on the empirical orthogonal function (EOF) analysis can be used to correct the prediction involving phase changes [21,24,25]. However, EOF-based correction is inevitably influenced by the fitting ability of the model in which the predicted principal component is regressed onto the observations. This correction method is also far from satisfactory for predicting variables with strong non-linear characteristics, such as precipitation [26].
Recently, deep learning (DL) methods have made encouraging progresses in atmosphere-ocean sciences [27][28][29][30][31][32][33]. It allows a model to be fed with raw data as predictors without detailed feature extraction and transformation [34]. Numerous studies have applied DL methods to the bias correction of short-lead predictions ranging from weather to sub-seasonal time scales [29,30,[35][36][37] or to the bias correction of climate model's long-term simulation results [31,38]. Fewer DL-based bias-correction models have been built for improving the seasonal climate prediction. Since the mechanisms of summer precipitation in East Asia are very complex, the deterministic approach may not well reproduce the large uncertainty in the precipitation variations. In this study, we propose both deterministic and probabilistic pathways of bias-correction using DL techniques to improve the seasonal precipitation forecasts based on a real-time climate forecast system and compare their results with three widely used bias-correction methods. Unlike previous studies on correcting precipitation bias, in which a single method was employed, our study is the first attempt to explore both deterministic and probabilistic strategies equipped with DL structures in improving summer precipitation bias corrections in East Asia. In addition, the annual cycle encoded in the DL module further improves the correction results.

Data source
The monthly precipitation forecasts being corrected in this study is provided by the global climate prediction system NUIST-CFS1.0 that performs real-time forecasting on monthly basis since 2005 [4,39]. Not only does it show a good performance on tropical climate prediction, but also it is one of the top models for predicting the atmospheric circulation and precipitation in East Asia [40][41][42]. Note that, since daily weather signals are driven by stochastic atmospheric internal processes and cannot be predicted at seasonal timescale, the predicted daily signals on a day cannot be paired with the observed daily data on the same day. Therefore, the daily hindcasts of the climate model are not applicable for supervised learning. Thus, the monthly hindcasts from NUIST-CFS1.0 are used to feed the DL models to provide large-scale physical information (such as surface air temperature, specific humidity, zonal and meridional winds at 500 hPa and 850 hPa). They are produced by nine member forecasts of NUIST-CFS1.0 at 4 month lead (e.g. June value is forecasted from 1 March) spanning from April 1983 to December 2020, with the period of 2010-2020 for testing and the remainder for training and validation. Each single member is treated as one sample to increase the samples of input data.
The gridded monthly precipitation over China (i.e. CN05.1) with a spatial resolution of 0.25 • × 0.25 • covers the period from 1983 to only 2016 [43]. This CN05.1 dataset is used as the training label because of its higher spatial resolution. Another gridded monthly observational precipitation dataset over China, namely Surface dataset, is used for testing due to its better accuracy. The Surface dataset, published by the China Meteorological Administration (CMA) [44], has a spatial resolution of 0.5 • × 0.5 • and is updated up to 2020 In this study, the period of 2010-2020 is used for testing. The details of these datasets are shown in table 1.

Deterministic methods for bias correction
Precipitation is often characterized by a long-tailed distribution due to its intrinsic non-linearity and discontinuity. Such a distribution is susceptible to extreme values, of which the statistical features are difficult to be extracted because of scarcity. In contrast, the precipitation bias from NUIST-CFS1.0 forecasts forms a normal distribution, probably a combined results from systematic and random biases, which is relatively easier to be regressed in statistical models than a long-tailed distribution (figure S1). Therefore, we attempt to directly correct the precipitation bias using atmospheric signals that the NUIST-CFS1.0 forecasts. Given that, a deterministic bias correction pathway (DP) is proposed by using precipitation bias as the correcting target ( figure S2). If features of the bias can be extracted and the statistical relationships can be constructed with the GCMs prediction of large-scale atmospheric variables, it will facilitate the task of bias correction. Guided by this pathway, Auto-Encoder (AE [45]) is introduced to help extract specific features of the bias y, and a residual network (ResNet-18 [46]) is constructed to build the relationship between a specific GCM-predicted atmospheric variable x and the bias features. In the AE, a convolutional network (named the Encoder) extracts features according to the spatial distribution of bias and encodes the bias features as latent vectors, while a transposed convolutional network (named the Decoder) guides the latent vectors back to the original spatial distribution. In order to learn monthly characteristics of the biases, a onehot encoding with 12 month features (i.e. calendar month data, c) is added to the AE. It has been verified that the AE reconstruction data has a high correlation with the label, indicating that the Encoder of AE has better feature extraction ability and the Decoder has better reconstruction ability with the one-hot encoding (figure S3). Moreover, when the latent vector is set to zero and only the calendar month data is input, the reconstructed bias shows an apparent annual cycle (figure S4), indicating that the effect of one-hot encoding is significant. The relationship is then constructed between the latent vector features and the GCM predicted atmospheric variables using ResNet-18 with calendar data encoding. After all models are trained, we can use the GCM prediction in ResNet-18 to produce the latent vectors of bias, reconstruct the bias by using the Decoder of AE, add the GCM's precipitation output, and finally get the bias-corrected precipitation forecasts. The processes are calculated by the following simplified formulae: where D, E, O, and R denotes Decoder, Encoder, Onehot encoding and Resnet-18 process, respectively.

Probabilistic methods for bias correction
The low ability of GCMs to accurately predict summer precipitation anomalies over China may be partly due to errors in parameterization schemes, and not all physical processes affecting summer precipitation anomalies can be accurately reproduced. However, the precipitation anomaly in different months follows a certain probability distribution. Therefore, guided by the large number theorem, a more accurate precipitation distribution can be simulated when multiple predictions are generated using a DL approach to reproduce the probability distribution of precipitation anomalies.
To efficiently address this issue, we introduce the Conditional Variational Auto-Encoder (CVAE [47,48]). In this study, we aim to learn the conditional distribution p (y|x, m) of precipitation anomaly y, with the anomaly of relevant atmospheric variable x predicted by NUIST-CFS 1.0 and the calendar month data m as conditions. A given function p θ (y|x, m) can be used to approximate the conditional distribution by the maximum likelihood method, and a latent variable z is also needed in CVAE to characterize the original distribution y. The formula is given as below: The stochastic gradient variational inference [49] is used to simplify the integration process in equation 4, replacing sampling of the prior distribution p(z|x, m) with that of the posterior distribution p θ (z|y, x, m) which is approximated with the inference model q ϕ (z|y, x, m). In addition, a generative model p φ (y|x, m, z) is used to approximate p θ (y|x, m, z) with the assist from a conditional network to embed the conditions into the model (see Text S1 for detailed derivation). The details of CVAE are referred to the articles of Kingma [47] and Sohn [48].
The inferred networks, generative networks and conditional networks are all used to extract (reconstruct) the features by employing a convolutional (transposed-convolutional) architecture (figure S2 and Text S2 in Supplementary Information). The generative model generated bias-corrected precipitation anomalies for 100 members under conditions provided by ensemble means anomalies of the atmospheric variables based on nine member forecasts of NUIST-CFS1.0 and the calendar month data, being sampled randomly in Gaussian distribution space. The schematic diagram of the two bias-correction pathways is shown in figure 1.

Results
We first assessed the performances of the deterministic and probabilistic pathways of the bias-correction and compared their results with other three widely used statistical or DL methods, namely the QM method [20], the EOF method with 30 leading modes [24] and a Unet model [50] that has the same inputs as the probabilistic pathway (PP)-correction method. Since all the bias-corrections were applied to the summer precipitation forecasts over China based on the NUIST-CFS1.0, we also compared the skills of the original predictions from the NUIST-CFS1.0. In this analysis, the skill of anomaly correction coefficients (ACCs) and root mean square error (RMSEs) are employed as the assessment metrics. The spatial distributions of ACCs and RMSEs, along with the ACCs and RMSEs differences between the corrected and uncorrected predictions, are shown in figure 2. All the skill assessments are calculated based on the evaluation period of 2010-2020. Without the bias-corrections, the NUIST-CFS1.0 shows moderate skills in predicting June-July-August (JJA) precipitation over the middle reaches of the Yangtze River and northern China, where the ACCs exceed the baseline of 0.5 and the RMSEs are confined within 2.0 mm d −1 (figures 2(a6) and (c6)).

The deterministic skill assessment of bias-correction results
To evaluate the improvements of prediction skills of the corrected JJA precipitation, the ACCs and RMSEs of precipitation predictions are calculated and comparisons are made between the bias-corrected and original predictions (figure 2). The interannual variations of the precipitation after the DPcorrection is better predicted in the middle and lower reaches of the Yangtze River, with ACCs exceeding 0.5 in most areas and even above 0.75 in some areas ( figure 2(a1)), and an overall improvement in ACC skills by up to 0.5 can also be found in these areas ( figure 2(b1)). However, poor prediction skills are still found in parts of northeast and northwest of China and the DP-correction shows little improvement and even becomes worse. The precipitation prediction with the PP-correction shows an overall positive ACC skill (figure 2(a2)) and the skill is improved across most of the country compared to the NUIST-CFS 1.0, particularly in southwest of China and southern China, with some places even having an improvement of greater than 0.75 (figure 2(b2)). However, the correcting effect of the PP method over the Yangtze River is not as good as the DP-correction method. Compared with the two DL approaches, the two traditional statistical biascorrection methods show either little improvement (i.e. the QM method; figure 2(b3)) or even worse skills (i.e. the EOF method; figure 2(b4)) over most parts of China. It is worth noting that the Unetcorrection also helps improve the prediction skills in many areas (figure 2(b5)), but the skill improvement is slightly weak compared to the PP-correction. This indicates that our redesigned deterministic and probabilistic correction strategies are more effective.
For all the corrected JJA precipitation anomaly, large RMSEs appear over eastern and southeastern China, which is consistent with the RMSE distribution of the original prediction (figures 2(c1)-(c6)). It is evident that the RMSEs are remarkably reduced in all DL-based bias-correction methods (figures 2(d1), (d2) and (d5)) compared to the two traditional methods (figures 2(d3) and (d4)). As the QM method focuses on the interval distribution of the original data, we also compared the correcting results of the DP-correction method with the QM method for the predicted total precipitation including extreme precipitation events (e.g. the historic Yangtze flooding in 2020; figure S5). The result suggests that, even in the areas where QM has its highest capability, the DPcorrection method outperforms the QM method in predicting the total precipitation.
Finally, the pattern correlation coefficients (PCCs) of JJA precipitation anomalies over China are assessed for each year during 2010-2020 ( figure 3). The radar chart facilitates the assessment of the prediction skill for individual years, with a larger area of blue shading indicating a better performance. It is evident that the DP-correction ( figure 3(a)) and PP-correction ( figure 3(b)) methods show better skills than the other traditional methods (figures 3(c) Figure 1. The schematic diagram of two deep learning-based bias-correction pathways that are used to improve the summer precipitation forecasts over China. In the deterministic pathway, the selected predictions of relevant atmospheric variables are inputted into the Auto-Encoder and the Resnet-18, and the bias in the predicted precipitation (Pr) is used as the target for correction. In the probability pathway, the predicted anomalies of the atmospheric variables are used as inputs of the CVAE, and the Pr anomaly is used as the target for correction. The CVAE can generate predictions for n times by random sampling from a normal distribution, where n is set as 100. The selected predictions of relevant atmospheric variables include specific humidity, zonal and meridional winds at 850 hPa and 500 hPa level (Q850, q4aa00, U850, U500, V850 and V500), and surface air temperature (Ts), respectively. All the input variables contain a region of 0 • -360 • E and 70 • S-70 • N. Further details of the DL models' modules for the deterministic and probabilistic pathways are shown in figure S2. and (d)). In addition, the DP-correction shows a better bias-correction ability for years with high JJA precipitation, especially for the extreme events in 2016 and 2020. The averaged PCC of 2010-2020 with the DP-correction method reaches 0.1394.
Compared to the DP method, the PP method produces relatively higher PCC for almost all years, with an average PCC of 0.233. This suggests its overall advantage in improving the summer precipitation prediction.

The probabilistic skills assessment of bias-correction results
With a probabilistic pathway, it is possible to generate multiple predictions and make probabilistic forecasts from the simulated prior distributions. Among all the bias-correction methods, only the corrected precipitation with the PP-correction method can be used for probabilistic forecasts, while the other methods are not suitable due to reduced distributional variations as the biases are corrected towards the observation. We adopted the Relative Operating Characteristic skill score (ROCSS) as an evaluation , and a positive value indicates the model has predictive skill at the grid mesh. We classify wet and dry events with precipitation anomalies being greater than 66% percentile and less than 33% percentile, respectively. Figure 4 shows the spatial distribution of ROCSS for probabilistic predictions of the JJA precipitation anomalies over China with and without the PP bias-correction. In general, the PPcorrection method shows an improvement of the prediction skills for wet and dry events over most parts of China. In contrast, the NUIST-CFS1.0 only show comparable skills in predicting normal events (i.e. with precipitation anomalies being within 33% and 66% percentile).

Summary and discussions
Based on the hindcasts of NUIST-CFS1.0, this study explores the bias-correction methods for improving summer precipitation prediction over China. We proposed two pathways to improve the prediction capability using DL models by combining the characteristics of precipitation and the prediction results based on one dynamical climate model real time forecast system. The results show that the two DLbased pathways as a benchmark can be successfully applied to correct the summer precipitation bias of the NUIST-CFS1.0 forecasts. They perform better compared to the two traditional bias-correction methods. Considering the enormous impacts of summer precipitation in East Asia, improving the prediction skill may help reduce socio-economic losses. In addition, the two correction pathways are designed based on physical relationships and not purely driven by data. This may provide a good way to improve climate models' predictions of summer precipitation over China.
The results suggest that the two DL-based pathways have their own merits and drawbacks. For example, the bias-corrected variable in the DP method is transformed from total precipitation to the bias in the GCM predictions. We believe it is a simple but effective attempt to change the form of data distribution while retaining the characteristics of the precipitation data, and this approach effectively transforms the correction problem into a problem of matching the bias in the GCM's precipitation prediction with the predicted atmospheric variables. A significant benefit found here is that the DP method can better correct the bias in some extreme events, such as the extreme flooding during summer 2020 in Yangtze River area, probably because each extreme event is treated with a large bias in total precipitation. Since large GCM's biases may frequently occur due to semiempirical physical parameterizations, low resolution, and/or other model systematic errors, it may become easier when the question of predicting extreme events is transformed into the prediction of the large bias in the GCMs' outputs. This is equivalent to increase the sampling of the rare extreme events in the training process of DL model, which may subsequently help improve the prediction of extreme events. However, it must be acknowledged that the model structure (AE-ResNet) may not yet be optimal, and it still has many shortcomings. For instance, the constructed two-step model may lead to an accumulation of errors and thus reduce the bias-correction capability. In addition, how to well separate the biases in climatological mean states and interannual anomalies to further improve the prediction of precipitation anomaly remains to be a challenge. Another big and long-standing challenge is how to insert physical constrains into the DL models to improve the prediction skills and the interpretability of the results? All these issues warrant future studies.
As the PP bias-correction method produces 100member predictions using the GCM-predicted atmospheric variables and calendar month data as conditions, it partly avoids the problem that the DL model relies on the prediction accuracy of the atmospheric variables. Even if the predictions of atmospheric variables has a large bias, the PP can reduce the errors by averaging the results over multiple members. As shown by the present results, the PP is an effective approach to improve the climate model's predictions of summer precipitation. However, it remains a challenge for the correction of extreme events. One important issue is whether the PP bias correction capability might be dependent on the number of ensemble forecasts. To explore this issue, we conducted eight sensitive experiments by increasing the numbers of ensemble member forecasts from 1 to 100 in order to estimate its impacts on prediction skill. Figure S6 shows that spatially averaged ACC skill score of the summer precipitation prediction over China increases with the increasing ensemble members. However, the ACC score becomes nearly saturated with the smallest spread when the number of ensemble members reaches 64. This is probably because the generative model in the PP-correction method cannot generate more than 64 distinctively different member forecasts. The diversity among the ensemble member forecasts may be one of main factors that limit the PP bias-correction ability. In addition, we have also examined the separative roles of the two conditions (i.e. the atmospheric variable inputs and calendar month data conditions) that play in the CVAE model to further explore the interpretability of our DL models. Only when the atmospheric variable inputs are changed, the pattern of the mean and standard deviation of the generated results are similar to the original results, albeit with a weaker intensity ( figure S7). The simple analysis finds that the calendar month data controls the variance and mean values of CVAE-generated precipitation anomalies during 2010-2020, while changes in the atmospheric variable inputs control the spatial distribution of the variance and mean values. This is just a simple attempt for interpretable analysis, and more models designed based on the conditional decoupling are required to better correct the model bias in the future.

Data availability statement
The code of deep learning models used in this study is available on request for anyone who is interested in using it.
The data that support the findings of this study are openly available at the following URL/DOI: https:// icar.nuist.edu.cn/en/111/list.htm.