Downscaling and Projection of Multi-CMIP5 Precipitation Using Machine Learning Methods in the Upper Han River Basin

Downscaling considerably alleviates the drawbacks of regional climate simulation by general circulation models (GCMs). However, little information is available regarding the downscaling using machine learning methods, specifically at hydrological basin scale. *is study developed multiple machine learning (ML) downscaling models, based on a Bayesian model average (BMA), to downscale the precipitation simulation of 8 Coupled Model Intercomparison Project Phase 5 (CMIP5) models using model output statistics (MOS) for the years 1961–2005 in the upper Han River basin. A series of statistical metrics, including Pearson’s correlation coefficient (PCC), root mean squared error (RMSE), and relative bias (Rbias), were used for evaluation and comparative analyses. Moreover, the BMA and the best ML downscaling model were used to downscale precipitation in the 21st century under Representative Concentration Pathway 4.5 (RCP4.5) and RCP8.5 scenarios. *e results show the following: (1)*e performance of the BMA ensemble simulation is clearly better than that of the individual models and the simple mean model ensemble (MME).*e PCC reaches 0.74, and the RMSE is reduced by 28%–60% for all the GCMs and 33% compared to theMME. (2) *e downscaled models greatly improved station simulation performance. Support vector machine for regression (SVR) was superior to multilayer perceptron (MLP) and random forest (RF). *e downscaling results based on the BMA ensemble simulation and SVR models were regarded as the best performing overall (PCC, RMSE, and Rbias were 0.82, 35.07,mm and − 5.45%, respectively). (3) Based on BMA and SVRmodels, the projected precipitations show a weak increasing trend on the whole under RCP4.5 and RCP8.5. Specifically, the average rainfall during the mid(2040–2069) and late (2070–2099) 21st century increased by 3.23% and 1.02%, respectively, compared to the base year (1971–2000) under RCP4.5, while they increased by 4.25% and 8.30% under RCP8.5. Additionally, the magnitude of changes during winter and spring was higher than that during summer and autumn. Furthermore, future work is recommended to study the improvement of downscaling models and the effect of local climate.


Introduction
General circulation models (GCMs) are widely utilized to study regional meteorological and hydrological responses under climatic changes. e low resolution of GCMs hinders their applications at a basin scale; thus, downscaling techniques are vital to obtain data at a local scale. Many studies have addressed this task based on various downscaling techniques in various regions of the globe [1][2][3]. ere are two large groups for downscaling including dynamical downscaling (DD), which uses regional climate models (RCMs) to downscale GCMs, and statistical downscaling (SD), which aims to achieve a statistical or empirical relationship between large-scale atmospheric variables (termed predictors) and regional variables (termed predictands). SD has the advantages of a low computational cost, relatively simple implementation, and reliable accuracy.
us, SD techniques may be convenient if the study mainly concentrates on the downscaling and projection of precipitation at a basin scale. e SD methods are generally subdivided into two categories: perfect prognosis (PP) and model output statistics (MOS) [2]. For PP, the statistical relationship is established between a predictand and the observed largescale predictors, while for MOS, GCM-simulated predictors are exploited to establish the relationship with the observed predictand. Given the availability of a considerable database of historical forecasts and the ability to explicitly account for GCM-inherent bias, MOS has recently gained considerable popularity as an alternative to PP [4,5]. It is possible for MOS to provide more information regarding the dynamic controls of precipitation or temperature, especially combining precipitation (temperature) data with circulation data as predictors [6]. MOS also has a high potential for projections of climate change [7,8]. During recent years, MOS models based on regression have become popular for establishing the relationship between GCM-simulated variables and observed rainfall.
ere are successful studies using MOS downscaling based on regression techniques. For example, Jonathan et al. exploited maximum covariance analysis and principal component regression to downscale MOS in the extratropics and concluded that MOS clearly outperformed PP methods, which exploited observed largescale predictors [9].
However, the relationship between local precipitation and the spatial patterns of GCM hindcasts is often very complex. ere have been several comparative studies that demonstrated the superiority of machine learning (ML) approaches compared to other SD approaches such as statistical downscaling models or conventional linear regression models [10][11][12]. ML approaches have proven to be efficient for modeling highly nonlinear relationships [7]. e artificial neural network (ANN) is competent to be applied for climate downscaling due to its ability to establish the nonlinear relationships between predictors and predictands [13]. Meanwhile, support vector machine for regression (SVR) models have also been successfully used to capture highly nonlinear relationships by applying kernel functions to map the low-dimensional input data to a high-dimensional feature space [10]. Furthermore, another ML method, random forest (RF), has been regarded as a competent and robust algorithm for representing complex relationships because it can implement different types of input variables and operates flexibly.
is method was also successfully applied to precipitation downscaling [12]. However, the applicability of ML methods in MOS downscaling has not yet been widely explored. Sa'adi et al. compared two ML approaches, RF and SVR, to downscale monthly precipitation based on MOS. e results demonstrated that such strategies were capable of properly downscaling GCM precipitation, while the SVR showed a more accurate performance overall [7]. is result inspired the implementation of ML methods to downscale precipitation, based on MOS, in other regional basins.
Some studies have demonstrated that multimodel ensemble projections outperform the "best" single model over the long term, as they were effective to reduce the uncertainties [14,15]. In addition, recent studies have suggested that the GCM is the largest quantified source of uncertainty in the projected effects of climate change on hydrological processes. ere are some proposed ensemble methods that have been proven to alleviate the GCM uncertainty, such as the simple model average (MME), Bayesian model average (BMA) [16,17], and reliability ensemble average (REA) [18]. BMA is especially popular, as it assigns weights to each output model according to ensemble forecasts from different sources into a consensus probability density function (PDF). It is also popular to combine ensemble techniques with downscaling models for regional climate studies. For example, a simple statistical downscaling algorithm and superensemble methods were used to address forecast accuracy and regional predicted monsoon precipitation; it was demonstrated that such strategies outperformed each of the member models and their downscaling counterparts [19]. ANN and SVR were used for PP downscaling, and BMA was further exploited to obtain a multi-GCM ensemble. e corrected ensemble forecasts could represent the historical climate well [20].
However, there are no studies regarding the capacity of the projections for a combination of BMA-based ensembles and ML downscaling strategies based on MOS in the upper Han River. us, the major objectives of this study were to (1) develop MOS-based BMA_ML ensemble downscaling models based on MLP, SVR, and RF methods in the upper Han River basin; (2) study the performance of the BMA method for ensemble model outputs; and (3) evaluate precipitation during the 21st century under RCP4.5 and RCP8.5 in upper Han River basin. Furthermore, based on the study results, a beneficial reference for future long-term management strategies of water resources, disaster mitigation, ecological layout, and other applications at a regional level is expected. e remainder of the study is organized as follows: Section 2 describes the study area topography and the observational station data used. e applied methods are discussed in Section 3, including the data processing methods, assembling ensemble models, and downscaling. In Section 4, the results are discussed in detail. Lastly, the authors conclude and propose several prospects based on this study.

Study Area and Data
e Han River is the largest tributary of the Yangtze River, which covers 159 thousand km 2 . In the region, the main terrain types are mountains and hills. e average observed annual rainfall is approximately 700∼1100 mm and shows an obviously seasonal character. During the summer, heavy rainfall occurs due to the descending cold air from the south. Furthermore, flood and drought disasters in the basin are severe due to natural fluctuations and human intervention. In this study, the main region of the mountain and hill terrain of the upper Han River were selected, which contain 13 meteorological stations. Table 1 shows the position information of each station. e observed daily rainfall data for the years from 1961 to 2005 were utilized to calibrate and validate the downscaled models, and the data in 2006-2018 were used to validate the projected precipitation. ese data were obtained from the China Meteorological Data Website (http://data.cma.cn/). Figure 1 shows the water system and topography of the Han River and station distributions.
For the CMIP5 data, the simulated historical and future precipitation under RCP4.5 and RCP8.5 from the GCMs were chosen for the regional simulation and projection. Eight GCMs were used, as they showed good performance in simulating precipitation in the study area. Table 2 provides a general description of these GCMs and Table 3 introduces RCP4.5 and RCP8.5. e GCM-simulated data were acquired from the Coupled Model Intercomparison Project website (http://cmip-pcmdi.llnl.gov/cmip5/). e resolution of these GCMs is not the same. To obtain consistent resolution of downscaling input, 8 GCMs are interpolated to 1°( longitude) × 1°(latitude), which is about 95 km × 111 km, based on bilinear interpolation. It is important to determine the appropriate climate domain for the reliable downscaling of climatic factors. e considered domain size should be large enough to capture the rainfall mechanism without increasing the cost of CPU [21]. us, in this study, from the author's consideration that the domain should cover the area affected by climatic circulation patterns of precipitation in the study area, a spatial domain consisting of 171 grid points was selected, which covers the region between latitudes 99°E-117°E and longitudes 29°N-37°N ( Figure 2). en, the precipitation datasets of these 171 grid points were chosen to be the predictors.

Methods
is study aimed to develop an MOS-based BMA_ML ensemble for downscaling models based on the MLP, SVR, and RF methods for the upper Han River basin. e main framework is shown in Figure 3.
e methodology of the present study encompasses the following steps: (i) predictor selection, (ii) GCM ensemble generation, (iii) downscaling using machine learning, and (iv) result evaluation.

Predictors Selection.
is section mainly discusses the method for obtaining the selected predictors' data and extracting the specific principal components (PCs) based on principal component analysis (PCA). e aim of PCA is to reduce the number of input variables without losing much information. PCA was first proposed by Hotelling in 1933 [22]. PCA is a multivariate statistical method that uses the central idea of dimensionality reduction to transform multiple indicators into several comprehensive indicators on the premise of reduced information loss. In statistics, PCA is an analysis for simplifying data. e main steps of PCA are as follows: (i) Computing the covariance matrix of sample data (ii) Calculating the eigenvalues and eigenvectors of this covariance matrix (iii) Calculating the cumulative contribution rate e detailed equation can be seen in [22]. In this study, the data were standardized to eliminate the influence of single-sample data before PCA.
e PCs of 171 GCM precipitation variables were calculated based on PCA, and the first few PCs were selected as predictors when the cumulative contribution rate was greater than 95% among all the PCs. e new variables express nearly the entirety of the raw variables and are regarded as predictors. ese PCs are also used as the inputs of the ML downscaling models.

GCM Ensemble Generation.
In this study, MME and BMA were used to generate a GCM ensemble. e MME only exploits the arithmetic mean of 8 precipitation models to obtain an ensemble simulation. As a statistical postprocessing method, BMA combines multiple models. It has been recently applied to correct underdispersion in ensemble prediction and alleviate model uncertainty based on posterior probability [23]. e predictive PDF of the BMA for future variables is based on a weighted average, which centres on the individual downscaling forecasts [24]. In this study, the precipitation values from 8 CMIP5 models were used to obtain the ensemble output. Assuming that y is the target variable and that D is a given model's data, the combined forecast PDF of y is where M � M 1 , M 2 , . . . , M K are alternative CMIP5 models. K is equal to 8 in the current study. P(y | M k , D) is the predictive PDF given for model M k alone; k. P(M k | D) denotes the posterior probability of model M k , which is represented as where P(D | M i ) is the marginal likelihood, expressed as where θ i is regarded as the parameter of model M i , with P(θ i | M i ) denoting the prior probability for θ i . e respective weight for the 8 CMIP5 models was determined based on BMA. en, the ensemble result was obtained based on these weights and models. Importantly, the posterior distribution of the multimodel was acquired based on a Markov chain Monte Carlo (MCMC) simulation algorithm [25]. e BMA procedure was run in R3.5.1, using the "BMS" package.

Machine Learning Downscaling.
e ML methods applied in this study include MLP, SVR, and RF, which were used to downscale individual GCMs, MME, and BMA. e following subsections introduce these methods.

Multilayer Perceptron.
As the classical type of neural network, MLP was used in this study [26]. e backpropagation (BP) algorithm was selected to train the MLP network [27]. e MLP consists of one input layer, one or more hidden layers, and one output layer. Figure 4 depicts a typical three-layer MLP network. In addition, equation (4) represents the output value of a three-layer MLP network [28]: where w ji are the weights in the hidden layer connecting the i-th neuron in the input layer and j-th neuron in the hidden layer, w jo is the bias for the j-th hidden neuron, f h is the   Radiative forcing is stable at 4.5 W/m 2 (∼650 ppm CO 2 eq) by 2100 RCP8. 5 Radiative forcing increase to 8.5 W/m 2 (∼1370 ppm CO 2 eq) after 2100 activation function of the hidden neuron, w ki is the weight between the j-th neuron in the hidden layer and k-th neuron in the output layer, w ko is the bias for k-th output neuron, and f o is the activation function for the output neuron. e main steps of BP algorithm are as follows: (i) Forward propagation: it includes the calculation of the sums of the input neurons, followed by the use of an activation function and the calculation of output neurons. (ii) Backpropagation: it includes the calculation of derivative of loss function, backpropagation of error signal through the network, and weight adjustment to minimize the overall error. (iii) Iteration of the above processes until overall error is satisfactorily small.
For the MLP network, the choice of hyperparameters is essential, such as the structure of the network, activation function, learning algorithm, and learning rate scheduling type and parameters [29]. In this study, the authors used Bayesian hyperparameter optimization to choose MLP hyperparameters. More details are provided in Section 4.2 and Section 1 of the supplementary materials.

Support Vector
Machine. SVM is another machine learning method based on VC theory and the rule of structural risk minimization [30]. SVR can solve nonlinear regression problems by applying kernel functions to map the low-dimensional input data to a high-dimensional feature space. SVR methods have been successfully applied in precipitation downscaling based on MOS [8]. ere are no documented MOS-based BMA_SVR downscaling applications. Assuming a training sample D � {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n )} x i ∈ R n , y i ∈ R, a linear model is represented as

Advances in Meteorology
where w and b denote the support vector and the constant, respectively. Generally, w should be as small as possible. T represents the transpose of matrix. e expectation is to yield an f (x) that is as close as possible to y. SVR assumes that it can tolerate the most deviations between f (x) and y. We define it as ε. Furthermore, a slack variable should be added to the model to overcome overfitting. us, an optimal hyperplane is selected based on In equation (6), ξ i and ξ i denote slack variables, and C is the penalty factor, determining the tradeoff between the minimal loss and maximal margins. As an optimization problem, equation (6) can be solved based on Lagrange's dual formulas. e solutions to equation (6) can be represented as where α i and α i are Lagrange multipliers. So far, equation (7) is the linear model. e kernel functions are exploited to achieve nonlinear regression.
where K denotes the exploited kernel function. Regarding SVR, the general kernel functions include linear, polynomial, and Gaussian kernels. ere are also some hyperparameters in SVR such as C, kernel function, as previously mentioned above, and epsilon, which represents the tolerance of the termination criterion. e choice of these hyperparameters plays a great role in an SVR model. In this study, the authors used Bayesian hyperparameter optimization to choose SVR hyperparameters. More details are provided in Section 4.2 and Section 3 of the supplementary materials.

Random
Forest. RF was proposed by Breiman [31] as a novel machine learning algorithm composed of multiple classification and regression decision trees (CART) that may avoid overfitting and can regulate different types of input variables. For more details on CART analysis, refer to Breiman et al. [32]. RF can generate multiple independent trees and then make a final decision, which is determined by its characteristic of nonparametric statistical regression and randomness. Accordingly, the decision-making ability of the RF model depends on each CART. Each decision tree consists of a root node, subnode, and leaf node. Each subnode contains a judgement rule, and a leaf node corresponds to a judgement level.
e RF model steps are approximately divided into the following: (i) First, in this study, a training set is extracted from the original dataset using a bootstrapping method. Each training set is approximately 2/3 of the original dataset. e other 1/3 of the dataset is described as out-of-bag (OOB). A total of n rounds of extraction are carried out such that n training sets are formed. ese training sets are independent of each other, but the elements can be duplicated.
(ii) en, one training set is used to build a regression decision tree. Many decision trees form a forest. e principle adopted is the minimum mean square error. (iii) In the last step, based on the prediction results of every tree, the result of the forest is the average of the predicted values of all trees.
Using OOB, the RF can internally conduct cross validation. In this study, the authors exploited the OOB error (E OOB ) to estimate the internal error: Regarding RF, the number of trees in the forest and the maximum depth of trees are the main RF hyperparameters. e authors also used Bayesian hyperparameter optimization to choose RF hyperparameters. More details are provided in Sections 4.2 and 2 of the supplementary materials.

Evaluation of Downscaling Models.
To evaluate the performance of the established model, the following statistical metrics were used for the verification period: (i) Pearson's correlation coefficient (PCC) evaluates the linear correlation between modeled and observed precipitation series; a PCC equal to 0 denotes no correlation while 1 represents complete correlation. (ii) Root mean squared error (RMSE) represents errors between modeled and observed precipitation series; the smaller the RMSE, the better the results. (iii) Relative bias (Rbias) evaluates the relative deviation between simulated and observed data; the smaller the Rbias, the smaller the relative deviation. Table 4 shows the detailed equations and description of these statistical metrics.

Ensemble Downscaling Simulation of BMA and MME.
First, the authors evaluated the precipitation simulation using individual GCMs. Based on the interpolated grid data of the CMIP5 models, the 16 nearest grid points to each station were used to simply downscale. An average value of the 16 nearest grid points to each station was used to represent the GCM precipitation simulation for each station. Tables 4-6 provide the monthly simulation performance for each station and the regional mean. erein, the mean value of the whole region was acquired based on the arithmetic mean of the 13 stations. Regarding BMA, the selected period was from 1993 to 2005. Each model has a certain reliability to simulate real rainfall, though there are great differences, as shown in the tables. Clearly, ACCESS1.0, GFDL-ESM2M, and GISS-E2-R outperformed the other models in this region, as the PCC reaches 0.64-0.68, and the RMSE is approximately 57-82 mm. e good PCCs are paired with poor RMSEs and Rbias in some cases. Furthermore, the differences for each station are apparent. e simulation performance results of all the models for stations 5 and 13 are relatively poor. ese results may be due to a local microclimate as the GCMs cannot consider regional simulation.
is theme is worthy of further study. As described in the Methods, BMA was applied for ensemble simulation of 8 CMIP5 models, and MME was compared to BMA. MME exploits the arithmetic mean of precipitation of 8 models to yield an ensemble simulation. Figure 5 shows the weighted value of the 8 models. Tables 5-7 also provide the performance results of the MME and BMA. e region mean presents the arithmetic mean precipitation of 13 stations.
e MME provides an improvement compared to all single models. e PCC is improved from 0.50-0.68 to 0.76, and the RMSE is slightly less than that of ACCESS1.0 and IN for the region mean. Furthermore, the BMA results prove its superiority. e PCC, RMSE, and Rbias values between the BMA simulation and observed data are distinctly better than those of the single individual models. e PCC reaches 0.74, and the RMSE is reduced by 28%-60%. Compared with that of MME, the RMSE decreases from 60.86 mm to 41.03 mm and the Rbias is greatly reduced, although the PCC exhibits little change. For this region, the given weights of each model via BMA are shown in Figure 5. e performances of the 8 single models, MME, and BMA for precipitation simulation are shown through a Taylor diagram, as shown in Figure 6. e PCC, RMSE, and standard deviation are represented in the Taylor diagram. e results graphically conclude that the BMA has a greatly improved simulation performance in this study region. e satisfactory performance of multimodel ensemble simulation demonstrated their capacity to alleviate the structural differences in each model and the uncertainty that exists in individual models, as demonstrated by the identical conclusion of previous studies [33,34].
From the results, the great performance of BMA proves that it is more reliable than the other approaches, as BMA reduced the errors. More effort is still needed to yield precipitation simulation at regional scales because it is difficult for the GCMs to represent regional climate situations [35]. Accordingly, there are large differences in precipitation simulation for each station, which is likely because the local climatic circulation of precipitation events can induce a greater interannual variability in rainfall [36]. In addition to BMA, there are other statistical ensemble methods that have been successfully applied, such as reliability ensemble averaging (REA) and the generalized linear model (GLM) [7,18]. Recently, the GLM performed well with ensembles of downscaled GCM projections [7]; this finding could inspire further study, which would refer to the comparison of ensemble-downscaling and downscalingensemble strategies, applying the BMA and GLM methods. Huge uncertainties remain, requiring further efforts to reduce them.

Validation and Comparison of ree Machine Learning Downscaling Models.
e authors conducted the process of hyperparameter choice of MLP, SVR, and RF based on Bayesian hyperparameter optimization. e Bayesian optimization maps hyperparameters to the scoring probability of the objective (e.g., loss of model performance) and can infer information on the unknown function [37]. e authors adopted the tree-structured Parzen estimator (TPE) algorithm under the framework of sequential model-based global optimization (SMBO) as better results are reported using it in several difficult learning problems [38]. In addition, a 10-fold cross-validation was used to promote the choice of optimal hyperparameters. More details regarding the hyperparameter optimization process and results can be found in Sections 1-3 of the supplementary materials. All ML downscaling models were built based on the optimal hyperparameters. Eventually, these models were evaluated based on PCC, RMSE, and Rbias, whose definitions are provided in Table 4. e authors used Python's "hyperopt" package to implement Bayesian hyperparameter optimization for MLP and RF, and MATLAB's "fitrsvm" function for SVM. Furthermore, the authors used 10-fold cross-validation to identify the optimal parameters, where the data during the period of 1961-2005 were divided into 10 equal sized subdatasets. Nine out of 10 subdatasets were used for training, and the remaining one was used for model validation. Mean squared error (MSE) was used as the crossvalidation score (objective) for MLP and RF, while the loss function was used as the objective for SVR because it is a selfcontained function of "fitrsvm." Tables 8-10 provide the hyperparameter results of MLP, SVR, and RF for the region mean, respectively. e detailed description for each parameter and search range for optimization are provided in Sections 1-3 in the supplementary materials, as well as the optimization processes for each station. Figure 7 shows the selected PC for each GCM, MME, and BMA. e selected PCs are the input of ML methods, while the observed precipitation of each station and region mean are the output. Table 11 shows a comparison of the three ML downscaling methods for the performance evaluation based on statistical metrics. Due to the length Advances in Meteorology limitation, Table 11 only provides the regional mean, and the assessment results of individual stations are represented by Tables S4-S6 in the supplementary materials. No method has absolute superiority, as the performance of all individual GCMs, MME, and BMA is not superior to that of the other methods. Encouragingly, all the downscaled models show greatly improved station simulation performance compared with the raw GCM output. For the downscaling of individual GCMs, the marked improvements reveal that the PCC increased from 0.54-0.68 to 0.72-0.81, the RMSE changed from 57.18-100.15 to 36.66-41.97, and the Rbias values were clearly reduced. Moreover, there are relatively fewer improvements for MME and BMA. Downscaling of BMA especially showed that the RMSE decreased by approximately 5.4 mm. is situation may be because the BMA simulation before machine learning training is relatively good, making it more difficult to significantly improve. From the final results, MME and BMA downscaling perform better than any of the single models. Figures 8-10 show scatter plots between the downscaled results and the observed precipitation in the region. e modeled data are closer to observed data if these scatters are more concentrated on the line. e SVR plots clearly show that the modeled data are closer to observed data than those of MLP and RF. According to Table 1, it is clear that there are relative differences in each downscaling method. According to the performance assessed by PCC and RMSE, the SVR results outperform the others in most cases, especially for the GCM downscaling of GR, PCC 2 n is the sample size; x i , y i are individual samples; x, y are the arithmetic mean of x and y, respectively RMSE RMSE � ����������������� n i�1 (y Obs − y pre ) 2 /n y Obs denotes observed data; y pre is the prediction value; n expresses the sample size Rbias Rbias � n i�1 (y pre − y Obs )/ n i�1 y Obs Same as the description of RMSE equation  e difference in performance at specific stations (Tables S4-S6 in the supplementary materials) is maybe due to the differences in local climate and localized rainfall features. However, we did not explore this theme.
In this study, the BMA and SVR models were used for the downscaling future projection of precipitation. e overall outperformance of SVR based on the MOS method over RF has also been proposed in other regions, but there is no absolute superiority for every GCM and station [7].
is study also came to a similar conclusion. Although neural networks have been popularly applied for downscaling [39,40], they were first tested on MOS. Compared with MLP and RF, the results of the MLP show a higher PCC and a lower RMSE. ose results demonstrated that there are relative uncertainties among the downscaling methods. ere are some satisfactory studies using multimethod ensemble for modeling [41].
is has inspired future work using ensemble methods to reduce the uncertainties.
Generally, the modeling performance of the ML methods depends on their input and parameters [42]. For        AC13  18  3  512  1604  GG  5  3  498  1581  GH  4  6  985  1551  GM  6  4  127  1630  GR  12  3  967  1657  IN  6  3  934  1549  Nor  13  2  451  1530  MME  4  2  622  1440  BMA  4  2  the input, it is clear that the more reliable the GCM, the better the performance of the downscaled results. For parameter set, though the authors used Bayesian hyperparameter optimization to choose the optimal hyperparameters of the ML method, there may be room for improving the optimization algorithm itself.

Projected Changes in Precipitation during the 21st
Century.
e projected precipitation outputs from 2006 to 2009 under RCP 4.5 and RCP 8.5 were assembled into an ensemble and downscaled through the SVR models using the established BMA, i.e., using the same GCM weights and selecting the same principal components. Firstly, we validated monthly precipitation projections against observational data for the 2006-2018 period. Table 12 shows the validation results. It can be seen that the projected precipitation under RCP4.5 is closer to the real data than RCP8.5, as the PCC is 0.78 and the RMSE is 43.57. In addition, the values of Rbias suggest that the real emission trend for the 2006-2018 period was somewhere between the RCP4.5 and RCP8.5 ones and was closer to the RCP4.5 one. As the change of climate is complex and uncertain, the changes of precipitation during the mid-(2040-2069) and late (2070-2099) 21st under RCP4.5 and RCP8.5 are worthy of study. e monthly rainfall simulation was converted to annual and seasonal time series. erein, the four seasons are spring (March-May), summer (June-August), autumn (September-November), and winter (December-February). Figure 11 shows the projected future annual and seasonal precipitation in this region. Considering 1971-2000 as the historical baseline, the average rainfall during the mid-and late 21st century increases by 3.23% and 1.02%, respectively, compared with the base year under RCP4.5; meanwhile, they increase by 4.25% and 8.30% under RCP8.5. From Figure 11, it can be seen that the rainfall during the 21st century has a weak increasing trend overall, and there is a slight downward fluctuation in the midterm. Under the RCP8.5 scenario, the increase in rainfall is more significant. For seasonal change, the mean value was obvious during winter and spring, while there was less change during summer and autumn. Figures 12  and 13 compare the statistics of the historical baseline and mid-and late 21st century time series based on quantile-quantile plots under RCP4.5 and RCP8.5. It can be seen that most rain distributions are near the normal distributions. In each subfigure, three baselines that represent the corresponding normal distributions are shown. e interception and slope of these lines represent the mean and variance. Approximately, compared to the period of 1971-2000, the precipitation of the mid-and late 21st century under RCP4.5 and RCP8.5 clearly increases. e changes in magnitude during the spring and winter are great. is result may be due to the greater uncertainty in the downscaling models during the dry season or the bias in RCP4.5 and RCP8.5. e trend of change throughout the 21st century in this region is consistent with that of previous studies, though the specific values are not the same [15,44]. is difference is acceptable because the selected GCMs and downscaling strategies differ. e specific projection of each station is shown in Table S7 in the supplementary materials. e projection of precipitation change for each station is distinct. Significant precipitation seasonality exists, though this study did not take measures to remove it; thus, the evaluation of seasonal rainfall has more uncertainties.
ere are several studies that consider seasonality by separately training downscaling models based on each calendar season and month [43]. e authors considered the sufficient sample size for the training of the ML methods and thus used all monthly data, so far, for modeling.
ere are further plans to train monthly and seasonal models based on daily data, though much uncertainty exists in the daily rainfall. ere are successful studies that have assessed extreme precipitation events based on daily downscaled precipitation [44]. Daily downscaled precipitation is also worth studying in a future work.

Conclusion
Statistical downscaling of precipitation in complex-topography hydrological basins is of major importance, since the low resolution of GCMs hinders the effective study of climate change at this scale. Improving downscaling through machine learning techniques has been investigated in this study, through the use of three MOSbased machine learning methods (MLP, SVR, and RF). Prior to downscaling, the BMA model was developed into

12
Advances in Meteorology a multi-GCM ensemble, while MME was applied as a reference. A series of statistical metrics including PCC, RMSE, and Rbias were used to conduct an evaluation and comparative analyses. en, BMA and SVR were applied to downscale the precipitation projection during the 21st century under RCP4.5 and RCP8.5. A validation of projected monthly precipitation in 2006-2018 was done for region mean, and the changes of precipitation during the mid-and late 21st century were analyzed. e conclusions are as follows: (i) MME and BMA both demonstrated their superiority over the individual GCMs. Specifically, the PCC, RMSE, and Rbias between the BMA ensemble simulation and the observed data were clearly better than that of the single models. e PCC reached 0.74, and the RMSE was reduced by 28%-60% over all the GCMs. Compared with the MME, the RMSE decreased from 60.86 mm to 41.03 mm and the Rbias was greatly reduced, though the PCC showed little change.   (ii) All the downscaled models have greatly improved station simulation performance compared with that of the raw GCM output. e PCC increased from 0.54-0.68 to 0.72-0.81, and the RMSE changed from 57.18-100.15 mm to 36.66-41.97 mm. e SVR showed the best performance on the whole. e downscaling results based on the BMA ensemble simulation and the SVR models were regarded as the best performing (the PCC, RMSE, and Rbias were 0.82, 35.07 mm, and − 5.45%, respectively) (iii) e projected rainfall under RCP4.5 is closer to the real data than RCP8.5 in 2006-2018. And the regional rainfall during the 21st century has a weak increasing trend on the whole, under RCP4.5 and RCP8.5, while there is a slight decreasing fluctuation in the midterm. Specifically, the average rainfall during the mid-and late 21st century increased by 3.23% and 1.02%, respectively, compared with the base year under RCP4.5; meanwhile, they increased by 4.25% and 8.30% under RCP8.5, and the magnitude of changes during winter and spring was higher than that during summer and autumn.
is study first developed MOS-based BMA_ML ensemble downscaling models and provided preliminary downscaled projections of precipitation during the 21st century for the region, which can be referenced in studies of future climate, hydrological, economic change, etc. However, unresolved uncertainties remain. It may be advisable for researchers to study the improvement of downscaling models and the applicability for other regions for follow-up work, subsequently considering the effect of local climate and seasonality, based on multiple time scales.
Data Availability e observed daily rainfall data for the years from 1961 to 2005 were utilized to calibrate and validate the downscaled models, and they were obtained from the China Meteorological Data Website (http://data.cma.cn/). e GCM-simulated data were acquired from the Coupled Model Intercomparison Project website (http://cmip-pcmdi.llnl.gov/cmip5/).

Conflicts of Interest
e authors declare no conflicts of interest.