An Application of ANN Ensemble for Estimating of Precipitation Using Regional Climate Models

Climate change scenarios are used for predicting future precipitation. More detailed regional climate change scenarios are being used through dynamic downscale based on global circulation model results. There is a global tendency to utilize simulated precipitation data from downscaled regional climate models (RCMs) suitable for each country. In Korea, there are studies for improving the accuracy of climate change scenario precipitation forecasts compared with observed precipitation. In this study, the precipitation of ﬁve regional climate models and actual observed precipitation provided in Korea are applied to ANN (artiﬁcial neural network), which suggests ways to improve prediction accuracy for precipitation. The ANN ensemble of RCMs simulates the actual observed precipitation more accurately than the individual RCM. In particular, it is more eﬀective inland than in coastal areas, where precipitation patterns are complex. Pearson correlation coeﬃcient of ANN is high as 0.04 compared with MRA. It is expected that more detailed analysis will be possible if it is applied not only to four cities but also to other regions in Korea. If observed precipitation data are collected in suﬃcient quantity, the applicability of the ANN model will widen.


Introduction
Construction of future climate scenarios under global warming conditions, at a local or regional scale, is necessary for the assessment of climate change impacts on economic activities such as agriculture or energy production [1]. General circulation models (GCMs) represent the most satisfactory approach to predicting future climate changes [2,3], but their present low spatial resolution measuring a few hundred kilometers makes their output problematic for use in impact studies [4,5].
Various developed methods have been applied to downscale from GCM coarse resolution output to finer spatial scale (e.g., local stations or river catchment). Physically based numerical models such as regional dynamical models that work at a finer spatial scale are nested within a GCM [6]. Current applications of this method have revealed its capacity to reproduce fine-scale features of regional climates [7][8][9]. In spite of the substantial improvement in computer abilities in recent decades, this numerical model method remains computationally very demanding [10].
In recent years, a number of reports and papers within the meteorological community have adopted neural network as a method to downscale from large-scale atmospheric circulation to regional climate variables [11,12]. Several application methods are developed for the purposes of constructing climate change scenarios [3,10].
ANN models can be trained to establish the closest mathematical relationship between atmospheric circulation and the local climate, without predefined limitations. us, this method is able to capture some of the nonlinear relationships between the local climate and large-scale circulation [1]. In particular, it has been shown that there is always a certain configuration of the multilayer perceptron that can arbitrarily approximate any continuous function [26,27]. In this sense, it is helpful to regard the multilayer perceptron as a very powerful multiple regression technique [28]. Gardner and Dorling (1998) have shown that most atmospheric sciences ANN applications have employed standard feed-forward configuration of the multilayer perceptron [1,29].
Downscaling methods are broadly classified as dynamic or statistical. Statistical downscaling is the development of statistical relationships between local climate variables and large-scale predictors. Dynamic downscaling is based upon nesting a finer scale regional climate model (RCM) of up to 10 km horizontal resolution within GCMs. Downscaling methods can be used to improve the spatial resolution of GCM output in order to overcome the challenge of assessing climate-related impacts on water resources at the catchment scale [30][31][32].
For the prediction of future climate conditions, global climate models were used. e grid resolution of current GCMs is in the order of hundreds of kilometers. However, the resolution required by environmental impact models is less than 10 km. erefore, as Tsanis et al. indicate, changes in precipitation and/or temperature predicted by GCMs for the Special Report on Emissions Scenarios (SRES) in small Mediterranean river basins cannot be used to assess extreme hydrological impacts, without previously applying a downscaling methodology. Two main downscaling techniques are available to represent this subgrid variability: (a) a dynamic approach using regional climate models (RCMs) and (b) a statistical approach [33].
Compared with RCMs, statistical methods include simpler computational procedures as well as the ability to produce results that can be directly compared with the regional observed data [34]. erefore, statistical downscaling techniques have been widely used for climate impact studies [32,[35][36][37][38][39].
A method exists in which it is possible to apply an ensemble of several models in such a way as to reduce uncertainty in climate change scenarios [40]. is ensemble method uses a single-model ensemble (SME) that combines various initial conditions and a multimodel ensemble (MME) that combines multiple model results.
ese include an ensemble method that averages the values of climate models, a Reliability Ensemble Average method that combines weights according to the accuracy of global models, and a method of using a median for each model. A method of synthesizing the values is also used [41][42][43].
In this study, observed precipitation data were compared with estimated precipitation data using the RCM. Via multilayer perceptron of the artificial neural network, it also helps to improve the accuracy of precipitation prediction by using an ANN ensemble for precipitation data generated from the RCM. For this purpose, data on RCM presented by the Korea Meteorological Administration and Coordinated Regional Climate Downscaling Experiment (CORDEX) were collected and applied to artificial neural networks.
Data for the artificial neural networks comprised the monthly precipitation of five RCMs used in Korea from 2006 to 2015. During that period, simulated precipitation by RCM at the same point as the observed rainfall was compared through the Pearson correlation coefficient in four major cities in Korea.
rough the coefficient of determination, the model closest to the actual precipitation was estimated, further estimating new precipitation by using the RCM precipitation data as an independent variable of the artificial neural network.
Based on the Pearson correlation coefficient, the optimal RCM for each city was selected. An ANN model was then built showing precipitation by the selected RCM as an independent variable. ANN models using five RCMs of four observatories were simulated, and ANN simulation cases were determined by the RCM's combination in order of lowest determination coefficients. Accuracy assessment was performed in order to compare the precipitation data generated by the ANN model with the data actually observed.

Theoretical Background
Artificial neural network (ANN) was used as a statistical analysis method to find correlations between measured and simulated values of precipitation by the RCM. In addition, accuracy assessment was performed to compare the ANN-derived simulation results with measured data. Section 2 covers the theories of the statistical methods used.

ANN Analysis for Determining Precipitation from RCM.
e analysis procedure of the artificial neural network consists of input, hidden, and output layers, as shown in Figure 1. Learning and output are done through backpropagation algorithms.
According to Haykin (1994), a neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge that can then be made available for use. e neural network resembles the human brain in two respects: knowledge is acquired by the network through a learning process, and interneuron connection strengths (known as synaptic weights) are used to store the knowledge. e ANN procedure used is a feedforward network type with input, hidden, and output layers [44].
Neurons in the input layer simply act as a buffer. ose in different layers are interconnected by means of weights. e neurons in the hidden and output layers are called the activation function, which when used here is a sigmoidal activation function. e input to each neuron j in the hidden layer is the sum of the weighted input signal x i ( w ji x i � net j , in which w ji is the interconnecting weight between neuron j in the hidden layer and neuron i in the input layer). Output y j from the neuron is given by [44,45] 2 Advances in Civil Engineering Analysis by error propagation algorithm compares the calculated value in all directions of the neural network with the target value and adjusts the connection strength (weighting factor) so that the square sum of the errors is minimized. is is performed until the error meets a certain value. When the iteration is finished, the final value is calculated as output.

Accuracy Assessment for Selecting Optimal ANN Model.
In evaluating the accuracy of the leakage ratio using the ANN developed during research, an error ratio analysis was performed to evaluate the difference between the actual and model values. Accuracy assessment can be estimated by comparing the actual measured value with that generated by model simulation.
e mean absolute error (MAE), mean square error (MSE), BIAS, and G value were used to evaluate the accuracy of the estimation results. e calculation method of each equation is shown in the following equations: where z(x i ) is the estimated value at i and z is the mean value of total usage data.
If the MAE and MSE are smaller, the estimated value indicates higher accuracy. e nearer the BIAS is to zero, the less biased the estimation result is. A G value of 100 indicates a perfect estimation. A negative G value indicates it as being less reliable than using an average of data values as a predictor [45].

Pearson Coefficient for Correlation Analysis.
In statistics, the Pearson correlation coefficient (PCC), also referred to as Pearson's r, is a measure of the linear correlation between the variables X and Y. It has a value between +1 and −1, where 1 is a total positive linear correlation, 0 is no linear correlation, and −1 is a total negative linear correlation. Widely used in the sciences, PCC was developed by Karl Pearson in the 1880s from a related idea introduced by Francis Galton [46][47][48].
Correlation analysis is a method of analyzing the linear relationship between two variables in probability theory and statistics. e two variables can be correlated with each other from an independent relationship, and the strength of the relationship between them is the correlation coefficient, as defined in where x and y are mean values of x and y. e PCC (r) is also used to determine the relationship between two variables with a coefficient of determination (r 2 ). e PCC is used to explain the linear relationship between the two variables. However, it may prove insufficient to determine the accuracy between the target and simulated values.
ough the results of the two variables are positively correlated, the linearity of the linear equation cannot be accurately estimated according to its slope and constant value, so an error rate analysis is performed when evaluating accuracy. If the linear relationship between the comparison variables is shown, it is concluded that the relationship between the two variables is closely related to the tendency of the trend line and the degree of convergence of the values [45].

Provision of RCM for Generating Precipitation Data.
e RCP scenarios in the IPCC's fifth assessment report are input into climate models and are provided through the IPCC's data distribution center. e development of global climate change scenarios (CMIP5) produces global data based on RCP scenarios. Based on global data, RCM regionally generated climate change scenarios are available in the local climate specification experiment (CORDEX) [49]. CORDEX calculates regional scenarios of climate change by dividing the world into 14 regions, including North America, Europe, Africa, and East Asia, at a resolution of

Input layer
Hidden layer Output layer 0.44°(about 50 km). With regard to Korea, data on climate change forecasts by the RCM can be used in CORDEX-EA for the East Asia region by HadGEM3-RA, RegCM4, SNU-MM5, SNU-WRF, and YSU-RSM [49]. Each website providing climate change scenarios provides data through various models, data periods, variables, and resolutions. is paper collected data from the COR-DEX-EA and the websites of the Korea Global Atmosphere Watch Center (KGAWC) [50], which provides RCM results of several variables in RCP climate change scenarios.
e World Climate Research Program (WCRP) initiated the CORDEX framework in order to improve worldwide regional climate change projections and provide a framework for better coordination of regional climate downscaling [51].
e CORDEX experiments provided an opportunity to evaluate the relative and absolute performances of various RCMs over predefined regions. As one of 14 branch domains within the CORDEX framework, CORDEX-East Asia (CORDEX-EA) covers a large area of East Asia with a horizontal resolution of approximately 50 km [52].
In the framework of the CORDEX project, five RCMs driven by ERA-Interim reanalysis were evaluated in terms of their ability to simulate the climatology of the East Asia domain (CORDEX-EA) in comparison with the Aphrodite observational data set [53]. e models are affected by a temperature bias ranging from 1.1 (RegCM4) to 0.2°C (SNU-MM5) from June to August and from 5.5 (SNU-MM5) to 0.9°C (RegCM4) from December to February [53].

Collection of Precipitation Data from RCM.
To compare results of the RCM and actual data in order to determine climate model forecasting accuracy, data were selected for use based on the status of the climate change scenario. e climate change scenarios collected are shown in Table 1.
Data for the same period are required to compare climate change scenarios and observations. us, 50 km resolution precipitation data for 2006∼15 overlapping with the previous period's data were collected from the HadGEM3-RA, RegCM4, SNU-WRF, SNU-MM5, and CORDEX-EA YSU-RS prediction data. Only monthly SUN-WRF data were available from 2006 to 2010. Daily data of SNU-WRF were thus collected and then converted into monthly data from 2011 to 2015. So monthly data by all RCMs were collected for ANN simulation. e collected RCM data were interpolated using Universal Kriging, the optimal spatial interpolation method available to collect data at the same location as the actual observatory site [49]. e interpolated precipitation data are used in the study as shown in Section 3.3.

Observation Data Collection for Comparison with RCM.
Precipitation data are available through the KMA website and Korea Global Atmosphere Water Center [54]. Ground observation data are classified into the automated synoptic observation system (ASOS), automatic weather system (AWS), and automated agricultural observing system (AAOS). Data are provided in CSV and PDF formats.
ASOS comprises ground observations performed at the same time on all observatories in order to determine the atmospheric conditions at any given time. Air pressure, temperature, wind direction, wind speed, relative humidity, precipitation, solar radiation, daylight time, ground temperature, vertical temperature, and ground temperature are all automatically observed. Each observation station is measuring the precipitation by one-minute time unit. It is currently being observed at more than 80 sites in Korea [54].
Observed precipitation data for 2006∼2015, consistent with the RCM data's period, were collected [46]. Observation points were selected from the synoptic meteorological network, where no data were missing. No movement of precipitation observation equipment was seen over the measurement period.
Location information on 4 selected observation points of precipitation data collected through the Korea Global Atmosphere Water Center is shown in Figure 2.

Status of Collected Data.
In all four cities, monthly precipitation by RCMs and observed precipitation patterns were similar. In the case of Seoul, observed precipitation in the summer season was higher than the RCM simulation. In the case of Busan, RCM simulated precipitation was frequently higher than the observed precipitation.
Park (2017) analyzed five RCMs in Korea and found a similar correlation between observed and RCM precipitation, whether located inland or in northern parts [49]. Figure 3 shows that precipitation forecasting patterns for Seoul and Daejeon, both situated further north than Daegu and Busan, are more stable. As shown in the precipitation comparison graph, the monthly precipitation of five RCMs demonstrates a differing pattern. Table 2 shows the Pearson coefficients for the actual precipitation and the RCM simulated precipitation in four major Korean cities. As can be seen in the table, all have a positive correlation of 0.2∼0.6 for each model.
In particular, the Busan area has a low Pearson coefficient except for the HadGEM3-RA model. Except for the HadGEM3-RA model, in Busan, it is found that the correlation with observed precipitation is low. Correlation between the three cities also varies depending on the model used. Of the five RCMs, the HadGEM3-RA model generally shows a high correlation, while the YSU-RSM shows a high correlation in Daejeon and Daegu.

Simulation Model Construction. ANN simulation cases
were selected to produce ensemble precipitation forecasting using five RCMs in four major cities in Korea. Observed precipitation was used as a dependent variable while 5 RCM precipitation data were used as independent variables.
In constructing an ANN simulation, 80% of total data are trained and 20% verified to simulate the constructed neural network. e hyperbolic tangent function is used for the  Advances in Civil Engineering activation function and the conjugate gradient method is used as the optimization algorithm. e conjugate gradient method is an algorithm for finding the nearest local minimum of a function of n variables which presupposes that the gradient of the function can be computed [55]. e minimum variation of the learning error is 0.0001 in the  multilayer perceptron of ANN, and the minimum relative variation of the learning error is 0.001. e results are repeatedly calculated until the error is minimized. ANN simulation cases were selected according to the coefficient of determination calculated in Table 2, using five RCMs. e RCM independent variables were removed one by one in descending order of the coefficient of determination. Five cases were then selected as shown in Table 3 for each city. e number of neurons in the ANN model was set to twice the independent variable, that is, the optimal stimulation condition, while the number of independent variables was taken into account [44,45].

Accuracy Assessment Results.
Using ANN, the most reliable case for estimating the precipitation of 4 major cities was selected. Accuracy assessment was determined by using the MAE, MSE, BIAS, and G value (goodness of prediction). Figure 4 shows the accuracy assessment results for all simulation cases.Where the calculated value of the MAE and MSE is small and the G value close to 100, the simulated value is therefore corrected to the measured value.
e results of the ANN ensemble are more accurate than the results of RCM estimated precipitation. In all cities, the precipitation calculated by ANN is lower in the MAE and MSE values than the RCM results.
Additionally, since ANN simulation results show that the G value is closer to 100 than the conventional RCM, the ANN can be considered to be effective in precipitation prediction.
However, it is difficult to use as an estimation model if the G value is negative. All ANN simulation results show positive values, while some RCMs show negative values.
In the case of BIAS, RCMs demonstrate bias according to location and model, whereas the ANN ensemble model is relatively unbiased. Detailed results of the ANN ensemble accuracy assessment are shown in Table 4. Pearson co. (avr.) is averaged PCC for selected RCMs.
As shown in Figure 4, HadGEM3-RA for Seoul and Busan, YSU-RSM for Daejeon, and RegCM4 for Daegu were selected as the most accurate RCMs. erefore, various RCMs are required for accurate model selection in each region. Pearson coefficient by ANN is higher than that by averaged precipitation by RCMs in all cases.
In Table 4, the most accurate ANN simulation cases were selected by analyzing MAE, MSE, and G values, which are the comparison indices of measured and predicted values. When analyzing ANN using 4 or 5 RCMs, results using ANN showed the highest accuracy.
In the case of Seoul, accuracy was high except when using the YSU-RSM model, which had the lowest Pearson coefficient of 0.31 in Table 2. In the case of Daejeon, accuracy was high except when using the SNU-MM5 model, and in the case of Busan, accuracy was high except when using the RegCM4 model which had the lowest Pearson coefficient. In the case of Daegu, cases using all RCMs were the most accurate. Figure 5 shows ANN ensemble simulation results for the optimal cases selected in Table 4. And, Table 5 is a basic statistical analysis for precipitations from Figure 5. ANN ensemble was determined through the results such as MAE, MSE, and G values. It was compared with a case where the correlation was high among the 5 RCMs.

Graph Analysis.
In Seoul, Case 1-2 is the most accurate. It uses 4 RCMs, not including the YSU-RSM model. Certain monthly estimated precipitation in Seoul is higher than 1,000 mm. e amount of precipitation corresponding to annual rainfall is simulated. e HadGEM3-RA had the highest accuracy among the five RCMs used. However, it was not reliable in predicting extreme precipitation. Instead, predictions of monthly rainfall patterns proved better than in other cities.      As a result of the ANN ensemble, there was a pattern similar to the HadGEM3-RA precipitation model which was the optimal RCM for Seoul. In the case of Daejeon and Daegu, the YSU-RSM model achieved the results closest to actually observed precipitation. In certain periods, precipitation by RCM was overestimated. e model selected through the ANN ensemble was adjusted to be close to the observed precipitation.

Advances in Civil Engineering
In Busan, the ANN ensemble shows a similar pattern to the optimal RCM. However, it was overestimated in certain periods.
e average monthly precipitation in Busan is 124 mm, and the average monthly precipitation by HadGEM3-RA is 98 mm, which is underestimated compared with observed data in Table 5. e ANN ensemble is 116 mm, which is closer to the observed rainfall than HadGEM3-RA. However, HadGEM3-RA was 452 mm and ANN ensemble was 316 mm compared with the maximum monthly rainfall of 886 mm. Even during periods when no precipitation was observed, a certain amount of precipitation was predicted in the model. e results showed that the ANN ensemble predicted precipitation results closer to the observed precipitation than the single used RCM. In particular, the ANN ensemble produced more reliable patterns than overestimating RCM results in inland cities such as Daejeon and Daegu.

Comparison between ANN and Multiple Regression Analysis
In Section 5, the multiple regression analysis (MRA) was calculated and compared with the ANN results. MRA is one of the most widely used statistical techniques. e ANN simulation results of target cities were compared using the various statistical analysis methods.

Generation of Equation by Multiple Regression Analysis.
Multiple regression analysis (MRA) is used for identifying relationships between variables with high correlations and predict the value of variables. In the regression model, one independent variable is called simple regression analysis, and one independent variable does not sufficiently explain the dependent variable. e multiple linear regression model with independent variables is expressed as [56,57] where x i (i � 1, . . . , k) is the independent variable (precipitation by RCMs), y is the dependent variable (observed precipitation), β i (i � 1, . . . , k) is the regression coefficient, and β 0 is the intercept of y. MRA was performed on selected appropriate ANN simulation cases of each city. Selected cases are Case 1-2, Case 2-2, Case 3-1, and Case 4-2. Using the 4 cases, generated MRA equations are shown in equations (8)- (11). e precipitation unit is mm per month for 4 cities.

Comparison between ANN and Multiple Regression
Analysis. Multiple regression equations from (8) to (11) were used to generate precipitation; then these results were compared with ANN's precipitation as shown in Table 6 through the accuracy assessment indices. As shown in Table 6, the prediction estimation accuracy of ANN was higher than MRA in all analyzed cities. MAE and MSE values are low and the G value is close to 100, which is the result of judging that the estimation accuracy is high.
In the case of MRA, the BIAS value is 0 in all cities, and the estimated value is not biased compared to ANN. As a result of comparing ANN and MRA with MAE, MSE, and G values, ANN ensembles in all cities showed appropriate results to observations in all cities. Figure 6 is a graph comparing ANN and MRA to actual precipitation. Estimated precipitation was underestimated in all four cities. Overall ANN and MRA precipitation distributions were similar. e results show that the accuracy of precipitation estimation using ANN and MRA is higher than that of RCM's precipitation, but the accuracy is still lower than the observed precipitation in all analyzed cities. Table 7 shows the results of basic statistical analysis for observation, ANN, and MRA precipitations. e mean precipitation values of the estimated by ANN and MRA are almost similar to the observed value. Particularly, MRA is more similar to the average of the observed values than ANN.
In the case of minimum precipitation, there was no monthly precipitation in case of observed value, but precipitation by RCM did not predict monthly precipitation well and it is difficult to judge which method of ANN and RMA is suitable. In the case of maximum precipitation, the observed precipitation tended to be higher in each city, and ANN and MRA did not show a similar range of precipitation.
Since the predicted precipitation by ANN and MRA follows the results of RCMs, precipitation by reliable RCM should be prioritized. In particular, ANN ensembles did not estimate extreme precipitation appropriately because RCM precipitation did not predict actual extreme precipitation well.
According to previous studies, the uncertainty of precipitation occurring in RCM in East Asia is 1 to 1.5 mm/day  Advances in Civil Engineering 13 [58]. In case of monthly conversion, a difference of less than 50 mm may occur, and the monthly precipitation of this study exceeds the range. In four cities, monthly precipitation amounting to 50% of the annual average precipitation was observed and it is a reason for the bigger difference between observed and estimated precipitation.

Conclusions
is study is an application of ANN in order to improve the accuracy of estimated precipitation according to climate change scenarios in Korea. For this purpose, RCM precipitation data were collected from CORDEX and the Korea Meteorological Administration. ANN was applied to major cities in Korea and the following conclusions were drawn. e five RCMs provided in Korea were compared with the observed precipitation, and the ANN ensemble model was applied to combine the advantages of each model. e results show that using the ANN ensemble simulation estimates observed precipitation appropriately compared to a single RCM. In addition, irregular precipitation patterns in single RCM results have steadily changed by using the ensemble model.
In applying ANN to predicting the precipitation of major cities in Korea, it is most accurate to apply 4 or 5 RCMs as independent variables. When ANN is applied, optimal weighting factors should be adjusted for low and high correlation models.
In the time series analysis, the ANN ensembles are closer to the observed precipitation pattern than the single used RCM. In particular, it is shown that precipitation pattern forecasting is more effective in inland areas such as Daejeon and Daegu than coastal areas such as Busan, where the precipitation pattern is more irregular.
ANN and MRA were compared by the RCM ensemble. As a result, ANN was selected as the accurate model in representative cities in Korea. However, it is necessary to improve the accuracy in quantitatively evaluating the precipitation.
An RCM of 50 km resolution was extracted by the spatial interpolation method and used for this study. Estimated precipitation may vary according to the statistical downscaling method using RCM, so that different optimal ANN models may be selected by different downscaling results. A more detailed analysis will be possible if applied to other regions in the future, and the applicability of the ANN model will see growth following the collection of more observation data. To increase model accuracy, M5Tree, MARS, LSSVR, and GEP models can be applied in future research. Also, this study was conducted on four major cities and needs to be supplemented through research in various domestic or overseas regions with different city characteristics in the future.
Since it is not possible to predict optimal precipitation using all RCMs, an additional RCM selection methodology is required for ANN application. e study classifies simulation cases using Pearson correlation coefficients, but further studies are needed to select the optimal ANN cases.

Data Availability
e RCMs data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e author declares no conflicts of interest.