Development of an integrated method for long-term water quality prediction using seasonal climate forecast

The APEC Climate Center (APCC) produces climate prediction information utilizing a multi-climate model ensemble (MME) technique. In this study, four different downscaling methods, in accordance with the degree of utilizing the seasonal climate prediction information, were developed in order to improve predictability and to refine the spatial scale. These methods include: (1) the Simple Bias Correction (SBC) method, which directly uses APCC’s dynamic prediction data with a 3 to 6 month lead time; (2) the Moving Window Regression (MWR) method, which indirectly utilizes dynamic prediction data; (3) the Climate Index Regression (CIR) method, which predominantly uses observation-based climate indices; and (4) the Integrated Time Regression (ITR) method, which uses predictors selected from both CIR and MWR. Then, a sampling-based temporal downscaling was conducted using the Mahalanobis distance method in order to create daily weather inputs to the Soil and Water Assessment Tool (SWAT) model. Long-term predictability of water quality within the Wecheon watershed of the Nakdong River Basin was evaluated. According to the Korean Ministry of Environment’s Provisions of Water Quality Prediction and Response Measures, modeling-based predictability was evaluated by using 3month lead prediction data issued in February, May, August, and November as model input of SWAT. Finally, an integrated approach, which takes into account various climate information and downscaling methods for water quality prediction, was presented. This integrated approach can be used to prevent potential problems caused by extreme climate in advance.


Introduction
Demand from water resources managers for seasonal climate prediction information with a lead-time of several months is increasing as this information can provide key knowledge on issues like long-term dam inflow and water quality prediction information.Long-term water quality forecasts are particularly important in watershed management because they allow for these managers to implement proactive water quality control management techniques.The importance of utilizing long-term forecasts for proactive management of water quality is becoming more important, particularly in non-point source pollution cases.Non-point source pollution flows into the water bodies during rainfall events and gradually induces the water quality problems such as algae (e.g.Meng et al., 2010;Xu et al., 2012).However, seasonal forecast informa-tion has yet to be widely utilized in this manner mainly due to its high uncertainty levels.In addition to the issue of high uncertainty, there are large differences in the spatio-temporal scale between the forecast data that is necessary for water quality management and the data that is currently provided.Therefore, in order to address these concerns, the development of spatiotemporal downscaling techniques are imperative in order to fully apply long-term climate prediction information to water quality management in a large region.
There are two approaches to downscaling: (1) dynamic, and (2) statistical.Dynamic downscaling approach is not commonly used mainly due to its more time-and costintensive nature compared to that of the statistical approach.For this reason, the statistical downscaling approach has been used to attempt to enhance the utilization of data created using dynamical models.In previous studies the regression method, which indirectly uses atmosphere-ocean variables from dynamic prediction data as predictors for forecasting local target variables based on high correlation, was applied instead of directly using the target variables in order to improve predictability (Kang et al., 2009(Kang et al., , 2014)).Other statistical approaches that improve predictability, use only those lagged climate indices that have high correlation with local climate variables.The lag time based climate index approach has been used in regions that have strong tele-connection between global climate indices and local climate variables (Bridhikitti, 2013;Hamlet and Lettenmaier, 1999;Schepen et al., 2012;Räsänen and Kummu, 2013).In Korea, there has been research on seasonal prediction using lagged climate indices, which have high correlation with precipitation and temperature on the Korea peninsula (Kim et al., 2007;Kim and Kim, 2010).In addition, various studies have been conducted using climate indices in connection with the precipitation of the East Asian region, including the Korea Peninsula (Lee et al., 2008;Wang et al., 2008).Recently, the Hybrid approach, which combines dynamically and statistically predicted climate information, has been applied to improve the predictability of the seasonal forecast (Robertson and Wang, 2012;Schepen et al., 2012).
Therefore, the objectives of this study are: (1) to develop a hybrid downscaling technique for predicting long-term precipitation and temperature on the Korea peninsula, by considering both the multi-model based prediction data provided by the APEC Climate Center (APCC) and the statistical prediction information based on teleconnection for water resources management; and (2) to evaluate the applicability of the seasonal forecast information in long-term future water quality predictions by using the predicted climate information as input to watershed modeling.

Methodology
Figure 1 shows a flow chart of the overall study: (1) Seasonal Climate Forecasting (steps j-n in the figure ); (2) Temporal Downscaling (steps o and p); and Long-Term Water Quality Forecasting (steps q and r).Steps j-n include the development and application of four downscaling methods to predict the monthly precipitation and temperature depending on the available weather information.Steps o and p include converting monthly forecast information to daily data at the weather station levels using temporal downscaling based on the average, minimum (MIN), and maximum (MAX) of the multi-model ensemble (MME).Steps q and r include the assessment of the predictability in simulating monthly water quality by aggregation of daily water quality output, using predicted daily weather data as input to the watershed modeling.

Seasonal climate forecasting
The purpose of seasonal climate forecasting is to generate monthly precipitation and temperature prediction data on the target area while improving predictability by using different source of global climate information.The overall seasonal climate forecasting technique combines four different downscaling methods according to the degree of using dynamic prediction data produced by global climate models (GCMs).These methods include: (1) the Simple Bias Correction (SBC) method, which directly uses APCC's climate prediction data with 3 to 6 month lead time; (2) the Moving Window Regression (MWR) method, which indirectly utilizes the dynamic prediction data; (3) the Climate Index Regression (CIR) method, which predominantly uses the observation-based climate indices without using any prediction data; and (4) Integrated Time Regression (ITR) method, which uses predictors selected from both CIR and MWR.Since predictability on the Korean peninsula may differ depending on the target month and selected method, predictability was evaluated using the simple average of all available forecast information.
Simple Bias Correction (SBC) is a forecast-based direct downscaling method which uses GCM's prediction data to adjust the monthly mean of predicted precipitation and temperature through a simple bias-correction.For example, if the precipitation and temperature prediction data on Korean peninsula is needed, SBC directly uses the grid values of precipitation and temperature variables, which are produced from GCMs over the corresponding area.The systematic bias is adjusted for precipitation and temperature by using the ratio and addition, respectively, in order to make the monthly average of prediction same to the average of observation for the same period.
Moving Window Regression (MWR), which is similar to Kang et al. (2009) in the concepts and methodology, is a forecast-based indirect statistical downscaling method, which uses the proxy variables, produced by GCMs as predictors of regression model when high correlation exists between proxy variables and regional target variables.If there are limitations in directly predicting target variables such as precipitation (PREC) and temperature at 2 m (T2M) in the target area, the MWR method uses the oceanic and atmospheric circulation variables as predictors to improve the seasonal prediction predictability in the target region.Available proxy variables provided by APCC include temperature at 850 hPa (T850), zonal wind at 200 hPa (U200), meridional wind at 200 hPa (V200), zonal wind at 850 hPa (U850), meridional wind at 850 hPa (V850), geopotential height at 500 hPa (Z500), sea level pressure (SLP), and sea surface temperature (SST).In this study, only climate information from the latitudinal range of −40-40 • (centered on the equator) was used for the predictor selection procedures.
Climate Index Regression (CIR) is an observation-based indirect statistical downscaling method that can be used when there is a high correlation between global climate indices and regional target variables with lag time.For real time operation of CIR in predicting monthly precipitation and temperature using climate indices, the lag time between the monthly precipitation/temperature and indices should be larger than the lead-time.The CIR method is similar to the MWR method in that both methods indirectly utilize the correlation between regional target variables and global scale climate variables related to oceanic and atmospheric circulation.There is a difference between the CIR and MWR methods when selecting predictors to forecast future seasonal target variable values.While the MWR method uses simultaneous proxy predictors that are predicted by GCMs, the CIR method uses the observed climate information from a few months ago by taking into account the lag time.A 2 to 6 month range of lag time was used in this study for 3 month lead forecasting.
Integrated Time Regression (ITR) is an indirect statistical downscaling method that uses both forecast and observation based predictors from the MWR and CIR methods, respectively.As a result, it can be used only when the MWR and CIR methods simultaneously select predictors for a particular target period.From the best predictors determined by the MWR and CIR methods, a selection of final predictors for the multivariate regression model are finally selected through the Akaike Information Criterion (AIC) analysis.
The statistical downscaling model was constructed separately for each month and target variable.The concepts of both cross-validation and split-validation were applied in developing statistical downscaling methods such as MWR and CIR in order to prevent overfitting problems, which can occur when constructing statistical forecasting models.The Leave-one-out cross-validation (LOOCV) technique was ap-plied to the observation period .In other words, when predicting target variables for a specific target period (year/month), all predictors for the same target period are removed from the model construction procedure in order to reproduce the same conditions as the real time forecasting.For example, when predicting for January 1983, only predictors from January 1984 to 2013 are utilized in constructing the regression model.Predictions are made in the same way for the rest of simulation period.For each cross-validation process, the split validation approach was applied, and then the best predictors that showed consistent performance for both training and verification periods were finally selected.

Temporal downscaling
In addition to predicted monthly precipitation and temperature, additional long-term climate variables including wind speed, solar radiation, and relative humidity, are necessary at the daily time-scale in order to use the seasonal forecast information as inputs to watershed modeling.From a geographical standpoint, considering the spatial correlation among weather stations is very important in the temporal downscaling procedure.In this study, a sampling approach that extracts daily weather variables from the past observations within the target region was selected.First, the year/month from previous observed data that is most similar to the regional average in predicted climate patterns is determined, considering both precipitation and temperature simultaneously.This was done using the Mahalanobis distance method (Mahalanobis, 1936) by considering the covariance between precipitation and temperature.Then, the daily weather variables for each weather station are extracted from the selected year/month.proc-iahs.net/374/175/2016/Proc.IAHS, 374, 175-185, 2016

Evaluating Predictability
It is necessary to conduct calibration and validation procedures first in order to evaluate the modeling-based water quality predictability using watershed-scale model.In the case of the flow rate, the total error (Err) and Nash-Sutcliffe Efficiency index (NSE) were used to evaluate the performance of the model based on simulated and monitored data.
For water quality components such as Total Nitrogen (TN) and Total Phosphorus (TP), graphical approach based on trial-and-error methods was used to evaluate model performance.
where O i = observed value, P i = predicted value, Ō = observed mean, and n = number of simulations.
According to the Provisions of Water Quality Prediction and Response Measures of the Korean Ministry of Environment, modeling-based predictability was evaluated by using 3-month lead prediction data forecasted in February, May, August, and November as model inputs in the Soil and Water Assessment Tool (SWAT).The ultimate goal of the modeling-based predictability evaluation is to compare the generated predicted water quality output to the observed water quality, using long-term weather prediction information as watershed model inputs.However, the generated water quality outputs that used observed weather data as model inputs, was also compared with the predicted water quality in order to distinguish the uncertainties caused by parameterization of watershed models and long-term climate prediction.

Study watershed and watershed model
The Wecheon watershed within the Nakdong River basin was selected for this study mainly due to the relatively few changes in land-use within the watershed for a long period of time.The watershed has a high percentage of rural areas, and low percentage of waste water treatment plants.As a result, in 2011, it was found that non-point source pollution caused around 80 % of the total Biological Oxygen Demand (BOD) pollutant load within the watershed, which is sensitive to rainfall characteristics.Therefore, the Wecheon watershed is suitable to evaluate the applicability of seasonal forecast information for watershed management purposes when considering higher impacts by climate factors compared to human factors.
Figure 2 shows the locations of the weather, water level, and water quality monitoring station network.Younggok and Wecheon-B stations were selected to calibrate the flow and water quality related parameters of the selected watershed model, respectively, because their measurement points are located relatively close to each other.
The Soil and Water Assessment Tool (SWAT) model was selected in this study to simulate the TN and TP movements from upland areas to streams.SWAT, which was developed by the US Department of Agriculture to predict the long-term behavior of hydrology and contaminants at large watershed scale, divides the watershed into multiple sub-watersheds in order to capture the different spatial characteristics (Neitsch et al., 2005).Each sub-watershed is subsequently subdivided into multiple Hydrologic Response Units (HRUs) by grouping them according to their similar properties, such as land use, soil, and slope.In addition to the hydrological processes, using the Modified Universal Soil Loss Equation (MUSLE) can simulate soil erosion.The transport mechanisms of organic substances like nitrogen, phosphorus, and pesticides, can also be simulated.Water, sediments, and nutrients that are introduced from the HRUs into the water body are simulated through a reaction mechanism within the water body (Neitsch et al., 2005).
In this study, Wecheon watershed was divided into 29 subwatersheds and 1866 HRUs by combining land use, soil, and the Digital Elevation Model (DEM).Two years of warm-up simulations were conducted for the initialization of the model parameters.Then, the model was calibrated and validated using measured data for 2007-2008 and 2010-2011, respectively.The trial-and-error method was used for the model calibration procedure.Therefore, manual calibration of the flow rate was conducted first and then calibration of the sediment and nutrient related parameters were conducted sequentially.

Climate information
APCC has been collecting monthly dynamic prediction data produced by 16 institutions and has been producing 3-month and 6-month lead Multi-Model Ensemble (MME) climate forecasts every month.In this study, 3-month lead seasonal forecast data which were regrided with 2.5 • ×2.5 • resolution based on 10 individual Global Climate Models (GCM) were used for the SBC and MWR downscaling methods.Table 1 shows the description of 10 GCMs used.
When using the CIR method, we used the real time 25 climate indices as predictors: 16 climate indices that are updated on a monthly basis by NOAA, through the webpage (http://www.esrl.noaa.gov/psd/data/climateindices/list/),and 9 indices which are extracted monthly at APCC using the NCEP/NCAR Reanalysis 1 data (Table 2).This study utilizes the observed monthly precipitation and temperature data from the Korean peninsula, based on 57 Korean weather stations (Fig. 1).

Evaluation of the downscaling method
Table 3 shows the results of prediction models that have been selected for each case (month and variable).When using the Simple Bias Correction (SBC) method, total of 21 and 36 models were selected for precipitation and temperature, respectively.For precipitation forecasting, a similar number of models were selected for different lead times (6, 7, and 8 models for 1, 2, and 3 month lead times, respectively).For temperature forecasting, the number of models varied between different lead times (20, 10, and 8 models for 1, 2, and 3 month lead times, respectively).The MWR method selected 19 models for precipitation forecasting and 9 models for temperature forecasting.For precipitation forecasting, the number of models varied between different lead times (6, 4, and 9 models for 1, 2, and 3 month lead times, respectively).For temperature forecasting, a similar number of models were selected for different lead times.When utilizing the Climate Index Regression (CIR) method, only one index (the Western Pacific Index, "WP" with 5 month lag) was selected to predict precipitation in July, while two indices such as the Pacific Warm Pool (PACWARM) with 6-month lag and the Atlantic Tripole SST EOF (ATLTRI) with 3-month lag were selected to predict temperature in September and October, respectively.As a result, the ITR method was selected to forecast precipitation levels in July and temperature levels in September and October, When MWR model selections are available.Overall, the SBC method, which is based on dynamic prediction data, shows the highest model selection and is followed by statistical downscaling methods such as MWR, and CIR/ITR.The SBC method shows the highest selection of models for 1-month lead temperature prediction for September with 6 models, while the MWR method shows the highest selection of models for 1-month lead precipitation prediction for September with 3 models.Figure 3 shows an example of spatial distribution of the three predictors that have been selected by the MWR method for 1-month lead precipitation prediction for September.
An evaluation of predictability when issuing forecasts every month was conducted, as shown in Fig. 4. For example, when we predict precipitation levels in August during the month of July, all three prediction results (including 1-month lead prediction issued in July, 2-month lead prediction issued in June, and 3-month lead prediction issued in May) can be used.Figure 5 illustrates an evaluation of predictability using a simple average of multi-model predictions.
Figure 5 shows the temporal correlation coefficient for each month according to changes in lead time.For precipitation prediction, there were difficulties when selecting models for January, March, May, and June.The months of February, July, and December show Temporal Correlation Coefficient (TCC) values that are greater than 0.6 for most lead times.In December, when the selected models are based on dynamic model predictions, there is a decreasing in TCC values as the lead times increase.Figure 5 also shows that there are difficulties when selecting prediction models for temperature predictions February, March, April, and June.The greatest TCC values occurred during the month of September, when most of the model selections are based on the SBC method, directly using dynamic prediction data.

Long-term water quality forecast
Table 4 shows the monthly Total Error (Err) and Nash-Sutcliffe Efficiency Index (NSE) for streamflow during the calibration and validation periods.The Errs were less than 5 % for both calibration and validation periods (−1.6 and −1.1 % Err, respectively).NSE values were greater than 0.9 for both calibration and validation periods (0.95 and 0.92, respectively).This indicates that streamflow simulation using the Soil and Water Assessment Tool (SWAT) is satisfactory.Figures 6 and 7      Figure 8 shows the comparison of observed pollutant loads (Observed-WQ), simulated pollutant loads using observed weather data (SWAT-Observed), and simulated pollutant loads using forecasted MME data (SWAT-Forecast).When comparing Observed-WQ and SWAT-Forecast, the temporal correlation coefficient (TCC) was 0.22 for TN.This low TCC value may be due to the overestimation by SWAT-Forecast during the period of 2007-2010.However, TCC increased to 0.58 when comparing SWAT-Forecast to SWAT-Observed due to their good temporal agreement.A comparison of TP load shows a similar trend to TN comparison results by showing TCC values of 0.05 and 0.49 when SWAT-Forecast was compared to Observed-WQ and SWAT-Observed, respectively.We found that uncertainty from the SWAT parameterization was higher than the uncertainty from the climate forecast, illustrated by the big differences in their TCCs.The high uncertainties in model parameterization can  be caused by the lack of sampled data used to estimate monthly pollutant loads.TN and TP concentrations were measured around four times every month and the monthly pollutant load was estimated by multiplying the concentration and flow rate.As a result, monthly pollutant loads can be affected by flow rate depending on the sampling time.
Table 5 shows the monthly comparison of TCC values according to different references as observation including Observed-WQ and SWAT-Observed.When SWAT-Forecast is compared to Observed-WQ, only March has a TCC value that is greater than 0.5 for TN, and all months have TCC values that are less than 0.5 for TP.However, when SWAT-Forecast is compared to SWAT-Observed, 3 months from March to May have TCC values that are greater than 0.5 for both TN and TP predictions.Even though there were difficulties in selecting models between March and May for both precipitation and temperature, this period had the highest predictability in long-term water quality forecast.On the other hand, water quality predictions in July and December, when predictability in precipitation prediction was high, had proc-iahs.net/374/175/2016/Proc.IAHS, 374, 175-185, 2016  low TCC values.This indicates that marginally higher predictability, in precipitation and temperature predictions, does not guarantee higher predictability in long-term water quality prediction.

Conclusions
Four different downscaling methods in accordance with the degree of utilizing seasonal climate prediction data were developed in order to improve predictability and refine the spatial scale.These methods include: (1) the Simple Bias Correction (SBC) method, which directly uses the APCC's climate prediction data with 3 month lead; (2) the Moving Window Regression (MWR) method, which indirectly uti-lizes prediction data; (3) the Climate Index Regression (CIR) method, which predominantly uses observation-based climate indices; and (4) the Integrated Time Regression (ITR) method, which uses predictors selected from both the CIR and MWR methods.Overall, the SBC method, based on dynamic prediction data, shows the highest model selection and is followed by statistical downscaling methods such as MWR, and CIR/ITR.Then, sampling-based temporal downscaling using the Mahalanobis distance method was conducted in order to create daily weather inputs to the Soil and Water Assessment Tool (SWAT) model.Long-term predictability of water quality within the medium-size Wecheon watershed of the Nakdong River Basin was evaluated.According to the Provisions of Water Quality Prediction and Response Measures of the Korean Ministry of Environment, modeling-based predictability was evaluated by using 3-month lead prediction data forecasted in February, May, August, and November as model inputs of SWAT.The results indicate that marginally higher predictability in precipitation and temperature predictions does not guarantee higher predictability in long-term water quality predictions.
Finally, we presented an integrated approach that takes into account various climate information and downscaling methods for water quality prediction, which can be used to proactively prevent potential problems caused by extreme climate events.

Data availability
Seasonal forecast data by individual models are available at the APCC Data Service System website (APCC) (http:// adss.apcc21.org/).Monthly climate indices data are available at the Climate Indices website (APCC) (http://www.apcc21.

Figure 1 .
Figure 1.Flow chart of integrated seasonal climate and water quality forecasting based on a modeling approach.

Figure 2 .
Figure 2. (a) Location of the monitoring network; (b) land use; and (c) elevation of the Wecheon watershed.

Figure 3 .
Figure 3. Spatial distribution of selected variables by the NCEP, PNU, and POAMA models for 1-month lead precipitation predictions in September (yellow indicates most frequent selection through the cross-validation procedures from 1983 to 2013).

Figure 4 .
Figure 4. Description of the monthly prediction approach and use of different amounts of prediction data according to lead time.

Figure 5 .
Figure 5. Temporal correlation coefficients (TCC) according to changes in lead time for predicting precipitation (top) and temperature (bottom) using multi-model ensemble (MME) average.

Figure 6 .
Figure 6.Comparison of simulated and observed total nitrogen (TN) load for calibration (top) and validation (bottom) periods.

Figure 7 .
Figure 7.Comparison of simulated and observed total phosphorus (TP) load for calibration (top) and validation (bottom) periods.

Figure 8 .
Figure 8.Comparison of observed pollutant loads (Observed-WQ), simulated pollutant loads using observed weather data (SWAT-Observed), and simulated pollutant loads using forecasted MME data (SWAT-Forecast) at Wecheon-B water quality monitoring station.

Table 1 .
Description of ten GCMs used as predictors.

Table 2 .
Monthly updated climate indices used for seasonal prediction.
* Data source is NOAA and remaining is APCC.

Table 3 .
Selected downscaling method and models for each month according to different lead time and variables.

Table 4 .
Results of SWAT calibration and validation for monthly streamflow.