The development of statistical downscaling methods for assessing the effects of climate change on the precipitation isotopes concentration

In recent years, stable isotopes of the water molecule (oxygen-18 and deuterium) have become a useful tool for tracking the water cycle. The concentration of these tracers changes with variations of water molecules within the water cycle. Due to this feature of isotopes, global large-scale isotope models have been developed. On the other hand, numerous local and global networks have been created in order to monitor the concentration of precipitation isotopes. The main problem with the simultaneous use of these local stations and the large-scale isotope datasets is their temporal and spatial mismatch. To use both isotope databases for monitoring the hydrological cycle in local scale, it is necessary to downscale the large-scale models’ outputs. In this research, a downscaling approach is proposed for isotopes’ concentrations using three statistical models, including multiple linear regression, generalized linear and weighting least square regression models. The results indicate that the implementation of the statistical downscaling method in the case of information preprocessing based on the seasonal changes, their spatial variations and a suitable method selection is a useful tool for monitoring the climate changes of a region according to the information on the stable oxygen-18 isotope. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/). doi: 10.2166/wcc.2020.208 ://iwaponline.com/jwcc/article-pdf/12/3/709/893647/jwc0120709.pdf Maryam Mosaffa Sara Nazif (corresponding author) School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran E-mail: snazif@ut.ac.ir Youssef Khalaj Amirhosseini Water and Environmental Kowsar Research Center, No. 33, Fatemi St., Tehran, Iran


INTRODUCTION
Along with all the available methods for identifying the basins and connections among the components of the hydrological cycle, isotope tracing has become an effective tool for a better understanding of hydrological processes through creating new insight into our present and past climate.
During recent decades, many researchers have used the water constituent stable isotopes (hydrogen and oxygen) in order to investigate the connection details of the hydrological cycle's components (Bowen & Revenaugh ).
The difference in physical properties of water molecules leads to a change in the stable isotope composition of water, due to fractionation during phase changes (Kim & Lee ).
In general, one can say that the water molecule's isotopologues show different behaviors during the phase transition.
The occurring fractionation process on the water isotopologues has led to a significant correlation between the isotopes' concentration and climatic variables (Sutanto et al. ). Among all the tracers used in hydrology, oxygen-18 isotope concentration is sensitive to phase changes of water during its circulation. Therefore, it has been used as a natural tracer for hydrological cycles since the 1960s because the isotope-related characteristics of water are dependent on phase changes of water in evaporation and condensation as well as geographic and temporal variations. By using the isotope information in precipitation and vapor, one can study atmospheric vapor cycling processes on various scales, such as large-scale transport and in-cloud processes (Rozanski & Gonfiantini ).With the assistance of the World Meteorological In addition to simulating the temporal-spatial distribution of the isotope, the IsoGCM models, which are equipped with the concentration simulation of water molecule's isotopes, can also interpret the effects of kinematic and equilibrium fractionation on the isotopes' concentration during a global hydrological cycle (Haese et al. ).
Researchers created the IsoGCM models by counting the fractionation process in all transfer phases such as evaporation, condensation, precipitation, air vapor and added them to the GCM models. In the general modeling of the isotopes, the Earth's atmosphere is divided into hundreds of air columns and the parameters of the dynamic differential equations of the atmosphere such as solar radiation, cloud, and convective flow are calculated in these columns and transferred to the next columns. In each transfer, the concentration of isotopes is updated based on the fractionation among the different phases of the air mass.
Depending on the conditions and structure of each of the applied IsoGCM models, the use of isotopes also varies in the simulated atmospheric variables (Hoffmann et   Although the isotopic large-scale models are utilized to increase our understanding of the physical relationships between the climate variables and water isotope, the outputs of these models do not have the resolution to assess climate change at the regional scale (Dansgaard ). Climate change assessment studies are usually carried out with high spatial resolution in small areas. In this case, it is desirable that the outputs of these models are downscaled. In the meantime, choosing the downscaling technique of the global simulation models' information is one of the primary challenges in using data from these models on a local or regional scale. The regional dynamic models and statistical methods are considered as two common techniques for downscaling the information on the large-scale models. For using isotopic data for climate change studies it is necessary to spatially downscale them to be used beside the weather characteristics. Less attention is paid to this issue in the literature and few models and methods are available for this purpose. The limited amount of isotopic measurements is another issue that causes more difficulty in this regard. Therefore, the aim of this study is to develop a downscaling model considering the limited available data through developing relations between weather and isotopic variables.

THE STUDY AREA
Shiraz, the capital of Fars province with an area of about 8,725 square kilometers, accounts for approximately 7.6% of Iran's total area and its altitude is 1,486 m above sea level. Shiraz is located 53 34 0 east of the Greenwich meridian and 29 36 0 north of the equator. Based upon the meteorological data during the statistical period of 45 years  for this region, the average annual temperature in this city is 18.04 C and the total annual precipitation in the province is 320.2 mm. Also, the average long-term annual precipitation of Shiraz is reported as 100.4 mm. The majority of rainfall in the study area is due to the winter masses. The Mediterranean Sea is the main moisture source of the study area during winter and affects the area from the northwest, west, and southwest

DATASETS
In this study, two datasets are used for the calibration as well as verification of the suggestive statistical downscaling model of the oxygen-18 concentration in Shiraz's monthly precipitation. The first set includes the IsoGCM model data. The second group of data used for model validation are the recorded data in the study area which include rainfall, temperature, humidity, and oxygen-18 concentration in precipitation. Since the number of local measurements is too few, this information is only used to evaluate the function of equations at the regional level. In order to study the effects of climate change here, we have first weighed the isotopes concentration based on the precipitation amount of the model and then used its monthly average for the simulation (Delavau et al. ). In the following, each of these databases will be introduced in detail.

Local database
To evaluate the efficiency of the proposed downscaling models in the study area, the statistical information of the meteorological station located in Shiraz has been used.
The meteorological station information includes temperature, precipitation, wind speed, and relative humidity on a daily basis. The observation period of the meteorological data at this station is from 1983 to 2017. The daily information of these data has been provided from Iran Meteorological Organization. To use this information, a monthly time scale has been selected. The average monthly temperature recorded at this station is 17.5 C and its total annual precipitation is 312 mm. Along with the meteorological station data, 12 precipitation samples of Shiraz city were evaluated in terms of the isotopic concentration of deuterium and oxygen-18 during the data downscaling period.
The isotope concentrations of the precipitation samples are indicated with δ notation relative to the VSMOW (Vienna Standard Mean Ocean Water). The isotopic abundance ratios are expressed as parts per mill of their deviations as given by VSMOW and δ is calculated according to the following relation: where, α represents the concentration ratio of heavy to light isotope in the measured and standard samples. Currently, VSMOW is declared as the standard defining the water isotopic composition by IAEA. The concentrations of oxygen-18 and deuterium isotopes of the samples have been measured using the Los Gatos Research (LGR) setup.    (2) and then its weighted average evaluated: where, i, P i , and δ 18 O i define the month indicator, average monthly precipitation, and monthly isotopic concentration of oxygen-18, respectively.
In the data downscaling process of the IsoGSM model, 70% of the studied time data have been randomly selected in order to train the regression model and the remaining 30% used for validating the model. The IsoGSM data output scale is a 6-hour timeframe and their monthly average is  (1) using Point 1 and Point 3 information in developing the downscaling model because the local sampling point and the two points of 1 and 3 are approximately located at the same latitude; (2) using Point 1 to Point 4 information in developing the downscaling model considering sources of moisture in the study area and wind direction.  isotope in precipitation (step D). Figure 2 depicts the flowchart of the methodology process used in the present study. In the following, efforts will be made to briefly introduce the steps outlined above.
Step A Since a method is being developed for estimating the concentration of oxygen-18 isotope in rainfall, the days in which rainfall has not occurred or the amount of precipitation is small, must be deleted from the database. In this study, the boundary value of 0.5 mm of precipitation per day has been used.  effects of season and temperature on the concentration of precipitation isotopes will be considered by categorizing the data according to the seasons.
Step B The Step where, y and X i denote the value of dependent variable (δ 18 O) and selected predictor variables in step B (precipitation, temperature, and relative humidity), respectively. β i stands for the regression coefficient being estimated by means of the least squares method. In the MLR approach, it is assumed that there is no multicollinearity among the predictor variables. This assumption is assessed using the VIF coefficient which can be obtained as below: where, R 2 i is the coefficient of determination of the model in which the independent variable of X i is regressed on the remaining ones.
This factor indicates how much the variance of the estimated coefficients is inflated compared to the state in which the estimated variables are not linearly correlated. As an empirical rule, for VIF values greater than 5, the multicolinearity is high. If the colinearity level among the independent variables is high, it indicates correlation existence among them and the selection of the predictor variables should be reconsidered. Since the independent and dependent data have different measurement units and the range of numbers varies for each variable, the data must be standardized before the analysis according to the following relation: where, μ and σ stand for the average and standard deviation of the data, respectively. Also, Nx i is the standardized value with average and variance values of 0 and 1, respectively.

C-2) GLM
The GLM model is a method used to simulate the linear relationships between several independent variables with an independent one and it is assumed in this method that all the investigated data are subjected to the normal distribution. The GLM regression model is defined as: in which, ε i is the random error with normal distribution as ε i ∼ N(0, σ 2 ). Since the prediction of the relation between the dependent and independent variables is not accurate, the considered random variable as the model error prevents In this research, the Kolmogorov-Smirnov test is implemented due to the large amount of input data. When examining the data normality, the zero assumption is tested at a 5% error rate. If the test statistic is greater than 0.05, then there is no reason to reject the zero assumption based on the normal distribution of data.

C-3) WLSR
The WLSR method used here for the information downscaling is very similar to the MLR one, with the difference that in this regression approach, a weight W is assigned to each of the independent variables. This weight is equal to the inverse square of the linear regression variance fitted on the independent variables by stepwise regression method.
The WLSR method is used to weigh the independent variables of a regression when the error variance is constant.
The data weight according to the least squares indicates the error's random behavior in the simulation model.
If the parameters of a regression model are estimated using the LSM, the error terms will not depend on each other and will have the same variance. This is because when less weight is assigned to the data that are more accurate and higher to those with less accuracy, this makes the data behave better compared to the overall variance. Therefore, weighing independent variables in the regression equation reduces the number of deviated information from the time series. The equation governing the WLSR is defined as: where w i denotes the weight of ith variable given by: Step D In order to evaluate and analyze the performance of esti- The root mean square error is used as a scale to indicate the difference between the simulated and measured values.
This criterion, which is defined by Equation (9), is considered as the most commonly used error index. Equation (10) describes the Nash-Sutcliffe efficiency (NSE) coefficient. If the NSE value is equal to 1, there is a complete proportion between the observational and simulation data.
In fact, this criterion indicates the relative importance of the simulated values variance in comparison to that of the observational ones. The ratio index of simulation standard deviation to observations standard is another way of assessing the observations (Equation (11)).
Equation (12) represents the coefficient of determination for evaluating the simulation results. This coefficient indicates how much the dependent variable's changes are affected by the independent variable and its remaining variation is related to the other factors: In the above equations, O io defines the simulated con-

RESULTS AND DISCUSSION
In this section, attempts are made to show the results of the statistical downscaling of the oxygen-18 isotope concentration according to the steps described in the previous sections.
Step Step B The IsoGSM large-scale model has 19 outputs as defined in    Tables 5 and 6 for both scenarios. As can be seen from the results of Table 5, the value of this coefficient in two scenarios is lower than 2.5, therefore, one can ignore the correlation among the independent variables.
Step C The downscaling results of the oxygen-18 isotope concentration via MLR (Equation (3)), GLM (Equation (6)), and WLSR (Equation (7)) models are given in Table 7     randomly selected in order to train the regression model and the remaining 30% used for the model validation. In this section, we evaluate the regression models introduced in step C for simulating the data of the large-scale IsoGSM model using the appropriate statistical methods.
Investigating the standardized residual values. Figure 3 depicts the standardized residuals in the verification data (30% of the whole data separated by season) for each regression model and spatial scenario. All graphs represent the randomness of the residuals' pattern. This randomness  Box plot investigation. The simulated concentration value of the oxygen-18 isotope in winter using different models is plotted in Figure 4 for the first spatial scenario (Points 1 and 3).   Probability density function (pdf) investigation. The pdf investigation is considered as another test for evaluating the simulation performance of the modeling methods. In both scenarios, the GLM and MLR models have been able to partially simulate the minimum limit values.  Table 7 and using the weather data of the region and then compared with the results of the observational data (Table 1). Figure 7 shows the comparison of the regression models' achievements with the regional observational data in both spatial scen- The above methodologies have their limitations. The main limitation of this study is the small sample size of isotopic data which is mostly because of the high cost of these kinds of measurements. Using a bigger sample of isotopic data will definitely improve the study results and outputs' accuracy. Furthermore, other weather variables such as geopotential height and surface pressure can be taken into account in developing the downscaling model. This is not considered in this study because of limitations in data accessibility.
The results of the three basic statistical downscaling models have been compared with the information of the large-scale isotopic (information used for the validation) one and the observational results in terms of the statistical downscaling ability of oxygen-18 isotope concentration in precipitation of Shiraz, southwest Iran. Comparison of the simulation results with the data via the global and regional large-scale data shows that the first spatial scenario utilizing data from nodes 1 and 3 has better performance than the second scenario which contains the information from four nodes around the study area. The standardized residual  error around the x-axis in the first spatial scenario (Points 1 and 3) is more symmetric than the second counterpart. This is mainly due to the fact that the two nodes of 1 and 3 have approximate latitudes as wide as the study area. Comparison of the time separation information also shows the effect of season and time variations on the concentration of isotopes.
Simulating oxygen-18 isotope concentration using the seasonal equations is more accurate than their annual simulation.
Therefore, it can be concluded that the use of the statistical downscaling method in the case of information pre-processing based on their seasonal and spatial changes and the selection of a suitable method is a useful tool for monitoring climate change of a region based on stable oxygen-18 isotope information.  the results obtained via the improved statistical simulation models with the observational data and those of the isotopic global large-scale model (information used for the verification), indicates that the WLSR and MLR models have better performance of the three selected statistical methods.
Using the equations presented in this research, one is able to draw the monthly concentration distribution map of oxygen-18 precipitation isotopes in Shiraz using meteorological variables (precipitation, temperature, and humidity) and utilize them for long-term ecological, hydrological, and hydrogeological studies by means of isotope trackers.
A review of technical literature shows that the isotopic concentration is affected by the type of precipitation. For further studies, downscaled isotopic data can be used for the investigation of climate change impacts on the type of rainfall.