Application of Weighted Semivariogram Model (WSVM) based on fitness to experimental semivariogram on estimation of rainfall amount

mated semivariograms and weighted factors, which are related with the inverse of the objective function value associated with the optimal parameters of theoretical semivariogram models (TSVMs). A WSVM can save computation time in the estimation of rainfall amount without the identification of the best-fit TSVM commonly carried out by the cross-validation. Ten rainstorm events recorded at fourteen rain-gauges in North 10


Introduction
The Kriging method is a well-known geostatistical method widely applied in the estimation of hydrological variables, such as precipitation (e.g. Goovaerts, 2000;Teegavarapu and Chandramouli, 2005;Legleiter and Kyriakidis, 2008), hydraulic conduc-20 tivity (e.g. Cassiani et al., 1998;Ouellon et al., 2008), soil moisture (e.g. Buttafuoco et al., 2005;Perry and Niemann, 2008), and shallow water table (e.g. Desbarats et al., 2001;Lyon et al., 2006). Moreover, the optimal number and location of rainfall gauges can be determined by the Kriging method (e.g. Pardo-Iguzauiza, 1998). The modification of the Kriging method is still in progress based on the characteristics of spatial vari- Taylor expansion approximation in order to evaluate the influence of parameter uncertainty in the Kriging method. This model is applied in the estimation of average annual precipitation over the Veneto Region in Italy. From experiment results, this model effectively assesses the influence of parameter estimation uncertainty in the Kriging. Ortiz and Deutsch (2002) proposed an approach for the evaluation of the uncertainty in the 5 semivariogram, and developed a methodology to transfer this uncertainty value into geostatistical simulation and decision making. Teegavarpu et al. (2005) presented a stochastic data-driven model incorporating an artificial neural network and the Kriging method for the estimation of missing precipitation records. Their results indicate that the proposed model can improve the estimation of missing precipitation and its accu-10 racy is better than the commonly used inverse distance method. Skoien et al. (2006) suggested the topological Kriging (Top-kriging) to estimate 100-year flood in ungauged catchments in two Austrian regions. Their results proved that Top-kriging can provide more plausible and indeed more accurate estimates than Ordinary Kriging. Walter et al. (2007) developed a methodology for improving the semivariogram estimation when 15 low sample size is applied in generating spatial autocorrelation of oyster abundance. This proposed method can reduce the likelihood of failing to obtain a variogram from a set of samples and improves the efficiency of variogram estimation.
Although several modified Kriging methods have been developed, a good best-fit theoretical semivariogram model of a spatial phenomenon is still necessary (Delay 20 and Marsily, 1994). Delay and Marsily (1994) proposed a method of the integral of the semivariogram (ISV) to overcome the problem of grouping the pairs of experimental points into classes of distances when the data are not distributed on a regular grid. However, there are often limited data available in early stage of geostatistical modeling which leads to considerable the uncertainty in statistical parameters, including 25 the variogram (Ortiz et al., 2002). Hence, the identification and parameter-calibration of the best-fit theoretical semivariogram model, and the estimation of spatial data, become uncertain and unreliable. The uncertainty probably results in measurement error, equipment failure, or other errors of spatial correlation and so on. Unfortunately, the 4231 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | above uncertainty in the calculation of the variogram is hardly eliminated (Cressie and Hawkins, 1980;Genton, 1998). Although Barancourt et al. (1992) indicated that the selection of the theoretical semivariogram slightly influences the estimation of averaged monthly precipitation fields, the accuracy of predicted precipitation by the Kriging method in the high-resolution rain-gauge network is significantly affected by the theo-5 retical semivariogram model in which the best-fit type is difficultly identified using the dispersive experimental semivariogram (Dirks et al., 1998). As with the above, Verworn and Haberlandt (2011) presented that the precipitation interpolation performance is less influenced by the effect of different semivariogram types, but its performance significantly varied with the event. In addition, the Kriging method separates two steps 10 of the selection of the best-fit semivariogram model and the calibration of associated parameters. The selection of the best-fit semivariogram model is commonly carried out by the leave-one-out cross-validation method. In detail, cross-validation leaves one sample out and predicts for the sample location based on remaining samples (Santra et al., 2008). In doing so, the selection of the best-fit model may involve a long compu-15 tation time, especially for a big sample size. Moreover, the above separation probably leads to the selection of best-fit model based on the variance of the errors of estimated data through a semivariogram model with previously calibrated parameters. Hence, the uncertainty in model parameters will influence the selection of the best-fit model (Todini, 2001).

20
The aim of this study is to present a weighted semivariogram model based on theoretical semivariogram models to reduce the probability of failing to select the best-fit semivariogram models and associated parameters so as to effectively produce accurate spatial estimators.

Brief review of theoretical semivariogram model
Before introducing theoretical semivariogram models, the definition of an experimental semivariogram γ (h) is expressed as in which h is the distance between two spatial points, N(h) is the number of points within the distance h, z(x) and z(x + h) are spatial data at two points x and x + h. The commonly used theoretical semivariogram models are introduced as follows (Davis, 1973): 1. Spherical model: 2. Exponential model: 3. Gaussian model: Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 4. Power model: 5. Linear model: 6. Cubic model: 7. Pentaspherical model: in which a 0 andC 0 are the influence range and the scale (or sill), respectively. This study implements the parameter calibration using the genetic algorithm method with 10 an objective function F obj as: To reduce the uncertainty of failing to select the best-fit theoretical semivariogram model and associated parameters, this study refers to the combined forecasts method (Fischer and Harvey, 1999) to develop a weighted semivariogram model. The com-5 bined forecasts method is a well-established procedure to improve forecasting accuracy which takes advantage of the availability of both multiple information and computing resources for data-intensive forecasting (Bunn, 1989). Basically, the proposed weighted semivariogram model combines results from theoretical semivariogram models to provide the weighted average of semivariogram γ w (h) by using the following 10 equation: in which w sv is the weighted factor and γ m (h) denotes the estimated semivariogram estimated m-tj theoretical semivariogram model. In view of Eq. (7) where F obj (m) denotes the objective function for the m th theoretical semivariogram model associated with the optimal model parameters. Note that the sum of w sv should be equal to one. Figure 1 shows the graphical illustration of theoretical semivariogram models and the weighted semivariogram model.

5
Substituting the weighted semivariogram γ w (h) into the Kriging equation system, the Kriging weight λ can be solved where µ is the Lagrange multiplier. h 0,j is the distance between the point x 0 of which dataẑ(x 0 ) would be estimated as well as point x j ; of which data z(x j ) are known, and 10 K is the number of locations measured. Eventually, the estimate at the point x 0ẑ (x 0 ) can be arrived at d through the following equation with the measured data at points x j z(x j ).

15
To derive the proposed weighted semivariogram model, the procedure of model development is expressed as: - Step 1: Calculate the experimental semivariogram. - Step 2: Calibrate the parameters of theoretical semivariogram models using the experimental semivariogram. - Step 3: Calculate the weighted factors of theoretical semivariogram models using Eq. (11).

-
Step 4: Estimate the semivariograms using the theoretical semivariogram models and calculate the corresponding weighted average through Eq. (10) - Step 5: Solve the Kriging equation system composed of the weighed theoretical semivariogram to obtain the Kriging weights using Eq. (12) and then predict the spatial data at the objective points through Eq. (13).

Model validation
In this study, the model validation is made by comparing the estimated rainfall amount at rain-gauges by the Kriging method with weighed and theoretical semivariogram models. To investigate the effect of sample size on the estimation of rainfall amount, the cross validation method is implemented in the mode validation. In detail, some rain-15 gauges are randomly extracted from the catchment area, which are defined as calibration gauges, and the remaining gauges are regarded as validation gauges used for the model development and validation. Then, the rainfall amounts at validation gauges are estimated using TVMS and WSVM, in which the associated parameters are calibrated using the observed data at calibration gauges, and the associated model performance 20 indices are calculated for the model assessment. Note that the weighted and theoretical semivariogram models are named WSVM and TSVM respectively in this study.

4237
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | The performance indices commonly used in the model validation are introduced as: 1. Root mean square error (RMSE): in which N val is the number of validation gauges, R est and R obs are estimated and observed rainfall amount at the validation gauges estimated using Kriging method 5 with TSVMs or WSVM, respectively. Note that a small RMSE value indicates that the estimated rainfall amount could be closer to the observed one.

Model reliability index (KG) (Leggett and Williams, 1981):
Note that KG approaching one implies that the spatial variation of the estimated 10 rainfall amount could resemble observed one.
3. Probability of performance indices of estimated rainfall amount using WSVM less than those for TVSM.
Since calibration gauges are randomly selected by means of the bootstrap method in this study, the resulting performance index RMSE and KG values 15 are probably dependent on the locations and number of calibration gauges extracted.
In addition to comparison of RMSE and KG, this study evaluates WSVM and TSVM by calculating probability of performance indices for WSVM superior to those for TSVM. Thus, using a number of estimated rainfall amounts by means of WSVM and TSVM, the corresponding probability Pr(RMSE WSVM < RMSE TSVM ) is calculated. Similarly, the probability of WSVM closer 5 to one than that for TSVMs Pr((KG WSVM − 1) < (KG TSVM − 1)) is computed. By comparing Pr(RMSE WSVM < RMSE TSVM ) and Pr((KG WSVM − 1) < (KG TSVM − 1)) the effect of the number and location of calibration gauges to WSVM and TSVM in the estimation of rainfall amounts at ungauged zones can be analyzed, and the results can be referred to the evaluation of the proposed WSVM.

Results and discussion
In this section, the estimated rainfall amounts are produced by the Kriging method with the best-fit TSVM and WSVM and compared using the performance indices calculated. Since the estimation of rainfall amount probably is probably influenced by rainstorm events and number, as well as location, of rain-gauges, the model validation has two 15 parts: one is to consider the uncertainty of rainstorm events and the other is to take into account the effects of number and locations of rain-gauges on the estimation of rainfall amount.

Study area and data used
For the model development and validation, the Shinmen reservoir catchment is adopted 20 as the study area (see Fig. 2). The Shinmen Reservoir is located upstream of the Dahan River basin in northern Taiwan, and serves a number of purposes, including irrigation, hydroelectric power, fresh water supply, flood prevention and sightseeing. Ten rainstorm events recorded at the fourteen rain-gauges from 2004 to 2008 in Shinmen reservoir catchment are used as the study data, as shown in Tables 1 and 2.

Consideration of uncertainty in rainstorm events
To reflect the uncertainty in rainstorm events, the observed rainfall amounts of ten rainstorm events (see Table 2) are used in the model validation. Note that seven gauges of the fourteen rain-gauges in the Shinmen reservoir catchment are selected as calibration gauges, whereas the remaining gauges are validation gauges (see Table 1). 5

Identification of best-fit TSVM
Using the Kriging method to estimate spatial data, the best-fit TSVM should be identified in advance. In general, the leave-one-out cross-validation method is widely applied in the identification of the best-fit model based on the standardized average error (SKAE) and the standardized kriging variance (SKV) (Evrendilek and Frtekin, 2007;10 Kumar and Remadevi, 2006) as: where N is the sample size. z * (x i ) and z(x i ) are the estimated and observed data at the point x i . σ ki denotes the estimated kriging variance at the point x i . SKAE accounts for 15 an indicator of prediction errors, which means the degree of bias in model prediction, and it is required to be close to zero. Moreover, SKAE is supposed to be in the range 1 ± 2 √ 2N. SKV reveals the comparison of the error variance to the kriging variance, and should be close to one. SKV greater and less than one means that the predictions are underestimated and overestimated, respectively. In summary, the best-fit TSVM 20 should satisfy the criteria, i.e. SKAE ∼ = 0 and SKV ∼ = 1. In addition, before calculating the experimental semivariogram used in the identification of the best-fit model and parameter-calibration, the rainfall amount should be non-dimensionalized through the equation R * (x) = R (x)/σ R in which R * (x) is a dimensionless value of rainfall amount R(x) and σ R is the standard deviation of rainfall amounts. Table 3 lists SKAE and SKV values of TVSMs for eight rainstorm events. As shown, 5 TSVM has significantly different SKAE and SKV values for rainstorm events, so the selected best-fit TSVM varies with the rainstorm event, based on the abovementioned criteria. Specifically, Power and Gaussian models have the maximum SKAE (on average 4.09) and SKV (on average 18.86) respectively, which indicates that Power and Gaussian models are unlikely to be selected as the best-fit models. According to SKAE, the best-fit TSVMs for rainstorm events are Spherical (EV2 and EV4), Exponential (EV1), Gaussian (EV5, EV6, EV10), Linear (EV3), Cubic (EV7), and Pentaspherical (EV8). However, referring to SKV, the best-fit TSVMs are Spherical (EV1 and EV5), Exponential (EV2, EV9, and EV10), Power (EV3 and EV8), Linear (EV7), and Cubic (EV6). In summary, the Spherical, Exponential, and Gaussian models are frequently 15 identified as the best-fit models.
Although the best-fit TSVM can be determined based on SKAE and SKV as shown in Table 1, it is observed that the corresponding SKAE and SKV values are obviously greater than zero and one, respectively. This implies the best-fit model has a significant model bias and the resulting predictions may be underestimated.

Calculation of weighted factors of TSVM
As with deriving WSVM, the weight factors of TSVMs should be calculated in advance. According to Eq. (10), the weighted factors are based on the objective function values associated with the optimal parameters of TSVMs. Using the genetic algorithm with the objection function Eq. (8) in this study, the optimal parameters of TSVMs can be calibrated, and the corresponding objective function values are computed as shown in Table 4. Thus, the weighted factors of TSVM used in WSVM can be quantified (see Table 5) and then the weighted semivariogram can be calculated. On average, the 4241 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | average weighted factors are 0.152 (Spherical), 0.144 (Exponential), 0.147 (Gaussian), 0.097 (Power), 0.136 (Linear), 0.177 (Cubic), and 0.147 (Pentaspherical). Therefore, except for the Power model, the weighted factors of TSVMs approxi-mate 0.15. It can be also said that TSVMs (except for the Power model) may pro-vide equivalent contributions to the estimated semivariogram so that the estimated 5 semi-variogram by all TSVMs should be taken into account. This result differs from the consequences for SKAE and SKV criteria in which Spherical, Exponential, and Gaussian models are identified as the best-fit models. As a result, theoretically, the proposed WSVM could reduce the uncertainty in the selection of the best-fit TSVM so as to enhance the reliability of estimated rainfall amount.

Comparison of estimated rainfall amount
Through the Kriging system equation associated with estimated semi-variogram by the best-fit models, which are determined based on SKAE and SKV criteria, and WSVM for ten rainstorm events, the rainfall amounts at seven validation points are estimated. Fig-ure 3 shows the graphical comparison of observed and the estimated rainfall amounts 15 at validation gauges by the best-fit TSVMs and WSVM. It can be observed that the es-timated rainfall amount by WSVM significantly differs from those by the best-fit TSVMs for ten rainstorm events. Although the fitness of estimated rainfall amount to observed data varies with the rainstorm event, the estimated rainfall amount visually fits the ob-served data better than those by the best-fit TSVMs, except for EV2 and EV7 in which 20 the estimated rainfall by WSVM resembles those by the best-fit TSVMs. Therefore, it is shown that WSVM could capture the behavior of rainfall amount better than the best-fit TSVMs.
The performance index RMSE values are also calculated with the observed and esti-mated rainfall amounts at validation gauges by WSVM and the best-fit TSVM as shown 25 in Fig. 4. From Fig. 4, the RMSE values of estimated rainfall amounts by WSVM are less than or approximately equal to those by the best-fit TSVM. Specifically, the av-erage RMSE for WSVM (1.372) is significantly less than those for the best-fit TSVMs, 4242 i.e. 1.458 (SKAE) and 1.508 (SKV). It follows that the proposed WSVM could effectively provide more accurate estimation of rainfall amount than the best-fit models identified by SKAE and SKV criteria by taking into account the estimated semivariogram by TSVMs.
In addition to the performance index RMSE, this study also calculates the model 5 reliability index KG to evaluate the spatial variation of estimated rainfall amount (see Fig. 5). From Fig. 5, similar to RMSE, the KG values of estimated rainfall amount by WSVM are less or approximately equal to those by the best-fit TSVM. On average, the KG value of estimated rainfall amount by WSVM approximates 1.372 which is less than those by the best-fit TSVMs, i.e. 1.458 (SKAE) and 1.508 (SKV). It can be also 10 said that the KG values of estimated rainfall amounts by WSVM are closer to one than those by the best-fit TSVMs. Therefore, WSVM could capture the behavior of rainfall amount better than the best-fit TSVM.
To sum up the above results, the proposed WSVM can reduce the uncertainty resulting from unsuitable best-fit TSVMs so as to produce more accurate and reliable 15 estimated rainfall amount.

Consideration of uncertainty in number and location of rain-gauges
To evaluate the effect of number and locations of rain-gauges on the estimation of rainfall amount by the Kriging system equation with WSVM and the best-fit TSVM, 4-11 rain-gauges are randomly extracted as calibration gauges and the remaining gauges 20 are defined as the validation gauges by fifty times. According to results from Table 1, Spherical, Gaussian, and Exponential models have a high likelihood of being selected as the best-fit model. Therefore, this study adopts Spherical, Gaussian, and Exponential models as the best-fit TSVMs and compares them with WSVM in the estimation of rainfall amount under the consideration of various number and locations of calibration 25 gauges. Figure 6 shows the comparison of average RMSE of estimated rainfall amount by WSVM and the best-fit models with various number of calibration points. The average 4243 ber of calibration gauges are mostly greater than 50 %. Specifically, the average values of Pr(RMSE WSVM < RMSE TSVM ) are 63.4 % (Spherical), 61.6 % (Gaussian), and 55.6 % (Exponential), respectively. As for Pr (KG WSVM − 1) < (KG TSVM − 1) , t he average for Spherical, Gaussian, and Exponential models are 62.4 %, 62.9 %, and 51.7 % , respectively. In the cases of Spherical and Gaussian models being the best-fit models, 20 WSVM can produce the estimation of rainfall amounts with a corresponding probability of 60 % which are more accurate and reliable than the best-fit models. Even for the Exponential model, which has fewer probabilities Pr(RMSE WSVM < RMSE TSVM ) and Pr (KG WSVM − 1) < (KG TSVM − 1) as compared to the Spherical and Gaussian models, WSVM has a 50 % probability of capturing the behavior of estimated rainfall amount 25 in scale and space better than the Exponential model. In summary, varying according to number and location of rain-gauges, WSVM has a high likelihood of capturing the rainfall amount at validation gauges, that is, WSVM is significantly superior to the best-fit model in the estimation of rainfall amount under the consideration of uncertainty in the number and location of rain-gauges. In addition, since WSVM estimates the rainfall amount without identifying the best-fit TSVM through cross-validation, it takes less computation time and is more effective than TSVM.

Conclusions
This study proposes a weighted semivariogram model (WSVM) to reduce the uncertainty in the identification of the best-fit theoretical semivariogram models (TSVMs) and associated parameters. The proposed WSVM mainly calculates the weighted average of the semivariogram, which is the sum of product of estimated semivariogram and weighted factors resulting from the inverse of the objective function value associated with optimal parameters. WSVM can effectively produce the rainfall amount without 10 determining the best-fit TSVM commonly carried out by the cross-validation method. The results of the graphical comparison and performance indices for ten rainstorm events recorded in the Shinmen reservoir catchment indicate the proposed WSVM not only improves the accuracy of estimated rainfall amount without determining the bestfit model, but also has a high probability of capturing the real rainfall amount under 15 the consideration of uncertainties in rainstorm events and number as well as locations of rain-gauges. Consequently, it is shown that the WSVM can effectively reduce the uncertainty of failing to select an unsuitable model and provide the more accurate and reliable rainfall amount.

EV10
Obs WSVM Best-fit model_SKAE Best-fit model_SKV        Fig. 7. Probabilities of RMSE and KG of estimated rainfall amount using WSVM less than those using TSVM with different number of calibration gauges.