Identiﬁcation of hydrological model parameters for ﬂood forecasting using data depth measures

The development of methods for estimating the parameters of hydrological models considering uncertainties has been of high interest in hydrological research over the last years. Besides the very popular Markov Chain Monte Carlo (MCMC) methods which estimate the uncertainty of model parameters in the settings of a Bayesian framework, 5 the development of depth based sampling methods, also entitled robust parameter estimation (


Introduction
Conceptual hydrological models are designed to approximate the general physical mechanism which govern the rainfall-runoff process within a specific catchment.Many of the currently available models serve engineers in practice and hydrologists in research.Most of these models require calibration because available input data is rarely sufficient to directly parameterise the models to the desired accuracy.The success of a model application is thus strongly dependent on a good calibration.
Traditionally models are calibrated manually.This is very labour-intensive and the success of the calibration is highly dependent on the experience and hydrological knowledge of the modeller.Therefore, recently automatic methods for model calibration have evolved significantly (e.g.Duan et al., 1992;Gupta et al., 1998;Kobold et al., 2003;Theiner and Wieczorek, 2006) and have found a common acceptance and broad use in the hydrological community (e.g.Hogue et al., 2000;Cullmann, 2006;Kunstmann et al., 2006;Marx, 2007).These approaches are based on solving a mathematical optimisation problem which is formulated by the help of a purpose-oriented objective function which evaluates the model performance.The result is traditionally a single best performing parameter vector.However, regardless of the model and optimisation algorithm used, many studies applying such methods have reported problems in estimating unique best performing parameter vectors (Duan et al., 1992;Gupta et al., 1998;Wagener et al., 2004).The probability to estimate the same model performance for different estimated parameter vectors was described by Beven and Binley (1992) as the equifinality problem.In terms of the optimisation problem, the reason for this is the existence of multiple local optima with both small and large regions of attraction, discontinuities in the first derivatives, and curving ridges in the multidimensional parameter space.The difficulties to estimate single best performing parameter vectors are even stronger, when the focus is put on specific aspects of the catchment behaviour, e.g.flood events (e.g.Cullmann, 2006;Cullmann and Wriedt, 2008;Fenicia et al., 2007).In this context, hydrological models are not yet able to equally well describe the full range of processes that drive the runoff generation.This holds both for simple concept models and detailed process models with physically based components.One of the main reasons for this lack of "process fidelity" are the highly dynamic characteristics of such events.Besides the discussed issues in model calibration, the uncertainties in the used observations is often completely neglected.
To overcome this problem a whole variety of Markov Chain Monte Carlo (MCMC) methods have recently been developed in order to get a well-founded estimate of the the uncertainty of model parameters in the settings of a Bayesian framework.The approaches developed by Vrugt et al. (2003) and Kuczera et al. (2006) have recently attracted much scientific interest and enjoyed rising popularity in the hydrological community (e.g.Bos and de Vreng, 2006;Thyer et al., 2006;Frost et al., 2007;Grundmann, 2010).One major advantage of a Bayesian framework is the possibility to describe all relevant sources of uncertainty in a closed form and consequently the estimation of mathematically well founded results.However, also for those kind of approaches a modeller has to make assumptions of all sources of uncertainty to be considered.Often these assumptions are quite arbitrary because the information for a well founded decision is not available.Subsequently these decisions have a non-neglect-able influence on the results.Thus, the uncertainty estimates might get a rather subjective touch -a fact that contradicts the original intention of the application of a Bayesian framework.Furthermore in many real-world applications modellers call for purpose-specific objectives in calibration, this is difficult to integrate in the likelihood function Bayesian uncertainty framework.For example the formulation and implementation of a likelihood function considering both peak flow difference and the Nash-Suttcliffe efficiency is not straightforward.
A completely new kind of approach to address this problem was presented by B árdossy and Singh (2008) who applied the concept of data depth in order to sample robust model parameter vectors.In a previous study (B árdossy, 2007)  geometrically well structured for commonly used hydrological models.This result was also found by other studies dealing with model calibration (e.g.Kuczera and Parent, 1998) and by preliminary studies with WaSiM in the Rietholzbach catchment (Pompe, 2009).However, the actual goal of a successful model calibration is not just a good model performance in calibration, but the estimation of robust parameter vectors.A good definition of robustness of hydrological model parameters is given by B árdossy and Singh (2008).We call all parameter vectors robust if they: 1. lead to good model performance over the selected time period 2. lead to a hydrologically reasonable representation of the corresponding processes 3. are not sensitive: small changes of the parameters should not lead to very different results 4. are transferable: they perform well for other time periods and might also perform well on other catchments (i.e. they can be regionalised) Studies of computational geometry and multivariate statistics (e.g.Liu et al., 2006;Bremner et al., 2008) showed that members geometrically deep within a set, are more robust in order to represent the whole set.These points can be estimated by the concept of data depth, which has recently attracted a lot of research interest in multivariate statistics and robust modelling (e.g.Cramer, 2003;Liu et al., 2006).The estimation of a set of parameter vectors can be done in an evolutionary process, as presented by B árdossy and Singh (2008) in a very first sketch.The estimated results are very promising.Therefor we reviewed the presented methods, implemented improved methods to sample deep parameter vectors and well founded stopping criteria, and applied a further developed version of the ROPE method, called AROPE MC in order to calibrate a hydrological model.Compared to the study of B árdossy and Singh (2008) we applied the method to a distributed process-oriented model with a higher temporal resolution The concept of this paper will be illustrated with examples from the Rietholzbach catchment.Out of a time series from 1981-2008, 24 significant flood events were selected for model calibration and validation.The hydrological model chosen is the model WaSiM-ETH.A short description of the catchment, the used data and the model is provided in this section.

Study area
The real-world case studies presented within this paper were carried out on the Rietholzbach catchment.This basin has been observed as a research catchment by the ETH Zurich since 1975.The outlet drains a 3.18 km 2 hilly pre-alpine watershed with an average precipitation of 1600 mm per year, generating a mean annual runoff of 1046 mm.As a sub-basin of the Thur catchment it is located in the northeast of Switzerland.Its geographical location and basic land-use characteristics are listed in Table 1.A significant number of studies have been conducted in this basin.For further information refer to Gurtz et al. (1999); Zappa (2002) and the website http://www.iac.ethz.ch/research/rietholzbach.

Hydrological model
The used hydrological model is WaSiM-ETH/6.4(in the further referred to as WaSiM).
It is a spatial distributed process-oriented rainfall-runoff model and was developed by Schulla (1997) at the ETH Zurich.WaSiM has been used successfully for modeling the rainfall-runoff processes in several studies in catchments located within mid mountain ranges (e.g.Grundmann, 2010) and especially also in the pre-alpine Rietholzbach catchment (Gurtz et al., 1999(Gurtz et al., , 2003a,b),b).Additionally WaSiM-ETH has been used for extrapolation of extreme flood events by Cullmann (2006).For this study we used the version with the Richards approach for the simulation of the unsaturated zone.In the case studies presented discussed in this work, WaSiM will be calibrated for the simulation of extreme discharges.Therefore the main focus of attention in this short model presentation will be on the model part representing the unsaturated zone.WaSiM transforms rainfall into runoff according to the scheme shown in Fig. 3. Here, three exemplary soil water compartments receive infiltration which is computed by a modified approach according to Green and Ampt (1911).This module is also used to determine the direct runoff Q d in the model.Q d is then routed via a flow-time grid and finally projected cell-wise to the catchment outlet by means of a simple bucket type function (Eq.1).The recession coefficient of this function is the model parameter k d .
where Q d (t) is the direct runoff at timestep t with timestep length ∂t.
The soil water movement through the different soil layers is modeled by means of the discrete form of the Richards-equation which can be written as: where ∂Θ denotes the change in soil water content, ∂t defines the time step and ∂q is the change in specific flux.The fluxes q in and q out characterise the influx and efflux from the specific neighboring soil layer respectively.The thickness of the soil layers is defined by ∂z.
In the model each soil layer produces interflow (Q ifl ) according to Eq. ( 3), which is cell-wise scaled with the scalar model parameter dr.
where k s denotes the hydraulic conductivity at the water content Θ l in the considered soil layer l , dr is a conceptual model parameter to be estimated and β characterises the local slope in the grid cell.2429 Corresponding to the direct runoff, the interflow is again projected to the catchment outlet by means of a flow-time grid and a second bucket type function.Here the model parameter k i represents the recession coefficient in analogy to Eq. (1).For further details about WaSiM refer to Schulla (1997) and the official website of the model http://www.wasim.ch.Table 2 gives the model parameters considered for calibration.

Discussion
These are the storage coefficients of direct runoff and interflow, k d and k i , and the drainage density dr which is a scaling parameter of interflow generation.In previous studies (Cullmann, 2006;Pompe, 2009;Grundmann, 2010) these three parameters have been proven to be sensitive with respect to modelling flood events.Besides the specified upper and lower boundaries of the model parameters, the additional constraint k i ≥ 1.05 k d was introduced in order to account for the basic consideration that the direct runoff from a cell has a shorter travel time to the catchment outlet than the generated interflow in the unsaturated zone.

Data
Due to its longterm observation as a research catchment and its limited size, the Rietholzbach catchment has a long record of hourly data sets and the perturbing impact of data heterogeneity is relatively small in this catchment.The data we based our study upon is a time series consisting 27 years of meteorological and discharge measurements.Out of this time series we selected a set of 24 flood events with a peak flow of at least 1 mm h −1 .All events are in the time from May until October to avoid the problem of modeling snow accumulation and melting processes.An overview of all selected 24 flood events with their specific characteristics and pre-conditions is given in Table 3.

Objective criteria
Within this study commonly used objective functions are applied.There are global criteria which try to assess the general quality of the fit of model and the catchment behavior whereas local criteria just focus on a specific attribute, e.g.peak flow values or snow melt periods.The efficiency criterion according to Nash and Sutcliffe (1970) (NS) has been widely used to quantify the global performance of hydrological models.The relative deviation in peak flow (rPD) is a simple local criterion to assess the model's skill to represent the catchment behavior for flood events.In order to obtain both a good estimate of the peak flow values and a minimum reasonable representation of the catchment behavior we aggregated the global criterion NS and the local criterion rPD in a performance criterion, we call flood skill (FloodSkill).An overview of all criteria used as an objective within the case studies of this paper is given in Table 4.

Data depth
Data depth is a statistical method used for multivariate data analysis which assigns a numeric value to a point with respect to a set of points based on its centrality.This approach provides center-outward orderings of points in Euclidean space of any dimension and provides the possibility of a new non-parametric multivariate statistical analysis in which no distributional assumptions are needed.Tukey (1975) introduced this concept first in order to estimate the center of a multivariate dataset.A formal definition of an arbitrary depth function D for the d -dimensional space R d is given as follows: To be called a depth function, D has to fulfill specific properties (Zuo and Serfling, 2000).The concept of data depth is illustrated in Fig. 4 by a small 2-dimensional example.For a random point set in R 2 the data depth was computed for each point of the set with respect to the point set itself.The used data depth function was the halfspace depth.It is one of the best known among the data depth measures in nonparametric statistics, and in discrete and computational geometry.According to Tukey (1975) 2431 , n} is defined as the smallest number of data points in any closed halfspace with boundary through θ .This is also called the Tukey or location depth, and it can be written as: where u ranges over all vectors in R d with ||u|| = 1.
Very often the halfspace depth is normalized by division with the number of points in the set Z: The first publication of Tukey (1975) was then followed by many generalizations and other definitions of this concept (e.g.Oja, 1983;Donoho and Gasko, 1992;Rousseeuw and Struyf, 1998;Rousseeuw and Hubert, 1999;Venc álek, 2008).A good overview of a broad range of different definitions of the concept of data depth and its application for multivariate data analysis is given by Hugg et al. (2006) and Liu et al. (2006).In the following the symbol D is used for an arbitrary depth function.
For a given data set Z, the set D k of all points ∈ R d with depth at least k is called the contour of depth k in statistics (s.Donoho and Gasko, 1992).The application of that concept in the sampling of parameter vectors with a least depth with respect to a set of good parameter vectors is the underlying approach of the algorithm presented in this work.We use the definition of data depth introduced by Tukey (1975) with an implementation according to Rousseeuw and Struyf (1998).Furthermore it has proved to be a very robust measure in order to identify the center of a multivariate dataset (e.g.Rousseeuw and Struyf, 1998;Cramer, 2003;Serfling, 2006).For a study of further data depth functions with the algorithm presented in this paper, refer to Krauße (2011).The hydrological model is run for each parameter vector in X n and the corresponding model performances are calculated 5: The subset X * n of the best performing parameters is identified.This might be for example the best 10% of Θ n 6: m random parameter vectors forming the set Y m are generated, such that ∀θ ∈ The set Y m is relabeled as X n and steps 3-6 are repeated until 8: until the performance corresponding to X n and Y m does not differ more than what one would expect from the observation errors 9: return Y m i In principle the general proceeding of this algorithm, can be divided into three important parts.After the input and a pre-processing a set of good parameter vectors is identified (lines 4 and 5).ones) is generated (line 6).These two operations are evolutionary repeated and after each iteration a stopping criterion is checked (line 8).The general approach of the presented algorithm is well-founded and a case study with an application to the calibration of the conceptual hydrological model HBV for a catchment in south-west Germany on a daily time step resulted in parsimonious and reasonable results.However, in first studies we experienced some problems particularly with the application of the latter two parts of the procedure, the generation of deep parameter vectors and the exact definition of the stopping criterion.In the following we will give a brief overview of the problems and explain how the new AROPE MC algorithm addresses these shortcomings.
One of the major premises of the application of the concept of data depth is the assumption that the set of good parameter vectors is geometrically well-structured.In concrete terms we rely on the assumption that the depth contours will be indicative of the shape of the cloud of good parameter vectors, while generating deep parameters.However, for most depth functions, this does not hold on point sets that are distributed in non-convex position (Hugg et al., 2006).Unfortunately the parameter space of most hydrological models is dominated by distinct regions of attraction and non-convex multidimensional ridges (e.g.Duan et al., 1992;Sorooshian et al., 1993;Grundmann, 2010).
To overcome this conflict we propose to substitute the generation of deep parameter vectors with the strategy, entitled GenDeep, as given below in Algorithm 5.2.
Additionally we implemented alternative sampling strategies for the sampling of candidate points for the sets of deep parameter vectors.A simple sampling strategy of candidates is a uniform sampling within the bounding box for the considered set of good parameter vectors.This strategy gets ineffective and computationally intensive for higher dimensions.That is due to the fact that the volume ratio of the bounding box to the set of parameter vectors itself decreases with rising dimension.This issue is illustrated by Fig. 5 where the ratio between the volume of the unit sphere and the unit cube is plotted.Additionally the computational complexity of most depth functions increases tremendously for higher dimensions.To address this problem we suggest n , e.g. with the expectation maximization (EM) algorithm according to Dempster et al. (1977), which identifies the most probable number of clusters k in X * n and assigns all members of the set X * n to one (in case of ambiguity also to more than one) of the clusters c i , where i ∈ {1, ..., k}.
to approximate the considered set of good parameter vectors by a Gaussian mixture model (GMM) whose parameters can be estimated by the EM algorithm which is called anyway in order to do the cluster analysis in the presented strategy.For further details of the proposed strategy for the generation of deep parameter vectors refer to Krauße (2011).
Another issue of the ROPE MC algorithm is the loosely defined stopping criterion: "until the performance corresponding to X n and Y m does not differ more than what one would expect from the observation errors" (B árdossy andSingh, 2008, p. 1280).The problem is that there are countless possibilities in the prior estimation of the tolerance in the model performance due to uncertainty in the observation data and it can hardly be determined exactly.A broad definition of this tolerance can lead to sets with inferior model performance, whereas a tighter tolerance can easily result in overfitting.This is a severe shortcoming because it undermines the actual goals of the algorithm.Overfitting in the context of robust parameter estimation means that the model performance on the calibration data still can be increased by further shrinking the estimated set of the deep model parameter vectors, whereas the model performance on (reasonably similar) control data decreases by further shrinking.Figure 6 illustrates this fact with the results of the calibration of WaSiM in the Rietholzbach catchment w. r. t. to flood events.The FloodSkill criterion was used as objective and the flood events no. 4 and no.14 were used as calibration and control data, respectively.It is evident that the model performance on the control data considerably decreases from iteration 3 whereas the model performance on the calibration data could be increased by further iterations.
To address this problem we implemented two changes to the algorithm.First, we slightly changed the evolutionary shrinking of the generated deep parameter vectors.
To avoid the unintended exclusion of possibly robust parameter vectors close to the boundary of the initial set of good model parameters, we suggest merging the set of generated deep parameter vectors and the identified good parameter vectors as initial set for the next iteration, as follows: Furthermore we introduced a new stopping criterion in order to avoid overfitting.We suggest the splitting of the data used for model calibration in a calibration and a control set.Just the calibration set is used for the actual model calibration, whereas the control set is just used to supervise the control process in order to avoid overfitting.In each iteration of the algorithm the model performance is estimated both on the calibration and control set.The moment the performance does not improve anymore for the control set, the algorithm is stopped.This kind of approach is a state of the art method in the supervised training of artificial neural networks in order to avoid overfitting (Tetko et al., 1995) 1: Select d model parameters, to be considered for calibration and identify prior boundaries [x lb , x ub ] for all selected parameters 2: n random parameter vectors forming the set X n are generated in the d -dimensional rectangle bounded by the defined boundaries.

3: repeat
The hydrological model is run for each parameter vector in X n and the corresponding model performances on the calibration data are calculated 8: The subset X * n of the best performing parameters is identified.This might be for example the best 10% of Θ n 9: The hydrological model is run for each parameter vector in X * n and the corresponding model performances on the control data are calculated 10:  (1991).This function is a fairly difficult problem due to its large search space and its large number of local minima.The formal definition of both functions is given in Eqs. ( 8) and ( 9) respectively.Figure 8 shows the plot for both functions for two variables, to give the reader a better impression of the nature of the problem.
We applied the original ROPE MC algorithm and the AROPE MC algorithm for estimating the minimum of both test functions for the dimension 2-4.As boundary for the parameters x we chose [−10, 10].In order to have a reference performance, we applied the genetic algorithm (GA) according to Conn et al. (1997) in the same feasible space with a comparable number of maximum function evaluations and the same tolerance 2 .To get a fair mean best result for comparison, we ran the GA each iteratively until the value for the overall mean of the best estimates got stable.For each estimate we computed its fitness value as the absolute of the difference between its function value and the known global optimum.Figure 9 illustrates the principle of operation of the algorithm by the scatter plots of the estimated parameter vectors after several iteration steps for the the 2-dimensional case.It is evident that new parameter vectors are sampled deep with respect to the estimated set in the previous iteration.For the Rastrigin's function several clusters can be clearly identified which improves the performance of the sampled parameter vectors tremendously.
Table 5 presents the comparison of the fitness values for all used algorithms.It is evident that the AROPE MC algorithm can achieve a reasonable performance and estimate a set of model parameters in the region with the highest possible performance for all test cases.In most of the cases the deepest estimated parameter vectors have a better performance than the mean of the estimated set.For the Rosenbrock's function both the ROPE MC and the AROPE MC algorithm perform comparable well for all dimensions.The results for the Rastrigin's function show an improvement of AROPE MC with respect to ROPE MC algorithm for higher dimensions due to the cluster based sampling with the DepGeen strategy.However, the results for the Rastrigin's function also show that the proposed algorithm still suffers from the general shortcomings of a Monte-Carlo type approach for high dimensional problems with a very large number of areas of attraction.A too small sample size can result in an inaccurate clustering and consequently undermine the improvements of this strategy.Note that this problem can easily be compensated by a higher sample size on the cost of more computation time.
As another possible solution for high-dimensional problems with multiple regions of attraction we propose the previous use of an approved evolutionary search strategy for high-dimensional parameter spaces, e.g. the particle swarm concept.

Case study II: calibration of the hydrological model WaSiM with focus on flood events
In a second case study we studied the influence of observation errors on the calibration results with AROPE MC in comparison with state of the art optimisation algorithms of the process-oriented hydrological model WaSiM for flood events.algorithms we used the interior-point method (IPM) according to Waltz et al. (2006) which is a gradient based method and the GA already used in the previous case study.
We assume that the influence of observation errors in temperature measurements is negligible for the simulation of flood events whereas the uncertainty of the measured precipitation can be expressed by an ensemble.To keep the problem still computationally feasible we do not consider the influence on the estimated parameter sets due to the uncertainties in the observed precipitation and just use the ensemble mean for the model calibration, but just focus on the influence on the observation errors of the measured discharge.Following the assumptions of B árdossy and Singh (2008) we assume an accuracy of the measured discharge q obs (t) of 5%.Thus, the real but unknown discharge q(t) can be written as: with (t) being a random error.This random error is due to uncertainties of the rating curve, non-uniqueness of the stage discharge relationship, changes of the cross section etc. (B árdossy and Singh, 2008).As many other authors (e.g.Kuczera et al., 2006;B árdossy and Singh, 2008) we assume that this error obeys a normal distribution with a standard deviation of the measurement accuracy: N (0, 0.05).For each observed discharge time series we used this model and produced an ensemble with 100 members.With both the IPM and the GA for different each of the 24 flood events with respect to every single ensemble member each and validated the set of 100 estimated best parameter vectors.The same was done for the AROPE MC algorithm.In order to reduce the computation time, we previously checked the stability of the estimated robust parameter sets with AROPE MC for a subset of the discharge ensemble members and subsequently just used the ensemble mean for calibration.The used objective in all of the following case studies was the proposed FloodSkill criterion.

Model calibration with limited observation data
At first we calibrated WaSiM with limited observation data.Therefore the hydrological model was calibrated for each of the 24 flood events and subsequently validated on all 24 flood events.Flood event 14 was used as control set, because WaSiM can portray this event very well, so serious outliers in the measurements and massive occurrence of non considered runoff problems should not be a problem for this event.Furthermore its peak flow value is close to the center with respect to all observed flood events considered in this study.For the calibration of event 14 itself we chose event 12 as control event.Table 6 shows the calibration performance for the estimated parameter vectors for each algorithm.It is evident that all calibration algorithms achieve a reasonable calibration performance.For most of the cases the AROPE MC algorithm and the IPM are outperformed by the and GA.That is not disappointing, because the goal of the AROPE MC algorithm is primarily not to achieve a better calibration performance but to estimate a more robust and preferably better validation performance with respect to the parameter vectors estimated by classical optimisation.For the events marked with " * " overfitting was an issue and was limited by the control event.An example 3 of the effectiveness of the overfitting stopping criterion for the calibration with the flood event 4 is given in Fig. 6.
The results of the mean model performance over all validation events for the specific flood events used for calibration is shown in Fig. 11.Detailed statistics of the overall validation performance averaged over all calibration events are given in Table 7. From the plots in Fig. 11 it is obvious that regardless from the used parameter estimation algorithm the validation performance for the single event calibration estimates are very volatile and are strongly dependent on the used calibration event.Referring to the FloodSkill criterion and the NS, the validation performance of the AROPE MC estimates averaged over the results of all 24 single event calibrations is slightly better than those 3 Consider that this example was already discussed within the presentation of the principles of the AROPE MC algorithm in the previous section.estimated by GA and IPM.Surprisingly the parameter vectors estimated by the GA have by far the worst validation performance.These results show that one flood event is not sufficient to estimate a stable solution and therefore also cannot confirm the supposed advantages of the approach.
A deeper investigation of the geometrical structure of the estimated sets of parameter vectors reveals possible explanations for this results and can also be an indicator for the stability of the achieved solution.Consider again that the underlying assumption of the sampling of parameter vectors by data depth is that the model performance both on the calibration and validation data of parameter vectors deep within the set of good ones is within the upper range of the set and has a smaller spread than the ones at the boundary.The scatter plots in Fig. 10 show the correlation of the validation performance of parameter vectors estimated by AROPE MC with respect to their data depth.Referring to the NS, parameter vectors with higher depth have a consistently good model performance, whereas for the results calculated by the rPD criterion such a relationship can not be shown at all.These results are strongly confirmed by the validation results calculated by the NS and the rPD (cf.Table 7).For the NS criterion the validation results of the AROPE MC estimates are significantly better in mean than the ones estimated by pure optimisation and have a smaller standard deviation.However, referring to the rPD criterion the AROPE MC estimates perform even worse than the ones estimated by IPM.The problems for the rPD criterion are due to the fact that for the calibration with one single flood event, the rPD is calculated by the comparison of just two values.Random errors in the observed peak value and problems of the model structure to simulate that value can result in a spiky and not well-defined structure of the set of parameter vectors with good model performance.It is obvious that the application of data depth to such kind of problems does not make sense.This might be avoided by both the use of more flood events for calibration and the the use of "smooth" performance criteria which do not account for just a very small number of observations measurements.

Calibration with multiple flood events
After the analysis of the results for the calibration with one single flood event, we decided to study the improvement of the validation results of AROPE MC in comparison with IPM and GA using more observation data for calibration and control.In particular we studied the geometrical shape and the stability of the estimated solutions.Therefore we divided the set of 24 flood events into three subsets, a set of calibration events, a set of control events and a set of validation events as given in Table 8.The assignment of the events to one of the three was rather arbitrary, but we tried to keep the proportion of events with high and low peak flow values balanced within the subsets.The same holds true for the type of its precipitation (convective vs. stratiform).Again we used the FloodSkill criterion as objective, in particular to be able to compare the results with those of the calibration with single flood events.
Figure 12 illustrates how the sets of candidate parameter vectors evolve after each iteration of the AROPE MC algorithm and form a geometrically well-structured cloud.A scatter plot of the estimated parameter vectors of all three compared parameter estimation algorithms is given in Fig. 13.The sets of parameter vectors estimated by IPM and GA form geometrically less-defined clouds.Furthermore the central region of the set estimated by GA is roughly approximated by the set estimated by AROPE MC , whereas the set estimated by the gradient based IPM algorithm has another form.We validated all estimated results for all flood events in the validation set (see Table 8).For better comparison we also computed the model performance for the parameter vectors estimated by the single event calibration in the previous case study on the validation set in this case study.A boxplot of the overall validation results referring to the FloodSkill criterion is given in Fig. 14.Detailed statistics for all referred performance criteria are given in Table 9.It is evident that the use of more data for model calibration improves the model performance in validation tremendously for all three approaches.AROPE MC outperforms IPM and GA for the used objective, the FloodSkill criterion and the NS.Furthermore for the calibration objective also the standard deviation of the validation results is significantly smaller for the parameter vectors estimated by AROPE MC than those estimated by IPM and GA which indicates that the transfer of these parameters is more reasonable.Referring to the rPD the parameter vectors estimated by AROPE MC have approximately the same validation performance as those estimated by IPM and GA.However, the best parameter vectors of the cloud in AROPE MC perform slightly better than the best ones estimated by IPM and GA.
Additionally we checked whether the set of parameter vectors estimated by AROPE MC form a stable solution, that means whether vectors with higher data depth not just have less standard deviation in their corresponding model performance but also tend to have a better model performance on the validation data.Therefore we calculated the correlation between the data depth of each parameter in the final set with respect to the complete set and its model performance on the validation events.The results are given in Table 10.Besides on event 16 the parameter vectors with higher data depth tend to have a better model performance on each single validation event and on the overall set of validation events.The correlation is much stronger for the NS than for the rPD.Thus, the estimated set seems to be more robust with respect to the NS than to the rPD.This might be due to still too less calibration events or due to problems of the model structure to represent the global system behaviour and the peak flow values equally well.Possibly the good parameter sets with respect to the rPD criterion for the given calibration events still do not form a well-structured geometrical set.Consider that the rPD for three calibration events is just computed by comparison with three observations.This might by tackled by more calibration data or by the use of other approaches than iterative Monte-Carlo simulations for the identification of parameter vectors with good model performance.
Figure 15 shows the simulated discharge for four validation events computed by the estimated parameter sets of all three compared parameter estimation algorithms.The catchment characteristics are in general better represented by the model runs with the estimates of AROPE MC than the estimates in mean and has a lower standard deviation.However the results for flood event 4 also show the slight disadvantage of the AROPE MC estimates compared to the GA estimates with respect to the rPD criterion.
Due to the high process dynamics and the small catchment size (uncertainties and errors are not shrinked to the same degree by averaging of many values) the confidence bounds of the model performance are higher than those presented by B árdossy and Singh (2008).Nevertheless, the results of the multiple events calibration study confirm the outcome of B árdossy and Singh (2008): the application of the principle of data depth can be very useful to estimate a set of robust parameter vectors considering uncertainties.However, considering the results of the single event calibration it is also evident that in cases where no geometrically well-structured parameter set of model parameter vectors with good model performance can be identified, the application of the principle of data depth is not suitable.This problem might be avoided considering the selection of appropriate performance criteria and the required amount of observation data to be used for calibration.Furthermore the previous application of approved population based algorithms might be useful in order to identify sets of good parameter vectors with a more complicated geometrical structure, before deep parameter vectors are selected.-The first case study reveals that the AROPE MC algorithm estimates a set of parameter vectors in the region with the highest possible performance for 2 benchmark functions.The new sampling strategy leads to a better performance of AROPE MC if compared to the original ROPE MC algorithm for problems with multiple regions of attraction.
-The results of the second case study in this paper show that just the small observational uncertainty of the discharge leads to a high variability of the model performance in validation.Parameter vectors with equal model performance on the calibration data can lead to very different results in validation.The proposed method of an evolutionary sampling of model parameter vectors by the help of data depth functions can help to identify sets of robust parameter vectors.Parameters with low data depth are near the boundary and are sensitive to small changes and do transfer to other time periods less well as high depth ones.
-Especially for processes with high dynamics (short time steps in the models), the selection of appropriate performance criteria and the required amount of observation data have to be considered to estimate robust model parameter vectors.
-In this paper, model performance was expressed by just one aggregated objective function.The presented algorithm can be easily altered to a general multiobjective parameter estimation procedure.
We propose further research on the application of data depth functions for parameter estimation.We suggest merging the concept of depth based sampling with the strength of approved search strategies for high-dimensional parameter spaces, e.g. the particle swarm concept in order to overcome the shortcomings of the Monte Carlo based approach in order to generate sets of good parameter vectors.In this study we compared  Table 4. Objective functions used in this study, where x i and y i (θ) are the observed and simulated discharge (by the parameter vector θ) at time-step i respectively and n is the number of observation points.

∂Θ ∂t
= ∂q ∂z where ∂Θ denotes the cha fines the time step and ∂q is fluxes q in and q out characteri specific neighboring soil laye the soil layers is defined by ∂ In the model each soil laye cording to (Equation 3), wh scalar model parameter dr.
where k s denotes the hyd content Θ l in the considered model parameter to be estima slope in the grid cell.
Corresponding to the dire projected to the catchment o grid and a second bucket ty parameter k i represents the ogy to equation 1.For fu fer to Schulla (1997) and th http://www.wasim.ch.
To be called a depth function, D has to fulfill specific properties (Zuo and Serfling, 2000).The concept of data depth is illustrated in Figure 4 by a small 2-dimensional example.For a random point set in R 2 the data depth was computed for each point of the set with respect to the point set itself.The used data depth function was the halfspace depth.It is one of the best known among the data depth measures in nonparametric statistics, and in discrete and computational geometry.According to Tukey (1975) and Donoho and Gasko (1992) the halfspace depth of an arbitrary point θ ∈ R d with respect to a d-dimensional data set is defined as the smallest number of data points in any closed halfspace with boundary through θ.This is also called the Tukey or location depth, and it can be written as: where u ranges over all vectors in R d with ||u|| = 1.
Very often the halfspace depth is normalized by division with the number of points in the set Z: if #it 5: X n 6: end i 7: The h X n a bratio 8: The s fied.T 9: The h X * n a trol d 10: Y m ←  a Gaussian mixture model (GMM) whose parameters can be estimated by the EM algorithm which is called anyway in order to do the cluster analysis in the presented strategy.For further details of the proposed strategy for the generation of deep parameter vectors refer to Krauße (2011).
Another issue of the ROPE MC algorithm is the loosely defined stopping criterion: "until the performance corresponding to X n and Y m does not differ more than what one would expect from the observation errors" (Bárdossy andSingh, 2008, p. 1280).The problem is that there are countless possibilities in the prior estimation of the tolerance in the model performance due to uncertainty in the observation data and it can hardly be determined exactly.A broad definition of this tolerance can lead to sets with inferior model performance, whereas a tighter tolerance can easily result in overfitting.This is a severe shortcoming because it undermines the actual goals of the algorithm.Overfitting in the context of robust n bration data are c 8: The subset X * n o fied.This might 9: The hydrological X * n and the corre trol data are calcu 10: Y m ← GenDeep 11: until the performan differ more than wh errors or the perform 12: return Y m parameter estimation the calibration data st ing the estimated set whereas the model pe trol data decreases by this fact with the resu Rietholzbach catchme Skill criterion was use and no.14 were used tively.It is evident th trol data considerably model performance on by further iterations.
To address this pro the algorithm.First, shrinking of the gener the unintended exclus tors close to the boun parameters, we sugge parameter vectors and as initial set for the ne Furthermore we int der to avoid overfittin used for model calibr Just the calibration se tion, whereas the cont trol process in order of the algorithm the on the calibration and The model η is run for the calibration and control set; the corresponding model performances are calculated by a purpose-specific objective function U : ∀θ ∈ Xn do : Xn of the good performing parameter vectors in Xn is identified, e.g.such that X * n comprises the best performing 10% of all parameters vectors in Xn:

Identify good
Generate a set of deep parameters Ym w.r.t.X * n

Generate robust
Stopping criterion satisfied?
(2) Improvement on calibration data gets smaller than tolerance?
(3) Improvement on control data decreases w.r.mance does not improve anymore for the control set, the algorithm is stopped.This kind of approach is a state of the art method in the supervised training of artificial neural networks in order to avoid overfitting (Tetko et al., 1995).The new algorithm, entitled the Advanced Robust Parameter Estimation by Monte Carlo (AROPE MC ) is given in a brief form in pseudocode in Algorithm 5.3.A more detailed illustration of the approach is given in Figure 7.The algorithm was implemented in the MATLAB programming language.The implementation is open source and available from the author.
6 Case studies 6.1 Case study I: Estimating the minimum of the Rosenbrock and Rastrigin function The goal of the AROPE MC algorithm is not the estimate of the global optimum of a problem.However, the estimated set of robust points should encompass a region close to the global optimum.We investigate the performance of the AROPE MC algorithm for the estimation of the minimum of two simple test functions of the form f : R n → R, often used as a performance test problem for optimization algorithms: the Rosenbrock's function and the Rastrigin's function.The former is a non-convex function with a unique minimum value of 0 attained at the point 1.Finding the minimum is a challenge since it has a shallow minimum inside a deeply curved valley.The Rastrigin function is a typical example of non-linear multimodal function.It was first proposed by Rastrigin as a 2-dimensional function and has been generalised for multiple dimensions by Mühlenbein et al. (1991).This function is a fairly difficult problem due to its large search space and its large number of local minima.The formal definition of both functions is given in Equations 8 and 9 respectively.Figure 8 shows the plot for both functions for two variables, to give the reader a better impression of the nature of the problem.
We applied the original ROPE MC algorithm and the AROPE MC algorithm for estimating the minimum of both test functions for the dimension 2-4.As boundary for the parameters x we chose [−10, 10].In order to have a reference performance, we applied the genetic algorithm (GA) according to Conn et al. (1997) in the same feasible space with a comparable number of maximum function evaluations and the same tolerance2 .To get a fair mean best result for comparison, we ran the GA each iteratively until the value for the overall mean of the best estimates got stable.For each estimate we computed its fitness value as the absolute of the difference between its function value and the known global optimum.
Figure 9 illustrates the principle of operation of the algorithm by the scatter plots of the estimated parameter vectors after several iteration steps for the the 2-dimensional case.It is evident that new parameter vectors are sampled deep with respect to the estimated set in the previous iteration.For the Rastrigin's function several clusters can be clearly identified which improves the performance of the sampled parameter vectors tremendously.
1.79e − 1 7.65e + 0 7.81e + 0 2.40e + 0 2.56e + 0 the mean of the estimated set.For the Rosenbr tion both the ROPE MC and the AROPE MC algorith comparable well for all dimensions.The results f trigin's function show an improvement of ARO respect to ROPE MC algorithm for higher dimens the cluster based sampling with the DepGeen stra ever, the results for the Rastrigin's function also sh proposed algorithm still suffers from the general ings of a Monte-Carlo type approach for high d problems with a very large number of areas of at too small sample size can result in an inaccurate and consequently undermine the improvements o egy.Note that this problem can easily be comp a higher sample size on the cost of more comput As another possible solution for high-dimensiona with multiple regions of attraction we propose th use of an approved evolutionary search strategy dimensional parameter spaces, e.g the particle s cept.

Case study II: Calibration of the hydrolog
WaSiM with focus on flood events In a second case study we studied the influenc vation errors on the calibration results with AR comparison with state of the art optimisation alg the process-oriented hydrological model WaSiM events.For classical optimisation algorithms w interior-point method (IPM) according to Waltz e which is a gradient based method and the GA al in the previous case study.
We assume that the influence of observation err perature measurements is negligible for the sim flood events whereas the uncertainty of the measu tation can be expressed by an ensemble.To keep t      performance, whereas for the results calculated by the rPD criterion such a relationship can not be shown at all.These results are strongly confirmed by the validation results calculated by the NS and the rPD (cf.Table 7).For the NS criterion the validation results of the AROPE MC estimates are significantly better in mean than the ones estimated by pure optimisation and have a smaller standard deviation.However, referring to the rPD criterion the AROPE MC estimates perform even worse than the ones estimated by IPM.The problems for the rPD criterion are due to the fact that for the calibration with one single flood event, the rPD is calculated by the comparison of just two values.Random errors in the observed peak value and problems of the model structure to simulate that value can result in a spiky and not well-defined structure of the set of parameter vectors with good model performance.It is obvious that the application of data depth to such kind of problems does not make sense.This might be avoided by both the use of more flood events for calibration and the the use of "smooth" performance criteria which do not account for just a very small number of observations measurements.

Calibration with multiple flood events
After the analysis of the results for the calibration with one single flood event, we decided to study the improvement of the validation results of AROPE MC in comparison with IPM and GA using more observation data for calibration and control.In particular we studied the geometrical shape and the stability of the estimated solutions.Therefore we divided the set of 24 flood events into three subsets, a set of calibration events, a set of control events and a set of validation events as given in Table 8.The assignment of the events to one of the three was rather arbitrary, but we tried to keep the proportion of events with high and low peak flow values balanced within the subsets.The same holds true for the type of its precipitation (convective vs. stratiform).Again we used the FloodSkill criterion as objective, in particular to be able to compare the results with those of the calibration with single flood events.Figure 12 illustrates how the sets of candidate parameter vectors evolve after each iteration of the AROPE MC algorithm and form a geometrically well-structured cloud.A scatter plot of the estimated parameter vectors of all three   performance, whereas for the results calculated by the rPD criterion such a relationship can not be shown at all.These results are strongly confirmed by the validation results calculated by the NS and the rPD (cf.Table 7).For the NS criterion the validation results of the AROPE MC estimates are significantly better in mean than the ones estimated by pure optimisation and have a smaller standard deviation.However, referring to the rPD criterion the AROPE MC estimates perform even worse than the ones estimated by IPM.The problems for the rPD criterion are due to the fact that for the calibration with one single flood event, the rPD is calculated by the comparison of just two values.Random errors in the observed peak value and problems of the model structure to simulate that value can result in a spiky and not well-defined structure of the set of parameter vectors with good model performance.It is obvious that the application of data depth

Calibration with multiple flood events
After the analysis of the results for the calibration with one single flood event, we decided to study the improvement of the validation results of AROPE MC in comparison with IPM and GA using more observation data for calibration and control.In particular we studied the geometrical shape and the stability of the estimated solutions.Therefore we divided the set of 24 flood events into three subsets, a set of calibration events, a set of control events and a set of validation events as given in Table 8.The assignment of the events to one of the three was rather arbitrary, but we tried to keep the proportion of events with high and low peak flow values balanced within the subsets.The same holds true for the type of its precipitation (convective vs. stratiform).Again we used the FloodSkill criterion as objective, in particular to be able to  We validated all estimated results for all flood events in the validation set (see Table 8).For better comparison we also computed the model performance for the parameter vectors estimated by the single event calibration in the previous case study on the validation set in this case study.A boxplot of the overall validation results referring to the FloodSkill criterion is given in Figure 14.Detailed statistics for all referred performance criteria are given in Table 9.It is evident that the use of more data for model calibration improves the model performance in validation tremendously for all three approaches.AROPE MC outperforms IPM and GA for the used objective, the FloodSkill criterion and the NS.Furthermore for the calibration objective also the standard deviation of the validation results is significantly smaller for the parameter vectors estimated by AROPE MC than those estimated by IPM and GA which indicates that the transfer of these parameters is more reasonable.Referring to the rPD the parameter vectors estimated by AROPE MC have approximately the same validation performance as those estimated by IPM and GA.However, the best parameter vectors of the cloud in AROPE MC perform slightly better than the best ones estimated by IPM and GA.
Additionally we checked whether the set of parameter vectors estimated by AROPE MC form a stable solution, that means whether vectors with higher data depth not just have less standard deviation in their corresponding model performance but also tend to have a better model performance on the validation data.Therefore we calculated the correlation between the data depth of each parameter in the final set with respect to the complete set and its model performance on the validation events.The results are given in Table 10.Besides on event 16 the parameter vectors with higher data depth tend to have a better model performance on each single validation event and on the overall set of validation events.The correlation is much stronger for the NS than for the rPD.Thus, the estimated set seems to be more robust with respect to the NS than to the rPD.This might be due to still too less calibration events or due to problems of the model structure to represent the global system behaviour and the peak flow values equally well.Possibly the good parameter sets with respect to the rPD criterion for the given calibration events still do not form a well-structured geometrical set.Consider that the rPD for three calibration events is just computed by comparison with three observations.This might by tackled by more calibration data or by the use of other approaches than iterative Monte-Carlo simulations for the identification of parameter vectors with good model performance.
Figure 15 shows the simulated discharge for four validation events computed by the estimated parameter sets of all three compared parameter estimation algorithms.The catchment characteristics are in general better represented by the model runs with the estimates of AROPE MC than the estimates in mean and has a lower standard deviation.However the results for flood event 4 also show the slight disadvantage of the AROPE MC estimates compared to the GA estimates with respect to the rPD criterion.
Due to the high process dynamics and the small catchment size (uncertainties and errors are not shrinked to the same degree by averaging of many values) the confidence bounds of the model performance are higher than those presented by Bárdossy and Singh (2008).Nevertheless, the results of -In this paper, model performance was expressed by just one aggregated objective function.The presented algorithm can be easily altered to a general multi-objective parameter estimation procedure.-In this paper, model performance was expressed by just one aggregated objective function.The presented algorithm can be easily altered to a general multi-objective parameter estimation procedure.

2425 Discussion
Paper | Discussion Paper | Discussion Paper | Discussion Paper | it was shown that the set of parameter vectors with good model performance (according to B árdossy and Singh, 2008, they are from now on called the good parameter vectors) is Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

(
hourly instead of daily time-step) in a catchment where the dominant processes have high dynamics.2427 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 2 Case study area and the hydrological model

Discussion
Paper | Discussion Paper | Discussion Paper | Discussion Paper | Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Paper | Discussion Paper | Discussion Paper | Discussion Paper | and Donoho and Gasko (1992) the halfspace depth of an arbitrary point θ ∈ R d with respect to a d -dimensional data set Z Afterwards a set of deep parameter vectors (w.r. t. the good Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | a cluster analysis on the set of good parameter vectors X *

2435
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | . The new algorithm, entitled the Advanced Robust Parameter Estimation by Monte Carlo (AROPE MC ) is given in a brief form in pseudocode in Algorithm 5.3.A more detailed illustration of the approach is given in Fig. 7.The algorithm was implemented in the MATLAB programming language.The implementation is open source and available from the author.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Algorithm 5.3 AROPE MC until the performance corresponding to X n and Y m does not differ more than what one would expect from the observation errors or the performance on the control data gets worse 12: return Y m study I: estimating the minimum of the Rosenbrock and Rastrigin function The goal of the AROPE MC algorithm is not the estimate of the global optimum of a problem.However, the estimated set of robust points should encompass a region close to the global optimum.We investigate the performance of the AROPE MC algorithm for the estimation of the minimum of two simple test functions of the form f : R n → R, often used as a performance test problem for optimization algorithms: the Rosenbrock's 2437 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | function and the Rastrigin's function.The former is a non-convex function with a unique minimum value of 0 attained at the point 1.Finding the minimum is a challenge since it has a shallow minimum inside a deeply curved valley.The Rastrigin function is a typical example of non-linear multimodal function.It was first proposed by Rastrigin as a 2dimensional function and has been generalised for multiple dimensions by M ühlenbein et al.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | For classical optimisation 2439 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

2441
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

2443
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

7
Discussion and conclusions -This paper presents a depth based parameter estimation method, which is well suited for the robust calibration of hydrological models considering uncertainties.The Advanced Robust Parameter Estimation by Monte Carlo (AROPE MC ), is a modified version of the depth based parameter estimation procedure presented by B árdossy and Singh (2008).There are two differences between the AROPE MC algorithm and the original ROPE MC algorithm.The further development enables us sampling from different non-convex regions of attraction and at the same time preventing AROPE MC from overfitting calibration data.We compare the effectiveness of the newly developed algorithm for estimating robust model parameter 2445 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | vectors with the original ROPE MC algorithm and the GA and IPM algorithms in three case studies.

2451
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

2461Fig. 1 .
Fig. 1.The Rietholzbach catchments with the main measurement site "B üel" on 7 May 2005 at the beginning of the summer season.You see the typical pastures which cover two third of the catchment area and some of the sporadic small patches of forest in the background.

Fig. 4 .
Fig. 4. 2-dimensional point set shaded according to assigned depth.A darker point represents higher depth.The used depth function was halfspace depth.

Fig. 4 .Fig. 5 .
Fig. 4. 2-dimensional point set shaded according to assigned depth.A darker point represents higher depth.The used depth function was halfspace depth.

Fig. 5 .
Fig. 5. Volume ratio of the unit sphere to the unit cube in n dimensions as a continuous function of n.

Fig. 5 .Fig. 6 .
Fig. 5. Volume ratio of the unit sphere to the unit cube in n dimensions as a continuous function of n

Fig. 6 .
Fig.6.Overfitting for the calibration of the model WaSiM while calibrated with the method according to B árdossy and Singh (2008); Flood event no. 4 was used for calibration and event no.14 was used as control set.

Fig. 8 .
Fig. 8. Contour plot of the Rosenbrock (upper) and Rastrigin (lower) function for the 2-dimensional case; the known global optima are marked by a red cross

Fig. 8 .
Fig. 8. Contour plot of the Rosenbrock (upper) and Rastrigin (lower) function for the 2dimensional case; the known global optima are marked by a red cross.

Fig. 9 .
Fig. 9. Scatter plot of the results of the AROPEMC algorithm of the Rosenbrock (left) and Rastrigin (right) function for the 2dimensional case after the first, second, fourth and final iteration from top to bottom

Fig. 9 .Fig. 10 .Fig. 10 .Fig. 10 .
Fig. 9. Scatter plot of the results of the AROPE MC algorithm of the Rosenbrock (left) and Rastrigin (right) function for the 2-dimensional case after the first, second, fourth and final iteration from top to bottom.

Fig. 11 .
Fig. 11.Statistics of the mean FloodSkill on the validation data for the estimates of the calibration with single flood events each.

Fig. 12 .
Fig. 12. Evolution of the candidate model parameter vectors before each iteration in the AROPEMC algorithm for the multiple event calibration

Fig. 13 .
Fig. 13.Scatter plot of the final estimated parameter vectors of the three compared parameter estimation algorithms

Fig. 12 .
Fig. 12. Evolution of the candidate model parameter vectors before each iteration in the AROPE MC algorithm for the multiple event calibration.

Fig. 12 .
Fig. 12. Evolution of the candidate model parameter vectors before each iteration in the AROPEMC algorithm for the multiple event calibration

Fig. 13 .
Fig. 13.Scatter plot of the final estimated parameter vectors of the three compared parameter estimation algorithms

Fig. 13 .Fig. 14 .
Fig. 13.Scatter plot of the final estimated parameter vectors of the three compared parameter estimation algorithms.

Fig. 14 .Fig. 15 .
Fig. 14.Statistics of the mean FloodSkill on the validation data for the estimates of the calibration with single flood events each.

Fig. 15 .
Fig. 15.Simulated hydrographs for the flood events 4 (a), 8 (b), 9 (c) and 19 (d) computed by the parameter vectors estimated of all three compared algorithms; the mean value is plotted as thick solid line and the confidence interval of the parameter uncertainty (Q.95 − Q.05) plotted as thin dash-dot line

Fig. 15 .
Fig. 15.Simulated hydrographs for the flood events 4 (a), 8 (b), 9 (c) and 19 (d) computed by the parameter vectors estimated of all three compared algorithms; the mean value is plotted as thick solid line and the confidence interval of the parameter uncertainty (Q .95− Q .05 ) plotted as thin dash-dot line.

Table 1 .
Overview of the most important basin characteristics.

Table 2 .
Overview of the used model parameters considered for calibration; the reference parameter vector θ wb was estimated in order to use WaSiM for water-balance simulations in the Rietholzbach catchment.

Table 3 .
Overview of the database of the 24 flood events used for calibration and validation sorted by peak flow value.

Table 5 .
Comparison of the fitness values for the estimates computed by the three used algorithms.

Table 6 .
Mean calibration results (FloodSkill) for the three compared algorithms.

Table 7 .
Mean overall validation results of the single event calibration for the three compared algorithms.

Table 10 .
Correlation between data depth and validation performance of all parameter vectors estimated with AROPE MC for the multiple event calibration.

Table 2 .
Overview of the used model parameters considered for calibration; the reference parameter ve use WaSiM for water-balance simulations in the Rietholzbach catchment parameter reference (θ wb ) upper and lower boundary description i Fig. 3. Scheme of the WaSiM soil module with location of impact of conceptual model parameters (bold)

Table considered
for calibration.T of direct runoff and interflow density dr which is a scaling ation.In previous studies (C Fig. 3. Scheme of the WaSiM soil module with location of impact of conceptual model parameters (bold).

Table 5 .
Comparison of the fitness values for the estimates computed by the three used algorithms.

Table 7 .
Statistics of the mean FloodSkill on the validation data for the estimates of the calibration with single flood events each Mean overall validation results of the single event calibration for the three compared algorithms

Table 8 .
Sub-division of all flood events in a calibration, control and validation set Krauße and Cullmann:Identification of model parameters using data depth measures 17