Robust multi-objective calibration strategies – possibilities for improving flood forecasting

Process-oriented rainfall-runoff models are designed to approximate the complex hydrologic processes within a specific catchment and in particular to simulate the discharge at the catchment outlet. Most of these models exhibit a high degree of complexity and require the determination of various parameters by calibration. Recently, automatic calibration methods became popular in order to identify parameter vectors with high corresponding model performance. The model performance is often assessed by a purpose-oriented objective function. Practical experience suggests that in many situations one single objective function cannot adequately describe the model’s ability to represent any aspect of the catchment’s behaviour. This is regardless of whether the objective is aggregated of several criteria that measure different (possibly opposite) aspects of the system behaviour. One strategy to circumvent this problem is to define multiple objective functions and to apply a multiobjective optimisation algorithm to identify the set of Pareto optimal or non-dominated solutions. Nonetheless, there is a major disadvantage of automatic calibration procedures that understand the problem of model calibration just as the solution of an optimisation problem: due to the complex-shaped response surface, the estimated solution of the optimisation problem can result in different near-optimum parameter vectors that can lead to a very different performance on the validation data. Bárdossy and Singh (2008) studied this problem for single-objective calibration problems using the example of hydrological models and proposed a geometrical sampling approach called Robust Parameter Estimation (ROPE). This approach applies the concept of data depth in order to overcome the shortcomings of automatic calibration procedures and find a set of robust parameter vectors. Recent studies confirmed the effectivity of this method. However, all ROPE approaches published so far just identify robust model parameter vectors with respect to one single objective. The consideration of multiple objectives is just possible by aggregation. In this paper, we present an approach that combines the principles of multi-objective optimisation and depth-based sampling, entitled Multi-Objective Robust Parameter Estimation (MOROPE). It applies a multi-objective optimisation algorithm in order to identify non-dominated robust model parameter vectors. Subsequently, it samples parameter vectors with high data depth using a further developed sampling algorithm presented inKrauße and Cullmann(2012a). We study the effectivity of the proposed method using synthetical test functions and for the calibration of a distributed hydrologic model with focus on flood events in a small, pre-alpine, and fast responding catchment in Switzerland.


Introduction
Hydrologic models are simplified, conceptual representations of a part of the hydrologic cycle.They relate rainfall to streamflow on a continuous basis.Many of those models are driven by a vector of parameters that cannot be measured directly, but must be determined through indirect methods.This is an important aspect of model calibration.Efficient and effective parameter estimation techniques are a crucial factor for the successful application of these models.In the process of parameter estimation, the values of the Published by Copernicus Publications on behalf of the European Geosciences Union.
model parameters are adjusted until the catchment behaviour is closely matched.
Traditionally, this calibration is performed manually adjusting the parameters while visually inspecting the agreement between observations and model predictions.For the application of automatic approaches, the calibration is formulated as an optimisation problem.A purpose-specific objective function f quantifies the agreement between observations and simulation results.Many practical studies suggest that one single objective function, no matter how carefully chosen, is often insufficient to represent all characteristics of the system behaviour (Gupta et al., 1998;Cullmann, 2006;Gill et al., 2006;Fenicia et al., 2007).For instance, the mean absolute error of the discharge at the catchment outlet might be a good indicator for the ability to represent the water balance; however, it is likely to be inadequate to measure the model performance for flood forecasts where a correct simulation of the peak flow value and timing is crucial.Consequently, single-objective calibration approaches that provide one unique, global best parameter vector are in many cases not considered acceptable by experienced hydrologists.The most elementary solution to circumvent this problem is to aggregate several objective functions.However, this approach involves a great deal of subjective judgment and neglects the global bests for individual objective functions.Another advanced option is a multi-objective view of the optimisation problem referring to the concept of Pareto optimality (Yapo et al., 1998).A multi-objective optimisation algorithm approximates the set of non-dominated (i.e.Pareto optimal) solutions with respect to a set of given objectives.The set of Pareto optimal solutions is denoted as the Pareto optimal set or Pareto set P, and the image of P under the mapping of all considered objectives F is called Pareto front.Often, both terms are used synonymously.This can however lead to misunderstandings.Consider that the term Pareto optimal set means the set of true or estimated solutions in the parameter space, whereas the term Pareto front means its image in the objective space.This difference is important for the major concept of this paper.The Pareto optimal set or Pareto set reflects the trade-offs among all given objectives.Evolutionary algorithms, i.e. genetic algorithms (GA), and particle swarm optimisation (PSO) or other population-based algorithms can be used to perform this task.There are many different strategies with different strengths and shortcomings.Recently, multi-method algorithms have found favour in order to approximate this set.The approximated Pareto optimal set is indicated by Ã. Bárdossy and Singh (2008) studied one of the major shortcomings of automatic calibration procedures that understand the problem of model calibration just as an optimisation problem: the parameters of fitted hydrologic models depend upon the input data.The quality of input data cannot be assured as there may be measurement errors for both input and state variables.That is why the estimated best parameter vectors can lead to a very different performance on the validation data.The actual goal of a good model calibration should not be to find parameter vectors that perform best for the calibration period but to find parameter vectors that are robust.In this context, the term robust is closely related to the terms transferable and less sensitive.Robust parameter vectors should not just perform well on the calibration data but also provide a sufficient model performance in the validation.Hence, these parameters can be transferred to other time periods.Furthermore, robust parameter vectors should be as insensitive as possible.This means that a small change of the estimated parameter vectors should not lead to a significant change (usually a decrease) of the model performance.Against this background, the term sensitive is always related to a good model performance and to the general sensitivity of the considered model parameters.This means that a sensitive model parameter will always stay sensitive.However, a small change of this parameter within the region of robust model parameter vectors should have a smaller effect on the (good) model performance than in other regions.This requirement is important, because the set of optimal parameters cannot be exactly determined and may even slightly change under slightly different conditions.
The application of advanced parameter estimation approaches that do not understand a model calibration as a pure optimisation task is a necessary though not sufficient condition for the determination of robust model parameters.The success of a robust modelling strategy also strongly depends on a good implementation of the principle of parsimony.The goal of this concept is to obtain a model that represents a desired structure with as few parameters as possible.The essential number of parameters depends on the available information, e.g. the number of gaging stations in a catchment or the temporal resolution of the observations.The selection of a parsimonious model structure was subject of many studies in hydrology (e.g.Beven, 1989;van der Linden and Woo, 2003;Boyle et al., 2006;Wagener and Wheater, 2006).Another essential prerequisite for the estimation of robust model parameters is the selection of appropriate calibration data.The data used for calibration should be both representative for the considered processes and contain as much information as possible that can be used to identify the parameters that describe the considered processes best.The development of well-founded statistical methods based on information theory that are suitable to perform this task for both data-driven and process-oriented models has attracted rising scientific interest (see Maier and Dandy, 2000;Wagener and Wheater, 2002;Wagener and Gupta, 2005;Thyer et al., 2006;Singh, 2010).Of course also the selection of appropriate objective functions plays an important role for a successful model calibration.Typically, the formulation of uncorrelated criteria that use several considered model outputs adds new information.A comprehensive overview and a more detailed discussion of these issues within the scope of multi-objective optimisation are given in Efstratiadis and Koutsoyiannis (2010).
Hydrol.Earth Syst.Sci., 16, 3579-3606, 2012 www.hydrol-earth-syst-sci.net/16/3579/2012/ The approach presented in this paper focusses on the enhancement of the results provided by approved evolutionary multi-objective optimisation methods.The underlying principle to achieve this goal is the application of data depth metrics in order to sample robust parameter vectors with respect to a set of identified parameter vectors with good model performance.Bárdossy and Singh (2008) developed a first method using this approach and called it Robust Parameter Estimation Method (ROPE).Recent studies using further developed versions of this methodology (e.g.Krauße and Cullmann, 2012a) confirmed the potentials of the depth-based parameter sampling for the estimation of robust hydrologic parameter vectors.
However, all ROPE approaches published so far just identify robust model parameter vectors with respect to one single objective.The consideration of multiple objectives is just possible by aggregation.In this paper we present a new method, entitled Multi-Objective Robust Parameter Estimation (MO-ROPE) that synthesizes the advantages of both the multi-objective view and robust parameter estimation.In order to quantify the uncertainty of the parameterisation with respect to the given objectives, the method estimates a set of robust model parameter vectors applying a two-step approach.Within the first step, a suitable multi-objective optimisation algorithm is used to approximate the Pareto optimal set.In a second step, parameter vectors with high data depth (with respect to the Pareto set) are sampled assuming that those parameter vectors are more robust than the complete Pareto optimal set.Notwithstanding the already raised fact that the application of advanced parameter estimation approaches is not sufficient for the estimation of robust model parameters, we will follow the notation of Bárdossy and Singh (2008) and denote the parameter estimation using data depth metrics by the abbreviation ROPE.
In the following section, we introduce a multi-objective parameter estimation technique that applies evolutionary multi-objective optimisation algorithms and the concept of data depth in order to estimate a robust set of parameter vectors for a given multi-objective calibration problem.First, we discuss suitable techniques for the approximation of the Pareto optimal set.We suggest the application of a modified version of the advanced multi-method calibration framework AMALGAM first presented by Vrugt et al. (2009).It applies an additional hybrid search strategy entitled MO-PSO-GA that is a modified version of a search strategy that has proven successful within the frame of another single-objective ROPE strategy.Afterwards, we briefly introduce and discuss the principle of data depth and the depthbased sampling approach.The developed solution is tested on a set of well-known multi-objective test problems.We study the effectivity and efficiency of the suggested multiobjective optimisation strategy and show the advantages of the depth-based sampling strategy for several low-and highdimensional test problems that are subject to uncertainty.Furthermore, a real world case study shows the success of the developed methodology calibrating a distributed hydrologic model in a small catchment with high process dynamics focussing on flood events.

Combining multi-objective optimisation and depth-based parameter sampling
The proposed MO-ROPE approach synthesizes the concepts of multi-objective model calibration and the depth-based ROPE approach.Therefore, we will thoroughly introduce both concepts in this section.

Effective and efficient approximation of the Pareto set
An essential prerequisite for a successful application of the proposed MO-ROPE method is an effective and efficient approximation of the Pareto set for a given calibration problem.
Recently, several evolutionary search strategies, e.g.genetic algorithms (GA) and particle swarm optimisation (PSO), became popular to fulfill this task.Multi-method approaches even go one step further and apply several search strategies in parallel with the goal to exploit their strengths, and to minimise their weaknesses.A very popular and effective stateof-the-art method in this category is the AMALGAM approach presented by Vrugt and Robinson (2007) that merges the strengths of four well-founded multi-objective optimisation strategies.Benchmark results using a set of well-known multi-objective test problems show that AMALGAM approaches a factor of 10 improvement over current optimization algorithms for more complex, higher-dimensional problems.Following Wolpert and Macready's "no free lunch" theorem, showing that it is impossible to develop a single search algorithm that will always be superior to any other algorithm, an outperformance of AMALGAM by a single newly developed algorithm is thus virtually excluded.However, the existence of a superior multi-method approach should not completely prevent the development of alternative multi-objective calibration strategies.In case that there is a new method that provides at least in some cases a superior performance to other considered single calibration strategies, development and application within the frame of a multi-method framework might be successful.Furthermore, there are many real world multi-objective calibration problems that are simple enough to be solved by such approaches.That is why we adapted a search strategy called PSO-GA u that is based on PSO and GA in order to approximate the Pareto set1 .It has proven to be successful in estimating a set of good parameter vectors with a given uncertainty bound within the scope of a ROPE framework that is provided in Krauße and Cullmann (2012a).The developed approach is entitled Multi-Objective optimisation by Algorithm 1 MO-PSO-GA.
initialise the non-dominated front: Ã ← ∅ for all particles i do initialise position and local best x i , xi ∈ U [x lb , x ub ] and velocity v i ← 0 end for while stop criteria not met do for all particles i do in case that the particle x i is not dominated by all members in Ã, add it to this set Ã ← Ã ∪ x i remove all members from Ã that are dominated by x i update the local best xi as the best position found so far from the particle with index i in case that it weakly dominates all positions explored by this particle so far end for assign to each particle a random member of the archive ĝi ∈ Ã as a "personal" global best discard the worst n ψ ← ψ • #{particles} from the population initialise genetic offspring o ga ← ∅ for i = 1 to n ψ 2 do select a pair {x 1 , x 2 } from the population by tournament selection apply the VPAC operator to generate new offspring end for for all particles i do update velocity using equation update position using equation x i ← x i + v i end for merge new population with genetic offspring particles ← particles ∪ o ga end while Particle Swarm Optimisation and Genetic Algorithm (MO-PSO-GA).A pseudocode listing of the algorithm is given in Algorithm 1.The algorithm was set up according to the ideas provided by Settles and Soule (2005).They introduced a hybrid between a GA and PSO.The algorithm is controlled by a parameter called the breeding ratio ψ.This parameter controls the proportion of the population that is not moved according to the PSO strategy but is transformed using the GA.Thus, values for the breeding ratio parameter range from Settles and Soule (2005) propose a default value of 0.5, with the expectation that the best results would be with an even mix of both GA and PSO.However, other values for the breeding ratio may provide better results depending on the characteristics of different considered calibration problems.For further details, we refer to Settles and Soule (2005) and referred literature.The evolution of the particles by the GA is done using the following technique: from the pool of possible candidates, a subset of parameter vectors is selected by tournament selection.They are recombined by an oper-Algorithm 2 VPAC operator.
1: pick random numbers φ 1 , φ 2 ∼ U (0, 1) 2: update positions using equations − φ 2 v 2 3: reset the particles memory p 1 ← x 1 and p 2 ← x 2 4: update the velocities: ator that is called Velocity Propelled Averaged Crossover (VPAC).The goal of its application is to create two children whose positions are in between the parent's positions, but accelerated away from the parent's current direction (negative velocity) in order to increase diversity in the population.This might be effective because, towards the end of a typical PSO run, the population tends to be highly concentrated in a small portion of the search space which reduces the effective search space.Algorithm 2 shows how the new child position vectors and velocities are calculated using VPAC.The used PSO strategy is an adaption of the strategy used in the PSO-GA u approach provided in Krauße and Cullmann (2012a).The movement of the particles is controlled by three parameters: the particle inertia weight ω, the cognitive attraction φ 1 and the social attraction φ 2 .The parameter ω determines the velocity of the proper motion of the particles.φ 1 specifies the degree of movement towards the local optimum, and φ 2 controls the movement towards the global optimum of the whole particle swarm.We set the algorithm's parameters according to literature recommendations (see Table 1).For further details and studies regarding the setting of the algorithm's parameters, refer to Perez and Behdinan (2007) and Settles (2005) who also provide references to additional literature and materials.The used stopping criterion is either a fixed number of iteration steps that has to be set according to the given problem or a maximum number of members in the set of good parameter vectors.Another option might be a check that assesses the stability of the estimated set.We suggest to carry out some test runs with different limits and check the stability of the estimated parameter vectors.For further details regarding this issue, we refer to Gill et al. (2006) and Cabrera and Coello (2010).
In contrast to normal single-objective approaches, it stores all so far approximated Pareto optimal parameter vectors in a set Ã which is used to direct the search towards the complete region enclosing the Pareto optimal parameter vectors.Differently to other MO-PSO approaches (e.g.Gill et al., 2006), the the so-called "personal" global best for each particle is not the closest particle of the so far approximated Pareto optimal set Ã, but a random member of Ã.Furthermore, each particle has a local best that is just updated by the current position in case that it weakly dominates all so far found positions of the particle.After a given number of iterations, the set Ã holds a number of parameter vectors that represent the Pareto set of the given multi-optimisation problem.We assume that the introduction of additional features to a normal PSO search strategy can outperform other simple multi-objective calibration strategies and might be a useful complement to existing advanced multi-method optimisation strategies, e.g. the AMALGAM algorithm.Therefore, we integrated the developed algorithm as an additional search strategy into the AMALGAM framework.In the following, the extended AMALGAM approach will be denoted by AMALGAM * .The AMALGAM framework contains a simple handling of boundary constraints.Infeasible solutions that are out of the bounds are just set to the bounds in order to preserve their feasibility.In order to enable the framework to deal with more complex bounds, we added some features of a constraint handling technique based on adaptive penalty functions and a distance measure proposed by Woldesenbet et al. (2007).This method uses the number of feasible individuals in the population in order to be able to control whether a modified objective function focusses just on the objective values or the constraint violation.As long as all members of the current population are feasible, the objectives remain unchanged.

Robust parameter estimation
The application of automatic calibration procedures for model parameter estimation often completely neglects possible uncertainties in the observations used to quantify the matching of simulated values and measurements.Bárdossy and Singh (2008) studied this problem: due to the complexshaped response surface and the erroneous observations, the solution of the optimisation problem can lead to very different near-optimum parameter vectors that correspond to a much different model performance on the validation data.
The actual goal of a good model calibration should not be to find parameter vectors that perform best for the calibration period but to find parameter vectors that -lead to good model performance over the selected time period; -are not sensitive: small changes of the parameters should not lead to very different results; -are transferable: they perform well for other time periods and might also perform well on other catchments.2 Typically, such parameter vectors also lead to a hydrologically reasonable representation of the corresponding processes in the context of the possibilities of the used model.According to Bárdossy and Singh (2008), we call such parameter vectors robust.There are two possibilities to improve existing calibration approaches in order to identify robust parameter vectors.One starting point that recently attracted ris- ing scientific interest is a more intelligent selection of the calibration data (see Wagener and Wheater, 2002;Wagener and Gupta, 2005;Thyer et al., 2006); another one is the development of advanced methods for the identification of robust model parameters.
A methodology that focusses on the latter approach is the robust parameter estimation approach (ROPE).It was first presented by Bárdossy and Singh (2008).The ROPE approach is based on the application of the principle of data depth in order to sample robust parameter vectors.Data depth is a statistical method used for multivariate data analysis which has recently attracted a lot of research interest (e.g.Cramer, 2003;Liu et al., 2006).A specific data depth function assigns a numeric value to a given point which corresponds to its centrality, with respect to a set of points.This approach provides a center-outward ordering of points in Euclidean space of any dimension with respect to a given point set or distribution.This provides the possibility of a new nonparametric multivariate statistical analysis in which no distributional assumptions are needed.Recent studies of computational geometry and multivariate statistics (e.g.Liu et al., 2006;Bremner et al., 2008) showed that members with high depth with respect to a given point set, are more robust in order to represent the underlying distribution than a whole set.The deep points can be identified using a data depth metric implemented in a data depth function.Most proposed metrics used in data depth functions are inherently geometric, with a numeric value assigned to each data point that represents its centrality within the given dataset.The depth median, the point of maximal depth, is the depth-based estimator for the center of the dataset.Depth contours can be used to visualize and quantify the data.For instance they can be used to define central regions enclosing a part of the space with high depth.The concept of data depth is illustrated in Fig. 1 by a simple 2-dimensional example.For a random point set in R 2 , the data depth was computed for each point of the set with respect to the point set itself.The used depth function is the halfspace depth.According to its originator, it is also known as Tukey depth (Tukey, 1975).It is one of the best known among the data depth measures in nonparametric statistics, and in discrete and computational geometry and has proved to be a very robust measure in order to identify the center of a multivariate dataset (e.g Rousseeuw and Struyf, 1998;Cramer, 2003;Serfling, 2006) where u ranges over all vectors in R d with ||u|| = 1.Within this equation, the symbol just indicates that the labelled vector is transposed.
Very often the halfspace depth is normalised by dividing the "hdepth" value by the number of points in the set Z: The principle of the halfspace depth of a point θ with respect to a point set Z in R 2 is illustrated in Fig. 2. The plot shows some lines through θ that split the parameter space into two halfspaces.Obviously, each possible halfspace through θ contains at least two points of the given set.Hence, the non-normalised halfspace depth hdepth(θ | Z) = 2.For any further details of the halfspace depth, refer to Tukey (1975); Donoho and Gasko (1992); Rousseeuw and Struyf (1998).
As mentioned above, Bárdossy and Singh (2008) developed a method entitled ROPE that applies the principle of data depth in order to identify a set of robust model parameters.The approach was further developed by Krauße and Cullmann (2012a).In general all ROPE approaches consist of two steps: 1.In a first step, a set of model parameter vectors with a reasonable good model performance is identified.The  depth with respect to a given point set, are more robust in order to represent the underlying distribution than whole set The deep points can be identified using a data depth metri implemented in a data depth function.Most proposed met rics used in data depth functions are inherently geometric with a numeric value assigned to each data point that rep resents its centrality within the given data set.The depth median, the point of maximal depth, is the depth based esti mator for the center of the data set.Depth contours can b used to visualize and quantify the data.For instance they can be used to define central regions enclosing a part of the spac with high depth.The concept of data depth is illustrated in Fig. 1 by a simple 2-dimensional example.For a random point set in R 2 the data depth was computed for each poin of the set with respect to the point set itself.The used depth estimated parameter vectors achieve the best possible performance with respect to the given objectives and calibration data.Thus, they are from now on called the good parameter vectors.For single-objective problems, the good parameter vectors can be expressed by a set comprising the global optimum plus an uncertainty bound that depends on the assumed uncertainties.
Considering multi-objective calibration, we suggest to express the good parameter vectors by a set of nondominated parameter vectors, i.e. the Pareto optimal set.
2. Afterwards, a set of parameter vectors deep with respect to the previously identified set of good parameter vectors is generated under the assumption that those parameter vectors are more robust than the complete set of good parameter vectors. 3 The depth-based sampling with ROPE can be very useful for the estimation of robust hydrologic model parameters.Different studies with the semi-distributed HBV model focussing on the simulation of the water balance (Bárdossy and Singh, 2008) and the modelling of flood events with a distributed hydrologic model (Krauße and Cullmann, 2012a) illustrate the advantages and potential of this approach.
3 "The reason for this is that one assumes that the low depth points can be regarded as an iso-hypersurface corresponding to the selected level.If one assumes continuity of the objective function then higher values of the function are expected in the interior of the set" (Bárdossy andSingh, 2008, p. 1278 Krauße and Cullmann (2011a).Consider that the data depth is computed within the parameter space.3: return D Ã

Synthesizing multi-objective optimisation and robust parameter estimation
We propose to synthesize the strengths of multi-objective optimisation and robust parameter estimation with data depth functions in a new algorithm, entitled multi-objective robust parameter estimation (MO-ROPE).It synthesizes the advantages of a multi-objective view and the ROPE approach.The approach firstly applies a suitable multi-objective optimisation algorithm in order to estimate a set of non-dominated solutions Ã, thus the Pareto optimal set.A suitable method to do this task could be one of the methods discussed above, e.g. the AMALGAM approach.In a second step, the algorithm samples a set of parameter vectors that are deep with respect to the previously identified set Ã.The result is a set of parameter vectors D Ã that has high data depth with respect to the set Ã. We propose to do the sampling using a stratified approach called GenDeep.It provides advantages for the sampling from non-elliptic, banana-shaped and multimodal distributions.That might be a crucial in case of Pareto optimal sets that are distributed non-linearly or even over different distinct regions in the parameter space.The Gen-Deep strategy is thoroughly discussed in Krauße and Cullmann (2012b).It was already successfully applied for the sampling of deep parameter vectors within the frame of a further developed single-objective ROPE strategy (Krauße and Cullmann, 2012a).Consider that MO-ROPE samples the deep parameters in the parameter space and not in the objective space.Hence, the estimated robust solutions are just dependent on the geometrical structure of the previously identified Pareto optimal set.Any further assumptions regarding realistic ranges of the considered objectives are not required.This makes it an in principle different approach in comparison to other published approaches that distinguish robust and non-behavioural solutions by subjective cut-off thresholds in the objective space (e.g.Efstratiadis and Koutsoyiannis, 2010).Although this solution can easily select robust parameter vectors, it requires additional assumptions on the optimal range of the considered objectives.Furthermore, the non-behavioural solutions might not always exclusively correspond to the extreme tails of the Pareto front.Hence, the depth-based sampling completely forgets about the location of the Pareto optimal solutions in the objective space.
It just tries to identify parameter vectors that are located in the central regions of the Pareto optimal set in the parameter space.A pseudocode listing of the proposed MO-ROPE approach is provided in Algorithm 3. The developed solution enables the possibility to apply the ROPE approach on multiobjective calibration problems.The algorithm was implemented in MATLAB and C ++ and is embedded in a robust parameter estimation framework which comprises other published robust parameter estimation approaches.The framework is open source and available from the author.
3 Case studies

Preliminary case study: investigating the effectivity and efficiency of different multi-method multi-objective algorithms using common test problems
The published results of single-objective robust parameter estimation approaches have shown that the quality of the set of good parameter vectors plays an important role for the robustness of the estimated model parameter vectors (e.g.Krauße and Cullmann, 2012a).This is why an effective estimation of the non-dominated solutions in the first algorithmic step is an important prerequisite for a successful application of a multi-objective robust parameter estimation approach.Thus, in a preliminary case study, we applied the multi-objective optimisation algorithms discussed in the previous section in order to approximate the Pareto set for several benchmark problems with analytical solutions.We compare the performance of the stand-alone MO-PSO-GA approach with the results obtained by other simple evolutionary multi-objective optimisation strategies.Furthermore, we discuss the advantages of the integration of this hybrid approach into the advanced multi-method algorithm AMALGAM.We compare the original version of AMALGAM that already applies four approved multi-objective optimisation algorithms4 and an extended version called AMALGAM * that comprises the MO-PSO-GA approach as well.
For the first problem, entitled COELLO function, we followed the settings in Gill et al. (2006).We ran the the algorithms with a population size of 30 and set the maximum number of function evaluations to 5000.The true front of and the non-dominated front estimated by MO-PSO-GA are shown in Fig. 3.It is obvious that the true front is approximated well and all sections of the front are uniformly covered.The same holds true for the estimates of both considered AMALGAM modifications.Thus, for an objective comparison of the approximated front against the true front, we calculated two performance metrics and compared them where 2 ≤ j ≤ (N + 1) Table 3. Performance metrics for the non-dominated fronts estimated by several evolutionary multi-objective optimisation algorithms edited according to Gill et al. (2006).
The results are provided in Table 3.The metrics considered in this comparison are the generational distance (GD) metric introduced by van Veldhuizen and Lamont (1998) that measures the distance between the elements of the estimated non-dominated set and the known true front, and the spacing metric (SP) according to Schott (1995) that measures the mutual distance between the elements of the estimated nondominated front.Lower values of GD and SP denote a better approximation and a more uniform spread, respectively, with zero being the optimum.We abstain from a more detailed introduction of these measures, as they are just used for this small comparison.The results show that all three suggested algorithms provide excellent results for the COELLO problem.The MO-PSO-GA approach outperforms existing PSO approaches and obtains results that are almost equivalent to the results of AMALGAM and AMALGAM * .In a second test, we consider a problem that was proposed and used as a benchmark by Vrugt et al. (2003).That is why we refer to this benchmark as VRUGT.It is originally defined for two dimensions with three objectives.The Pareto solution set consists of a triangular-shaped area in the parameter space, having the corner points (0, 0), (0, 1) and (1, 0) for x 1 and x 2 , respectively.We defined an extended version that is defined for any dimension greater than or equal to two.It is defined by N + 1 objectives.In a first simple experiment, we estimated the Pareto set for the 2-dimensional case.A higher dimensional version with 30 dimensions is considered in a following case study that studies the advantages of deep parameter vectors.According to Vrugt et al. (2003), we set set the population size to values of 10, 20, 50, and 100, but limited the number of iterations to 50.Thus, the maximum number of function evaluations is limited to 500, 1000, 2500 and 5000.
The estimated parameters for the MO-PSO-GA and AMAL-GAM are given in Figs. 4 and 5.The results of AMALGAM * are just about equal and are thus neglected here.The results estimated by MO-PSO-GA and AMALGAM have a similar quality.A detailed visual comparison with published results estimated by MOSCEM (Vrugt et al., 2003) and two other multi-objective PSO implementations (Coello et al., 2004;Gill et al., 2006) confirms that the non-dominated front estimated by MO-PSO-GA is less clustered and provides a fairly better approximation of the true Pareto front.Already the run with population size 20 and a corresponding number of function evaluations of just 1000 provides better results.This underlines the efficiency of the MO-PSO-GA algorithm in comparison with other single-method search strategies.The previous benchmarks underlined the efficiency of the MO-PSO-GA algorithm in comparison with other simple multi-objective optimisation.However, the test problems were still far too easy to notice any differences between the different suggested algorithms.That is why we tested the suggested solutions on another set of more complex and well-known multi-objective benchmark problems provided by Zitzler et al. (2000), Deb et al. (2002) and Fonseca and Fleming (1993) that are commonly used in literature (e.g.Vrugt and Robinson, 2007) where 2 ≤ j ≤ (N + 1)

Convex
True front Estimated front  plies four approved multi-objective optimisation algorithms 3 and an extended version called AMALGAM * that comprises the MO-PSO-GA approach as well.
For the first problem, entitled COELLO function we followed the settings in Gill et al. (2006).We ran the the algorithms with a population size of 30 and set the maximum number of function evaluations to 5.000.The true front of and the non-dominated front estimated by MO-PSO-GA are 3 Table 3. Performance metrics for the non-dominated fronts estimated by several evolutionary multi-objective optimisation algorithms edited according to Gill et al. (2006) Algorithm GD SP AMALGAM * 0.0005 0.0570 AMALGAM 0.0005 0.0581 MO-PSO-GA 0.0006 0.0621 MOPSO (Gill et al., 2006) 0.0122 0.1415 MOPSO (Coello et al., 2004)  shown in Fig. 3.It is obvious that the true front is approximated well and all sections of the front are uniformly covered.The same holds true for the estimates of both considered AMALGAM modifications.Thus, for an objective comparison of the approximated front against the true front we calculated two performance metrics and compared them with a published overview according to Gill et al. (2006).
The results are provided in Table 3.The metrics considered in this comparison are the generational distance (GD) metric introduced by van Veldhuizen and Lamont (1998) that measures the distance between the elements of the estimated non-dominated set and the known true front, and the spacing metric (SP) according to Schott (1995) that measures the over the used benchmarks is given in Table 4. Pregenerated, uniformly spaced, reference sets representing the known exact Pareto optimal sets for the given optimisation problems are provided by http://www.tik.ee.ethz.ch/sop/download/supplementary/testproblems.We chose the versions with 1000 members each.According to the studies conducted in Vrugt and Robinson (2007), we applied each considered optimisation algorithm for the approximation of the Pareto optimal sets for the given test problems using a population size of 100 points in combination with 150 generations.Each optimisation run was repeated 30 times in order to avoid an interference of the results by outliers.In addition to the three considered algorithms MO-PSO-GA, AMAL-GAM and AMALGAM * , we also considered the NSGA-II algorithm according to Deb et al. (2002).It was subject of attention in many scientific publications and demonstrated superiority over many existing multi-objective optimisation methods.Furthermore, it already served as a reference in the first study with AMALGAM presented by Vrugt and Robinson (2007).Thus, it is a good indicator to assess the performance of the hybrid MO-PSO-GA algorithm.
The goal of a multi-objective optimisation algorithm is to approximate the Pareto optimal set of a given optimisation problem.The Pareto set in the parameter space corresponds to a Pareto front in the objective space.Thus, a good algorithm should estimate solutions that correspond to a non-dominated front that approximate the true Pareto front as close as possible.With the evolution of multi-objective  zero being the optimum.We abstain from a more detailed introduction of these measures, as they are just used for this small comparison.The results show that all three suggested algorithms provide excellent results for the COELLO problem.The MO-PSO-GA approach outperforms existing PSO approaches and obtains results that are almost equivalent to the results of AMALGAM and AMALGAM * .In a second test we consider a problem that was proposed and used as a benchmark by Vrugt et al. (2003).That is why we refer to this benchmark as VRUGT.It is originally defined for two dimensions with three objectives.The Pareto solution set consists of a triangular-shaped area in the parameter space, having the corner points (0,0), (0,1) and (1,0) for x 1 and x 2 , respectively.We defined an extended version that is defined for any dimension greater or equal two.It is defined by N + 1 objectives.In a first simple experiment we estimated the Pareto set for the 2-dimensional case.A higher dimensional version with 30 dimensions is considered in a following case study that studies the advantages of deep parameter vectors.According to Vrugt et al. (2003) we set set the population size to values of 10, 20, 50, and 100, but limited the number of iterations to 50.Thus, the maximum number of function evaluations is limited to 500, 1.000, 2.500 and 5.000.The estimated parameters for the MO-PSO-GA and AMALGAM are given in Fig. 3.1 and Fig. 4 respectively.The results of AMALGAM * are just about equal and are thus neglected here.The results estimated by MO-PSO-GA and AMALGAM have a similar quality.A detailed visual comparison with published results estimated by MOSCEM (Vrugt et al., 2003) and two other multi-objective PSO implementations (Coello et al., 2004;Gill et al., 2006) confirms that the non-dominated front estimated by MO-PSO-GA is less clustered and provides a fairly better approximation of the true Pareto front.Already the run with population size 20 and a corresponding number of function evaluations of just 1.000 provides better results.This underlines the efficiency of the MO-PSO-GA algorithm in comparison with other single-method search strategies.
The previous benchmarks underlined the efficiency of the MO-PSO-GA algorithm in comparison with other simple multi-objective optimisation.However the test problems were still far too easy to notice any differences between the different suggested algorithms.That is why we tested the suggested solutions on another set of more complex and well known multi-objective benchmark problems provided by Zitzler et al. (2000), Deb et al. (2002) and Fonseca and Fleming (1993) that are commonly used in literature (e.g.Vrugt and Robinson, 2007).An overview over the used benchmarks is given in Table 4. Pre-generated uniformly spaced reference sets representing the known exact Pareto optimal sets for the given optimisation problems are provided by http://www.tik.ee.ethz.ch/sop/download/supplementary/testproblems.We chose the versions with 1.000 members each.According to the studies conducted in Vrugt and Robinson (2007) we ap- zero being the optimum.We abstain from a more detailed introduction of these measures, as they are just used for this small comparison.The results show that all three suggested algorithms provide excellent results for the COELLO problem.The MO-PSO-GA approach outperforms existing PSO approaches and obtains results that are almost equivalent to the results of AMALGAM and AMALGAM * .In a second test we consider a problem that was proposed and used as a benchmark by Vrugt et al. (2003).That is why we refer to this benchmark as VRUGT.It is originally defined for two dimensions with three objectives.The Pareto solution set consists of a triangular-shaped area in the parameter space, having the corner points (0,0), (0,1) and (1,0) for x 1 and x 2 , respectively.We defined an extended version that is defined for any dimension greater or equal two.It is defined by N + 1 objectives.In a first simple experiment we estimated the Pareto set for the 2-dimensional case.A higher dimensional version with 30 dimensions is considered in a following case study that studies the advantages of deep parameter vectors.According to Vrugt et al. (2003) we set set the population size to values of 10, 20, 50, and 100, but limited the number of iterations to 50.Thus, the maximum number of function evaluations is limited to 500, 1.000, 2.500 and 5.000.The estimated parameters for the MO-PSO-GA and AMALGAM are given in Fig. 3.1 and Fig. 4 respectively.The results of AMALGAM * are just about equal and are thus neglected here.The results estimated by MO-PSO-GA and AMALGAM have a similar quality.A detailed visual comparison with published results estimated by MOSCEM (Vrugt et al., 2003) and two other multi-objective PSO implementations (Coello et al., 2004;Gill et al., 2006) confirms that the non-dominated front estimated by MO-PSO-GA is less clustered and provides a fairly better approximation of the true Pareto front.Already the run with population size 20 and a corresponding number of function evaluations of just 1.000 provides better results.This underlines the efficiency of the MO-PSO-GA algorithm in comparison with other single-method search strategies.
The previous benchmarks underlined the efficiency of the MO-PSO-GA algorithm in comparison with other simple multi-objective optimisation.However the test problems were still far too easy to notice any differences between the different suggested algorithms.That is why we tested the suggested solutions on another set of more complex and well known multi-objective benchmark problems provided by Zitzler et al. (2000), Deb et al. (2002) and Fonseca and Fleming (1993) that are commonly used in literature (e.g.Vrugt and Robinson, 2007).An overview over the used benchmarks is given in Table 4. Pre-generated uniformly spaced reference sets representing the known exact Pareto optimal sets for the given optimisation problems are provided by http://www.tik.ee.ethz.ch/sop/download/supplementary/testproblems.We chose the versions with 1.000 members each.According to the studies conducted in Vrugt and Robinson (2007) we ap- optimisation algorithms, a lot of metrics have emerged in order to quantify the effectivity and efficiency of compared approaches.We follow the suggestions of Vrugt and Robinson (2007) and use the following three metrics proposed in Deb (2001) and Deb et al. (2002) in order to measure the effectivity and efficiency of the compared optimisation algorithms.
The convergence metric (Y) measures the extent of convergence of the approximated set of non-dominated solutions Ã to the known true set of Pareto optimal solutions P .The metric Y computes by the average of the Euclidean distance of each point in Ã to its closest neighbour in P .The closer the value of Y is to zero, the better is the matching of the estimated solutions to the true Pareto optimal set.Consider that the distances are computed in the objective space.
Another important performance criterion for evolutionary algorithms for multi-objective optimization problems is the diversity metric .It measures the extent of the spread or diversity of the estimated solutions along the true Pareto front.In order to calculate this metric, we compute the Euclidean distances between two consecutive solutions in Ã.These distances are stored in a vector d with length L − 1 where L denotes the cardinality of the set Ã.In a second step, we calculate the Euclidean distances between the extrapolated extreme solutions of the true Pareto front and the boundary solutions of the obtained non-dominated set, d f and d l .Thus, d f denotes the distance between the first (leftmost) point in the obtained approximation and the beginning of the true front.The same applies for d l and the last (rightmost) point of the obtained Pareto approximation.The metric is a measure of the deviation of the individual distance between the consecutive solutions in Ã from its mean value weighted by the distance of the extreme tails of Ã from the tails of the true front P .Thus is computed by A perfect approximation of the true Pareto front would have a value of zero.A perfect distribution, whose tails are ideally identical to those of the true front, would correspond to d f = d l = 0. Furthermore, such an approximation should be perfectly distributed so that all d i = d.This leads to = 0. Hence, smaller values for this metric indicate a better spread and uniform distribution along the front.
The relative hypervolume indicator (RHV) quantifies both convergence and diversity of the so far estimated approximation within a single measure, and is therefore one of the best unary measures available to diagnose whether the nondominated solution set estimated by an optimisation algorithm is a good approximation of the true Pareto set.This metric is computed by one minus the ratio between the Hydrol.Earth Syst.Sci., 16, 3579-3606, 2012 www.hydrol-earth-syst-sci.net/16/3579/2012/ objective space dominated by the estimated non-dominated set, and the objective space dominated by the given true Pareto set.
RHV = 1 − HV( Ã) Thus, a perfect fit of the true front corresponds to a relative hypervolume indicator of zero.Following Vrugt and Robinson (2007), we use a relative hypervolume indicator with value of less than 0.005 as a criterion indicating that the estimated solution Ã is sufficiently close to the true Pareto set P , and counts the function evaluations carried out so far in order to measure the efficiency of the optimisation algorithms.
The effectivity of the considered optimisation algorithms in terms of the metrics Y and is provided in Table 5.The obtained results for the existing algorithms AMALGAM and NSGA-II are close to the values published in literature (Vrugt and Robinson, 2007).This indicates that the framework was set up correctly.Of particular relevance are however the results of the proposed approaches MO-PSO-GA and AMALGAM * .Aside from the ROT problem, the MO-PSO-GA approach provides results that are comparable or better to those estimated by NSGA-II.Notwithstanding these convincing results, the achieved performances are one step worse than the results yielded by AMALGAM.However, the already excellent results of the AMALGAM approach can even be slightly improved by an extension with the MO-PSO-GA approach.For all test problems, the AMALGAM * provides the best results for all benchmark problems considered.An exception is again the ROT problem where the additional MO-PSO-GA approach does not provide any additional improvements.These results both confirm the superiority of a well-founded multi-method approach, such as the AMAL-GAM approach.Furthermore, the benefits of additional single or hybrid search strategies can be used in order to improve the results of advanced multi-method approaches.It also confirms Wolpert and Macready's "no free lunch" theorem that shows that it is impossible to construct one single (parameter) search algorithm that will always outperform any other algorithm (Wolpert and Macready, 1997).Further research is required to estimate a perfect set of strategies in order to obtain a multi-method approach that yields the best possible results.According to the obtained results, we suggest to use the AMALGAM * approach for the estimation of the Pareto optimal set.

Studying the effectivity of depth-based sampling
for multi-objective calibration problems that are subject to uncertainty

Uncertainty in synthetic test cases
In the previous case study, we studied different strategies that are suitable to approximate the Pareto optimal set of a given multi-objective optimisation problem.This is a nec- 7. The region defined by the deep parameter vectors corresponds to central parts of the Pareto front.However, there is no direct relationship to a simple cutoff value or even a direct one to one relationship that maps just deep parameter vectors to the central part of the Pareto front.This is in particular underlined by the distribution of the hull that corresponds to larger parts of the tails but also contains members in the central region of the Pareto front.The advantage of the depth based sampling approach is illustrated by Fig. 8.It shows the density of the distribution of all possible Pareto fronts for VRUGT u 2 that were estimated for the different realisations of f and x .Furthermore there is an overlay with scatterplots that represent the distribution of the estimated approximation of the Pareto set of VRUGT ( Ã), the corresponding deep parameter vectors (D Ã) and the hull (H  essary and important step towards the identification of robust parameter vectors.The main thrust of the MO-ROPE approach is however the sampling of deep parameter vectors that are considered to be robust and thus a more representative solution in environments that are affected by a considerable degree of uncertainty.In such situations that are common for real-world problems, the Pareto set cannot be uniquely identified.The erroneous data can even lead to very different optimal parameter vectors.This issue is related to the requirement that robust parameter vectors should be as insensitive as possible.In this regard the estimated parameter vectors are considered as sensitive if a small change of the whole vector might lead to a big change in the performance of the model.To consider this issue, we just slightly altered the estimated parameter vectors and studied the results for the whole identified set, the deep parameters and the hull.The alteration was done using the following equation: where x is a multi-dimensional random independent Gaussian error with mean zero and a standard deviation that equals one percent of the range of the parameter in the estimated Pareto set in each considered dimension.Consider that this procedure alters the parameter vectors even less than in a similar experiment for sets of good parameters estimated by single-objective optimisation algorithms conducted by Bárdossy and Singh (2008).Furthermore, we assumed that the actual values of the objective function cannot be exactly identified.That is due to errors in the observations.We consider this issue as follows.For a given multi-objective calibration problem F = {f 1 . . .f m }, we defined a corresponding problem  an additional uncertainty term.The uncertainty is subject to the following error model: where f i is a random independent Gaussian error with constant variance.We assumed mean and standard deviation of just one percent of the range of the objective f i in the true Pareto front of the problem original problem.Even though the defined error models are quite subjective, the resulting disturbances of both the objectives and the parameters are still well below the typical changes observed in hydrologic model calibration and should not affect the performance to a too large extent.We now conducted the following experiment in order to study the advantages of the deep parameter vectors and the effectivity of the depth-based sampling: 1.In a first step, we applied the MO-ROPE framework in order to estimate robust parameter vectors for a multiobjective calibration problem F .According to the results of the previously presented case study we chose the AMALGAM * for the approximation of the Pareto optimal set Ã.    Pareto-optimal set.

Uncertainty in the synthetic application of a hydrological model
In the previous case study we showed the advantages of the depth-based sampling in comparison with a pure multiobjective optimisation using synthetical benchmark experiments.One last experiment should establish that the MO-ROPE method really produces robust parameter vectors in the context of hydrological model calibration.The experiment is done carrying out the following steps: 1. We chose a simple lumped hydrological model and used observation data to generate s synthetic discharge series.
2. Afterwards we added noise to the observations and the synthetic discharge.The noisy data was splitted into a calibration and validation period in three different ways.Pareto-optimal set.

Uncertainty in the synthetic application of a hydrological model
In the previous case study we showed the advantages of the depth-based sampling in comparison with a pure multiobjective optimisation using synthetical benchmark experiments.One last experiment should establish that the MO-ROPE method really produces robust parameter vectors in the context of hydrological model calibration.The experiment is done carrying out the following steps: 1. We chose a simple lumped hydrological model and used observation data to generate s synthetic discharge series.
2. Afterwards we added noise to the observations and the synthetic discharge.The noisy data was splitted into a calibration and validation period in three different ways.parameter vectors D Ã. Additionally, we generated another set that comprises all vectors in Ã with shallow depth.We call it the hull of Ã and denote it by H Ã. Consider once again that the deep parameters are sampled in the parameter space and not in the objective space.This means that the depth-based sampling does not require any a priori knowledge of the used objectives.
2. In a second step, we compared the performance of the complete estimated Pareto set and the sampled deep parameter vectors on the problem F u .We generated 10 000 different realisations of the errors f i and x .
For each realisation, we altered the parameter vectors and identified the corresponding Pareto optimal set as a subset of Ã and D Ã. Now we compared the performance of the complete approximated Pareto set Ã, the sampled deep parameter vectors D Ã and its hull H Ã regarding their ability to represent the set of true Pareto sets for F u .
We performed this experiment for the test problems VRUGT, FON and ROT in lower and higher dimensions that were already considered in the previous case study.For the problems VRUGT and FON, a subscript indicates the used number of dimensions.The Pareto set of the problem VRUGT is a multi-dimensional more or less convex region where the sampling of deep parameter vectors might be possible.The same applies, to a lesser extent, for the prob-lems FON and ROT where the true Pareto set is a multidimensional line that is usually approximated by a cigarshaped set.The same applies for reasonable models that simulate natural or technical processes where the Pareto optimal parameters also form such geometric structures (e.g.Bárdossy, 2007;Vrugt et al., 2003).We did not consider the problems ZDT1-ZDT6 where the corresponding Pareto set is a one-dimensional line with just one free parameter.In such cases, the sampling of deep parameters is not useful.
The principle of the depth-based sampling from the Pareto optimal sets is illustrated using the example of the problem VRUGT u 2 .The estimated non-dominated solutions Ã, the accordingly sampled deep parameter vectors and the hull in both parameter and objective space are provided by Figs. 6  and 7.The region defined by the deep parameter vectors corresponds to central parts of the Pareto front.However, there is no direct relationship to a simple cutoff value or even a direct one to one relationship that maps just deep parameter vectors to the central part of the Pareto front.This is in particular underlined by the distribution of the hull that corresponds to larger parts of the tails but also contains members in the central region of the Pareto front.The advantage of the depth-based sampling approach is illustrated by Fig. 8.It shows the density of the distribution of all possible Pareto fronts for VRUGT u 2 that were estimated for the different realisations of f and x .Furthermore, there is an overlay with scatter plots that represent the distribution of the estimated  parameter vectors in terms of the performance metrics Y and , and the mean sum of all squared objective functions.The metrics were calculated with respect to the individually identified non-dominated front (as a subset of the originally approximation) for each realisation of the perturbances x and f .The statistics represent the mean values over 10 000 realisations.

Uncertainty in the synthetic application of a hydrological model
In the previous case study we showed the advantages of the depth-based sampling in comparison with a pure multiobjective optimisation using synthetical benchmark experiments.One last experiment should establish that the MO-ROPE method really produces robust parameter vectors in the context of hydrological model calibration.The experiment is done carrying out the following steps: 1. We chose a simple lumped hydrological model and used observation data to generate s synthetic discharge series.
2. Afterwards we added noise to the observations and the synthetic discharge.The noisy data was splitted into a calibration and validation period in three different ways.approximation of the Pareto set of VRUGT ( Ã), the corresponding deep parameter vectors (D Ã) and the hull (H Ã) in the objective space of the problem VRUGT u 2 .In comparison with the complete Pareto set, the deep parameter vectors do not just show a better convergence to the set of true Pareto fronts for the problem considering uncertainty.According to the distributions of the distribution metric given in Fig. 9, they also provide a better mean approximation of the spread of the true Pareto fronts for the problem VRUGT u 2 .The advantages of the depth-based sampling are not limited to simple low-dimensional problems.The results for all considered problems including uncertainty are quantified in Table 7.It compares both performance metrics used in the previous case, i.e. the convergence metric Y that measures the extent of convergence to the actual Pareto sets for F u and the diversity metric that provides information about the spread of the solutions along the Pareto front.Furthermore, it provides the mean sum of the squares of all individual objectives.For each realisation of the errors x and f , the metrics were computed with respect to the individually identified non-dominated front (as a subset of the originally approximation Ã).For all given test problems, the sampled deep parameter vectors provide, in comparison with the complete estimated Pareto set, a better mean approximation of the actual Pareto sets in the objective space considering uncertainty regarding both the matching and a uniform distribution.Furthermore, the mean sum of the squares of all objectives is less for the deep parameter vectors.The effectivity of the depth-based sampling becomes even more objectionable if one considers the results for the parameter vectors with shallow depth, i.e. the hull.The parameters in the hull show the least suitable regarding its ability to represent a robust solution for the set of potential Pareto solution in an uncertain environment.Consider that the benefit of the depthbased sampling is greater for the problem VRUGT where the Pareto optimal set comprises a larger region.Typically, this is the case for hydrologic models.The results show that the depth-based sampling might be a suitable method to identify robust solutions with respect to a previously identified Pareto optimal set.

Uncertainty in the synthetic application of a hydrologic model
In the previous case study, we showed the advantages of the depth-based sampling in comparison with a pure multiobjective optimisation using synthetical benchmark experiments.One last experiment should establish that the MO-ROPE method really produces robust parameter vectors in the context of hydrologic model calibration.The experiment is done carrying out the following steps: 1. We chose a simple lumped hydrologic model and used observation data to generate synthetic discharge series.
2. Afterwards, we added noise to the observations and the synthetic discharge.The noisy data were split into a calibration and validation period in three different ways.
Hydrol  [h] 20 60 3. The model was now calibrated and validated for each realisation comparing AMALGAM and the proposed MO-ROPE approach.
The used observations were generated using the measurements in the Rietholzbach catchment.The used hydrologic model is described below.The model is a very simple lumped model consisting of two linear storage units.It takes both precipitation (prec) and potential evapotranspiration (etp) as input values and computes the corresponding discharge with two linear storage units.In a first step the precipitation and potential evapotranspiration are used to compute the effective precipitation as given in Eq. ( 7).
The effective precipitation p eff is distributed to two linear storage units using the parameter a as defined in Eq. ( 8).
The linear storage units are filled with the computed inflows.Each of both linear storage units generates discharge proportional to the stored water content V w as defined in Eq. ( 10).
The storage cons k describes how fast the storage unit drains the stored water content to the outlet.
The total discharge is the sum of the two outflows q out 1 and q out 2 .Thus, the calibration problem consists of the three parameters a, k 1 and k 2 with the boundaries given in Table 8.The boundaries for the storage coefficients k 1 and k 2 account for the fact that one linear storage unit should represent the faster runoff components while the other one models the interflow and groundwater discharge.The initial storage content was fixed to V w 1 = 0.015 for the faster responding storage unit and V w 2 = 5 for the less responsive one.
The data basis was the meteorological observations for 24 historic flood events in the Rietholzbach catchment (for details, see Sect. 4).We combined the time periods in three different ways for calibration and validation sets as shown in Table 9.The observations were altered by adding Gaussian noise as given in Eq. ( 11).This was repeated 100 times resulting in 100 different possible observation time series.Additionally, the generated synthetic discharge time series were altered as defined in Eq. ( 13). with The lumped model was now calibrated with respect to two objective functions: the relative deviation of simulated and observed peak runoff (rPD) and the Nash-Sutcliffe efficiency (NS).For details, see Table 13 in the following section.The mean validation results for both a pure multi-objective calibration using the AMALGAM algorithm and the corresponding deep parameter vectors are given in Tables 10 and 11.It is obvious that the deep parameters provide a slightly better model performance in mean and have less outliers on the negative side of the performance scale in terms of both considered performance criteria.This is also indicated by a slightly smaller standard deviation of the model performances.These results show that the advantages of the depthbased sampling approach apply not only for completely synthetic benchmarks but also for the calibration of a simple hydrologic model in a controlled environment.This underlines the possibilities of the depth-based sampling for the estimation of robust hydrologic model parameters.

Case study area and the hydrologic model
In a real world application, the MO-ROPE approach is tested on the calibration of the distributed hydrologic model WaSiM-ETH /6.4 model (further referred to as WaSiM). he model was developed by Schulla (1997).WaSiM transforms rainfall into runoff according to the scheme shown in Fig. 10.
It exemplary shows that the soil water compartments receive infiltration which is computed by a modified approach according to Green and Ampt (1911).This module is also used to determine the direct runoff Q d and the interflow Q ifl in    of hourly measurements (precipitation, temperature, wind speed, global radiation, and streamflow), 24 significant flood events were identified, such that the whole range of possible flood characteristics occurring in the catchment is well covered.An exception was made in that we did not use any winter events to avoid a blurring of the results due to the dynamics of snow accumulation and melt.For further details and a more comprehensive overview, refer to Krauße and Cullmann (2011a).
The WaSiM model has been chosen in this study because its physically based unsaturated zone module maintains the characteristic physics of dynamic rainfall-runoff processes even for unobserved events.This is especially important for correctly portraying the pre-event catchment conditions."WaSiM is therefore -amongst the available models -one of the best suited for extrapolation into the range of extreme flood events" (Cullmann, 2006, p. 20).That is why it has been widely used for the modelling of flood events (e.g.Marx, 2007;Cullmann et al., 2008;Grundmann, 2010).We performed this case study in the Rietholzbach catchment because it has a long record of hourly datasets and the perturbing impact of data heterogeneity is relatively small in this catchment.Furthermore, the WaSiM model has been thoroughly tested within this catchment (Schulla, 1997).Notwithstanding these advantages, the modelling of flood events in such a small catchment is a challenging task.Typically, the achievable model performance is just moderate and the modelling process is subject to many uncertainties that can hardly be quantified.However, the results of previous studies dealing with flood forecasting suggest that an improved parametrisation of headwater catchments can have a big impact on the model performance for gauging stations in the lower reaches that are usually monitored by operational flood forecasting centres (e.g.Cullmann, 2006;Grundmann, 2010).Many studies dealing with model calibration focussing on flood events showed that there is a tremendous tradeoff between a correct modelling of the peak flow values and a good representation of the catchment behaviour in terms of the streamflow at the outlet (e.g.Moussa and Chahinian, 2009;Grundmann, 2010).The goal of this case study is thus to study the advantages of the developed MO-ROPE strategy for the calibration of a process-oriented hydrologic model focussing on flood events with respect to multiple objectives.

Calibrating WaSiM for flood events using two objectives
In a first case study, we calibrated WaSiM using two objectives.The relative peak flow deviation (rPD) quantifies the agreement between observed and simulated peak flow value, whereas the coefficient of efficiency according to Nash and Sutcliffe (1970)   Table 13.Objective functions used in this study, where q x i , q y i (θ), and Θ x i , Θ y i (θ) are the observed and simulated discharge and m moisture at time-step i, respectively.The simulated values are computed by the parameter vector θ.n denotes the number of obser Name Description Formula NS Nash-Suttcliffe efficiency 1 − FloodSkill aggregate between NS and rPD N S − rP D r Θ moisture correlation coefficient The WaSiM model has been chosen in this study because its physically based unsaturated zone module maintains the characteristic physics of dynamic rainfall-runoff processes even for unobserved events.This is especially important for correctly portraying the pre-event catchment conditions."WaSiM is therefore -amongst the available models -one of the best suited for extrapolation into the range of extreme flood events" (Cullmann, 2006, p. 20).That is why it has been widely used for the modelling of flood events (e.g.Marx, 2007;Cullmann et al., 2008;Grundmann, 2010).We performed this case study in the Rietholzbach catchment because it has a long record of hourly data sets and the perturbing impact of data heterogeneity is relatively small in this catchment.Furthermore the WaSiM model has been thoroughly tested within this catchment (Schulla, 1997).Notwithstanding these advantages, the mode flood events in such a small catchment is a challengi Typically the achievable model performance is just ate and the modelling process is subject to many un ties that can hardly be quantified.However, the re previous studies dealing with flood forecasting sugg an improved parametrisation of headwater catchme have a big impact on the model performance for stations in the lower reaches that are usually monit operational flood forecasting centres (e.g.Cullman Grundmann, 2010).Many studies dealing with mo ibration focussing on flood events showed that th tremendous tradeoff between a correct modelling of flow values and a good representation of the catchm haviour in terms of the streamflow at the outlet (e.g.Table 12.Overview of the used model parameters considered for calibration; the reference parameter vector θ wb was estimated i use WaSiM for water-balance simulations in the Rietholzbach catchment; the parameterisation of the soil hydraulic parameters is each soil according to the pedotransfer functions provided in Wösten et al. (1999)   Table 13.Objective functions used in this study, where q x i , q y i (θ), and Θ x i , Θ y i (θ) are the observed and simulated discharge and moisture at time-step i, respectively.The simulated values are computed by the parameter vector θ.n denotes the number of obser

Name
Description Formula NS Nash-Suttcliffe efficiency 1 − FloodSkill aggregate between NS and rPD N S − rP D r Θ moisture correlation coefficient The WaSiM model has been chosen in this study because its physically based unsaturated zone module maintains the characteristic physics of dynamic rainfall-runoff processes even for unobserved events.This is especially important for correctly portraying the pre-event catchment conditions."WaSiM is therefore -amongst the available models -one of the best suited for extrapolation into the range of extreme flood events" (Cullmann, 2006, p. 20).That is why it has been widely used for the modelling of flood events (e.g.Marx, 2007;Cullmann et al., 2008;Grundmann, 2010).We performed this case study in the Rietholzbach catchment because it has a long record of hourly data sets 1997).Notwithstanding these advantages, the mod flood events in such a small catchment is a challeng Typically the achievable model performance is jus ate and the modelling process is subject to many u ties that can hardly be quantified.However, the r previous studies dealing with flood forecasting sug an improved parametrisation of headwater catchm have a big impact on the model performance for stations in the lower reaches that are usually moni operational flood forecasting centres (e.g.Cullman Grundmann, 2010).Many studies dealing with m ibration focussing on flood events showed that th Table 13.Objective functions used in this study, where q x i , q y i (θ), and x i , y i (θ) are the observed and simulated discharge and mean soil moisture at time step i, respectively.The simulated values are computed by the parameter vector θ .n denotes the number of observations.Name Description Formula NS Nash-Sutcliffe efficiency 1 − 1 n n i=1 (q x i −q y i (θ )) 2 1 n n i=1 q x i −q x 2 rPD rel.peak flow deviation FloodSkill aggregate between NS and rPD NS-rPD WaSiM, both criteria have a small correlation and are most suitable to quantify the model performance focussing on flood events.In a further case study, we additionally considered the correlation between the simulated soil moisture and the observed soil moisture in terms of the lysimeter weight in the catchment.An overview of objective criteria referred to in the following case studies is provided in Table 13.The model parameters considered for calibration are the conceptual model parameters k d , k i , "dr" and k rec .k d and k i are storage coefficients that control the outflow of linear storage events that collect the generated direct runoff and interflow, whereas "dr" is a scalar value that controls the generation of interflow.The conceptual soil parameter k rec determines the decrease of the saturated soil conductivity with increasing depth.Furthermore, we considered the soil hydraulic parameters of the dominating soils SL and SiL as additional calibration parameters.A reasonable a priori estimation of the distribution of the soil hydraulic parameters and a scaling of these values to one scaling parameter per soil (β SL and β SiL ) were done using an approach presented in Grundmann (2010).Previous studies with WaSiM focussing on flood events (e.g.Cullmann, 2006;Grundmann, 2010) showed that the selected set of model parameters is the most sensitive and suitable calibrating WaSiM for the modelling of flood events in fast responding catchments.These results were confirmed by preliminary sensitivity analyses with WaSiM in the Rietholzbach catchment (Cullmann, 2006;Seifert, 2010).All calibration parameters with their a priori distribution are clearly summarised in  possible flood characteristics occurring in the catchment is well covered.An exception was made in that we did not use any winter events to avoid a blurring of the results due to the dynamics of snow accumulation and melt.For further details and a more comprehensive overview refer to Krauße and Cullmann (2011a).We used five flood events with different characteristics 5 for calibration, and the estimated parameter vectors were validated on further 19 flood events.
We applied MO-ROPE using the given setup in order to estimate a set of robust model parameter vectors for the modelling of flood events using the two objectives rPD and NS.The Pareto set was estimated using the extended AMALGAM * framework.We checked the convergence of the algorithm using the development of the hypervolume dominated by the so far estimated non-dominated Pareto front in the consecutive generations as given in Fig. 12.It is evident that the hypervolume reaches stationary after approximately 80 generations.With a population size of 50, this corresponds to 4000 function evaluations.We decided to consider an additional buffer and thus set the maximum number of function evaluations to 6000.The trade-off surface for both objectives rPD and NS on the basis of all parameter vectors evaluated by AMALGAM * and the estimated approximation of the Pareto optimal set Ã is given in Fig. 11.The evaluation of the model performances corresponding to the 5 We considered flood events caused by convective and advective precipitation, different peak flow values and events with single and multiple flood crests.

Krauße et al.: Robust multi-objective calibration strategies 19
and Chahinian, 2009;Grundmann, 2010).The goal of this case study is thus to study the advantages of the developed MO-ROPE strategy for the calibration of a process-oriented hydrologic model focussing on flood events with respect to multiple objectives.

Calibrating WaSiM for flood events using two objectives
In a first case study we calibrated WaSiM using two objectives.The relative peak flow deviation (rPD) quantifies the agreement between observed and simulated peak flow value whereas the coefficient of efficiency according to Nash and Sutcliffe (1970) (NS) assesses the global fit of the observed and simulated hydrograph.According to our experience with WaSiM both criteria have a small correlation and are most suitable to quantify the model performance focussing on flood events.In a further case study we additionally considered the correlation between the simulated soil moisture and the observed soil moisture in terms of the lysimeter weight in the catchment.An overview over objective criteria referred to in the following case studies is provided in Table 13.The model parameters considered for calibration are the conceptual model parameters k d , k i , dr and k rec .k d and k i are storage coefficients that control the outflow of linear storage events that collect the generated direct runoff and interflow whereas dr is a scalar value that controls the generation of interflow.The conceptual soil parameter k rec determines the decrease of the saturated soil conductivity with increasing depth.Furthermore, we considered the soil hydraulic parameters of the dominating soils SL and SiL as additional calibration parameters.A reasonable a priori estimation of the distribution of the soil hydraulic parameters and a scaling of these values to one scaling parameter per soil (β SL and β SiL ) was done using an approach presented in Grundmann (2010).Previous studies with WaSiM focussing on flood events (e.g.Cullmann, 2006;Grundmann, 2010) showed that the selected set of model parameters are the most sensitive and suitable calibrating WaSiM for the modelling of flood events in fast responding catchments.These results were confirmed by preliminary sensitivity analyses with WaSiM in the Rietholzbach catchment (Cullmann, 2006;Seifert, 2010).
All calibration parameters with their a priori distribution are clearly summarised in Table 12.We used five flood events with different characteristics 4 for calibration and the estimated parameter vectors were validated on further 19 flood events.
We applied MO-ROPE using the given setup in order to estimate a set of robust model parameter vectors for the modelling of flood events using the two objectives rPD and NS.The Pareto set was estimated using the extended AMALGAM * framework.We checked the convergence of 4 We considered flood events caused by convective and advective precipitation, different peak flow values and events with single and multiple flood crests.the algorithm using the development of the hypervolume dominated by the so far estimated non-dominated Pareto front in the consecutive generations as given in Fig. 12.It is evident that the hypervolume reaches is stationary after approximately 80 generations.With a population size of 50 this corresponds to 4.000 function evaluations.We decided to consider an additional buffer and thus set the maximum number of function evaluations to 6.000.The trade-off surface for both objectives rPD and NS on the basis of all parameter vectors evaluated by AMALGAM * and the estimated approximation of the Pareto optimal set Ã is given Fig. 11.Evolution of the trade-off surface of the WaSiM parameter vectors in the two-dimensional objective space.The grey dots represent the whole set of parameter vectors that were evaluated during the approximation of the Pareto optimal set Ã, indicated by red dots.
case study is thus to study the advantages of the developed MO-ROPE strategy for the calibration of a process-oriented hydrologic model focussing on flood events with respect to multiple objectives.

Calibrating WaSiM for flood events using two objectives
In a first case study we calibrated WaSiM using two objectives.The relative peak flow deviation (rPD) quantifies the agreement between observed and simulated peak flow value whereas the coefficient of efficiency according to Nash and Sutcliffe (1970) (NS) assesses the global fit of the observed and simulated hydrograph.According to our experience with WaSiM both criteria have a small correlation and are most suitable to quantify the model performance focussing on flood events.In a further case study we additionally considered the correlation between the simulated soil moisture and the observed soil moisture in terms of the lysimeter weight in the catchment.An overview over objective criteria referred to in the following case studies is provided in Table 13.The model parameters considered for calibration are the conceptual model parameters k d , k i , dr and k rec .k d and k i are storage coefficients that control the outflow of linear storage events that collect the generated direct runoff and interflow whereas dr is a scalar value that controls the generation of interflow.The conceptual soil parameter k rec determines the decrease of the saturated soil conductivity with increasing depth.Furthermore, we considered the soil hydraulic parameters of the dominating soils SL and SiL as additional calibration parameters.A reasonable a priori estimation of the distribution of the soil hydraulic parameters and a scaling of these values to one scaling parameter per soil (β SL and β SiL ) was done using an approach presented in Grundmann (2010).Previous studies with WaSiM focussing on flood events (e.g.Cullmann, 2006;Grundmann, 2010) showed that the selected set of model parameters are the most sensitive and suitable calibrating WaSiM for the modelling of flood events in fast responding catchments.These results were confirmed by preliminary sensitivity analyses with WaSiM in the Rietholzbach catchment (Cullmann, 2006;Seifert, 2010).
All calibration parameters with their a priori distribution are clearly summarised in Table 12.We used five flood events with different characteristics 4 for calibration and the estimated parameter vectors were validated on further 19 flood events.
We applied MO-ROPE using the given setup in order to estimate a set of robust model parameter vectors for the modelling of flood events using the two objectives rPD and NS.The Pareto set was estimated using the extended AMALGAM * framework.We checked the convergence of 4 We considered flood events caused by convective and advective precipitation, different peak flow values and events with single and multiple flood crests.the algorithm using the development of the hypervolume dominated by the so far estimated non-dominated Pareto front in the consecutive generations as given in Fig. 12.It is evident that the hypervolume reaches is stationary after approximately 80 generations.With a population size of 50 this corresponds to 4.000 function evaluations.We decided to consider an additional buffer and thus set the maximum number of function evaluations to 6.000.The trade-off surface for both objectives rPD and NS on the basis of all parameter vectors evaluated by AMALGAM * and the estimated approximation of the Pareto optimal set Ã is given estimated parameter vectors shows a clear tradeoff between both used objectives.That means that the best parameter vectors with respect to the peak flow value cannot represent the global behaviour of the catchment for flood events equally well.The following final step, the sampling of deep parameter vectors, was done using the GenDeep strategy.We compared the results of the estimated approximation of Pareto set Ã and the deep parameter vectors D Ã. Additionally, we compared this set with the parameter vectors determined by the single-objective robust parameter estimation algorithm ROPE-PSO, using both the singular objectives rPD and NS, and an aggregated criterion, called FloodSkill (see Table 13), as objective functions.
The estimated Pareto set and the sampled deep parameter vectors in the parameter space are provided in Fig. 13   Mean ± Std 3.82 ± 0.87 50.5 ± 6.22 48.5 ± 6.73 0.50 ± 0.30 0.31 ± 0.17 0.20 ± 0.17 dr is forced to higher values (more interflow generation), the soils are adjusted more conductive (higher values of the scaling parameters β SL and β SiL ) in all soil layers (higher values of k rec ).Furthermore the variance of these parameters gets smaller as several flow components with different temporal dynamics have to represented by just one component the interflow.On the other hand the estimates for the efficiency criterion NS show in general a larger spread.The NS criterion quantifies the agreement of the observed and simulated catchment behaviour in a global way with an overweighting of the time steps with higher streamflow.A calibration with this criterion does not necessary force the model to its limits and can neglect a part of the fast runoff components in order to achieve a high NS value.The corresponding peak flow deviations are higher but the flow components can be represented by more model components and as a consequence the variance of the estimated parameter distributions gets larger.The parameter distributions estimated by the multi-objective MO-ROPE approach have mean values that are in-between the mean values of the single objective calibrations using the rPD and the aggregated compromise criterion FloodSkill.In comparison with the single-objective calibration runs, the use of the multi-objective MO-ROPE approach obtains tighter variation intervals.These results confirm the outcome of previous studies dealing with the multi-objective calibration of conceptual hydrologic models focussing on flood events (e.g.Engeland et al., 2006;Moussa and Chahinian, 2009).
A comparison of the complete Pareto set Ã, its hull H Ã (thus the points with shallow depth) and the sampled deep parameter vectors D Ã in the objective space for both the calibration and the validation data is provided in Fig. 14 (a   Mean ± Std 3.82 ± 0.87 50.5 ± 6.22 48.5 ± 6.73 0.50 ± 0.30 0.31 ± 0.17 0.20 ± 0.17 dr is forced to higher values (more interflow generation), the soils are adjusted more conductive (higher values of the scaling parameters β SL and β SiL ) in all soil layers (higher values of k rec ).Furthermore the variance of these parameters gets smaller as several flow components with different temporal dynamics have to represented by just one component the interflow.On the other hand the estimates for the efficiency criterion NS show in general a larger spread.The NS criterion quantifies the agreement of the observed and simulated catchment behaviour in a global way with an overweighting of the time steps with higher streamflow.A calibration with this criterion does not necessary force the model to its limits and can neglect a part of the fast runoff components in order to achieve a high NS value.The corresponding peak flow deviations are higher but the flow components can be represented by more model components and as a consequence the variance of the estimated parameter distributions gets larger.The parameter distributions estimated by the multi-objective MO-ROPE approach have mean values that are in-between the mean values of the single objective calibrations using the rPD and the aggregated compromise criterion FloodSkill.In comparison with the single-objective calibration runs, the use of the multi-objective MO-ROPE approach obtains tighter variation intervals.These results confirm the outcome of previous studies dealing with the multi-objective calibration of conceptual hydrologic models focussing on flood events (e.g.Engeland et al., 2006;Moussa and Chahinian, 2009).
A comparison of the complete Pareto set Ã, its hull H Ã (thus the points with shallow depth) and the sampled deep parameter vectors D Ã in the objective space for both the calibration and the validation data is provided in Fig. 14    Mean ± Std 3.82 ± 0.87 50.5 ± 6.22 48.5 ± 6.73 0.50 ± 0.30 0.31 ± 0.17 0.20 ± 0.17 dr is forced to higher values (more interflow generation), the soils are adjusted more conductive (higher values of the scaling parameters β SL and β SiL ) in all soil layers (higher values of k rec ).Furthermore the variance of these parameters gets smaller as several flow components with different temporal dynamics have to represented by just one component the interflow.On the other hand the estimates for the efficiency criterion NS show in general a larger spread.The NS criterion quantifies the agreement of the observed and simulated catchment behaviour in a global way with an overweighting of the time steps with higher streamflow.A calibration with this criterion does not necessary force the model to its limits and can neglect a part of the fast runoff components in order to achieve a high NS value.The corresponding peak flow deviations are higher but the flow components can be represented by more model components and as a consequence the variance of the estimated parameter distributions gets larger.The parameter distributions estimated by the multi-objective MO-ROPE approach have mean values that are in-between the mean values of the single objective calibrations using the rPD and the aggregated compromise criterion FloodSkill.In comparison with the single-objective calibration runs, the use of the multi-objective MO-ROPE approach obtains tighter variation intervals.These results confirm the outcome of previous studies dealing with the multi-objective calibration of conceptual hydrologic models focussing on flood events (e.g.Engeland et al., 2006;Moussa and Chahinian, 2009).
A comparison of the complete Pareto set Ã, its hull H Ã (thus the points with shallow depth) and the sampled deep parameter vectors D Ã in the objective space for both the calibration and the validation data is provided in Fig. 14    Mean ± Std 3.82 ± 0.87 50.5 ± 6.22 48.5 ± 6.73 0.50 ± 0.30 0.31 ± 0.17 0.20 ± 0.17 dr is forced to higher values (more interflow generation), the soils are adjusted more conductive (higher values of the scaling parameters β SL and β SiL ) in all soil layers (higher values of k rec ).Furthermore the variance of these parameters gets smaller as several flow components with different temporal dynamics have to represented by just one component the interflow.On the other hand the estimates for the efficiency criterion NS show in general a larger spread.The NS criterion quantifies the agreement of the observed and simulated catchment behaviour in a global way with an overweighting of the time steps with higher streamflow.A calibration with this criterion does not necessary force the model to its limits and can neglect a part of the fast runoff components in order to achieve a high NS value.The corresponding peak flow deviations are higher but the flow components can be represented by more model components and as a consequence the variance of the estimated parameter distributions gets larger.The parameter distributions estimated by the multi-objective MO-ROPE approach have mean values that are in-between the mean values of the single objective calibrations using the rPD and the aggregated compromise criterion FloodSkill.In comparison with the single-objective calibration runs, the use of the multi-objective MO-ROPE approach obtains tighter variation intervals.These results confirm the outcome of previous studies dealing with the multi-objective calibration of conceptual hydrologic models focussing on flood events (e.g.Engeland et al., 2006;Moussa and Chahinian, 2009).
A comparison of the complete Pareto set Ã, its hull H Ã (thus the points with shallow depth) and the sampled deep parameter vectors D Ã in the objective space for both the calibration and the validation data is provided in Fig. 14  of shape of the complete set.A comparison of the estimated sets of robust parameter vectors for both the MO-ROPE run as well as the single-objective ROPE-PSO is given in Table 14.It provides the distributions of the estimated robust parameter vectors and some basic statistical properties, i.e. the mean value, the standard deviation and their range.Obviously, there is an strong dependence of the parameters k d and k rec on the used performance criterion.The more a correct representation of the observed peak flow value is measured by the used objective criterion, the lower the value of k d and the higher the value of k rec .In general these results are reasonable and consistent with the model structure and the corresponding understanding of the hydrologic system.Lower values of k d increase the dynamics of the generated direct runoff; a higher value of k rec decreases the effective saturated conductivity of deeper soil layers.This leads to a faster generation of direct runoff.Within the Rietholzbach catchment, direct runoff on the surface has hardly ever been observed, not even during large and intensive convective storm events.However, there are many underground pipes that drain the slopes.The effect of these drainage system becomes significantly large during times of intensive precipitation.
The estimated parameters indicate that the drainage flow can be effectively represented by the direct runoff component in WaSiM.In terms of the spread of the parameters, there exists an in general strong relationship between the used performance criterion and the spread of the parameters.Typically, the variance of the estimated parameter distributions is smaller the more the used performance criterion quantifies the peak flow deviation.As explained above, it is necessary to force the direct runoff storage coefficient k d to extremely low values in order to represent the fast runoff components with very high dynamics.As a consequence, most of the remaining fast and intermediate runoff components have to be represented by the interflow component.Hence, the storage coefficient of the interflow k i is also forced to relatively small values, the value of the drainage "dr" forced to higher values (more interflow generation), and the soils adjusted more conductively (higher values of the scaling parameters β SL and β SiL ) in all soil layers (higher values of k rec ).Furthermore, the variance of these parameters becomes smaller as several flow components with different temporal dynamics have to represented by just one component of the interflow.
On the other hand, the estimates for the efficiency criterion NS show in general a larger spread.The NS criterion quantifies the agreement of the observed and simulated catchment behaviour in a global way with an overweighting of the time steps with higher streamflow.A calibration with this  in Fig. 11.The evaluation of the model performances corresponding to the estimated parameter vectors show a clear tradeoff between both used objectives.That means that the best parameter vectors with respect to the peak flow value cannot represent the global behaviour of the catchment for flood events equally well.The following final step, the sampling of deep parameter vectors was done using the GenDeep strategy.We compared the results of the estimated approximation of Pareto set Ã and the deep parameter vectors D Ã. Additionally we compared this set with the parameter vectors determined by the single-objective robust parameter estimation algorithm ROPE-PSO, using both the singular objectives rPD and NS, and an aggregated criterion, called FloodSkill (see Table 13) as objective functions.
The estimated Pareto set and the sampled deep parameter vectors in the parameter space are provided in Figure 13.The Pareto set is almost convex which is advantageous for the depth based sampling.The deep parameter vectors cover the central region of the set and its convex hull is indicative of the shape of the complete set.A comparison of the estimated sets of robust parameter vectors for both the MO-ROPE run as well as the single-objective ROPE-PSO is given in Table 4.2.It provides the distributions of the estimated robust parameter vectors and some basic statistical properties, i.e. the mean value, the standard deviation peak flow value is measured by the used objective criterion, the lower the value of k d and the higher the value of k rec .In general these results are reasonable and consistent with the model structure and the corresponding understanding of the hydrological system.Lower values of k d increase the dynamics of the generated direct runoff, a higher value of k rec decreases the effective saturated conductivity of deeper soil layers.This leads to a faster generation of direct runoff.Within the Rietholzbach catchment direct runoff on the surface has hardly ever been observed, even not during large and intensive convective storm events.However, there are many underground pipes that drain the slopes.The effect of these drainage system gets significantly large during times of intensive precipitation.The estimated parameters indicate that the drainage flow can be effectively represented by the direct runoff component in WaSiM.In terms of the spread of the parameters there exists an in general strong relationship between the used performance criterion and the spread of the parameters.Typically, the variance of the estimated parameter distributions is smaller the more the used performance criterion quantifies the peak flow deviation.As explained above, it is necessary to force the direct runoff storage coefficient k d to extremely low values in order to represent the fast runoff components with very high dynamics.As a consequence, most of the remaining fast and intermediate runoff criterion does not necessary force the model to its limits and can neglect a part of the fast runoff components in order to achieve a high NS value.The corresponding peak flow deviations are higher, but the flow components can be represented by more model components and as a consequence the variance of the estimated parameter distributions becomes larger.The parameter distributions estimated by the multi-objective MO-ROPE approach have mean values that are in between the mean values of the single objective calibrations using the rPD and the aggregated compromise criterion FloodSkill.In comparison with the single-objective calibration runs, the use of the multi-objective MO-ROPE approach obtains tighter variation intervals.These results confirm the outcome of previous studies dealing with the multi-objective calibration of conceptual hydrologic models focussing on flood events (e.g.Engeland et al., 2006;Moussa and Chahinian, 2009).
A comparison of the complete Pareto set Ã, its hull H Ã (thus the points with shallow depth) and the sampled deep parameter vectors D Ã in the objective space for both the calibration and the validation data is provided in Fig. 14a and  b.These figures illustrate the advantage of the depth-based sampling.In the objective space for the calibration data, the sampled deep parameter vectors are obviously concentrated in a central part of the Pareto front, whereas the parameter vectors in the hull are distributed over all parts of the Pareto front with high density in the area of the tails and low density in the central part of the front.The sampled deep parameter vectors show a better performance on the validation data than the complete Pareto set.The deep parameter vectors have less outliers with a worse performance and are a good approximation of the (theoretical) Pareto front in the objective space based on the validation data.The distribution of the deep parameter vectors suggests that the tails of the Pareto front estimated in the calibration are not required for a robust set of model parameter vectors.For example, the best parameter vectors with respect to rPD on the calibration data do not have a better rPD in the validation than the sampled deep parameter vectors.However, these vectors correspond to clearly worse NS values.This shows that the deep parameter vectors are better transferable to other periods and events and thus more robust.Notwithstanding the fact that several shallow parameter vectors also correspond to central parts of the Pareto front, we compared the deep parameter vectors with the results of an approach using subjective cutoff thresholds to select "behaviourial" and reject "non-behavioural" solutions as proposed by Efstratiadis and Koutsoyiannis (2010).We simply used cutoff thresholds defined by the boundaries of the model performances of the deep parameter vectors on the calibration data.The result is given in Fig. 14c  less outliers with a worse performance and are a good approximation of the (theoretical) Pareto front in the objective space based on the validation data.The distribution of the deep parameter vectors suggests that the tails of the Pareto front estimated in the calibration are not required for a robust set of model parameter vectors.For example the best parameter vectors with respect to rPD on the calibration data do not have a better rPD in the validation than the sampled deep parameter vectors.However these vectors correspond to clearly worse NS values.This shows that the deep parameter vectors are better transferable to other periods and events and thus more robust.Notwithstanding the fact that several shallow parameter vectors also correspond to central parts of the Pareto front, we compared the deep parameter vectors with the results of an approach using subjective cut-off thresholds to select "behaviourial" and reject "non-behavioural" solutions as proposed by Efstratiadis and Koutsoyiannis (2010).
We simply used cutoff-thresholds defined by the boundaries of the model performances of the deep parameter vectors on the calibration data.The result is given in Fig. 14 (c).It is evident that the behaviourial parameter vectors identified by the cut-off thresholds are better suitable to approximate the (theoretical) Pareto front in the objective space based on the validation data than the complete Pareto set identified for the calibration events.Nonetheless the deep parameter vectors provide a closer, denser and more uniform approximation of the Pareto front on the validation data.Moreover the depth based sampling does not require any assumptions regarding the threshold values5 and additionally provide the  of the FloodSkill criterion7 and its data depth with respect to the approximated Pareto set Ã.The parameter vectors with high data depth show a on average better model performance (higher FloodSkill values) with less variance than the parameter vectors with low data depth.This implies that parameter vectors with low data depth are more likely to be an outlier with bad model performance.The estimated results confirm the underlying assumption of the ROPE approach for multiobjective calibration tasks.
The model performances of the parameter vectors on the validation events estimated by both the multi-objective MO-ROPE and the single-objective robust parameter estimation runs are provided in Fig. 16.These results once again illustrate the evident tradeoff between the two criteria rPD and NS.The parameter vectors estimated by the single-objective ROPE-PSO considering just one criterion (rPD or NS) show indeed a good validation performance on those criteria.However, the validation performance in terms of the complementary performance criterion that is not considered for calibration is not sufficient for a robust modelling of flood events.For instance, the NS values for the ROPE-PSO estimates using the rPD as calibration objective are in the range of approximately 0.2-0.4,which is not acceptable, whereas the NS estimates correspond to peak flow deviations of 30-50 %, which is not sufficient either.Also the aggregated FloodSkill criterion just slightly improves the results.It still overweighs the NS criteria, and the parameter vectors estimated using the FloodSkill criterion do not fully exploit the model's abilities    in terms of the peak flow deviation.The MO-ROPE results however provide a good model performance on the validation data with respect to a good global representation of the catchment behaviour and a good modelling of the peak flow values.This confirms the assumption that the use of an advanced multi-objective calibration technique can improve the performance of hydrologic models used for the simulation of flood events.The corresponding hydrographs of three validation events and the corresponding parameter and model uncertainties for both the robust multi-objective estimates and the single-objective estimates, using the aggregated FloodSkill criterion as a kind of compromise solution, are given in Fig. 17.The complete model uncertainty was computed by two normal distributions fitted on the positive and negative discharge errors, transformed with the normal quantile transformation (NQT) (Krzysztofowicz, 1997) according to a method presented by Engeland et al. (2010).Contrary to Engeland et al. (2010), we considered observations with a discharge greater than or equal to 0.33 mm h −1 to account for the focus on the correct simulation of higher flow values.The hydrographs confirm the results of the previous analysis of the model performances.The peak flow values are better represented by the parameter vectors estimated by the multi-objective calibration.The confidence band of the parameter uncertainty is slightly smaller for the multiobjective calibration than for the single-objective calibration.The variation intervals of the resulting complete model uncertainty are approximately the same for both approaches.
The remaining uncertainty is due to the discussed shortcomings of the model structure and other neglected uncertainties in the observations.spatially distributed soil moisture measurements can help to improve the performance of hydrologic models used for flood forecasting (e.g.Oudin et al., 2003).The Rietholzbach catchment has been intensively monitored as a scientific research catchment since 1975.Lysimeter measurements deliver the necessary soil moisture information on hourly basis.Due to the small size of the basin and the location of the lysimeter station in the centre of the basin situated on grassland, which is the major land use in the catchment, the measurements are on average representative for the whole catchment for the long term (Gurtz et al., 2003).Unfortu-nately, there are no other soil moisture measurements that have a substantial length and no other significant studies for the catchment that examine the soil moisture dynamics for storm events in a more distributed way.Therefore, we studied the temporal dynamics of the soil moisture measurements at the lysimeter station and found a relatively strong correlation between the lysimeter measurements and the simulated soil moisture in the upper layers up to a soil depth of 0.5 m also during periods with high streamflow values, i.e. flood events (Müller, 2009).Hence, we chose the correlation between observed and simulated soil moisture as an objective.We call this performance criterion soil moisture correlation coefficient and denote it by r Θ .Its definition is given in the already referenced Table 13.The underlying idea considering the soil moisture correlation coefficient as an additional objective is that parameter vectors that do not only correspond to a good representation of catchment's behaviour in terms of the measured discharge but also the mean soil moisture dynamics are potentially better transferable to other flood events and can be used for extrapolation, i.e. they are potentially more robust.The r Θ has a relatively low correlation with both objectives that are already taken into account and should thus provide additional information for the parameter estimation.Nonetheless this is no guarantee that this will really improve the calibration results as "the possibility of operational use of the variational method depends heavily on the availability and quality of the datasets, especially regarding soil moisture data" (Oudin et al., 2003, p. 685).Doubts about the effectivity of the available soil moisture information are for instance reinforced by a further evaluation of the results estimated in the previous case study.Fig. 18 provides information about the dependance between r Θ and the data depth with respect to the estimated Pareto set.Obviously the deep parameter vectors in D Ã that are robust in terms of the objectives rPD and NS provide no better correlation between observed and simulated soil moisture.additional objective.We call this performance criterion soil moisture correlation coefficient and denote it by r .Its definition is given in the already referenced Table 13.The underlying idea considering the soil moisture correlation coefficient as an additional objective is that parameter vectors do not only correspond to a good representation of catchment's behaviour in terms of the measured discharge, but also the mean soil moisture dynamics are potentially better transferable to other flood events and can be used for extrapolation, i.e. they are potentially more robust.The r has a relatively low correlation with both objectives that are already taken into account and should thus provide additional information for the parameter estimation.Nonetheless, this is no guarantee that this will really improve the calibration results as "the possibility of operational use of the variational method depends heavily on the availability and quality of the datasets, especially regarding soil moisture data" (Oudin et al., 2003, p. 685).Doubts about the effectivity of the available soil moisture information are for instance reinforced by a further evaluation of the results estimated in the previous case study.Figure 18 provides information about the dependance between r and the data depth with respect to the estimated Pareto set.Obviously, the deep parameter vectors in D Ã that are robust in terms of the objectives rPD and NS provide no better correlation between observed and simulated soil moisture.We repeated the calibration of WaSiM using the MO-ROPE algorithm using the three introduced objectives rPD, NS and r .All settings of the model and calibration setup remain the same as in the previous case study.In two runs robust parameter vectors were estimated in terms of all three objectives and just the rPD and NS.From now on we refer to both calibration runs as 2-objective and 3-objective calibration.The distributions of the estimated parameter vec- to 0.29 for the 2-objective calibration.In terms of the NS values the results are comparable.The 3-objective estimates correspond to NS values in the range of 0.51-0.67instead of 0.47-0.67 for the 2-objective calibration.In brief, the soil moisture measurements available in the Rietholzbach catchment are no suitable additional criterion for the calibration of the hydrologic model WaSiM focussing on flood events.The negligibly small improvement in terms of the representation of the soil moisture dynamics and the global NS criterion bring a slight drop of the model performances focussing on    tors for both runs are provided in Table 4.2.Obviously, the results for the 2-objective calibration are the same as those of the previous case study.For the calibration run using all three objectives the distribution of the most sensitive parameter k d tends to result in smaller values than for the calibration run considering just the objectives rPD and NS.The distribution of the parameter k i has approximately the same mean and variance but a significantly larger negative skewness.The mean value of the parameter "dr" is approximately the same with a significantly lower variance.The same holds for the soil parameters β SL and β SiL .An obvious difference can be observed in the distribution of the parameter k rec .The mean value is significantly lower with a higher variance.However, the correlation between the parameter k rec and the other model parameters is fundamentally different for the 3objective calibration run.

Calibrating
The corresponding model performances on the calibration and validation data for both the 2-objective calibration and the 3-objective calibration considering the soil moisture dynamics are provided in Fig. 19.The scatter plots show the rPD and the NS for the approximated Pareto optimal set Ã and the subsequently sampled deep parameter vectors D Ã.
Evaluating the Pareto front of the 3-objective problem, consider that the Pareto front for this problem is actually a curved 3-dimensional face.Hence, the 2-dimensional plot of this 3dimensional front is not a non-dominated front with respect to just 2 objectives.The calibration results for the 3-objective calibration are the same in terms of the best values for the individual objectives.However, the variation intervals for the two objectives rPD and NS are clearly larger.The NS reaches values down to 0.5 instead of 0.63 and the rPD goes up to values of 0.28 instead of 0.22.The differences between the two calibration runs in terms of the criterion r are just 0.015.Referring to the validation results, the improvements of the 3-objective calibration in terms of the correlation between the observed and simulated soil moisture are again negligibly small.An increase of the correlation coefficient of just one hundredths is no substantial improvement.However, the validation performance in terms of the criteria assessing a good representation of the observed runoff is slightly worse, particularly for the rPD.For the 3-objective calibration, the rPD values on the validation events are approximately in the range of 0.24 up to 0.34, whereas they reach from 0.19 up to 0.29 for the 2-objective calibration.In terms of the NS values, the results are comparable.The 3-objective estimates correspond to NS values in the range of 0.51-0.67instead of 0.47-0.67 for the 2-objective calibration.In brief, the soil moisture measurements available in the Rietholzbach catchment are no suitable additional criterion for the calibration of the hydrologic model WaSiM focussing on flood events.The negligibly small improvement in terms of the representation of the soil moisture dynamics and the global NS criterion brings a slight drop of the model performances focussing on an exact representation of the streamflow values.One possible explanation for the disappointingly small improvement might be the shortcomings in the model structure, already discussed in the previous case study.The soils of the Rietholzbach catchment are characterised by many macropores and some smaller drainages.This induces the generation of preferential flow which is a very fast runoff component.However, the water movement in the unsaturated zone is described by the model in terms of the Richards equation that accounts just for the matrix flow in the soil but cannot represent the preferential flow component natively.Thus, the fast runoff components have to be represented by the direct runoff component.The identified parameter vectors provide a good representation of the catchment's behaviour in terms of the discharge at the outlet.In such a situation an additional criterion that requires a reasonable representation of the observed soil moisture dynamics can even be counterproductive.Another possible explanation for the expected improvements that have not been achieved is the question of whether the soil moisture measurements are representative for the soil moisture dynamics in the whole catchment.Recent studies using soil moisture data for the improvement of hydrologic modelling emphasise the importance of a good quality of these datasets (e.g Oudin et al., 2003;Norbiato et al., 2008).Focussing on flood events, this requires not only a good accuracy and temporal resolution of the soil moisture information but also many spatially distributed measurements.Unfortunately, these measurements are not available in the Rietholzbach catchment.We suggest further studies in operational flood forecasting studies for fast responding medium-scale catchments where such information is available.Although this case study did not fully succeed, it confirms the advantages of the depth-based sampling.The sampled deep parameters provide a robust approximation of the Pareto front with tighter variation intervals in terms of all considered objectives.Furthermore, it underlines that a successful robust modelling does not require just an advanced parameter estimation procedure but also the selection of a as parsimonious as possible model structure, representative calibration data and appropriate calibration objectives.The combination of multi-objective optimisation and depth-based parameter sampling can be a good tool to obtain robust model parameters.On its own, the depth-based sampling is however not sufficient to achieve robustness.algorithms.In a second step, we apply the concept of data depth for the sampling of robust model parameter vectors with respect of the identified Pareto optimal set.We study the efficiency and effectiveness of the developed solution by means of a set of synthetical benchmarks and the calibration of a process-oriented hydrologic model focussing on flood events.
-A set of synthetical benchmarks studies the efficiency and effectiveness of the AMALGAM framework extended by the newly developed MO-PSO-GA algorithm in order to estimate the Pareto optimal set for a given (constrained) multi-objective optimisation problem.For the given test problems, the developed framework proved its reliability and efficiency in comparison with other established approaches.The new MO-PSO-GA strategy is a useful search strategy that is able to outperform approved single-strategy multi-objective optimisation algorithms.As a consequence within the frame of the AMALGAM framework, it improves the effectivity and efficiency of this approach using four approved search strategies.
-Another case study shows the effectivity of the depthbased sampling approach for several multi-objective test problems that are subject to uncertainty.The sampled deep parameter vectors are less sensitive to small changes with respect to their ability to provide a robust approximation of the set of the possible Pareto fronts in the objective space considering uncertainties.
-In a real world case study, we compared the multiobjective MO-ROPE approach with the single-objective robust parameter estimation approach ROPE-PSO estimating three conceptual model parameters and three soil parameters of the hydrologic model WaSiM in the small Swiss research catchment Rietholzbach focussing on flood events.Previous studies have already shown that the model has problems to represent the global catchment behaviour and the peak flow values equally well.This applies in particular to small and fast responding catchments.The tradeoff between the representation of the peak flow values and the global catchment behaviour suggests the application of a multiobjective calibration strategy.The results of this study show that the application of MO-ROPE is a preferable option in order to identify robust solutions for such calibration problems.We showed that the parameter vectors in the approximated Pareto optimal set on the calibration data can lead to very different results in validation.
The sampled parameters with high depth show a robust performance with less negative outliers.They represent a robust approximation of the (theoretical) Pareto set on the validation data.
-In the scope of a second application, we studied the calibration of WaSiM considering three objectives.Besides the peak flow deviation and the Nash-Sutcliffe efficiency criterion, we used the agreement of the observed and simulated soil moisture dynamics for the estimation of robust parameter vectors for the modelling of flood events.Although the deep parameter vectors show once again the advantages of the depth-based sampling, the improvements in comparison with the calibration using just two objectives are negligibly small.This is due to shortcomings in the model structure and a limited significance of the soil moisture measurements at only one single spot.We strongly propose similar applications at larger scale with a sufficient set of spatially distributed measurements.This case study underlines that the depth-based parameter sampling can be a very useful technique for the identification of robust parameter vectors for multi-objective calibration problems.
-This paper introduced a new multi-objective method for the estimation of robust parameter vectors the can improve the results of classical multi-objective optimisation techniques.Despite all of the presented benefits, it must not be forgotten that the estimation of robust parameter vectors cannot be reduced to the development of robust parameter estimation methods.The shortcomings of all kind of calibration techniques in the last case study clearly show necessity that more effort should also go into finding better process descriptions resulting in hydrologic models that are able to represent catchments in a better way.Furthermore, better methods have to be developed to improve the identification of representative calibration data, i.e. the basis of a subsequent model calibration.The achievement of robustness requires the combination of improvements in all of these areas.
In this paper, the estimation of the Pareto optimal sets was done using an extended version of AMALGAM approach.Consider that one might substitute this component with any suitable multi-objective optimisation technique.The application of the technique of depth-based sampling is a relatively new method which was applied to a limited number of case studies.We strongly propose its application to further models, catchments and also other fields of study where measurement errors with unknown distribution and model structures that cannot be easily identified are present.

θFig. 2 .
Fig. 2. Illustration of the halfspace depth in the two-dimensiona space.The point θ has depth 2 with respect to the given point set.

Fig. 2 .
Fig. 2. Illustration of the halfspace depth in the two-dimensional space.The point θ has depth 2 with respect to the given point set.

Fig. 3 .
Fig. 3. True Pareto front and the non-dominated front estimated by MO-PSO-GA using Coello's Function.

Fig. 3 .
Fig. 3. True Pareto front and the non-dominated front estimated by MO-PSO-GA using Coello's function.

Fig. 4 .Fig. 5 .
Fig. 4. Scatterplots for the rank 1 points (blue dots) for 50 evaluation generations estimated by MO-PSO-GA with a population size of (a) 10, (b) 20, (c) 50, and (d) 100 respectively.The boundaries of the triangle enclosing the true front are the blue dotdashed blue lines.

Fig. 4 .NoFig. 4 .Fig. 5 .
Fig. 4. Scatter plots for the rank 1 points (blue dots) for 50 evaluation generations estimated by MO-PSO-GA with a population size of (a) 10, (b) 20, (c) 50, and (d) 100 respectively.The boundaries of the triangle enclosing the true front are the blue dot-dashed blue lines.

Fig. 5 .
Fig. 5. Scatter plots for the rank 1 points (blue dots) for 50 evaluation generations estimated by AMALGAM with a population size of (a) 10, (b) 20, (c) 50, and (d) 100 respectively.The boundaries of the triangle enclosing the true front are the blue dot-dashed blue lines.

Fig. 6 .
Fig. 6.Scatter plot of the approximated Pareto set for the problem VRUGT 2 in the 2-dimensional parameter space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 6 .
Fig. 6.Scatter plot of the approximated Pareto set for the problem VRUGT 2 in the 2-dimensional parameter space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 7 .
Fig. 7. Scatter plot of the approximated Pareto set for the problem VRUGT2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 8 .
Fig. 8. Contour plot indicating the distribution of the actual Pareto fronts for the problem VRUGT u 2 .w uncertainty.catter plot of the approximated Pareto set for the problem VRUGT2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 9 .
Fig. 9. Distribution of the diversity metric ∆ for the approximated Pareto set for the problem VRUGT2, its the shallow solutions (hull) are indicated by red and blue dots.

Fig. 7 .
Fig. 7. Scatter plot of the approximated Pareto set for the problem VRUGT 2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots. 15

Fig. 7 .
Fig. 7. Scatter plot of the approximated Pareto set for the problem VRUGT2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 8 .
Fig. 8. Contour plot indicating the distribution of the actual Pareto fronts for the problem VRUGT u 2 .w uncertainty.catter plot of the approximated Pareto set for the problem VRUGT2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 9 .
Fig. 9. Distribution of the diversity metric ∆ for the approximated Pareto set for the problem VRUGT2, its the shallow solutions (hull) are indicated by red and blue dots.

Fig. 8 .
Fig. 8. Scatter plot indicating the distribution of the actual Pareto fronts for the problem VRUGT u 2 considering uncertainty.Scatter plot of the approximated Pareto set for the problem VRUGT 2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 7 .
Fig. 7. Scatter plot of the approximated Pareto set for the problem VRUGT2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 8 .Fig. 9 .
Fig. 8. Contour plot indicating the distribution of the actual Pareto fronts for the problem VRUGT u 2 .w uncertainty.catter plot of the approximated Pareto set for the problem VRUGT2 in the 3-dimensional objective space; the deep and the shallow solutions (hull) are indicated by red and blue dots.

Fig. 9 .
Fig. 9. Distribution of the diversity metric for the approximated Pareto set for the problem VRUGT 2 ; its shallow solutions (hull) are indicated by red and blue bars.

Fig. 10 .
Fig. 10.Scheme of the WaSiM soil module with location of impact of soil hydraulic and conceptual model parameters (bold)

Fig. 10 .
Fig. 10.Scheme of the WaSiM soil module with location of impact of soil hydraulic and conceptual model parameters (bold).

Fig. 11 .Fig. 12 .
Fig. 11.Evolution of the trade-off surface of the WaSiM parameter vectors in the two-dimensional objective space.The grey dots represent the whole set of parameter vectors that were evaluated during the the identified approximation of the Pareto optimal set Ã, whereas the gray crosses represent other dominated parameter vectors that were evaluated by the algorithm.The best single-criteria solutions (NS and rPD respectively) are indicated by black dots.

Fig. 11 .Fig. 12 .
Fig. 11.Evolution of the trade-off surface of the WaSiM parameter vectors in the two-dimensional objective space.The grey dots represent the whole set of parameter vectors that were evaluated during the the identified approximation of the Pareto optimal set Ã, whereas the gray crosses represent other dominated parameter vectors that were evaluated by the algorithm.The best single-criteria solutions (NS and rPD respectively) are indicated by black dots.

Fig. 12 .
Fig. 12. Illustration of the development of the hypervolume of the estimated non-dominated front as a function of the generation number; the hypervolume was calculated with respect to the point [1, 0] which corresponds to 100 % peak flow deviation and a Nash value of zero.
. The Pareto set is almost convex which is advantageous for the depth-based sampling.The deep parameter vectors cover the central region of the set, and its convex hull is indicative Hydrol.Earth Syst.Sci., 16, 3579-3606, 2012 www.hydrol-earth-syst-sci.net/16/3579/2012/ ) and (b).These figures illustrate the advantage of the depth based sampling.In the objective space for the calibration data the sampled deep parameter vectors are obviously concentrated in a central part of the Pareto front, whereas the parameter vectors in the hull are distributed over all parts of the Pareto front with high density in the area of the tails and low density in the central part of the front.The sampled deep parameter vectors show a better performance on the validation data than the complete Pareto set.The deep parameter vectors have (a) and (b).These figures illustrate the advantage of the depth based sampling.In the objective space for the calibration data the sampled deep parameter vectors are obviously concentrated in a central part of the Pareto front, whereas the parameter vectors in the hull are distributed over all parts of the Pareto front with high density in the area of the tails and low density in the central part of the front.The sampled deep parameter vectors show a better performance on the validation data than the complete Pareto set.The deep parameter vectors have (a) and (b).These figures illustrate the advantage of the depth based sampling.In the objective space for the calibration data the sampled deep parameter vectors are obviously concentrated in a central part of the Pareto front, whereas the parameter vectors in the hull are distributed over all parts of the Pareto front with high density in the area of the tails and low density in the central part of the front.The sampled deep parameter vectors show a better performance on the validation data than the complete Pareto set.The deep parameter vectors have (a) and (b).These figures illustrate the advantage of the depth based sampling.In the objective space for the calibration data the sampled deep parameter vectors are obviously concentrated in a central part of the Pareto front, whereas the parameter vectors in the hull are distributed over all parts of the Pareto front with high density in the area of the tails and low density in the central part of the front.The sampled deep parameter vectors show a better performance on the validation data than the complete Pareto set.The deep parameter vectors have

Fig. 13 .
Fig. 13.Distribution of the Pareto optimal set estimated by MO-ROPE (applying the AMALGAM * strategy) and the sampled deep parameter vectors for the WaSiM model using two objectives (rPD and NS).

Fig. 13 .
Fig. 13.Distribution of the Pareto optimal set estimated by MO-ROPE (applying the AMALGAM * strategy) and the sampled deep parameter vectors for the WaSiM model using two objectives (rPD and NS).

Fig. 14 .
Fig. 14.Comparison of the identified approximation of the Pareto optimal set Ã (gray dots) in the objective space spanned by the two objectives rPD and NS.The plots in the left column are computed on the calibration events whereas the right column shows the results on the validation data.Subplot (a) additionally emphasises the parameter vectors in the hull A Ã (indicated by blue dots), subplot (b) shows the sampled deep parameter vectors (indicated by red dots) and subplot (c) provides the results for a simple cut-off procedure with threshold values defined by the performance of the deep parameter vectors on the calibration data.The selected parameter vectors are indicated by yellow dots.

Fig. 14 .Fig. 15 .
Fig. 14.Comparison of the identified approximation of the Pareto optimal set Ã (grey dots) in the objective space spanned by the two objectives rPD and NS.The plots in the left column are computed on the calibration events, whereas the right column shows the results on the validation data.Subplot (a) additionally emphasises the parameter vectors in the hull Ã (indicated by blue dots); subplot (b) shows the sampled deep parameter vectors (indicated by red dots), and subplot (c) provides the results for a simple cut-off procedure with threshold values defined by the performance of the deep parameter vectors on the calibration data.The selected parameter vectors are indicated by green dots.

Fig. 15 .
Fig. 15.Correlation between data depth and overall validation performance in terms of the FloodSkill criterion of all parameter vectors in the approximated Pareto set Ã estimated by the MO-ROPE calibration using the objectives rPD and NS.The blue dots indicate the hull H Ã, i.e. the parameter vectors with shallow data depth; the red dots indicate the ones with high data depth D Ã, and the grey ones are members of the complete Pareto set that are have a depth less than the deep ones and are not members of the hull.

Fig. 16 .Fig. 16 .
Fig. 16.Comparison of the robust model parameter vectors estimated by the multi-objective MO-ROPE using two objectives (rPD/NS) and the single-objective robust parameter estimation algorithm ROPE-PSO using the objectives rPD, FloodSkill and NS, respectively.The plot shows the estimated solution in the objective space based on the validation events.

Fig. 17 .
Fig. 17.Hydrograph prediction uncertainty associated with the uncertainty in the model (lighter shading) and parameter estimates (darker shading) for the flood events 6 (a), 8 (b) and 13 (c), estimated by MO-ROPE (left column) and ROPE-PSO using the FloodSkill criterion (right column).The dots correspond to the observed streamflow data.The shaded areas of uncertainty correspond to the 95 % confidence intervals.

Fig. 18 .
Fig. 18.Correlation between data depth and overall validation performance in terms of the soil moisture correlation r of the Pareto set Ã estimated during the 2-objective calibration.The blue dots indicate the hull H Ã, i.e. the parameter vectors with shallow data depth and the red dots indicate the sampled parameter vectors with high data depth D Ã.

Fig. 19 .
Fig. 19.Comparison of the Pareto optimal sets Ã and the subsequently sampled set of deep parameter vectors D Ã for the 2objective (left) and 3-objective (right) calibration run in the objecspace based on the calibration events (a) and validation events (b).Additionally each plot contains a boxplot providing information about the distribution of the corresponding soil moisture correlation for the sampled deep parameter vectors each (below).

Fig. 19 .
Fig. 19.Comparison of the Pareto optimal sets Ã and the subsequently sampled set of deep parameter vectors D Ã for the 2objective (left) and 3-objective (right) calibration run in the objective space based on the calibration events (a) and validation events (b).Additionally each plot contains a boxplot providing information about the distribution of the corresponding soil moisture correlation for the sampled deep parameter vectors each (below).

Fig. 19 .
Fig. 19.Comparison of the Pareto optimal sets Ã and the subsequently sampled set of deep parameter vectors D Ã for the 2objective (left) and 3-objective (right) calibration run in the objective space based on the calibration events (a) and validation events (b).Additionally, each plot contains a boxplot providing information about the distribution of the corresponding soil moisture correlation for the sampled deep parameter vectors each (below).

5
Discussion and conclusions -This paper presents a hybrid parameter estimation approach, entitled multi-objective robust parameter estimation (MO-ROPE), that merges the strength of evolutionary multi-objective optimisation algorithms and depth-based parameter sampling.In a first step, the algorithm employs a suitable multi-objective optimisation algorithm in order to approximate the Pareto optimal set of model parameter vectors for the given calibration problem.Within our framework, we apply an extended version of the advanced multi-method framework AMALGAM.We extended the original version of AMALGAM with a hybrid evolutionary optimisation algorithm (MO-PSO-GA) that employs the concepts of particle swarm optimisation and genetic programming in order to effectively estimate the Pareto optimal set.An adapted version of this search strategy has proven successful within the frame of single-objective ROPE www.hydrol-earth-syst-sci.net/16

Table 1 .
Default values of the parameters controlling the MO-PSO-GA algorithm.
A darker point represents higher depth.The lines indicate convex hulls enclosing the 25%, 50%, 75% and 100% deepest points.Th used depth function is halfspace depth.

Table 2 .
Description of two simple multi-objective test problems used in this study

Table 4 .
Description of complex multi-objective benchmark problems used in this study.

Table 2 .
Description of two simple mult-iobjective test problems used in this study

Table 5 .
Effectivity of the four presented algorithms in terms of the performance metrics Y and for the given more complex test benchmarks.The statistics represent mean values over 30 optimisation runs.

Table 6 .
Efficiency of the four presented algorithms in terms of the number of model runs that are required to result in a relative hypervolume smaller than 0.005.The statistics represent mean values over 30 optimisation runs.
* Number of runs that have failed to converge are shown in parentheses.* * None of the 30 optimisation runs have converged after 150 generations.

Table 7 .
Effectivity of the estimated Pareto optimal set ( Ã) and the corresponding deep (D Ã) and shallow (H Ã)

Table 8 .
Overview of the used model parameters considered for the synthetic hydrologic model calibration.

Table 9 .
Overview of the used combination of calibration and validation events.

Table 10 .
Validation performance of the solutions estimated by AMALGAM and MO-ROPE in terms of the Nash-Sutcliffe efficiency (NS).

Table 11 .
Zappa (2002) (1999)nce of the solutions estimated by AMALGAM and MO-ROPE in terms of the relative peak deviation (rPD).Q d is then routed via a flow-time grid and finally projected cell-wise to the catchment outlet by means of a simple bucket-type function.The recession coefficient of this function is the model parameter k d .The soil water movement through the different soil layers is modelled by means of the discrete form of the Richards equation.The WaSiM model was calibrated for the Rietholzbach catchment, a small pre-alpine catchment located in the north-east of Switzerland.A significant number of hydrologic studies have been conducted in this basin.It has been observed as a research catchment by the ETH Zurich since 1975.The outlet drains a 3.18 km 2 hilly pre-alpine watershed with an average annual precipitation of 1600 mm, generating a mean runoff of 1046 mm per year.For further information, refer toGurtz et al. (1999);Zappa (2002)and the website http://www.iac.ethz.ch/research/rietholzbach. In this case study, WaSiM will be calibrated for the simulation of extreme discharges.Out of a time series of 27 yr (

Table 12 .
Wösten et al. (1999) andBrakensiek et al. (1984) for calibration; the reference parameter vector θ wb was estimated in order to use WaSiM for water-balance simulations in the Rietholzbach catchment; the parameterisation of the soil hydraulic parameters is done for each soil according to the pedotransfer functions provided inWösten et al. (1999) andBrakensiek et al. (1984).

Table 12 .
Wösten et al. (1999) andBrakensiek et al. (1984) for calibration; the reference parameter vector θ wb was estimated in use WaSiM for water-balance simulations in the Rietholzbach catchment; the parameterisation of the soil hydraulic parameters is each soil according to the pedotransfer functions provided inWösten et al. (1999) andBrakensiek et al. (1984)

Table 14 .
Distribution of the robust parameter vectors of the WaSiM model estimated by MO-ROPE and the three different single-objective ROPE-PSO runs.

Table 14 .
Distribution of the robust parameter vectors of the WaSiM model estimated by MO-ROPE and the three different single-objective ROPE-PSO runs.

Table 14 .
Distribution of the robust parameter vectors of the WaSiM model estimated by MO-ROPE and the three different single-objective ROPE-PSO runs.

Table 14 .
Distribution of the robust parameter vectors of the WaSiM model estimated by MO-ROPE and the three different single-objective ROPE-PSO runs.

Table 14 .
Distribution of the robust parameter vectors of the WaSiM model estimated by MO-ROPE and the three different single-objective ROPE-PSO runs.

Table 15 .
Distribution of the robust parameter vectors of the WaSiM model estimated by MO-ROPE using the 2-objective (a) and 3-objective calibration (b).

3606, 2012 certainty
is approximately the same for both approaches.The remaining uncertainty is due to the discussed shortcomings of the model structure and other neglected uncertainties in the observations.Correlation between data depth and overall validation performance in terms of the soil moisture correlation rΘ of the Pareto set Ã estimated during the 2-objective calibration.The blue dots indicate the hull H Ã, i.e. the parameter vectors with shallow data depth and the red dots indicate the sampled parameter vectors ones with high data depth D Ã.
www.hydrol-earth-syst-sci.net/16/3579/2012/ Hydrol.Earth Syst.Sci., 16, 3579- means that the effective saturated conductivity for deeper soil layers decreases faster.This means that the lower soil layers in the model have a lower saturated conductivity.The correlations between the most important conceptual model parameters are approximately the same for both calibration runs.