Metamodeling for Uncertainty Quantification of a Flood Wave Model for Concrete Dam Breaks

Uncertainties in instantaneous dam-break floods are difficult to assess with standard methods (e.g., Monte Carlo simulation) because of the lack of historical observations and high computational costs of the numerical models. In this study, polynomial chaos expansion (PCE) was applied to a dam-break flood model reflecting the population of large concrete dams in Switzerland. The flood model was approximated with a metamodel and uncertainty in the inputs was propagated to the flow quantities downstream of the dam. The study demonstrates that the application of metamodeling for uncertainty quantification in dam-break studies allows for reduced computational costs compared to standard methods. Finally, Sobol’ sensitivity indices indicate that reservoir volume, length of the valley, and surface roughness contributed most to the variability of the outputs. The proposed methodology, when applied to similar studies in flood risk assessment, allows for more generalized risk quantification than conventional approaches.


Introduction
Safety standards applied to large dams must be maintained at a high level to provide sufficient protection for the downstream population. The safety standards are normally based on three pillars: thorough design, systematic monitoring, and emergency concept [1]. A dam is normally designed by evaluating its performance in all possible operational conditions (including maximum loads), whereas monitoring aims to detect structural defects or external threats to dam safety so that certain safety measures can be taken. Finally, the emergency concept aims to minimize the degree of potential consequences of the dam failure, and in some countries (e.g., Switzerland) it is included in measures of civil protection [2,3]. To help the decision-making process within the emergency concept, risk is commonly assessed by using a computational model of a potential dam break. Three essential steps of such an assessment are: (i) the simulation of the outflow hydrograph resulting from the dam break, (ii) the propagation of this hydrograph in the downstream topography, and (iii) the estimation of loss of life (LL). This study moves toward the estimation of LL due to a dam-break by focusing on the assessment of the flow quantities reached downstream of the dam in case of its failure, addressed through the aforementioned first two steps.
In this study, a risk assessment for the first two steps mentioned above was performed for hydropower dams in Switzerland, which is a country with one of the highest densities of dams it belongs to the class of spectral expansions (similar to, e.g., Fourier series expansion), a characteristic that allows it to directly provide variance-decomposition-based sensitivity indices (for Sobol' indices) [24]. The latter is the main motivation for the choice of PCE as a surrogate model in this contribution (for a comparison with the second most-adopted choice, kriging, please see [25]) Based on these premises, the novel contribution of this study is the quantitative assessment of uncertainties in the flow quantities of a hydrodynamic dam-break flood model by using metamodeling techniques. This overcomes the limitations of observation-based methods and methods with high computational cost. For this, first a computational model of the release of water from the reservoir and the resulting flood propagation was set up. Next, the output of this computational model was approximated by a polynomial chaos expansion (PCE) metamodel. The PCE allowed a direct propagation of the input uncertainties, both aleatory and epistemic, through the metamodel and the calculation of uncertainty in the metamodel output. Finally, global sensitivity analysis was performed on the metamodel and the importance of each model input for critical outputs was evaluated [26].
Furthermore, the metamodel output aims to reflect the risk level for hydropower in Switzerland, i.e., the metamodel needs to be generic for the population of large hydropower dams in Switzerland. For this purpose, all uncertain model inputs were parameterized considering the population of large concrete arch dams, which can be considered representative for hydropower dams in Switzerland. A generic metamodel can be favorable for application in risk analysis and within an energy policy perspective because it allows for a quick estimation of flow quantities without running intensive numerical computations or using laboratory scale models [27].
The structure of this study is composed of the following parts. The methodology for uncertainty quantification and sensitivity analysis is given in Section 2 along with the main assumptions, simplifications, and calculation steps. Afterwards, the computational model and its parametrized inputs are given in Section 3.1-3.2. Finally, results for uncertainties of calculated flow quantities, as well as sensitivity of the model output, are given in Sections 3.3-3.4and discussed in Section 4.

Uncertainty Quantification Framework
In this study, the framework for uncertainty quantification in engineering by Sudret [28] and De Rocquigny, et al. [29] was applied. The procedure is represented schematically in Figure 1. Step A-computational model; Step B-model input uncertainty description; Step C-uncertainty propagation; Step D-sensitivity analysis of the model output.
In step A, a computational model is set up to solve the physical problem, i.e., the flood wave propagation. All inputs required to run the model, as well as all outputs quantities of interest (QoIs),  In step A, a computational model is set up to solve the physical problem, i.e., the flood wave propagation. All inputs required to run the model, as well as all outputs quantities of interest (QoIs), are defined. In step B, sources of uncertainty are identified. Available information is gathered or, alternatively, expert judgement is applied to define marginal probability distributions for each uncertain input. In addition to the marginal distributions, dependence among probabilistic inputs must be described. In step C, the input uncertainty is propagated through the computational model. As a result of this propagation, usually statistics of the model QoIs such as their means and variances are given. Finally, a sensitivity analysis completes the framework (step D) by identifying inputs that contribute most to the output variability.

Step A. Computational Model
Prior to the metamodeling for uncertainty quantification, the appropriate computational model needs to be set up. This model should be able to simulate the detailed dynamics of a flood wave resulting from a dam break using up-to-date methods available for hydrodynamic models.
In this study, a complete and instantaneous dam break was assumed. Thus, with the dam failure occurring in a sufficiently short time, the dam-break flood was treated as a generalized dam-break problem, also known as a Riemann problem [30]. In this regard, a common approach for modeling flood waves is the non-linear shallow water equation (SWE) [31].
The dam-break flood wave was approximated by a one-dimensional (1D) numerical model based on SWEs where the flood wave propagation only occurs in the main flow direction [32,33]. This is a reasonable simplification because the flow in narrow valleys is markedly 1D [9,34]. Furthermore, the main model simplifications and assumptions are: hydrostatic pressure distribution, uniform flow velocity over the channel cross section, the cosine of channel slope is about unity, head losses in unsteady flow are modeled by using a steady-state resistance law. Finally, the channel geometry is fixed and not subject to erosion or deposition.
The governing equations are the continuity (Equation (1)) and the momentum equations (Equation (2)) [35]: where A is the wetted cross section; Q is the discharge; z S is the gradient of water surface elevation; g is the gravitational acceleration; t is the time; x 1 is the coordinate in main flow direction; and S f is the friction slope. For the simulation, the software BASEMENT [36] was used, where the governing equations are discretized for cross-sections using a first-order cell-centered finite volume method and time integration is adopted by an explicit first-order Euler scheme. Hydrodynamic fluxes are computed employing a Roe-type, approximate Rienmann solver [30] and the friction source term is solved using a robust splitting technique [37].

Identification of The Sources of Uncertainties
The sources of uncertainty in dam-break modeling are numerous and can be attributed to input parameters, model simplifications, and lack of knowledge [38]. These uncertainties can be categorized into two types, namely aleatory and epistemic [39].
The model inputs that are not explicitly known, due to the lack of knowledge, contribute to the epistemic uncertainty [40]. Among them is the reservoir volume (i.e., water volume retained at the time of the break), which is not an exact parameter, but can be rather accurately measured for large hydropower reservoirs. Furthermore, epistemic uncertainty exists in the parameters that were set Energies 2020, 13, 3685 5 of 25 using certain approximations (e.g., the same slope over the length of the valley, the same roughness coefficient for the valley-bed land cover) [41]. Another example of epistemic uncertainty is the surface roughness, for which data is missing for some areas of the computational domain. Finally, the epistemic uncertainty due to implicit parameters, such as the failure mode (e.g., a complete instantaneous failure) or the water and flow properties, were either restricted or disregarded (e.g., the varying density of water-sediment mixture from mobilized aggradations in reservoir, etc.) [42].
On the other hand, the input parameters of the model contain aleatory uncertainty, which originates from the natural variability within one topography (e.g., surface roughness), but also from the natural variability within the considered population of dams and valleys. For example, the height and crest length of a concrete dam are normally known inputs; however, in a generic model as proposed in this study, these parameters contain variability of all physically admissible values taken into account in the considered dam population.

Modeling the Sources of Uncertainties
All model inputs contributing to the sources of aleatory or epistemic uncertainty were included in the model as marginal probability distributions and not as deterministic values. This form of representation allows a quantitative description of the uncertainty of each parameter. Furthermore, distributions of these parameters were defined for all existing values including outliers; thus, it allowed the metamodel to be generic for the defined population of dams rather than a model for a specific dam.
The first step in building the marginal distributions was to gather information for all model input parameters (see Section 3.1.1). Based on the availability of the collected data, different methods for distribution fitting can be applied. In particular, when operating with almost no data, the maximum entropy principle can be applied to select the distribution family [43]. In some fields of science and engineering there are joint databases and codes providing probability distributions for common parameters (e.g., the Joint Committee on Structural Safety [44]). Thus, these distributions can be used as a priori generated distributions in studies lacking data for the particular parameters.
In contrast, when plentiful data is available, the maximum likelihood method with information criteria can be used. The former is based on calculation of the maximum likelihood estimator (MLE). The MLE of a parameter θ corresponds to the value that maximizes the likelihood function, i.e., the probability of observing the sample set X, which has been observed indeedθ ML = arg max θ L(θ; X) [45].
Additionally, several probability distributions can be tested to identify their fit to the data by calculating Information Criteria (IC) (e.g., Bayesian or Akaike [46,47]). The distribution with the smallest value of IC is commonly chosen because it is a good compromise between maximizing the likelihood and minimizing model complexity. Furthermore, when scarce data is available, Bayesian methods are typically used. Bayes' theorem describes the probability of an event based on the prior knowledge, the prior distribution P(θ). The prior distribution can be built based on existing studies or from expert knowledge (e.g., with the maximum entropy principle). Then, the prior information is updated with newly collected information (i.e., scarce data), through the likelihood function ( P(y θ )). The posterior ( P(θ y) ) represents the probability of a parameter value given some data as the product of the prior distribution of the parameter value and the likelihood of the data given a parameter value, as P(θ y) ∝ P(y θ)·P(θ). Therefore, the two extreme scenarios here are a case with no data (results when using maximum entropy principle) and a case with plentiful data (results when using MLE or IC).
Finally, in this study, dependence among parameters was described by the copula formalism [48]. A general two-dimensional copula C is a joint cumulative distribution function on the unit hypercube with uniform marginals. In this study, a Gaussian copula is used, a particular class of copula constructed from a multivariate normal distribution [48,49]. The Gaussian copula can be directly parametrized through the sample rank correlation coefficient estimated from the available data.

Step C. Uncertainty Propagation
Once the uncertainty of the model input parameters was quantitatively described, Step C of the uncertainty quantification (UQ) framework ( Figure 1) was conducted. As a first step, the applied computational model was considered as the model Y = M(X) of the response of the quality of interest, Y, to the random input, X (X R M ) with a joint probability distribution function (PDF) f X . Then, this model was substituted with a computationally inexpensive analytical approximation, namely the metamodel. In this study, polynomial chaos expansion (PCE) was the metamodel of choice [21,50]. PCE is a spectral decomposition of the form: where M PCE is the PCE metamodel response (later it is also distinguished between M PCE , y α,i are real coefficients, the index α determines the degree of the multivariate polynomials Ψ α , which are product of underlying orthonormal This method assumes independence between the model input parameters so that f X ( In case of dependent input parameters, an isoprobabilistic transformation can be applied to decorrelate the input parameters prior to the expansion [51]. The coefficients y α of the PCE for a given basis can be calculated using non-intrusive methods, e.g., least-squares analysis. The coefficients result from the post-processing of a training sample of the input random vector, X = x (1) , . . . , x (N) , x (i) = R M , the experimental design (ED), and the corresponding set of model responses, Y = y (1) , . . . , y (N) : The training sample X (Table 1) was created using Latin Hypercube Sampling (LHS) [52]. When calculating the coefficients for high-degree expansions and a limited experimental design, overfitting should be avoided. For this purpose, strategies like least angle regression (LARS, [53]) are commonly used. In comparison with the other methods, e.g., quadrature or ordinary-least-squares (OLS), LARS requires a much smaller experimental design to properly converge even in relatively high dimensions (i.e., M~100). Because the PCE is supposed to have an infinite number of summands, a truncation is applied within Equation (5), i.e., the maximum polynomial degree is limited between 1 and 15; a sparse truncation scheme using the hyperbolic norm with q = 0.75 was applied.
In this study, the construction and use of the metamodel for the purpose of uncertainty quantification was performed using PCE based on LARS as introduced in Blatman and Sudret [54] and as implemented in UQLab, an uncertainty quantification toolbox for MATLAB [55,56].
The agreement between the original model and the calculated PCE model was evaluated by calculating the normalized leave-one-out cross-validation error ( LOO ) [57]: where M x (i) is the response of the computational model to the ith point in the experimental design, µ Y is the mean value of the experimental design response, and M PCE\i x (i) is the response of a PCE constructed on all points except x (i) . In other words, in Equation (6) a total of N different metamodels were built. Each metamodel was built on a reduced experimental design X\x (i) = x ( j) , j = 1, . . . , K, j i except the ith observation (i.e., x (i) with the associated response y (i) ). Then, this observation was used as a single validation point for this metamodel. Finally, the mean square error was estimated and normalized by the estimated variance of the computational model.

Step D. Global Sensitivity Analysis
To meet the objectives of this study, the sensitivities of each model output, Y i , to the input parameters need to be quantified. The calculated sensitivity indices indicate the contribution of each model input to the overall uncertainty of the output due to structural effects of the particular computational model, so-called "structural" sensitivity analysis. In general, such analysis aims to locate important sources of discrepancy in the computational model and consequently to improve the model. Presumably, the improved model allows for a better representation of the true uncertainty in water quantities downstream of the valley.
To perform the sensitivity analysis, the preference was given to a global sensitivity analysis (GSA) and not a local sensitivity analysis. The latter concentrates only on the local impact of the particular input variation on the model output in vicinity of a reference input configuration, while values of other input parameters are kept constant. In contrast, GSA allows for the assessment of the relative contribution of individual input parameters even when the input-output relationship is nonlinear.
For the GSA, Sobol' sensitivity indices were chosen in this case [58]. The Sobol' method is arguably one of the most adopted techniques for GSA in engineering applications because it requires no assumptions on the form of the model and allows for convergence of the calculated indices to the exact relative contributions of the inputs and their interaction with the variance of the output [59].
Generally, the Sobol' indices are defined as the ratio of the partial variance due to the input variable X i , D i , to the total variance, D, [60]. For example, the first-order Sobol' indices are defined as: where D i is calculated as ..M . The first-order Sobol' index, S i , represents the expected fraction of total variance attributable to a single input variable when all other inputs are fixed. It indicates what reduction of output variance to expect if the parameter X i is fixed to some value x i , averaged over all possible values of x i .
In this study, when the PCE metamodel is built, the Sobol' indices can be calculated directly from the PCE coefficients without the need for additional sampling [61]. The partial variance D i can be obtained as a sum of squares of selected coefficients D i = α∈N M y 2 α ; then, the first-order indices can be computed as: where, A i = α ∈ N M : α i > 0, α j i = 0 is the set of selected multi-indices of multivariate polynomials with N being the number of samples in the ED and M being the dimension of the computational model. In this study, because we are interested in the parameters that dominate the response of the model, we performed the sensitivity analysis on independent inputs (a process known as "deep model exploration," [62]).

Results and Discussion
In this section, the setup of the computational model and the subsequent approximation by the metamodel are described. This includes the results for uncertainty and sensitivity of the model output, as well as results for all intermediate modeling steps defined in Figure 2.
As mentioned earlier, the focus of this study is on representative large hydropower dams in Switzerland, i.e., concrete arch dams taller than 100 m. Based on the information about these dams, 14 dams and their downstream areas were chosen for further analysis (Table S1).

Step A: Computational Model
Following the methodology described in Section 2.2, all inputs required to run the model (Section 3.1.1) and the model outputs (Section 3.1.2) were defined. Based on this, the numerical model can be set up. At the beginning of the simulation, the reservoir is initialized with a water volume while the channel downstream of the dam is considered to be dry ( Figure 2).

Required Model Input
To set up the flood propagation model, all input parameters required to sufficiently describe the computational domain need to be defined. Among these parameters are dimensions and shape of the dam, reservoir, and downstream area, as well as roughness according to land cover.
To describe the shape and dimensions of the dam and reservoir, it was assumed that the dam cross section can be described as an isosceles trapezoid with height equal to the dam height, width of the upper base equal to the length of the dam crest, and side slopes equal to the abutment slopes of the dam [63]. The width of the bottom base of the dam is assumed to be the same as the average bottom width of the valley downstream of the dam. The reservoir created by the dam has a planar bed and a trapezoidal cross section that is equal to the one of the dam. Representing the worst-case scenario, the water depth in the reservoir is equal to the dam height. Values of the parameters for dimensions and shape of the dam reservoir are constant over the length of the domain, i.e., prismatic shape of the reservoir.
With these assumptions, three input parameters are required to define the shape and dimensions of the dam and its reservoir; they are the dam height, H (m), the length of the dam crest, L cr (m), and the volume of the reservoir, V (m 3 ). Units and definitions of these parameters are given in Table 1 for visualization purposes these parameters are shown in Figure 2. Other characteristics of the dam and reservoir (e.g., the reservoir cross section, A (m 2 ), reservoir length, L re (m), etc.) can be estimated using the abovementioned inputs.

.Results and Discussion
In this section, the setup of the computational model and the subsequent approximation by the metamodel are described. This includes the results for uncertainty and sensitivity of the model output, as well as results for all intermediate modeling steps defined in Figure 2.
As mentioned earlier, the focus of this study is on representative large hydropower dams in Switzerland, i.e., concrete arch dams taller than 100 m. Based on the information about these dams, 14 dams and their downstream areas were chosen for further analysis (Table S1).

Step A: Computational Model
Following the methodology described in Section 2.2, all inputs required to run the model (Section 3.1.1) and the model outputs (Section 3.1.2) were defined. Based on this, the numerical model can be set up. At the beginning of the simulation, the reservoir is initialized with a water volume while the channel downstream of the dam is considered to be dry ( Figure 2).

Required Model Input
To set up the flood propagation model, all input parameters required to sufficiently describe the computational domain need to be defined. Among these parameters are dimensions and shape of the dam, reservoir, and downstream area, as well as roughness according to land cover.
To describe the shape and dimensions of the dam and reservoir, it was assumed that the dam cross section can be described as an isosceles trapezoid with height equal to the dam height, width of the upper base equal to the length of the dam crest, and side slopes equal to the abutment slopes of the dam [63]. The width of the bottom base of the dam is assumed to be the same as the average bottom width of the valley downstream of the dam. The reservoir created by the dam has a planar bed and a trapezoidal cross section that is equal to the one of the dam. Representing the worst-case scenario, the water depth in the reservoir is equal to the dam height. Values of the parameters for dimensions and shape of the dam reservoir are constant over the length of the domain, i.e., prismatic shape of the reservoir.
With these assumptions, three input parameters are required to define the shape and dimensions of the dam and its reservoir; they are the dam height, H (m), the length of the dam crest, L cr (m), and the volume of the reservoir, V (m 3 ). Units and definitions of these parameters are given in Table 2 for visualization purposes these parameters are shown in Figure 2. Other characteristics of the dam and reservoir (e.g., the reservoir cross section, A (m 2 ), reservoir length, L re (m), etc.) can be estimated using the abovementioned inputs. To describe the shape and dimensions of the downstream area (i.e., valley), a simplified geometry was proposed [64]. To define what kind of simplified geometry can rather accurately reflect the population of valleys downstream of the representative hydropower dams in Switzerland, a  To describe the shape and dimensions of the downstream area (i.e., valley), a simplified geometry was proposed [64]. To define what kind of simplified geometry can rather accurately reflect the population of valleys downstream of the representative hydropower dams in Switzerland, a classification of valleys downstream of the 14 Swiss concrete arch hydropower dams taller than 100 m (Table S1) was performed. It should be noted that classification of natural valleys can be controversial because each natural valley is unique. Therefore, the classification performed in this study considers only major features of the valleys.
For the purpose of classification, slope maps were generated for the area downstream of each of these 14 dams. Slope data were extracted from the Swiss high-resolution digital elevation model, swissALTI3D [65], and then all slope values were partitioned into three groups: "flat"-flatter than 20 • ; "moderately steep"-between 20 • and 45 • ; and "steep"-steeper than 45 • . Based on the obtained data and on the work about categorization of natural valleys conducted by Rosgen [66] and Rosgen et al. [67], three classes of valleys were defined (Table 2).
Among the defined classes, the V-shaped valleys (Class 1) have a narrow bottom with moderately steep side slopes (i.e., 20 • -63 • ). This class corresponds to valleys with steep, moderately steep, or gentle-sloping sides as defined in Rosgen [66] and Rosgen, Rosgen, Collins, Nankervis and Wright [67] ( Table 2). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75 • ). Therefore, this class is represented by rectangular-shape cross sections.
Based on the simplified geometries defined for the three classes, a generic model of the downstream valley reflecting all three classes could be defined as a channel with the trapezoidal cross section with two extreme cases, namely the rectangular and triangular cross sections. Therefore, the shape and dimensions of the channel with the simplified geometry can be characterized using four parameters: the channel length relative to the height of the dam, L ch−rel (m/m), the channel width, W (m) (uniform for the entire length of the channel), the slope of the channel bed, S b (m/m) (the same along the entire channel length), and the slope of the channel side slopes, S s ( • ) (the same on the right and on the left side of the channel and constant along the entire channel length).
Finally, to account for the surface resistance to the flow, the surface of the channel is described by the roughness coefficients, where roughness values for channel bed, M b (s/m 1/3 ), and channel side slopes, M s (s/m 1/3 ), are distinguished. The latter are assumed to be the same on both side slopes.

Dam-downstream Valleys in Switzerland
Simplified Geometry

Class 1: Channel with a triangular cross section
Moderately steep, gentle-sloping side-slopes (often in colluvial valleys) moderately steep side slopes (i.e., 20°-63°). This class corresponds to valleys with steep, moderately steep, or gentle-sloping sides as defined in Rosgen [66] and Rosgen, Rosgen, Collins, Nankervis and Wright [67] (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections.
Among the defined classes, the V-shaped valleys (Class 1) have a narrow bottom with moderately steep side slopes (i.e., 20°-63°). This class corresponds to valleys with steep, moderately steep, or gentle-sloping sides as defined in Rosgen [66] and Rosgen, Rosgen, Collins, Nankervis and Wright [67] (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections.

Class 3: Other topographies
Zevreila dam moderately steep side slopes (i.e., 20°-63°). This class corresponds to valleys with steep, moderately steep, or gentle-sloping sides as defined in Rosgen [66] and Rosgen, Rosgen, Collins, Nankervis and Wright [67] (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections.

Class 3: Other topographies
Channel with a triangular cross section data and on the work about categorization of natural valleys conducted by Rosgen [66] and Rosgen, et al. [67], three classes of valleys were defined (Table 1).
Among the defined classes, the V-shaped valleys (Class 1) have a narrow bottom with moderately steep side slopes (i.e., 20°-63°). This class corresponds to valleys with steep, moderately steep, or gentle-sloping sides as defined in Rosgen [66] and Rosgen, Rosgen, Collins, Nankervis and Wright [67] (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections.

Class 2: Channel with a trapezoidal cross section
Moderately steep, U-shaped glacial-through valleys steep, or gentle-sloping sides as defined in Rosgen [66] and Rosgen, Rosgen, Collins, Nankervis and Wright [67] (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections.  (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections. steep, or gentle-sloping sides as defined in Rosgen [66] and Rosgen, Rosgen, Collins, Nankervis and Wright [67] (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections.  (Table 1). A good approximation for valleys of this class can be a channel with a triangular cross section. The U-shaped valleys (Class 2) have a distinct valley bottom and moderately steep slopes forming the valley walls that start from the edges of the valley bed. Therefore, this class is reflected in the categorization by Rosgen as moderately steep, U-shaped glacial-through valleys. Valleys of Class 2 can be represented by a channel with a trapezoidal cross section. Finally, valleys of Class 3 are like bedrock valleys with steep side slopes; they also have a pronounced bottom area as in Class 2, but their sides are steeper, i.e., almost vertical (≥75°). Therefore, this class is represented by rectangular-shape cross sections. Based on the simplified geometries defined for the three classes, a generic model of the downstream valley reflecting all three classes could be defined as a channel with the trapezoidal cross section with two extreme cases, namely the rectangular and triangular cross sections. Therefore, the shape and dimensions of the channel with the simplified geometry can be characterized using four parameters: the channel length relative to the height of the dam, L ch−rel (m/m), the channel width, W (m) (uniform for the entire length of the channel), the slope of the channel bed, S b (m/m) (the same along the entire channel length), and the slope of the channel side slopes, S s (°) (the same on the right and on the left side of the channel and constant along the entire channel length).
Finally, to account for the surface resistance to the flow, the surface of the channel is described by the roughness coefficients, where roughness values for channel bed, M b (s/m 1/3 ), and channel side slopes, M s (s/m 1/3 ), are distinguished. The latter are assumed to be the same on both side slopes. Based on the simplified geometries defined for the three classes, a generic model of the downstream valley reflecting all three classes could be defined as a channel with the trapezoidal cross section with two extreme cases, namely the rectangular and triangular cross sections. Therefore, the shape and dimensions of the channel with the simplified geometry can be characterized using four parameters: the channel length relative to the height of the dam, L ch−rel (m/m), the channel width, W (m) (uniform for the entire length of the channel), the slope of the channel bed, S b (m/m) (the same along the entire channel length), and the slope of the channel side slopes, S s (°) (the same on the right and on the left side of the channel and constant along the entire channel length).
Finally, to account for the surface resistance to the flow, the surface of the channel is described by the roughness coefficients, where roughness values for channel bed, M b (s/m 1/3 ), and channel side slopes, M s (s/m 1/3 ), are distinguished. The latter are assumed to be the same on both side slopes. Based on the simplified geometries defined for the three classes, a generic model of the downstream valley reflecting all three classes could be defined as a channel with the trapezoidal cross section with two extreme cases, namely the rectangular and triangular cross sections. Therefore, the shape and dimensions of the channel with the simplified geometry can be characterized using four parameters: the channel length relative to the height of the dam, L ch−rel (m/m), the channel width, W (m) (uniform for the entire length of the channel), the slope of the channel bed, S b (m/m) (the same along the entire channel length), and the slope of the channel side slopes, S s (°) (the same on the right and on the left side of the channel and constant along the entire channel length).
Finally, to account for the surface resistance to the flow, the surface of the channel is described by the roughness coefficients, where roughness values for channel bed, M b (s/m 1/3 ), and channel side slopes, M s (s/m 1/3 ), are distinguished. The latter are assumed to be the same on both side slopes. Volume of the water in the reservoir formed by the

Required Model Response
As output, the model provides time series of the flow quantities downstream of the dam. The flow quantities, namely the outflow hydrograph, maximum water depth, h max , and maximum water velocity, v max , at the last downstream cross section of the channel are the outputs of interest. Although the parameters h max and v max are self-explanatory, the shape of the hydrograph generally can be described using the following features. Taking into account that the downstream area is initially dry (i.e., no base flow) and that the flood was caused by a water release from the dam reservoir and not by a rainfall event (i.e., the basin lag time is substituted by the time to the peak discharge), the peak discharge, the time to the peak discharge, the time of the flood arrival, and the recession constant can sufficiently describe the shape of the hydrograph (Table 3) [68].

Feature Definition Visualization
Peak discharge, Q peak , (m 3 /s) Maximum outflow reached during the flood event  The recession limb begins at time to peak and continues while the value of discharge decreases. This limb is characterized by a recession constant, k, of the line between the peak discharge and a discharge at time t after the peak discharge.
Although values for all the hydrograph features can be set based on their definition, the recession constant, k (m 3 /s 2 ), can be calculated as [69]: where, (m 3 /s) is the discharge at time t, which corresponds to the total run time and ∆t (s) is the time between and . In this study, the total run time was set to a conservative value of 9000 s, allowing for the computation of the model output and also for extreme-case scenarios.

Step B. Modeling the Sources of Uncertainties
For the majority of the model input parameters, the initial dataset contained 14 data points. Therefore, the maximum entropy principle was used to define distributions for this model (Section 2.3.2). According to the maximum entropy principle, the least biased Probability Density Function (PDF) ( ): ⟼ for representing information about a continuous variable X is the one that maximizes its entropy [ ] (as defined in Equation (10)), as well as certain constraints based on the available information, e.g., bounds, moments, etc.
Time-to-peak discharge, t peak , (s) Time interval between the start of the computational time and the peak discharge Time-to-flood arrival, t ar , (s) Time interval between the start of the computational time and the time of the first non-zero discharge value. Knowing t peak , t ar , and Q peak the rising limb of the hydrograph can be built, i.e., the curve reflecting the increase of the discharge.
Recession constant, k, (m 3 /s 2 ) The recession limb begins at time to peak and continues while the value of discharge decreases. This limb is characterized by a recession constant, k, of the line between the peak discharge and a discharge at time t after the peak discharge.
Although values for all the hydrograph features can be set based on their definition, the recession constant, k (m 3 /s 2 ), can be calculated as [69]: where, Q t (m 3 /s) is the discharge at time t, which corresponds to the total run time and ∆t (s) is the time between t peak and t. In this study, the total run time was set to a conservative value of 9000 s, allowing for the computation of the model output and also for extreme-case scenarios.

Step B. Modeling the Sources of Uncertainties
For the majority of the model input parameters, the initial dataset contained 14 data points. Therefore, the maximum entropy principle was used to define distributions for this model (Section 2.3.2). According to the maximum entropy principle, the least biased Probability Density Function (PDF) f X (x) : Ω −→ D X cR int for representing information about a continuous variable X is the one that maximizes its entropy H[ f X ] (as defined in Equation (10)), as well as certain constraints based on the available information, e.g., bounds, moments, etc.
In this study, entropy was tested between uniform and beta distributions (Table 4), which are normally used when little knowledge about the parameter is available (e.g., little confidence in the computed mean or median of the data). Furthermore, both distributions are bounded, allowing for setting the limits based on the range of values for the population of dams. For each of the two distribution types, the equations for entropy are given in Table 4. To identify the parameters of the distributions (Table 4), the maximum likelihood principle was used as explained in Section 2.3.2.
To model the distributions of physical characteristics of the dam and reservoir, information about the population of dams in Switzerland was gathered. In order to build probability distributions for H (m), L cr (m), and V (m 3 ), data about 14 concrete arch dams taller than 100 m (Table S1) were used (Swiss Committee On Dams [5]). By testing entropy for two probability distribution types (Section 2.3.2), it was found that the length of the dam crest was uniformly distributed [ Figure 3a]. In the case of dam height, it is known that lower values of the distribution range will have higher probabilities, i.e., the frequency of having a dam with a lower height is higher. Therefore, a beta distribution was used to describe the dam height [ Table 6, Figure 3b], where a B and b B are the bounds of the height range for the Swiss arch concrete dams taller than 100 m. Because the reservoir volume is strongly correlated with the dam height, it was assumed that V was also following a beta distribution [ Table 6, Figure 3c].
3·σ is a lower boundary and b = µ + √ 3·σ is an upper boundary. A Uniform distribution is applied when little is known about the parameter (e.g., distance, loads).
the case of dam height, it is known that lower values of the distribution range will have higher probabilities, i.e., the frequency of having a dam with a lower height is higher. Therefore, a beta distribution was used to describe the dam height [ Table 6, Figure 3b], where and are the bounds of the height range for the Swiss arch concrete dams taller than 100 m. Because the reservoir volume is strongly correlated with the dam height, it was assumed that V was also following a beta distribution [ Table 6, Figure 3c].  To build probability distributions for the parameters of the channel (i.e., L ch−rel , W, S s , S b ), information was extracted from the topographic maps and digital elevation maps [65] of the areas downstream of the 14 arch dams considered (Table S1).
The channel length was estimated as the distance between the dam and the nearest locality with at least 500 inhabitants situated downstream of the dam [ Figure 4a]. The distance corresponds to the flow path, i.e., considering all bends in the reach. For each dam, the locality situated either along the channel or at the end of the channel was chosen (Table S1). For example, in the Mauvoisin dam case, the channel length is measured between the dam and the Bagnes town, with the population of 8,057 To build probability distributions for the parameters of the channel (i.e., L ch−rel , W, S s , S b ), information was extracted from the topographic maps and digital elevation maps [65] of the areas downstream of the 14 arch dams considered (Table S1).
The channel length was estimated as the distance between the dam and the nearest locality with at least 500 inhabitants situated downstream of the dam [ Figure 4a]. The distance corresponds to the flow path, i.e., considering all bends in the reach. For each dam, the locality situated either along the channel or at the end of the channel was chosen (Table S1). For example, in the Mauvoisin dam case, the channel length is measured between the dam and the Bagnes town, with the population of 8057 people as in 2016 [70]. Based on these measurements, the channel length relative to the height of the dam, L ch−rel , was described with a uniform distribution [ Table 6, Figure 4b]. people as in 2016 [70]. Based on these measurements, the channel length relative to the height of the dam, L ch−rel , was described with a uniform distribution [ Table 6, Figure 4b].
The channel slope S b was defined by the difference in elevation between the dam and the locality and the absolute value of the channel length (i.e., not normalized by the dam height). Based on the collected data, S b was found to follow a beta distribution truncated over the boundary corresponding to the smallest and the largest existing values (Table 6).  According to the definition given in Section 0, the channel bed is defined using the swissALTI3D slope map [65]. Then, multiple lines were drawn across the bed area perpendicular to the channel axis, where each line represented the width of the channel bed in a specific location [as shown, for example, in Figure 5a]. For each channel the average value of W was taken into account to build a generic distribution representing all channels. The channel bed width was uniformly distributed [ Table 6, Figure 5b]. The channel slope S b was defined by the difference in elevation between the dam and the locality and the absolute value of the channel length (i.e., not normalized by the dam height). Based on the collected data, S b was found to follow a beta distribution truncated over the boundary corresponding to the smallest and the largest existing values (Table 6).
According to the definition given in Section 3.1.1, the channel bed is defined using the swissALTI3D slope map [65]. Then, multiple lines were drawn across the bed area perpendicular to the channel axis, where each line represented the width of the channel bed in a specific location [as shown, for example, in Figure 5a]. For each channel the average value of W was taken into account to build a generic distribution representing all channels. The channel bed width was uniformly distributed [ Table 6, Figure 5b].
According to the definition given in Section 0, the channel bed is defined using the swissALTI3D slope map [65]. Then, multiple lines were drawn across the bed area perpendicular to the channel axis, where each line represented the width of the channel bed in a specific location [as shown, for example, in Figure 5a]. For each channel the average value of W was taken into account to build a generic distribution representing all channels. The channel bed width was uniformly distributed [ Table 6, Figure 5b].  Figure 6a, rose polygon). In the next step, the slope value at each point falling within  Figure 6a, rose polygon). In the next step, the slope value at each point falling within the defined polygon was extracted from the swissALTI3D slope map [ Figure 6b]. Then, the mean of the slope was computed assuming that S s for each valley was normally distributed. Figure 6c presents an empirical histogram of all 14 calculated S s values showing that S s is uniformly distributed ( Table 6). the defined polygon was extracted from the swissALTI3D slope map [ Figure 6b]. Then, the mean of the slope was computed assuming that for each valley was normally distributed. Figure 6c presents an empirical histogram of all 14 calculated values showing that is uniformly distributed ( Table 6). The downstream environment is characterized by the roughness coefficients and . Figure  7a-c shows three steps performed to build a probability distribution for based on the data about the 14 valleys. The process of building a probability distribution for is analogous. In the first step [ Figure 7a], the types of land cover in the areas downstream of the dam were collected using data from the Swiss topographic landscape model, swissTLM3D [71]. The swissTLM3D differentiates between six types of land cover (Table). In this study, water bodies were excluded as a land cover type, because the downstream area of the dam was treated as a dry domain due to much smaller size of any water body in comparison with the dam reservoir.
Manning's roughness coefficients for different land cover types were defined using information available in the literature. Because many studies [72,73] used the NLCD classification (i.e., National Land Cover Dataset by the US Geologic Survey, [74]), land cover types in the SwissALTI3D classification were allocated to the types of the NLCD classification using the available description (Table 5).   In the first step [ Figure 7a], the types of land cover in the areas downstream of the dam were collected using data from the Swiss topographic landscape model, swissTLM3D [71]. The swissTLM3D differentiates between six types of land cover (Table 5). In this study, water bodies were excluded as a land cover type, because the downstream area of the dam was treated as a dry domain due to much smaller size of any water body in comparison with the dam reservoir. dashed lines in Figure 7c. To construct the distribution of the roughness reflecting the population of all considered valleys, roughness distributions of all distinct land cover types in different valleys were treated together and their areal probabilities were normalized by the total number of considered valleys. This resulted in a beta distribution shown as a red solid line in Figure 7c. The information about chosen marginal distributions, their hyper parameters, truncation range, and moments are summarized for all input parameters in Table 6.   Finally, all input parameters of the model were checked for possible dependence. The results showed strong positive, as well as negative, correlation between some of the inputs. For example, both the correlation between reservoir volume and dam height and the correlation between the crest length and the reservoir volume had a value of 0.57. These results can be supported by findings of other research conducted in the past; for example, the first example of correlation is reflected in the GRanD database guidelines for estimating missing reservoir volumes based on the reservoir area and dam height [75].
A negative correlation was found between the dam height and the relative length of the channel ( , ℎ− = −0.51), and between the slope of the channel bed and the roughness coefficient of the channel sides ( , = −0.52). The later, for example, can demonstrate that it is harder for vegetation  Manning's roughness coefficients for different land cover types were defined using information available in the literature. Because many studies [72,73] used the NLCD classification (i.e., National Land Cover Dataset by the US Geologic Survey, [74]), land cover types in the SwissALTI3D classification were allocated to the types of the NLCD classification using the available description (Table 5).
In the second step [ Figure 7b], the roughness of each land cover type was described by the average areal roughness with mean, µ i , and variance, σ 2 i , calculated by assuming that the roughness coefficient within each land cover type is distributed uniformly. Furthermore, areal probability, P i , was assigned to each land cover type by defining the fraction of the area occupied by the land cover type in the total area of the downstream valley bed. Then, using the equations given in Figure 7b, the mean, µ areal , and the variance, σ 2 areal , of the average areal roughness for the entire area of the downstream valley bed were defined. Finally, using the mean µ areal and variance σ 2 areal calculated with the method of moments, the maximum entropy was used to calculate the parameters of a beta probability distribution, which yielded a better fit to the M b data.
Probability distributions of the average areal roughness, M b , for each valley are displayed as dashed lines in Figure 7c. To construct the distribution of the roughness reflecting the population of all considered valleys, roughness distributions of all distinct land cover types in different valleys were treated together and their areal probabilities were normalized by the total number of considered valleys. This resulted in a beta distribution shown as a red solid line in Figure 7c. The information about chosen marginal distributions, their hyper parameters, truncation range, and moments are summarized for all input parameters in Table 6. Table 6. Information about the marginal distributions specified for the input of the metamodel. Finally, all input parameters of the model were checked for possible dependence. The results showed strong positive, as well as negative, correlation between some of the inputs. For example, both the correlation between reservoir volume and dam height and the correlation between the crest length and the reservoir volume had a value of 0.57. These results can be supported by findings of other research conducted in the past; for example, the first example of correlation is reflected in the GRanD database guidelines for estimating missing reservoir volumes based on the reservoir area and dam height [75].

Para-Meter
A negative correlation was found between the dam height and the relative length of the channel (ρ H,L ch−rel = −0.51), and between the slope of the channel bed and the roughness coefficient of the channel sides (ρ S b , M s = −0.52). The later, for example, can demonstrate that it is harder for vegetation such as forest trees to grow on very steep slopes. Correlation coefficients for other combinations are given in Table 7.

Step C. Uncertainty Propagation
The generic PCE metamodel for each of the defined model outputs was calculated using the experimental design with a sample size of 2000. The correlation between the model inputs was not considered for building the metamodel to achieve better metamodeling performance [76]. Afterwards, the constructed PCEs were evaluated on the new sample set of size 1,000,000 using coefficients calculated in the previous step. Prior to the forward propagation of the input uncertainties, the information about correlation between model inputs, i.e., Gaussian Copula with calculated Spearman rank correlation coefficients, was added to the information about marginal distributions and their bounds. In this way, the final PDF distributions reflecting uncertainty of the model outputs were built, including the information about the dependency between input parameters.
In Figure 8a-f, the histograms of the computational model responses, Y ED 1−6 , and the histograms of the PCE metamodel response, M PCE 1−6 , are shown. The latter together with the calculated moments of the metamodel response reflect the variability of the modeled parameters. In particular, the mean value of the peak discharge is 0.99 × 10 5 m 3 /s and the arrival of this peak outflow corresponds to 1067 s (17.78 min), whereas arrival of the flood itself corresponds to 756 s (12.60 min). However, the range of possible values of the corresponding parameters is large and the values of the peak outflow and the time to the peak outflow can grow up to 7 × 10 5 m 3 /s and 9000 s (150 min), respectively. The results for the recession constant describing the hydrograph indicate that the descending limb is a flat curve (i.e., the mean value of the recession constant k is 0.98 × 10 −3 m 3 /s 2 ), demonstrating that the water tends to remain at the location, i.e., the decrease in water depth is slow. The mean value of the maximal flow velocity is 26.19 m/s and the mean value of maximum water depth is 34.42 m, which are rather high considering that they indicate the most severe moment when the flood wave arrives. Quantitative validation of the calculated PCEs was done by estimating the cross-validation error (ϵ ). Results are given in Figure 9 and Table S2 for different experimental designs. The PCE degree shown in the appendix is the PCE degree with the lowest ϵ in the array of the given degrees (i.e., 1-15) [54]. The results showed that with the increasing ED the value of decreases, and the degree of the calculated PCE changes as well. The residual ϵ for some of the model outputs is about 10% (e.g., , ). Because of the relatively high residual ϵ , in addition to the built-in error estimation, the mean squared error (MSE) was used as another error metric for the convergence of the mean values of the model outputs. For this analysis, the ED was divided into two samples, where one was used for construction of the PCE, , and another as a validation sample, . While was represented by the same sample set of 100 points, was increased stepwise from 200 to 1900 sample points (see Figure 9). Then, the validation error evolution using the prediction of PCE, , and the true output of the computational model, , was calculated as: The calculated MSE (Figure 9 and Table S2) decreases for all the model outputs with the larger size of the ED, . For some of the parameters the decrease of the MSE value is rather constant and well-pronounced (e.g., , , ℎ ); for some others the MSE value decreased to 10% rather quickly and then stayed rather constant at the same level with the enlarging ED (e.g., , ). Similar to , the residual MSE for some parameters is about 10%. The overall tendency of MSE to decrease with an increasing ED is an indicator that the applied computational model is complex. In Quantitative validation of the calculated PCEs was done by estimating the cross-validation error ( LOO ). Results are given in Figure 9 and Table S2 for different experimental designs. The PCE degree shown in the appendix is the PCE degree with the lowest LOO in the array of the given degrees (i.e., [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] [54]. The results showed that with the increasing ED the value of LOO decreases, and the degree of the calculated PCE changes as well. The residual LOO for some of the model outputs is about 10% (e.g., k, v max ).
Because of the relatively high residual LOO , in addition to the built-in error estimation, the mean squared error (MSE) was used as another error metric for the convergence of the mean values of the model outputs. For this analysis, the ED was divided into two samples, where one was used for construction of the PCE, X NED , and another as a validation sample, X VAL . While X VAL was represented by the same sample set of 100 points, X NED was increased stepwise from 200 to 1900 sample points (see Figure 9). Then, the validation error evolution using the prediction of PCE, Y NED , and the true output of the computational model, Y VAL , was calculated as: Energies 2020, 13, 3685

of 25
The calculated MSE ( Figure 9 and Table S2) decreases for all the model outputs with the larger size of the ED, Y NED . For some of the parameters the decrease of the MSE value is rather constant and well-pronounced (e.g., Q peak , k, h max ); for some others the MSE value decreased to 10% rather quickly and then stayed rather constant at the same level with the enlarging ED (e.g., t peak , t ar ). Similar to LOO , the residual MSE for some parameters is about 10%. The overall tendency of MSE to decrease with an increasing ED is an indicator that the applied computational model is complex. In addition, the repeatability error calculated for each of the model outputs is close to 0.1% or substantially lower than this value (Protocol S1), which is another indicator of the model complexity. The validation results with the residual error (both and MSE) of 10% might compromise the advantages of using PCE as a tool for UQ. However, studies conducted in the past were able to prove that if the metamodel is used for the estimation of the statistical properties of the model output (e.g., mean, variance) and also for calculation of sensitivity indices (Section 3.4), then it is rather accurate even with a residual error of about 10% [25]. Another factor that can potentially contribute to relatively high validation error (both and MSE) is the nature of the model outputs defined in this study. Most of them represent maxima parameters (i.e., , , ℎ ) and the peaks are intrinsically harder to metamodel.
Finally, other types of surrogate modeling techniques, such as low-rank approximation (LRA) [16] and Gaussian process modeling (i.e., kriging [77]), were tested. Although LRA is particularly promising for high-dimensional problems (i.e., the number of unknown coefficients in LRA grows only linearly with respect to the input dimension [16]) and kriging has a different approach to approximate Y (i.e., it focuses on the local alterations of Y, whereas PCE approximates the global behavior of Y [78]), both of these metamodeling approaches did not reveal a better fit to the computational model.

Step D. Global Sensitivity Analysis
In general, the PCE approach reduces the cost of GSA by calculating the Sobol' indices directly from the PCE coefficients, y α , without the need for additional sampling (see Section 2.5). However, in real applications different issues can arise. An example can be cases of high-dimensional models, such as the one applied in this study (i.e., the model has nine inputs). For these models, UQ sampling of the size 10 2 to 10 3 still allows to reach a good agreement with the computational model, but for the calculation of the Sobol' indices a sample size 10 4 to 10 5 is normally required. In other words, if the computational cost for building the metamodel itself does not depend on the dimensionality of The validation results with the residual error (both LOO and MSE) of 10% might compromise the advantages of using PCE as a tool for UQ. However, studies conducted in the past were able to prove that if the metamodel is used for the estimation of the statistical properties of the model output (e.g., mean, variance) and also for calculation of sensitivity indices (Section 3.4), then it is rather accurate even with a residual error of about 10% [25]. Another factor that can potentially contribute to relatively high validation error (both LOO and MSE) is the nature of the model outputs defined in this study. Most of them represent maxima parameters (i.e., Q peak , v max , h max ) and the peaks are intrinsically harder to metamodel.
Finally, other types of surrogate modeling techniques, such as low-rank approximation (LRA) [16] and Gaussian process modeling (i.e., kriging [77]), were tested. Although LRA is particularly promising for high-dimensional problems (i.e., the number of unknown coefficients in LRA grows only linearly with respect to the input dimension [16]) and kriging has a different approach to approximate Y (i.e., it focuses on the local alterations of Y, whereas PCE approximates the global behavior of Y [78]), both of these metamodeling approaches did not reveal a better fit to the computational model.

Step D. Global Sensitivity Analysis
In general, the PCE approach reduces the cost of GSA by calculating the Sobol' indices directly from the PCE coefficients, y α , without the need for additional sampling (see Section 2.5). However, in real applications different issues can arise. An example can be cases of high-dimensional models, such as the one applied in this study (i.e., the model has nine inputs). For these models, UQ sampling of the size 10 2 to 10 3 still allows to reach a good agreement with the computational model, but for the calculation of the Sobol' indices a sample size 10 4 to 10 5 is normally required. In other words, if the computational cost for building the metamodel itself does not depend on the dimensionality of the model, the computational cost for the GSA increases with larger model dimensions. However, for the particular case of the "structural" GSA (Section 2.5), sampling of the size 10 2 to 10 3 could be adopted, and the metamodel built in Section 3.3 on the ED of 2000 points was used also for GSA.
The calculated first-order Sobol' indices (Figure 10a-f) indicate that among the parameters characterizing the dam and its reservoir (i.e., H, V, L cr ), the reservoir volume, V, is the most important for the parameters Q peak and h max . In other words, the volume of the water released from the dam reservoir influences amounts of water reachable downstream, which are expressed as outflow or water depth. Meanwhile, dam height, H, and crest length, L cr , did not influence any of the flow quantities estimated at the end of the channel.
Among the parameters of the channel, the relative length, L ch−rel , strongly affected all model outputs. In other words, by moving further downstream from the dam, not only did the maximum reachable flow quantities (Q peak , v max , h max ) changed, but the shape of the hydrograph (t peak , t ar , k) was altered. The width of the channel bed, W, is relatively important only for h max ; the water has literally more space to spread. Lastly, the slope of the channel bed, S b , and sides, S s , did not show influence over any of the model outputs.
Finally, the roughness coefficients had influence on the characteristics of the hydrograph (Q peak , t peak , t ar , k) and the maximum velocity reached downstream, v max . Roughness corresponds to flow resistance, e.g., the velocity decreases and it takes more time for the discharge to arrive and to reach its peak. The influence of M s on the output parameters is slightly higher than the influence of M b , which may be simply related to the fact that the channel sides cover a substantially larger surface area than the channel bed. reservoir influences amounts of water reachable downstream, which are expressed as outflow or water depth. Meanwhile, dam height, H, and crest length, , did not influence any of the flow quantities estimated at the end of the channel.
Among the parameters of the channel, the relative length, ℎ− , strongly affected all model outputs. In other words, by moving further downstream from the dam, not only did the maximum reachable flow quantities ( , , ℎ ) changed, but the shape of the hydrograph ( , , k) was altered. The width of the channel bed, W, is relatively important only for ℎ ; the water has literally more space to spread. Lastly, the slope of the channel bed, , and sides, , did not show influence over any of the model outputs.
Finally, the roughness coefficients had influence on the characteristics of the hydrograph ( , , , ) and the maximum velocity reached downstream, . Roughness corresponds to flow resistance, e.g., the velocity decreases and it takes more time for the discharge to arrive and to reach its peak. The influence of on the output parameters is slightly higher than the influence of , which may be simply related to the fact that the channel sides cover a substantially larger surface area than the channel bed. The calculated sensitivity indices identified the model input parameters that are influential for the model describing large hydropower dams in Switzerland. When variability of the output has to be limited, further investigation of the influential inputs needs to be done to decrease their uncertainty. For example, in the context of the dam risk assessment, to make a better prediction of the potential loss of life due to the impact of the dam-break flood wave, uncertainty of the value of the outflow, flow velocity, and maximum water depth at the location of the city needs to be decreased. According to the sensitivity results, to achieve that, the volume of the dam reservoir needs to be measured or monitored accurately. Furthermore, the roughness, especially of the channel sides, may be defined more specifically. For example, the average areal roughness used in this study can be further investigated to reflect different roughness on various parts of the valley. The calculated sensitivity indices identified the model input parameters that are influential for the model describing large hydropower dams in Switzerland. When variability of the output has to be limited, further investigation of the influential inputs needs to be done to decrease their uncertainty. For example, in the context of the dam risk assessment, to make a better prediction of the potential loss of life due to the impact of the dam-break flood wave, uncertainty of the value of the outflow, flow velocity, and maximum water depth at the location of the city needs to be decreased. According to the sensitivity results, to achieve that, the volume of the dam reservoir needs to be measured or Energies 2020, 13, 3685 20 of 25 monitored accurately. Furthermore, the roughness, especially of the channel sides, may be defined more specifically. For example, the average areal roughness used in this study can be further investigated to reflect different roughness on various parts of the valley.

Conclusions
The current study demonstrated the application of metamodeling for quantitative assessment of uncertainties for the complete and instantaneous break of a hypothetical concrete dam located in Switzerland. A 1D model based on the Shallow Water Equations was used for numerical simulation of the flood wave propagation. The model configuration included nine input parameters characterizing the dam, its reservoir and the downstream area and six model outputs defined as the flow quantities reached downstream of the failed dam. This computational model was approximated with a metamodel.
A polynomial chaos expansion (PCE) was able to reproduce the computational model with good agreement using 2000 runs of the original model. This offers the opportunity to reduce computational efforts compared, for example, to Monte Carlo sampling, which was applied in Froehlich [79] and required 100,000 trials to estimate confidence limits of dam-breach flood calculations. Furthermore, the quantitative assessment of uncertainties, which was enabled by using the metamodel, is beneficial for further phases of modeling the dam-break event. For example, it allows for the description of the flow quantities reached downstream in form of probability distributions, which then can be used as uncertain inputs to the model for the loss of life estimation in the downstream area.
The applied metamodeling approach also allowed sensitivity analysis by calculating the first-order PCE-based Sobol' indices. These indices helped to understand how the variability of each model input affected variability of the model output and to identify important sources of discrepancy within the model. Particularly, the reservoir volume, length of the channel, and surface roughness in the downstream area were important parameters that drove the output of the model as reflected by the higher values of the Sobol' indices. The dam dimensions, as well as slopes of the channel sides and bed, did not significantly influence the flow quantities at the end of the channel, i.e., at the locality of population at risk.
Finally, the constructed metamodel can support decision-making processes in risk management for large concrete arch hydropower dams in mountain regions of Switzerland. By setting certain parameters to deterministic values and by further specifying the distribution of the remainder of the parameters based on the specific location, this generic metamodel is converted into a preliminary model of the failure of a dam to be built in the future. Then, the flow quantities that can be reached downstream can be quickly approximated without running any computationally expensive model, what is useful for risk assessment and decision support. These findings eventually provide important inputs for decision makers when formulating energy policy strategies.
As an outlook of the current research, if it is decided to switch to 2D flood propagation models to account for non-prismatic channels and valleys with more complex topographies, robustness of PCE for quantitative assessment of uncertainties has to be investigated. Furthermore, correlation among input parameters and their influence on variability of the model output can be further addressed in sensitivity analysis. For this purpose, alternative techniques, e.g., Borgonovo indices [80] or copula-based approaches [81,82], can be applied. However, their application should be done with caution since it strongly depends on the question asked before doing the analysis. Finally, the step of life-loss estimation due to a dam-break will be addressed by the authors in the follow-up research.
Supplementary Materials: The following are available online at http://www.mdpi.com/1996-1073/13/14/3685/s1, Table S1: Information about 19 large concrete arch hydropower dams located in Switzerland, Table S2: The degree of PCE and values of LOO and MSE error (expressed as a fraction) calculated for each model output using different sizes of the experimental design, Protocol S1: Calculation of the repeatability error, RE, for each model output.
Author Contributions: The conceptualization of the paper was done by A.K., M.S. and P.B.; A.K. compiled the data set and M.S. provided feedback; A.K., M.S., D.F.V., C.W. and P.B. contributed to the conceptual model development and discussions on its refinement; A.K. and D.F.V. implemented the Basement model; A.K. implemented the uncertainty quantification in UQLab, and S.M. and B.S. provided support and feedback on improving the model and its performance; M.S., D.F.V., S.M., B.S. and C.W. provided feedback on specific metamodeling aspects. A.K. and M.S. wrote the first draft of the manuscript; M.S., P.B., S.M. and C.W. reviewed the paper draft; A.K. managed the reviewing and editing process. All authors have read and agreed to the published version of the manuscript.
Funding: This research project is part of the National Research Programme "Energy Turnaround" (NRP 70) of the Swiss National Science Foundation (SNSF). Further information on the National Research Programme can be found at www.nrp70.ch. Additionally, this work is supported by the Swiss Competence Center on Energy Research (SCCER) Supply of Electricity (SoE).

Acknowledgments:
The authors express their sincere thanks to Christopher Robinson (Swiss Federal Institute of Aquatic Science and Technology and Swiss Federal Institute of Technology, Zurich, Switzerland) for his valuable feedback and support to this work. We are grateful for the constructive comments of four anonymous reviewers on an earlier version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Nomenclature
The following symbols are used in this paper: = leave-one-out cross-validation error; µ Y = mean value of the computational model; ρ s ij = Spearman coefficient between two input parameters (ith and jth); Ψ α = multivariate polynomials. i.e., the product of the underlying orthonormal polynomials; and φ α = Orthonormal polynomials