Probabilistic modeling of flood characterizations with parametric and minimum information pair-copula model

This paper highlights the usefulness of the minimum information and parametric pair-copula construction (PCC) to model the joint distribution of flood event properties. Both of these models outperform other standard multivariate copula in modeling multivariate flood data that exhibiting complex patterns of dependence, particularly in the tails. In particular, the minimum information pair-copula model shows greater flexibility and produces better approximation of the joint probability density and corresponding measures have capability for effective hazard assessments. The study demonstrates that any multivariate density can be approximated to any degree of desired precision using minimum information pair-copula model and can be practically used for probabilistic flood hazard assessment.


Introduction
Operational planning and design of flood defence systems, irrigation water management systems and hydroelectric schemes requires accurate estimation of flood hazard and/or specified exceedance probabilities of river flow. Flood frequency analysis (FFA) is traditionally used to assess flood hazard with an assumption that annual maximum floods are a stationary, independent, identically distributed random process (Kidson and Richards, 2005). Conventionally, FFA is performed using either 'Block (annual) maxima' or 'peaks over threshold (POT)' methods on partial series of data (Hosking et al., 1985). Although, the univariate FFA is widely used in hydrology, many studies have highlighted its unreliability and suggested that univariate frequency analysis methods cannot sufficiently characterize inflow hydrographs or reduce uncertainty in flood analysis (Cunnane, 1988;Bobee and Rasmussen, 1994). Indeed, most hydrologic events are multivariate in nature and defined by a group of correlated random variables (e.g. flood peak, volume, and duration). Therefore, multivariate FFA would be more suitable to describe the uncertainties associated with these events.
By recognizing the limitations of univariate FFA, multivariate flood frequency analysis methods were developed. Many early multivariate studies focused on bivariate normal distribution to perform flood analysis with later researchers considering multivariate Gaussian (Krstanovic and Singh, 1987), gamma (Yue et al., 2001;Nadarajah and Gupta, 2006), exponential (Choulakian et al., 1990), Gumbel (Bacchi et al., 1994) and other distributions. Durrans et al. (2003) applied Pearson Type III distribution to perform joint frequency analysis. Yue and Wang (2004) developed Gumbel mixed and Gumbel logistic models; and compared their performances in flood analysis. However, distribution-based traditional univariate and multivariate analysis methods have mathematical weaknesses that limit their potential for practical applications. These flaws include that (a) the mathematical formulation is complicated when the number of variables are high (b) it is not possible to distinguish marginal and joint behavior of studied variables, (c) marginal distributions are of same type, or normal, or independent and (d) joint distributions hold validity in limited space (Song and Singh, 2010).
Recently, the application of copulas in hydrology, as well as in other earth and environmental sciences, has received increasing attention. Copulas are efficient mathematical tools which are capable of combining several univariate marginal cumulative distribution functions into their joint cumulative distribution function (Sklar, 1959). The copula application in hydrology largely began after De Michele and Salvadori (2003) highlighted the suitability of the Frank copula for the joint distribution of negatively associated storm intensity and storm duration data, whilst Grimaldi and Serinaldi (2006a,b) applied several trivariate copulas for determining joint and conditional distributions among design hyetograph variables. Recent works on analysis of multivariate hydrological extreme events (Salvadori and De Michele, 2006 have popularized copulas as a tool for extreme value applications in rainfall (Évin and Favre, 2008;Wang et al., 2010;Zhang et al., 2012), floods (Zhang and Singh, 2007;Chowdhary et al., 2011), and droughts (Shiau, 2006;Song and Singh, 2010;Zhang et al., 2012;Ma et al., 2013). A brief review of the application of copula in various engineering and science fields can found in . They have also identified plausible Copula candidates for flood peak flow and volume data in FFA. Dupuis (2007) used 5 copulas (Normal, Student-t, Frank, Clayton, Gumbel, and associated Clayton) and warned about ignoring the tail dependence characteristics of flood data. Their analysis showed that the Frank copula performed relatively well in comparison to other approaches. Karmakar and Simonovic (2009) identified that the generalized hyperbolic copula is better at obtaining pair-wise joint distributions among flood peak flow, volume and duration. Leonard et al. (2008) used copula for bivariate analysis of rainfall and stream flow extremes accounting for seasonal and climatic partitions. Huard et al. (2006) and Silva and Lopes (2008) used Bayesian based copula selection method for estimating marginal and dependence parameters.
The set of higher dimensional copulas proposed in the literature is limited and is not rich enough to model all possible mutual dependencies among all variables (see Kurowicka and Cooke, 2006 for details). In addition, Aas et al. (2009) show that the multivariate copulas (in particular, multivariate t-copula) cannot efficiently be used to model multivariate data exhibiting complex patterns of dependence in the tails (which are common in analyzing the extreme events). These limits of the multivariate copula motivate Joe (1997) and Cooke (2001, 2002), to propose a far efficient new way of constructing complex multivariate highly dependent models called vine or pair-copula (Aas et al., 2009). The principle behind this method is to model dependency using simple local building blocks based on conditional independence, known as the pair-copulae. The modeling scheme is then based on a decomposition of a multivariate density into a cascade of pair copulae, applied on the original variables and on their conditional and unconditional distribution functions. There is a growing literature of using the pair-copula models in the different real world applications including finance, economic and insurance studies (Aas et al., 2009;Czado and Min, 2010;Min and Czado, 2010;Bauer et al., 2012;Dissmann et al., 2013;Brechmann et al., 2014), risk management Brechmann et al., 2014), energy (Czado et al., 2011), hydrological drought frequency analysis (Song and Singh, 2010). In addition to the above references which give an idea of recent advancements happening on pair-copula applications in the different fields. Recently, Gyasi-Agyei and Melching (2012) have used PCC to model the dependence structure of storm event properties using hourly rainfall data from Cook County, Illinois, USA. Song and Kang (2011) demonstrated pair-copula based trivariate discharge modeling considering variables like flood duration, severity, and severity peak. Vernieuwe et al. (2015) constructed a continuous rainfall model based on vine copulas and they compared the vine model with ensemble synthetic rainfall series. In a similar study, Xiong et al. (2014) have developed an annual rainfall-runoff model using the canonical vine copula derivation approach and employed in 40 watersheds in two large basins in China.
The multivariate copula models have also been used in different applications in the domain of spatial statistics. Bárdossy (2006) was one of the first who applied copulas in a geostatistical context. Gräler and Pebesma (2011) propose a more efficient approach for modeling spatial data (including extremes) using the vine copula model. One of the advantages of their approach was its flexibility in choosing appropriate parametric copula families through bivariate spatial copulas. Gräler (2014) extends this methodology further by adding several spatial trees at the foundation of the selected vine. These additional spatial trees add valuable information on the dependence of the higher order neighbors leading to an improved model of the spatial data. The predictive accuracy of the spatial vine copula outperforms other spatial multivariate copulas, including spatial Gaussian copula which used to be a very common method (as suggested by Bárdossy, 2006).
In a more relevant study, Gräler et al. (2013) use the vine copula model to construct a joint probability distribution for the flood variables, including peak discharge, duration, and volume. However, their main purpose of modeling the dependencies between the flood variables using the vine copula model and other multivariate copula models was to estimate design events for a given return period and to discuss their differences in a practical application. They concluded that the vine copula approach is the way to go for constructing flexible multivariate distribution functions for the same reasons mentioned above and discussed in further details in the next section.
It should be noticed that the use of a copula to model dependency is simply a translation of one difficult problem into another. By using (parametric) copula, the difficulty of specifying the full joint distribution will be reduced to the difficulty of specifying the copula. The advantage is the technical one that copulas are normalized to have support on the unit square and uniform marginals. As many authors restrict the copulas to a particular parametric class (Gaussian, multivariate t, etc.) the potential flexibility of the copula approach is not realized in practice. Bedford et al. (2015) proposed a so-called minimum informative pair-copula using the vine structure to approximate any given multivariate copula to any required degree of approximation, and to show how this can be operationalized for use in practice. The only technical assumptions required are that the multivariate copula density under study is continuous and is non-zero. This approach, by contrast to the parametric methods mentioned above, allows a lot of flexibility in copula specification. This new approach involves the use of minimum information copulas that can be specified to any required degree of precision based on the data available and are then stacked together to produce the multivariate copula and density function.
Based on the above discussion, we extend the parametric vine copula model (Gräler et al., 2013) in modeling flood characterizations with the minimum information pair-copula model. This model shows greater flexibility and produces better approximation of the joint probability density and corresponding measures have better capability for effective hazard assessments. We also present an approximation method at which any multivariate density can be approximated to any degree of desired precision using minimum information pair-copula model and practically be applied for assessing probabilistic flood hazard. We finally illustrate the methods described above by modeling the flood event properties of the Himalayan River Beas. Himalayan rivers in north India are highly influenced by both the monsoon and intra-annual release of stored water in the snow cover and glacier ice of the Himalayas and its nearby foothills. The response of Himalayan rivers to precipitation and temperature is highly variable as it depends on the extent of snow cover and volume of snowpack in their catchment, snow melt behavior, and unpredictability in monsoon patterns in the region. Most Himalayan river basins have witnessed serious economic, agricultural and social impacts due to extreme hydrological events, such as floods, storms and droughts. To the best of our knowledge we could not find a proper study focusing on flood frequency analysis and flood hazard assessment. One of the main challenges of modeling flood characterizations of Himalayan rivers is the data scarcity. This problem is another reason of choosing the minimum information PCC for modeling the flood variables in the presence of limited data. More precisely, in this study we apply three methods to analyze the flood event data: common multivariate copulas; parametric PCC; and the minimum information paircopula model and compare their performances in modeling flood characterizations.
The remainder of the paper is organized as follows. A brief description of copulas and their mathematical formulation is given in Section 2. We then introduce the parametric pair-copula model and how it can be fitted to the data. In Section 3, we introduce the non-informative pair-copula model and show that how the minimum information copulas can be used in approximating a multivariate density distribution to any required degree of precision based on the observed data. In Section 4, we present and analysis the results associated of fitting the copula models discussed in this paper to the stream flow data of the Himalayan River Beas. In this section, we also compare these models using various statistics and graphical tools to show the benefit of the pair-copula models (particularly, minimum information copulas) in uncertainty modeling in Risk analysis. Section 5 is dedicated in using the selected models in flood risk management by computing various measures which are widely used in risk analysis of the flood data. We finally conclude our study in Section 6.

Multivariate dependence modeling using vine constructions
In many areas of applied science and particularly in modeling flood data and other hydrological data, it is necessary to model multiple uncertain quantities using an appropriate multivariate distribution. The Bayesian networks and multivariate copulas are widely used for this purpose. However, the Bayesian networks are more popular for general decision support settings, but their usage is limited to the multivariate normal and multinomial distributions for the continuous and discrete variables, respectively. In the recent years, the multivariate copulas have been attracted by the users in other disciplines, particularly for modeling financial data, risk and uncertainty analysis associated with the extreme events (including flood risk assessment), due to their flexibilities in dependency modeling of multiple data consists of both discrete and continuous data.
There is a growing literature on the use of the copulas to model dependencies of multiple uncertain quantities (Joe, 1997;Nelsen, 2006). In particular, these models have been widely used in multivariate analysis of hydrological data (Genest et al., , 2009Grimaldi and Serinaldi, 2006;Serinaldi and Grimaldi, 2007;Yan et al., 2007;Zhang and Singh, 2007;Song and Singh, 2010). A copula is a joint distribution on the unit square which enables us to uniquely determine a joint distribution of n random variables by specifying their marginal distributions and an appropriate copula function. As a more formal definition, a copula is any multivariate distribution, C, with uniformly distributed marginals Uð0; 1Þ on the unit square ½0; 1, where the corresponding joint distribution function F of ðx 1 ; . . . ; x n Þ can be written as Fðx 1 ; . . . ; x n Þ ¼ Cðu 1 ; . . . ; u n ; hÞ; where C : ½0; 1 n ! ½0; 1 is an appropriate copula distribution function, h denotes to the association parameters, u i ¼ F i ðx i Þ; i ¼ 1; . . . ; n and F 1 ; . . . ; F n are marginal distribution functions of X 1 ; . . . ; X n , respectively. This formula can be constructively used to define F in terms of given a copula function C and marginals F 1 ; . . . ; F n under reasonable conditions. For example, the 'Gaussian copula' as a widely used copula in many applications, can be obtained from the joint normal distribution and parameterized by the correlation matrix. The details of constructing the multivariate Archimedean copulas (e.g., Clayton, Gumbel and Frank) and multivariate canonical copula (t-student and Gaussian) can be found in Nelsen (2006) and Joe (1997).
In addition, the joint density function f ðx 1 ; . . . ; x n Þ, given that F i and C are differentiable, can be also presented as where f i ðx i Þ is the density function corresponding to F i ðx i Þ, and c ¼ @ n C=ð@F 1 . . . @F n Þ is called the copula density function f (Nelsen, 2006). Building/approximating a high-dimensional copula is generally considered as a difficult task. For instance, Venter et al. (2007) reported that most multivariate copula densities get increasingly difficult to approximate as the dimension increases. Joe (1997) (and later Kurowicka and Cooke, 2006) highlighted this point that however certain copula families, including the multivariate Gaussian, t-student, the exchangeable multivariate Archimedean copula or the nested Archimedean constructions, exhibit a huge improvement in modeling multivariate data, but they are still rather limited, computationally not tractable, and not rich enough to model all possible mutual dependencies among the variables.
In 2002, Bedford and Cooke introduce a probabilistic construction of multivariate distributions based on a flexible graphical model called vine which was also later called pair-copula construction (PCC) in Aas et al. (2009). This flexible structure allows for a free specification of (at least) nðn À 1Þ=2 bivariate copulas between n given variables. In other words, a vine on n variables is a nested set of trees, where the edges of the tree j are the nodes of the tree j þ 1 (for j ¼ 1; . . . ; n À 2), and each tree has the maximum number of edges. A vine in which two edges in tree j are joined by an edge in tree j þ 1 only if these edges share a common node, j ¼ 1; . . . ; n À 2, is called regular vine. The formal definition of vine and regular vine can also be found in Kurowicka and Cooke (2006).
The class of regular vines is generally quite broad and consists of many possible pair-copula decompositions. Among them, the canonical vine and the D-vine are two special regular vines where each one gives a specific way of decomposing a multivariate density function. The D-vine is more widely used in practice. In a D-vine with n variables, no node in any tree is connected to more than two edges. The canonical vine is more useful when a particular variable is known to be a key variable that controls interactions in the data. It is then recommended to place this variable at the root of this vine. As a result, each tree T j in canonical vine has a unique node that is connected to n À j edges. We briefly introduce these two vines and how they can be used to model multivariate data by a simple example (further details can be found in Aas et al., 2009;Kurowicka and Cooke, 2006).
As an example, a D-vine structure first will be used to model multivariate density function associated with the following random variables ðX 1 ; X 2 ; X 3 Þ with the given marginal densities f 1 ; f 2 ; f 3 , respectively. A D-vine structure is normally selected based on the association measures between variables (see also Section 2.1). The one that is chosen for these variables is shown in Fig. 1 with the following joint density decomposition where c ij ðFðx i Þ; Fðx j ÞÞ denote the bivariate copula between x i and x j , and c ikjj ðFðx i jx j Þ; Fðx k jx j ÞÞ denote the bivariate copula fitted to the conditional distributions Fðx i jx j Þ and Fðx k jx j Þ (see Appendix A, for the details of this decomposition). Similarly, a multivariate density decomposition can be derived based on a canonical vine structure. The canonical vine structure is very dependent on the root node. In other words, in a canonical vine, each tree T j has a unique node that is connected to n À j edges. The D-vine structure shown in Fig. 1 can be converted to a canonical vine with the same density factorisation given in (3) if x 2 is considered as the root node (see also Aas et al., 2009).
The above decomposition of the joint density gives us a constructive approach to build a multivariate distribution given a vine structure: If we make choices of marginal densities and copulae then the above formula will give us a multivariate density. In other words, we associate a vine distribution to a vine by specifying a copula to each edge of the first tree T 1 (as shown in Fig. 1) and a family of conditional copulas for the conditional variables given the conditioning variables in the second tree T 2 .
On of the main objectives of this paper is to address the advantages of the vine or PCC models over the standard multivariate copula in modeling hydrology data, and particularly flood risk management. One of the advantages of the vine models is that various bivariate copulas can be used in fitting a copula to any pair of variables instead of fitting a fixed multivariate copula to all variables. As a specific example, the multivariate t-copula is widely being used in finance and hydrology, where multivariate data exhibit complex patterns of dependencies in the tails. The issue with the multivariate t-copula is that only a single degree of freedom parameter which drives the tail dependence of all pairs of variables is used. This problem can be dealt with using the vine model. Aas et al. (2009) demonstrate the superiority of a D-vine copula with bivariate t-copulas over a single multivariate t-copula approach. We adopt, extend and address superiority and flexibility of this approach to model the flood event data over the current alternative. In the next section, we address the methods that can be used to fit a parametric pair-copula model to the data.

Estimation methods for pair-copula models
Fitting a vine or PCC model to the data involves a number of steps. The first step is to identify an appropriate vine tree structure. Such a structure may either be given by the data itself or has to be selected manually or through expert knowledge. This step is quite similar to structure learning in graphical models 1 with this difference that a vine structure can be easily determined in terms of the association measures or appropriate graphical tools. For a given vine structure, adequate copulas have to be selected, and in the next step estimated. This step is shared with fitting a single multivariate parametric copula to the data.
There are different tools that can be used to determine an appropriate vine tree structure: scatter, chi (X ), Kendall and k plots, the correlation coefficients, and the independent tests. It should be noticed that some of these tools can also be used to select a suitable parametric bivariate copula family. The definitions and further details of these tools can be found in Schirmacher and Schirmacher (2008).
In the next step of analyzing the data, we need to select suitable bivariate copula models describing the dependencies between the variables illustrated in terms of the selected vine structure. There are various graphical and analytical methods available to select an appropriate copula for the underlying data. Among the graphical methods, Kendall's plot (K-plot) and the chi-plot are more appropriate to select the best bivariate copula models directly (see  and references therein for more details, and Section 4 for an illustration). The k-function (proposed by Genest and Rivest, 1993) is another useful analytical measure to select a suitable bivariate copula. This function provides a characteristic for each copula family and is defined as follows kðt; hÞ :¼ t À Kðt; hÞ; ð4Þ where Kðt; hÞ ¼ PrðCðU; V; hÞ 6 tÞ is Kendall's distribution function for a copula C with parameters h; t 2 ½0; 1, and U and V are Uniform distributions over the interval ½0; 1 and jointly distributed according to C.
There are also a range of test-based analytical tools that are able to evaluate the dependency strength between the variables of interests and select the most appropriate copula. These are: independence test; and goodness-of-fit (GOF) tests. The Cramer-von Misses and Kolmogorov-Smirnov test, and the Vuong and Clarke tests are among the most well-known GOF tests (see also Vuong, 1989;Clarke, 2007).
Once the appropriate pair-copula families were selected, the estimation of the parameters via maximum likelihood can be derived. A brief explanation of the parameters estimation procedure for the vine structure shown in Fig. 1 is presented below (further details can be found in Aas et al., 2009). Suppose, there is a 3-dimensional distribution function (as presented above) along with N observations. We denote x j as the vector of observations for the j-th point, with j ¼ 1; 2; . . . ; N. The parameterised likelihood function for the D-vine decomposition given in (3) is as follows By taking logarithms and removing each of the marginal distribution term from the log-likelihood term, the log-likelihood function is given by We can then use numerical optimization techniques to maximize the log-likelihood over all parameters simultaneously. Aas et al. (2009) presented a detailed algorithm for likelihood evaluation and estimating the parameters of a D-vine construction model. We briefly explain this algorithm adopted to the D-vine model shown in Fig. 1.
The parameters of the copulas in the first tree (i.e., ðh 12 ; h 23 Þ) can be estimated from the original data, simply by fitting the bivariate copulas to the observations. For the copula parameters identified in the second tree, one first has to transform the data using the Fig. 1. A D-vine structure with 3 variables, 2 trees and 4 edges, where each edge may be associated with a pair-copula. 1 However, unlike the graphical models, the vine models can benefit from using different models of conditional dependence as building blocks in building the multivariate distribution.
conditional distribution function also known as h-function which can be derived from the corresponding bivariate copula using the following formula hðx; y; hÞ ¼ FðxjyÞ ¼ @C x;y ðx; y; hÞ @y The h-function is needed to derive the appropriate conditional distribution function using estimated parameters to determine realizations needed in the second tree. For instance, in order to estimate the parameters of c 13j2 , we first need to transform the observations . . . ; Ng to u 1j2;j :¼ hðu 1j ju 2j ;ĥ 12 Þ and u 3j2;j :¼ hðu 3j ju 2j ;ĥ 23 Þ, whereĥ 12 andĥ 23 are the estimated parameters in the first tree. We can now estimate h 13j2 based on fu 1j2;j ; u 2j3;j ; j ¼ 1; . . . ; Ng (see Aas et al. (2009) for further details).
In this section, we present a constructive approach to build a multivariate distribution given a vine structure and selected appropriate marginal densities and bivariate copulas. That means, the vine models can be used to model general multivariate densities. In practice, the copulas must be chosen from a convenient class, and this class should ideally be one that allows us to estimate any copula to an arbitrary degree. By having this class of copulas, we can approximate any multivariate distribution using any vine structure. This issue will be investigated in the following.

Building minimum information pair-copula model
As demonstrated in the previous section, the vine models are flexible enough for modeling high-dimensional multivariate data by cascading different fitted parametric bivariate copulas together to construct the corresponding joint density function. Bedford et al. (2015) show that building higher-dimensional copulas by fitting a parametric copula family to the data is generally a difficult problem, and choosing the parametric family for this purpose is even more difficult. They argue that if the copulae are restricted to be chosen from a particular parametric class (Archimedean, t-student, Gaussian, etc.), their potential flexibility will not be acknowledged. To overcome this difficulty, a new non-parametric vine model is introduced that can be easily implemented in practice and is able to approximate the underlying multivariate copula density to any arbitrary degree of precision.
The presented method in this section is similarly constructive and involves the use of minimum information technique to approximate the copula density as precisely as possible. This approximation method is very flexible and allows the use of a fixed finite dimensional family of copulas to be used in a vine construction, with the promise of a uniform level of approximation (Bedford et al., 2015).
We first need to introduce the minimum information copula, and then briefly explain how this copula can be approximated based on the observed data (or experts' stated information). Assuming that f 1 and f 2 are bivariate densities, the relative information of f 1 with respect to f 2 is then defined (Bedford and Meeuwissen, 1997) as This information is a measure of the degree of deviation of f 1 from f 2 and is minimized 0 when f 1 ¼ f 2 . It is trivial to show that relative information of f 1 with respect to f 2 is the same as that between the copula for f 1 with respect to f 2 . Therefore, it can be used to scale the strength of dependency in a copula in the sense that if the marginal distributions associated with f 1 and f 2 are similar, then Iðf 1 jf 2 Þ will be equal to the information measure derived in terms of the copula of f 1 relative to the independent copula.
A natural way to build a minimum information copula or specifying dependency constraints is through the use of moments. Follow Bedford et al. (2015), we consider moment constraints in which real-valued functions / 1 ; . . . ; / k are required to take expected values a 1 ; . . . ; a k , respectively. A minimum information copula can be then fitted to satisfy these constraints. The fitted copula has minimum information, with respect to the uniform copula cðu; vÞ ¼ uv, among the class of all copulas satisfying those constraints. Before presenting a general computational framework for constructing a minimum information copula satisfying the constraints, we explain the idea behind this methodology used in this paper. Suppose we have uniform variables u; v and the copula density we wish to find is cðu; vÞ. Further suppose that we wish to find a copula which, for some functions of uniform variables / 1 ðu; vÞ; . . . ; / k ðu; vÞ which are assumed to be continuous on ½0; 1 2 , satisfies the constraints E½/ i ðu; vÞ ¼ a i , for some values a i .
If we make the assumption that a copula satisfying the constraints exists then this problem is, in general, underdetermined. To select a unique copula distribution we wish to find the copula with minimum information with respect to the uniform copula satisfying these expectations. The relative information of cðu; vÞ with respect to the uniform copula is given by It is trivial to show that if cðu; vÞ needs to be a copula density, the marginal distributions for u and v must be uniforms which results in additional constraints: In order to find a copula density function satisfying the constraints introduced above, we need to solve the continuous optimization problem. However, to do so, we shall first consider the associated measurable optimization problem. We can then use this to give a solution in the continuous case. Thus, the measurable optimization problem we wish to solve is minimize We shall determine the unique solution to this measurable optimization problem. The solution of this optimization problem is called minimum information bivariate copula (Bedford et al., 2015) which can lead us to the minimum information pair-copula construction model. It is trivial to show that if a minimal information copula satisfied each of the constraints (based on moments, rank correlation, etc.), then the approximated multivariate density will also be minimally informative given those constraints (see also Bedford et al., 2015). The representation given in (6) with the kernel given (7) forms a minimum information copula satisfying the constraints, In other words, the copula given in (6) is a unique solution of the optimization problem introduced in (5).
There is a non-linear relationship between the set of ðk 1 ; . . . ; k k Þ and ða 1 ; . . . ; a k Þ. Bedford et al. (2015) give a detailed discussion about how this relationship can be determined. They also present a discrete version of the optimization problem given in (5) in terms of matrices that will be briefly explained below.
Suppose that both ðu; vÞ are discretized into n points, as and 'diag' stands for a diagonal matrix. We define the matrix, D 1 AD 2 with the uniform marginals as follows The idea behind the D 1 AD 2 algorithm is very simple, which starts with arbitrary positive initial matrices for D 1 and D 2 , and the new vectors will then be successively defined by iterating the following maps It can be shown that this iteration scheme converges geometrically to the requested vectors (Bedford et al. (2015) and references therein). Note that to compare different discretizations (for different n) we should multiply each cell weight d i ð1Þd j ð2Þa ij by n 2 as this quantity approximates the continuous copula density with respect to the uniform distributions.
The mapping from the set of vectors of k's onto the set of vectors of resulting expectations of functions ð/ 1 ; . . . ; / k Þ has to be found numerically. Bedford et al. (2015) propose an optimization procedure to determine the k i 's and corresponding copula for the given expectations a i , where the expectations have been calculated using the discrete copula density D 1 AD 2 . Hence, to determine k i 's whilst satisfying the constraints, the following set of equations has to be numerically solved The left hand side of the above equations are just functions of k's and, their roots can be found with optimization algorithms. Therefore, we must find the simultaneous roots of these functions and so minimize L sum ðk 1 ; . . . ; k k Þ ¼ X k l¼1 L 2 l ðk 1 ; . . . ; k k Þ: One of the possible solvers for this task would be FSOLVE -MATLAB's optimization routine. An alternative method is to use another MATLAB's optimization procedure called FMINSEARCH, which implements the Nelder-Mead simplex method (see Lagarias et al., 1998). Specifying the basis functions, ð/ 1 ; . . . ; / k Þ, would greatly influence the copula density approximation describe above. A twodimensional ordinary polynomial series is normally used to approximate the bivariate copula density. This approximation can be improved by using the orthonormal polynomial series or Legender multiwavelate which is studied in details in Daneshkhah et al. (2015) and will be also investigated in this paper to improve the minimum information pair-copula model fitted to the flood data.

Study area and datasets
A study was performed with daily discharge data from the Beas River which originates in the Himalayas (Fig. 2) and flows for approximately 470 km before joining the Sutlej River. The Beas River, on which major two dams (Pong dam and Pandoh dam) are located, is one of the five major rivers of the Indus basin, India. The downstream Pong reservoir drains a catchment area of 12,561 km 2 , of which the permanent snow cover is 780 km 2 (Jain et al., 2007). The active storage capacity of the Pong reservoir is 7051 Mm 3 . Monsoon rainfall between July and September is a major source of water inflow into the reservoir in addition to snow and glacier melt. The dam acts as a store for flood flows, and reservoir regulation prevents the inundation of downstream areas from flooding during the monsoon season. Apart from its use for generating hydropower, the Pong reservoir meets irrigation water demands of 8896 Mm 3 /year, which is spread relatively uniformly throughout the year. The Pandoh dam is a diversion dam which diverts nearly 4716 Mm 3 of Beas waters into the Sutlej River. Daily reservoir inflows to Pong reservoir for January 1998 to December 2010 (12 years) were used in this study. The Peak over threshold method is suitable for the Beas, as flash floods are common in the Himalayan region. Criterions for selection of independent POT data can be found in Bayliss (1999) and Bacova-Mitkova and Onderka (2010), but the threshold value is typically chosen so that the POT data series contains an average of around 4 values per year. For this study, the data series of peak discharges of 500 m 3 / s and above with corresponding hydrograph volumes and durations were used for the analyses. The graphical method was used for independent event separation to obtain hydrograph volumes and durations (Fig. 3). Table 1 provides descriptive statistics of the flood event variables (flood peak discharge, P; hydrograph volume, V; and hydrograph duration, D). The kurtosis coefficients are quite high, and their skewness coefficients are positive indicating that these flood variables can be best modeled by non-symmetric heavy tailed distributions.
It should be noted that the number of available flood episodes extracted from the database are 109 data events due to data scarcity. Evidently, from a statistical point of view, the size of data could be small for investigating a multivariate problem. Unfortunately, this is a typical situation when multivariate copulas are used for modeling extreme data (e.g., Gaál et al., 2015;Favre et al., 2004;etc.). However, here the target is not to provide an ultimate extreme flood model, and no practical project of hydrological works (as for example considered in Gräler et al., 2013) is undertaken. Instead, one of our main motivations of this study is to demonstrate how the methodology proposed in this paper can be used in practice and exhibit its potential flexibility and efficiency over alternative multivariate copulas in modeling the flood data, particularly when the data is limited. In other words, this is rather a methodological paper.
Indeed, another motivation of the methodology developed in our paper is modeling joint uncertainties in a probabilistic way and particularly when the data is limited or no data is available.
For the latter case, the presented methodology can be viewed as an expert elicitation approach where the expert is asked to specify the expected values of some functions which is beyond the scope of this paper (see Bedford et al. (2015) for further details). However, this methodology can be more effective and efficient when it is used for approximating uncertainty modeling of the limited data which is very common in extreme value theory and risk analysis.
The size of observed data could be considered as a source of potential error when the minimum information copula is applied for modeling a high-dimensional problem. As the dimensionality (or number of uncertain variables) increases, the number of trees representing the structure of pair-copula model will also increase. The conditional distributions/expectations at lower levels of a deeper pair-copula model must then be estimated based on fewer data points which can be then less accurate and noisier (see also Gräler (2014) reported a sort of similar problem in modeling extreme data using spatial vine copula). This problem could be resolved by ignoring some unnecessary conditional dependencies (the socalled simplifying assumption) in the sense discussed in Acar et al. (2012) and Stöeber et al. (2013). An alternative method is   to approximate fully conditional pair-copula models using Gaussian processes (Lopez-Paz et al., 2013). This simplifying paircopula model is more appropriate for high-dimensional problems and is beyond the scope of this study. However, based on the demonstrated results, the approximations based on the minimum information pair-copula models for 3 variables are quite accurate (and can be made more accurate by adding more base functions and making grid discretization grid finer) and its performance in comparison with other methods is much better as discussed in Section 4.3. Before modeling the dependencies between flood variables using the multivariate copula models, it is necessary to check whether the individual time series associated with each flood variable is stationary and exhibits no autocorrelation. Ljung and Box (1978) develop a statistical test, known as the Ljung-Box test, to check whether any of a group of autocorrelations of a time series are different from zero. In this test, instead of testing randomness at each distinct lag, the ''overall" randomness based on a number of lags will be tested. The null and alternative hypothesis of this test are defined as: The statistics to test these hypotheses which is known as Q-statistics, is defined as: where n is the sample size,q k is the sample autocorrelation at lag k, and h is the number of lags being tested. Under the null hypothesis, this statistics follows a chi-square distribution with h degrees of freedom.
The Q-statistics and their corresponding p-values for each time series are shown in Table 2. Based on the computed p-values given in this table, the null hypothesis that there is no autocorrelation cannot be rejected at the 5% significance level. In other words, there is no serial correlation in the time series associated with the flood variables.

Trivariate copula models
In this section, we model the dependencies between the flood event variables by fitting a trivariate copula model. A wide range of multivariate copulas suitable to model the flood data including the well-known Archimedean and elliptical copulas introduced above have been evaluated. The marginal distribution of each variable is first selected based on the computed Akaike information criterion (AIC) given in Table 3 along with the estimated parameters (using maximum likelihood method). Using the results presented in this table, the Inverse Gaussian distribution is best fitted to the peak flow and flood volume, while the best fitted distribution to the flood duration is Log-Normal. Fig. 4 shows the cumulative distribution functions (cdfs), pdfs and q-q plots of the selected distributions to the data which supports our choices of distributions reported in Table 3.
We then select the best fitted trivariate copula model using the common goodness-of-fit measures including log-likelihood and AIC values which are given in Table 4. Based on the results given in this table, it can be concluded that the trivariate t-student outperforms the other proposed copula models (including, Frank, Gumbel, Clayton, etc.). The parameters' estimations of the selected copula (the pairwise correlation measures and the degree of freedom) are given as followŝ q VD ¼ 0:6534;q DP ¼ 0:2668;q VP ¼ 0:7839; andm ¼ 10:5168: These results suggest that an elliptical copula is more suitable to model dependencies of the flood variables. The n-dimensional t-Student copula has been widely used for modeling of the hydrological (Ganguli and Reddy, 2013;Sraj et al., 2014). As mentioned above (and demonstrated in Aas et al., 2009), the main issue with the multivariate t-copula is that only a single degree of freedom parameter which drives the tail dependence of all pairs of variables is used. Therefore, if the tail dependencies of different pairs of the flood event variables are different, the dependence structure can be better described by the pair-copula models which will be discussed in the next section.

Modeling flood data using PCC models
In this section, we study the flood data using the PCC models and compare the fitted pair-copula model with the trivariate copula model selected in the previous section to verify the claim reported in the literature that the PCC model is generally superior to that of other multivariate copula models ( In order to fit a PCC model to the flood data, we use the methods described in Section 2.1 to first identify an appropriate vine tree structure, then select the most appropriate copula families for the pair-copulas and estimate their parameters. Finally, the derived model will be evaluated and compared to the alternatives. The first impression of the dependency structure of the flood event data is given in Fig. 5. The upper diagonal part of this figure show scatter plots, and the lower diagonal part shows the contour plots. There is evidently stronger dependence between (V; P) than between other pairs of variables ðD; PÞ and ðD; VÞ. The correlation coefficients and p-values reported in Table 5 support the similar conclusions taken from the pairs plot. The strongest dependencies are between ðP; VÞ and ðV; DÞ. That means, V should be placed between the other two variables as illustrated in Fig. 6 to model the flood event data. That means a D-vine copula model will be used for modeling the flood data. Aas et al. (2009) also reported that D-vines are indeed more flexible than canonical vines. This is mainly because for the canonical vines we should specify the relationships between one specific pilot variable and the others, while in the D-vine structure we can select more freely which pairs to model as demonstrated above (see also Czado et al. (2013) for a detailed discussion of regular vine model class selection). Fig. 7 shows the chi-plots (first row) and Kendall's plots (second row) of the variables ðD; VÞ (first column), ðV; PÞ (second column) and ðD; PÞ (third column) which indicate strong positive dependencies between these pairs of variables. Evidence of symmetric tail   dependence between the flood variables is also visible in these plots. Based on the properties of the different plausible copula candidates and their chi and Kendall's plots, we can conclude that t-Student, Gaussian or Frank copulas are most appropriate for these pairs of variables. In addition to these plots, by comparing empirical and theoretical k-functions (given in Eq. (4)), an indication can be given as to which copula family is more suitable to describe the observed dependencies. On the left panel of Fig. 8, we present the empirical k-function (black line) and theoretical k-function of a Gaussian copula fitted to the pair of variables ðP; VÞ with the estimated parameters (gray line) as well as independence and comonotonicity limits (dashed lines). The right panel of this figure shows the theoretical k-function of a t-student copula fitted to ðD; VÞ. The closeness of the theoretical k-functions of the suggested copulas with the empirical k-functions support our choices yielded by using the chi and Kendall's plots. An R package called CDvine has been developed which provides the functions and tools used above for statistical inference of canonical vine and D-vine copulas (see Brechmann and Schepsmeier, 2013). The scoring test based on the Vuong and Clarke tests described above strongly tends to select a Gaussian copula for the pair variables, ðV; PÞ with the estimated parameters,q VP ¼ 0:7971014. The same method selects a bivariate t-student copula between ðV; DÞ with the following estimated parameterŝ q ðD;VÞ ¼ 0:6386490 andm ðD;VÞ ¼ 7:572639: The similar copula models will be chosen if the AIC, log-likelihood, Cramer-von Misses or Kolmogorov-Smirnov test statistics are applied as the goodness-of-fit measures. It should be noted that the selected copulas are chosen from a wide range of alternative copulas including Frank, Gumbel, Frank, Joe, etc., and the reported copula represents the best fit among others.
In the next step, an appropriate copula between ðPjV; DjVÞ will be selected. We select this copula using the goodness-of-fit methods. Based on the computed goodness-of-fit measures, we select a t-copula with the following estimated parameters as the best fitted copula to ðP; DÞ conditional on V: q ðPjV;DjVÞ ¼ À0:5185675 andm ðPjV;DjVÞ ¼ 5:830839: The AIC for this PCC based on the fitted bivariate copulas presented above is À192 which is less than the best fitted trivariate   copula (i.e., t-student copula with AIC = À185.27). That means the PCC model is a more appropriate to model the flood data. Unlike the trivariate t-copula for all flood variables, this model enable us to use different copula models for each pair of the flood variables. Furthermore, the PCC model represents generally a more flexible and intuitive way of extending bivariate copulae to higher dimensions. Several studies have reported considerable improvement in modeling multivariate data which exhibit complex patterns of dependence in the tails using PCC model than the standard multivariate copula (particularly, multivariate t-copula), including Aas et al. (2009), Bauer et al. (2012, and Kurowicka and Joe (2011). Both of the models compared against each other suffer from this drawback that the chosen copulas are restricted to a particular parametric class (Gaussian, multivariate t, etc.) so that the potential flexibility of the copula approach is not realized in practice. The minimum information pair-copula model applied to analysis the flood even data, by contrast, allows a lot of flexibility in copula specification and results in a better fit.

Modeling flood data using minimum information pair-copula
In this section, we fit a joint probability distribution to the flood data using minimum information pair-copula described in Section 3. The same pair-copula structure illustrated in Fig. 6 will be used here. It is more convenient to present the minimum information copula in terms of functions of the so-called copula variables, denoted by X ¼ F 1 ðDÞ; Y ¼ F 2 ðVÞ; Z ¼ F 3 ðPÞ, where F i ð:Þ denote to the marginal CDF of the flood variables derived above. These functions to construct a minimum information between ðD; VÞ are given by / i ðX; YÞ ¼ / 0 i ðF À1 1 ðDÞ; F À1 2 ðVÞÞ; i ¼ 1; . . . ; k; and these should clearly have the same specified expectation, that is, E½/ 0 i ðD; VÞ ¼ E½/ i ðX; YÞ. We begin constructing the minimum information copulas between each set of two adjacent variables in the first tree, that is, CðD; VÞ and CðV; PÞ. In order to implement this, one needs to decide which basis functions should be chosen for each pair of these variables. We show only the detailed procedure of estimating the copula between ðD; VÞ. As mentioned above, a two-dimensional ordinary polynomial series can be used to approximate log-density of a bivariate copula function by truncating the series at an appropriate point until they were satisfied with the approximation. As Daneshkhah et al. (2015) show that this approximation can be improved by using the orthonormal polynomial series, we also use the orthonormal polynomial basis functions to approximate the bivariate copula function of interest. We therefore briefly define the orthonormal polynomial functions in ½0; 1 and then give a procedure to select an appropriate series of these basis functions.
Two polynomial functions h 1 and h 2 are called orthonormal in ½0; 1, if We follow Gram-Schmidt procedure to construct the orthonormal polynomial (OP) basis functions. Using this method, OP series can be defined as The two-dimensional OP basis functions are then given by In order to choose the most suitable basis functions to approximate the density of interest, we use an optimal method which is similar to the stepwise regression procedure (Bedford et al., 2015). In this method, at each stage, we evaluate the log-likelihood changes after adding each additional basis function. We then choose the basis with the largest increase in the log-likelihood. By applying this method on the proposed OP basis functions, we select the following four bases / 1 ðDÞ/ 1 ðVÞ; / 2 ðDÞ/ 2 ðVÞ; / 4 ðDÞ/ 5 ðVÞ; / 5 ðDÞ/ 1 ðVÞ It should be noted however there is no longer a jump in the log-likelihood when adding the fifth basis function, but the approximation can be slightly improved by adding more basis which will not be considered here. We use this step-wise technique to choose all of the remaining basis functions for other pairs in this case study. The calculated expected values of these basis functions based on the observed data are given by The minimum information copula CðD; VÞ with respect to the uniform distribution given the constraints above can be now constructed. In order to do this, we also need to decide on the number of discretization points or grid size. It is shown that a larger grid size will provide a better approximation to the log-density of copula but would increase the computational time (Bedford et al., 2015). They also illustrate that the more iterations of the D 1 AD 2 would result in a more accurate density approximation. In order to make a balance between the level of accuracy and the computational time, we choose a grid size of 200 Â 200 and fixed the approximation error at 1 Â 10 À12 .
The Lagrange coefficients of this density approximation, satisfied in Eq. (8), are given by The approximated minimum information copula, CðD; VÞ is shown in the left plot of Fig. 9.
The copula density between ðV; PÞ can be similarly approximated. First, we select the most suitable OP basis functions using the stepwise like method, as described above. These functions are as follows / 1 ðPÞ/ 1 ðVÞ; / 2 ðPÞ/ 2 ðVÞ; / 1 ðPÞ/ 4 ðVÞ; / 4 ðPÞ/ 3 ðVÞ The corresponding constraints as the mean of the above functions are calculated using the observed data as Minimally informative copula given moment constraints between volume and peak U P Fig. 9. The minimally informative copula given moment constraints between the variables: Left plot, minimum information copula between ðD; VÞ; Right plot, minimum information copula between ðV; PÞ.
By fitting the minimum information copula to these data and constraints, the following Lagrange multipliers are obtained k 1 ¼ 1:7189; k 2 ¼ 0:26523; k 3 ¼ 0:45487; k 4 ¼ À0:1945 The corresponding approximated minimum information copula, CðV; PÞ is shown in the right plot of Fig. 9. The conditional copula, CðDjV; PjVÞ, located in the second tree of the PCC illustrated in Fig. 6 can be similarly approximated. In order to calculate the minimum information copula between DjV and PjV, we first split the support of V into some arbitrary subintervals or bins (4 bins in this example) and then approximate the corresponding copula on each bin using the minimum information copula. The basis functions will be selected in the same way discussed above. Table 6 shows the selected basis functions, their corresponding expected values, Lagrange coefficients and loglikelihoods for each bin.
We now compare the methods used in this paper to model the dependencies between the flood variables based on the computed AIC of the fitted copula illustrated in Table 7. The AIC of the overall minimum information pair-copula model is considerably less than the AICs of the trivariate copula and less than the parametric D-vine copula model. That means the minimum information PCC model fits the observed data better than other models, and all dependencies are better captured using this method (see Table 8).
In addition to the correlation measures reported in Table 8, we can validate the proposed approximation method based on the minimum information copula based on the simulations drawn from the fitted models. In the next section, we first introduce a simulation method which will be used to validate our approximation and then to compare our approximation versus other alternative methods.
It should be noted that the best model should be selected by trading-off between the goodness of fit of the candidate model and the complexity of the model (e.g., AIC). The proposed approximation method is a general and can be applied to approximate any multivariate distribution with any degree of complexity to any required degree of approximation. Indeed the flexibility of vines gives us the potential to capture any fine-grain structure within a multivariate distribution, and unlike the Bayesian networks, the PCC can be modeled in terms of the conditional dependence aspects which could result in much simpler model structure search. In addition, unlike Multivariate Gaussian copulas, the proposed method in this paper allows the explicit modeling of nonconstant conditional dependence. However, Serinaldi (2013) extends this widespread belief that the increasingly refined mathematical structures of probability functions increase the accuracy and credibility of the fitted models (particularly, in extrapolating upper tails of the fitted models), but we have found some mixed conclusions of simplifying vine models and surrounding assumptions. It is evident that the deeper a bivariate copula is in the vine hierarchy, more variables will be conditioned on. Thus, if the aforementioned conditional dependencies are neglected, the paircopula constructions models are a direct method to build a flexible multivariate models using standard parametric bivariate copulas as building blocks. Acar et al. (2012) argue that however the ignoring conditional dependencies (so-called simplifying assumption) can lead to reasonably precise approximations of the underlying copula (as claimed by Haff et al., 2013), but this can generally be misleading, and develop an approach to condition parametric bivariate copulas on a single scalar variable. Stöeber et al. (2013) repeated this concern after studying several examples that the simplifying assumption for the pair-copula construction models is often too restrictive, and also the assumption of dealing with absolutely continuous pair-copula construction model is sometimes too strong. The latter assumption is used to make the paircopula models tractable for inference and model selection (a pair-copula construction model is called an absolutely continuous if all bivariate copula families occurring in the construction have densities with a parameter vector). Lopez-Paz et al. (2013) also reported that the simplifying assumption can lead to a totally oversimplified estimates in practice. They then extended the work of Acar et al. by developing a method for estimation of fully conditional vines using Gaussian Process. This model shows promising results with better predictive performance than the method that ignores conditional dependencies.

Validation by simulation
We now discuss the simulation of data taken from the PCC model. We follow the simulation method proposed by Kurowicka and Cooke (2006) based on sampling from the cumulative distributions. Their sampling strategy is as follows: sample three independent variables distributed uniformly on intervals ½0; 1, denoted by U 1 ; U 2 ; U 3 , and calculate values of the original variables using the following equations: where x i is realization values of X i , and u i is realization value of U i . More details and pseudo code can be found in Daneshkhah et al. (2015) (see also Cooke et al., 2015). Table 8 shows the rank correlations between the pairs of the flood variables calculated from the original observed data, and   Table 7 The results of fitting different copula functions to the flood data.
Type of copula AIC t-student À185.27 Gaussian À179.4 Parametric pair-copula À192 Minimum information Copula À204.8 based on the simulated data of size 1000 taken from the fitted trivariate t-student copula; the parametric PCC model; and the minimum information pair-copula. Both methods (PCC model and minimum information copula) reproduce the overall correlation structure fairly well. We further investigate and compare the tail dependence of the minimum information copula with the other copulas proposed above based on simulation study in the following section.

Probabilistic analysis of flood variables
The frequency analysis of multivariate extreme events is very useful for understanding critical hydrologic behavior of flood events at a river basin scale through consideration of multiple interacting flood characteristics. The understanding gained from such analyses would be very helpful in measuring nonstructural safety, and in developing flood hazard mitigation strategies, as the impacts of extreme flood events with similar peak flows can differ greatly depending on event duration and hence volume (i.e. long duration-high volume floods compared to short duration-moderate volume flash floods).
The objective of frequency analysis of hydrologic data is then to relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions (Chow et al., 1988;Ganguli and Reddy, 2013). For multivariate case, in which the flood variables, D; V; P exceeds their respective thresholds ðD > d Ã ; V > v Ã ; P > p Ã Þ, the joint return period is computed using inclusive probability (''OR" and ''AND" cases) of all three events, known as primary return periods (Salvadori, 2004). The joint primary return period in ''OR" case denoted by T OR ðD;V;PÞ (for annual flood analysis) is defined as The joint primary return period in ''AND" case denoted byT AND X 1 X 2 X 3 (for annual flood analysis) is defined as, where Cðu 1 ; u 2 Þ; Cðu 2 ; u 3 Þ, and Cðu 1 ; u 3 Þ are bivariate copulas between the cdfs of the flood variables. Table 9 exhibits return period obtained using univariate marginal distributions of peak flow, volume, and duration; and joint return periods for ''AND" and ''OR" cases for the different trivariate distributions presented in this paper. In this table, T AND TT ; T AND PC and T AND MI present the joint return periods for ''AND" case approximated by trivariate t-copula, PCC model and the minimum information pair-copula model, respectively. The differences between the joint return periods for ''AND" (and ''OR") case are due to the approximation methods of trivariate and bivariate copulas required in the definitions of T AND D;V;P and T OR D;V;P . The joint return period in ''AND" case, using any approximation method, is greater than the joint return period in ''OR" case. Hence, it also infers that the occurrence of trivariate flood characteristics simultaneously is less frequent in ''AND" case and more frequent in ''OR" case. Fig. 10 shows the joint bivariate return periods for the OR and AND cases for the pairs of flood variables. Ganguli and Reddy (2013) reported that the joint bivariate return period in ''AND" case is greater than the joint bivariate return period in ''OR" case. A similar finding is concluded here.
The study shows the joint return period, T AND ðD; V; PÞ, in the case of minimum information copula is larger than other copulas and the values are followed by parametric pair copula and trivariate t-copula indicating that other two methods are underestimating the flood hazard under high value combinations. In Table 9, the maximum return periods for T AND ðD; V; PÞ and T OR ðD; V; PÞ, based on individual flood event characteristics with return periods of 100 years ranged from 1101 years and 94 years represent the range of possible Beas river flood hazards in the case of minimum information copula.

Analyzing tail dependence: a simulation study
Based on the results presented above and demonstrated in Bedford et al. (2015) and Daneshkhah et al. (2015), the paircopula model constructed based on the minimum information copulas can model any dependence structure. In many fields, including hydrology, extreme weather forecast, financial risk prediction that the fitted copula would lie within non-Gaussian multivariate families (Joe, 1997), tail dependence properties and behavior are more important. We therefore investigate the tail behavior of the minimum information copula for the data simulated from the fitted copulas introduced above.
Tail dependence in a bivariate distribution can be represented by the probability that the first variable exceeds its q-quantile, Table 9 Comparison of return periods for flood characteristics calculated based on trivariate t-copula (denoted by TTT ), pair-copula model (TPC ) and minimum information pair-copula model (TMI).
Ã ; p Ã Þ þ F V;P ðv Ã ; p Ã Þ À F D;V;P ðd Ã ; v Ã ; p Ã Þ ¼ 1 1 À F D ðd Ã Þ À F V ðv Ã Þ À F P ðp Ã Þ þ Cðu 1 ; u 2 Þ þ Cðu 1 ; u 3 Þ þ Cðu 2 ; u 3 Þ À Cðu 1 ; u 2 ; u 3 Þ given that the other exceeds its own q-quantile. In order to study the tail behavior of the fitted minimum information copulas, we first utilize scatter-plot, Chi-plot and K-plot which can detect bivariate dependence using the ranks of the data as explained in Section 2. The first column of Fig. 11 illustrates a scatter-plot of a random sample (of size 1000) taken from the fitted Normal copula (as fitted to ðV; PÞ variables) with correlation coefficient of q ¼ 0:7971014, and the corresponding Chi and K-plots. The second column demonstrates the same plots of a random sample with the same size taken from the minimum information copula fitted to ðV; PÞ. By comparing the scatter-plots, it can be concluded that the minimum information copula is well capturing the general behavior of the Normal copula. The upper and lower tail dependency can be derived from the Chi and K-plots. For example, If there is no upper or lower tail dependence, the v values rightmost of the Chi-plot have to return to the zero line. This can be clearly observed in the Chi-plots of the Normal and corresponding minimum information copulas. The same tail dependencies behaviors can be observed from the K-plots of these copulas. Similarly, in the first column of Fig. 12, a scatter-plot of a random sample (of size 1000) taken from the fitted t-copula (as fitted to ðD; VÞ variables) with parameters ofq ¼ 0:6386490 and m ¼ 7:572639, and the corresponding Chi and K-plots are shown, the corresponding plots associated with the fitted minimum information copula are illustrated in the second column. By comparing the scatter-plots, it can be concluded that the minimum information copula is well capturing the general behavior of the t copula. A similar upper tail dependency can be observed for these two copula by comparing their Chi and K-plots. The minimum information copula is able to capture the upper tail behavior which can be found in other copulas including Gumbel and Tawn Copulas, and lower tail dependency such as Frank copula (see also Bedford et al. (2015) for similar findings). In addition to these graphical tools to detect and study the tail behavior of the fitted copulas above, we also present some analytical tools to measure tail dependency.
In order to study occurrence of extreme events like flood, the pair-wise analysis of upper tail dependence of flood variables can be implemented using the fitted copula models. The coefficient of upper tail dependence of two variables of interests X and Y is denoted by k U ðX; YÞ and defined as follows where a is considered as a threshold value associated with the upper tail dependence between these variables. This coefficient can be also presented in terms of copula as given in (Joe, 1997). It can be shown that if 0 < k U 6 1, the corresponding variables are said to be asymptotically dependent in the upper tail or the corresponding copula, C coupling these variables has upper tail dependence; if k U ¼ 0, the variables are said to be independent in the upper tail.
In flood hazard management, it is very crucial to take into account the tail-dependence coefficient in the modeling of joint flood characterizations. Otherwise, it can lead to a serious underestimation of the hazard and under design of flood protection works, with well-known consequences. Therefore, computing the taildependence coefficients as precise as possible would reduce the associated hazard. The method of approximating a bivariate copula using the minimum information technique can be used to estimate the tail-dependence coefficient by any level of approximation as desired. In this section, we analysis the tail dependencies between the flood variables using the different methods of modeling copulas demonstrated above.
The tail dependence may be studied either graphically using the chi-plot or numerically from an empirical copula, a given group of multivariate distributions, and a given group of copula functions. There are closed formulas for tail dependence of the bivariate t-student and Gaussian copulas given in Table 10.
In order to calculate the tail dependence associated with the fitted minimum information pair-copula, we can use the non-parametric estimations of the tail dependence. We use the C apéraá-Fougéres-Genest estimator denoted byk CFG U and suggested by Capéraá et al. (1997) to compute the tail dependencies between the pairs of flood variables fitted by the minimum information PCC.
In order to calculatek CFG U , a random sample fðU 1 ; V 1 Þ; . . . ; ðU n ; V n Þg taken from the underlying copula CðÁ; ÁÞ is required. The bivariate upper tail dependence,k CFG U is then given bŷ Table 11 shows the tail dependence coefficients for the different pairs of the flood variables and different types of copula models to capture the dependency structure. These coefficients are calculated based on the samples taken from the multivariate copulas fitted to the flood data in this paper. For instance, the tail dependence coefficients k TT U for each pair of the flood variables are calculated using (11) and based on a sample taken from the trivariatet-copula fitted to the flood data. In this table, we denote TT as the trivariate t-copula, PC stands for the pair-copula model, and MI denotes the minimum information copula.
Based on the results shown in Table 11, for the bivariate copula between ðD; PÞ, the value for the pair-copula distribution is 24 times and for the minimum information copula 29 times higher than the corresponding one for the trivariate t-copula. The practical implication of this difference in tail dependence is that the probability of observing a long duration flood is much higher for the PCC model and the minimum information pair-copula model than it is for the trivariate t-copula.

Conclusions
The aim of this paper was to present the use and usefulness of pair-copulas and minimum information pair-copula in flood hazard management. We developed a flexible D-vine and minimum information PCC with the same structure to model multivariate data exhibiting complex patterns of dependence in the tails. The developed methodology was used to analyze the dependency structure among flood data collected from Beas basin. In these analyses the developed models in this paper were carefully compared to relevant benchmark models such as multivariate copula model, and particularly multivariate t-copula. However, standard multivariate copulas have added some flexibility, this flexibility is insufficient in higher dimensional applications or the extreme events applications. The pair-copula models can fill this gap by benefiting from the rich class of existing bivariate parametric copula families or more flexible class of non-informative pair-copulas.
In order to compare the proposed models to the standard multivariate copulas, we first select the best trivariate copula to model the joint density of the flood variables. Using the different graphical and analytical goodness-of-fit criterions, the t-copula was chosen as the best trivariate copula. This copula has been chosen as x xx x x x xxx x x xx x xxx x x x x x x xx x xxxxx x x xxx x x xxx xxxxx x x 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.4 0.8

MI copula K−plot
xx x x xxx x xxx x x xxx x x x x x x xx x xxxxx xxxx xxx x x xxxxxxx x xxx x x Fig. 12. Scatter plots, Chi-plots and K-plots of the t-student copula and the minimum information copula fitted to ðD; VÞ variables. the most appropriate model in analyzing multivariate flood data in several other studies (see Ganguli and Reddy, 2013 and reference therein). We show that the drawbacks of this copula explained above can be resolved by using the D-vine copula model and minimum information D-vine copula. In addition to the general statistical comparisons between these models, we also computed the primary return periods of the flood data using these copulas and analyzed them in details concluding that the minimum informative pair-copula prediction of the primary return periods was the best and the trivariate t-copula was the worst among these three models. We also calculated the tail dependence coefficients between any pair of the flood variables using these three models and the same results as above were concluded. We show that the vine model constructed from minimum information copulas cab represent any dependence structure. The minimum information copula can be used to model the multivariate data with various tail dependency, including heavy, symmetric, and nonsymmetric tails. can model from weak to strong upper tail dependence in all of the parametric copulas chosen. The minimum information copula can model from weak to strong upper tail dependence in all of other suitable parametric copulas, including t, Gumbel, and Tawn copulas (see also Bedford, et al., 2015). In this study, we show that the minimum information copula is very useful to precisely estimate the tail dependence coefficients and primary return periods which are very vital in flood hazard management, and would allow improved representation of the interdependencies between flood event peak, event duration and volume to be taken into account in efficient flood analysis.
The minimum information copula we propose here to approximate uncertainty modeling in flood hazard management allows for the common correlation-based approaches to determining dependence, as well as providing a precise probabilistic approximation given a wide range of constraints and uncertainty available in the data. Our approach can be considered as subjectivist approach which follows a tradition in which expectation values are used to specify uncertain quantities. For instance, within a Bayesian approach, the proposed method in this paper may be thought of as a way to generate an informative prior distribution. In the Bayesian framework of risk assessment, the elicitation of a joint probability distribution from experts is among the key research areas, and the minimum information pair-copulas can be considered as a promising way to approximate a multivariate prior distribution based on the experts probabilistic statements. In addition, the pair-copula models can be used in conjunction of MCMC methods to update the models in a probabilistic way which is useful for detailed uncertainty analysis (see Min and Czado (2010) for further details).