Modeling Dependence with C-and D-Vine Copulas: The R Package CDVine

Flexible multivariate distributions are needed in many areas. The popular multivariate Gaussian distribution is however very restrictive and cannot account for features like asymmetry and heavy tails. Therefore dependence modeling using copulas is nowadays very common to account for such patterns. The use of copulas is however challenging in higher dimensions, where standard multivariate copulas suﬀer from rather inﬂexible structures. Vine copulas overcome such limitations and are able to model complex dependency patterns by beneﬁting from the rich variety of bivariate copulas as building blocks. This article presents the R package CDVine which provides functions and tools for statistical inference of canonical vine (C-vine) and D-vine copulas. It contains tools for bivariate exploratory data analysis and for bivariate copula selection as well as for selection of pair-copula families in a vine. Models can be estimated either sequentially or by joint maximum likelihood estimation. Sampling algorithms and graphical methods are also included.


Introduction
In search for flexible multivariate distributions, copula modeling has recently become increasingly popular in many fields of application. Standard references on copula theory include the books by Joe (1997) and Nelsen (2006). The most fundamental theorem, which constitutes the important role of copulas for describing dependence in statistics, is the theorem of Sklar (1959). It establishes the link between multivariate distribution functions and their univariate margins.
Let F be the d-dimensional distribution function of the random vector X = (X 1 , . . . , X d ) with margins F 1 , . . . , F d . Then there exists a copula C such that for all x = (x 1 , . . . , F (x) = C(F 1 (x 1 ), . . . , F d (x d )). (1) C is unique if F 1 , . . . , F d are continuous. Conversely, if C is a copula and F 1 , . . . , F d are distribution functions, then the function F defined by (1) is a joint distribution function with margins F 1 , . . . , F d . In particular C can be interpreted as the distribution function of a ddimensional random variable on [0, 1] d with uniform margins. Corresponding densities will be denoted by a small letter c. Furthermore, the random variables X 1 , . . . , X d will be assumed to be continuous in the following.
The pratical implication of Sklar's theorem is that the modeling of the marginal distributions can be conveniently separated from the dependence modeling in terms of the copula. The problem in practical applications is how to identify this copula. For the bivariate case, a rich variety of copula families is available and well-investigated (see Joe 1997;Nelsen 2006). However, in arbitrary dimension, the choice of adequate families is rather limited. Standard multivariate copulas such as the multivariate Gaussian or Student-t as well as exchangeable Archimedean copulas lack the flexibility of accurately modeling the dependence among larger numbers of variables. Generalizations of these offer some improvement, but typically become rather intricate in their structure and hence exhibit other limitations such as parameter restrictions.
Vine copulas do not suffer from any of these problems. Initially proposed by Joe (1996) and developed in more detail in Cooke (2001, 2002) and in Kurowicka and Cooke (2006), vines are a flexible graphical model for describing multivariate copulas built up using a cascade of bivariate copulas, so-called pair-copulas. Such pair-copula constructions (PCCs) decompose a multivariate probability density into bivariate copulas, where each pair-copula can be choosen independently from the others. This allows for a enormous flexibility in dependence modeling. In particular, asymmetries and tail dependence can be taken into account as well as (conditional) independence to build more parsimonious models. Vines thus combine the advantages of multivariate copula modeling, that is separation of marginal and dependence modeling, and the flexibility of bivariate copulas. Their "statistical breakthrough" was due to Aas, Czado, Frigessi, and Bakken (2009) who described statistical inference techniques for the two classes of canonical (C-) and D-vines.
So far publicly available and reliable software for C-and D-vine copula inference has been lacking. Only the software tool UNICORN (Kurowicka and Cooke 2009) includes some functionality for vines but only to a rather limited extent. We therefore try to fill this gap with the package CDVine for the statistical software R (R Core Team 2012). It includes functions for statistical inference of C-and D-vine copulas as well as, due to the underlying pair-copula structure, tools for bivariate data analysis. Some other R packages for copula modeling are available on the Comprehensive R Archive Network (CRAN, http://CRAN.R-project.org/): the comprehensive package copula described in Yan (2007) and Kojadinovic and Yan (2010b), the package fCopulae (Wuertz et al. 2009) and finally the package nacopula (Hofert and Mächler 2011) for so-called nested Archimedean copulas, a generalization of Archimedean copulas. Furthermore, our package depends on the packages igraph0 (Csardi and Nepusz 2006) for illustrations of vine trees and mvtnorm (Genz et al. 2012), which provides efficient implementations of multivariate Gaussian and Student-t distributions. These will be loaded (if not already loaded) when loading the package CDVine by R> library("CDVine") In the following, we assume that this has been done. Additionally, to allow for reproducibility of the results, we preliminarily fix a seed.
R> set.seed (10) The remainder of the paper is structured as follows. The required methodological background on vine copulas is provided in Section 2. In Section 3 we then discuss methods for bivariate data analysis, while those for statistical inference of C-and D-vine copulas are treated in Section 4. An illustrative example is presented in Section 5. Section 6 concludes and provides an outlook to further software implementations of the vine copula methodology.

Methodological background
Vines are a graphical representation to specify so-called pair copula constructions (PCCs) as introduced by Aas et al. (2009). Before we provide more general expressions for C-and D-vines, we motivate the PCC in three dimensions. For this let X = (X 1 , X 2 , X 3 ) ∼ F with marginal distribution functions F 1 , F 2 and F 3 and corresponding densities. By recursive conditioning we can write f (x 1 , x 2 , x 3 ) = f 1 (x 1 )f (x 2 |x 1 )f (x 3 |x 1 , x 2 ).
The three-dimensional joint density (2) can therefore be represented in terms of bivariate copulas C 1,2 , C 1,3 and C 2,3|1 with densities c 1,2 , c 1,3 and c 2,3|1 , so-called pair-copulas, which may be chosen independently of each other to achieve a wide range of different dependence structures. Typically it is assumed that the conditional copula C 2,3|1 is independent of the conditioning variable X 1 to facilitate inference (see Aas et al. 2009, andHobaek Haff, Aas, andFrigessi 2010).
Since the decomposition in (2) is not unique, there exist many such iterative PCCs. To classify them Cooke (2001, 2002) introduced the graphical model called vine, which is also treated in detail in Kurowicka and Cooke (2006) and Kurowicka and Joe (2011). Vines arrange the d(d−1)/2 pair-copulas of a d-dimensional PCC in d−1 linked trees (acyclic connected graphs with nodes and edges). In the first C-vine tree, the dependence with respect to one particular variable, the first root node, is modeled using bivariate copulas for each pair. Conditioned on this variable, pairwise dependencies with respect to a second variable are modeled, the second root node. In general, a root node is chosen in each tree and all pairwise dependencies with respect to this node are modeled conditioned on all previous root nodes, i.e., C-vine trees have a star structure (see the left panel of Figure 1). This gives the following decomposition of a multivariate density, the C-vine density w.l.o.g. with root nodes 1, . . . , d (otherwise nodes can be relabeled), (4) where f k , k = 1, . . . , d, denote the marginal densities and c i,i+j|1:(i−1) bivariate copula densities with parameter(s) θ i,i+j|1:(i−1) (in general i k : i m means i k , . . . , i m ). Here, the outer product runs over the d − 1 trees and root nodes i, while the inner product refers to the d − i pair-copulas in each tree i = 1, . . . , d − 1. Our three-dimensional example can be interpreted as a C-vine with X 1 as first root node. A more detailed discussion of the C-vine construction and its likelihood can be found in Aas et al. (2009) and in Czado et al. (2012).
Similarly, D-vines are also constructed by choosing a specific order of the variables. Then in the first tree, the dependence of the first and second variable, of the second and third, of the third and fourth, and so on, is modeled using pair-copulas, i.e., if we assume the order 1, . . . , d, we model the pairs (1, 2), (2, 3), (3, 4), etc. In the second tree, conditional dependence of the first and third given the second variable (the pair (1, 3|2)), the second and fourth given the third (the pair (2, 4|3)), and so on, is modeled. In the same way, pairwise dependencies of variables a and b are modeled in subsequent trees conditioned on those variables which lie between the variables a and b in the first tree, e.g., the pair (1, 5|2, 3, 4). That is each D-vine tree has a path structure (see the right panel of Figure 1). This then leads to the D-vine density which also conveniently decomposes a d-dimensional density (as above the order is w.l.o.g. chosen as 1, . . . , d; otherwise nodes can be relabeled): Again the outer product runs over the d − 1 trees, while the pairs in each tree are designated   by the inner product. By ordering the variables appropriately, the above three-dimensional example corresponds to a D-vine with order 2, 1, 3.
The crucial question for inference is how to obtain the conditional distribution functions F (x|v) for an m-dimensional vector v. For a pair-copula term in tree m + 1, this can easily be established using the pair-copulas of the previous trees 1, . . . , m and by sequentially applying the relationship where v j is an arbitrary component of v and v −j denotes the (m − 1)-dimensional vector v excluding v j (Joe 1996). Further C xv j |v −j is a bivariate copula distribution function with parameter(s) θ specified in tree m. The notion of the h-function is introduced for convenience (see Aas et al. 2009).
By allowing arbitrary bivariate copulas for each pair-copula term in the decompositions (4) and (5), the multivariate copulas obtained from C-and D-vine structures, so-called C-and Dvine copulas, constitute very flexible models, since bivariate copulas can easily accommodate complex dependence structures such as asymmetric dependence or strong joint tail behavior (see Joe, Li, and Nikoloulopoulos 2010). For this purpose a wide range of bivariate copula families is implemented in the package CDVine. Examples of five-dimensional C-and D-vine trees are shown in Figure 1. Here, the order of root nodes in the C-vine is 1, . . . , 5, which also is the order of the first D-vine tree. Edge labels show the indices of the corresponding pair-copula terms.  Fitting a vine copula model involves different steps: First an appropriate vine tree structure has to be identified. Such a structure may either be given by the data itself or has to be selected manually or through expert knowledge. For a given vine structure, adequate copulas have to be selected and, in the next step, estimated. Finally, models need to be evaluated and compared to alternatives. The workflow shown in Figure 2 allocates functions in CDVine to these different steps of data analysis and model building.

Bivariate data analysis methods
Since C-and D-vine copulas as pair-copula constructions are based on bivariate copulas as building blocks, CDVine includes a range of tools for bivariate data analysis and inference of bivariate copula families. We hence discuss these methods before turning to functions for statistical inference of C-and D-vine copulas in Section 4.
In the following we further assume that the data we are working with has approximately uniform margins in [0, 1], so-called copula data. For general data sets this is typically established either by non-parametrically transforming the data with the empirical marginal distribution functions or by choosing (and fitting) appropriate marginal distributions and then applying the parametric distribution functions to the data (see Sklar's Theorem (1)).

Bivariate copula families
The package CDVine provides a wide range of bivariate copula families from the two major classes of elliptical and Archimedean copulas (see Joe 1997;Nelsen 2006). Elliptical copulas are directly obtained by inverting Sklar's Theorem (1). Given a bivariate distribution function F with invertible margins F 1 and F 2 , then examples, which are also implemented in CDVine, are the bivariate Gaussian copula and the bivariate Student-t copula with dependence parameter ρ ∈ (−1, 1) and degrees of freedom parameter ν > 2 for the Student-t copula. Φ ρ denotes the bivariate standard normal distribution function with correlation parameter ρ and Φ −1 the inverse of the univariate standard normal distribution function. Similarly, t ρ,ν is the bivariate Student-t distribution function with correlation parameter ρ and ν degrees of freedom, while t −1 ν denotes the inverse univariate Student-t distribution function with ν degrees of freedom. Both copulas are obviously symmetric and hence lower and upper tail dependence coefficients are the same.
In CDVine we implemented the most common single parameter Archimedean families such as the Clayton, Gumbel, Frank and Joe. Furthermore, the packages provides functionality for four Archimedean copula families with two parameters, namely the Clayton-Gumbel, the Joe-Gumbel, the Joe-Clayton and the Joe-Frank. Following Joe (1997) we simply refer to them as BB1, BB6, BB7 and BB8, respectively. Their more flexible structure allows for different non-zero lower and upper tail dependence coefficients. As boundary cases they include the Clayton and Gumbel, the Joe and Gumbel, the Joe and Clayton as well as the Joe and Frank copulas, respectively.
To each family we assigned a number which is called by the argument family in many functions (see the respective first columns of Tables 1 and 2). Corresponding parameters are called by the arguments par and par2, where par2 is needed for the degrees of freedom parameter of the Student-t copula as well as for the δ-parameter of the BB1, BB6, BB7 and BB8 copulas. By default par2 is set to zero. The used notation and properties (relationship of parameter(s) to Kendall's τ as well as to lower and upper tail dependence coefficients; see Joe 1996;Nelsen 2006, for further details) are shown in Table 1 for bivariate elliptical and in Table 2 for bivariate Archimedean copulas, respectively.
By 0 we denote the independence copula, which is a boundary case of the implemented bivariate copulas, e.g., for the elliptical copulas with ρ = 0 and the Frank copula with θ → 0. As a reminder of the coding of the copula families the function BiCopName transforms a copula family name to its number analogue and vice versa.
The cumulative distribution functions (CDFs) and probability density functions (PDFs) of the bivariate copula families can be found in the books of Joe (1997) and Nelsen (2006) and are implemented in CDVine as the functions BiCopCDF and BiCopPDF, respectively. The example code illustrates the CDF and the PDF of a Student-t copula (family = 2) with dependence parameter ρ = 0.7 (par = 0.7) and 4 degrees of freedom (par2 = 4). Perspective plots are shown in Figure 3. Corresponding code can be found in the replication file v52i03.R, accompanying this manuscript.
Conditional bivariate distribution functions, the so-called h-functions defined in (6), can be evaluated using the function BiCopHfunc. For bivariate copula data u1 and u2 and given bivariate copula family (family) and parameter(s) (par and par2) it returns the h-functions of u2 given u1, h(u 2 |u 1 , θ), in the first (hfunc1) and of u1 given u2, h(u 1 |u 2 , θ), in the second argument (hfunc2).
To account for the relationship between bivariate copula parameter(s) and Kendall's τ and vice versa, the package CDVine contains the functions BiCopPar2Tau and BiCopTau2Par. However note that the inverse relationship (Kendall's τ to copula parameter(s)) is only welldefined for one parameter bivariate copulas, i.e., the families 1, 3, 4, 5, 6 and the rotated versions of the one parameter Archimedean copulas.
Similarly, the relationship between the copula parameter(s) and the tail dependence coefficients as tabulated in Tables 1 and 2 is implemented in the function BiCopPar2TailDep.
Simulation of general bivariate copula families can easily be established using the probability integral transform. For this, let C be the bivariate copula under consideration with parameter(s) θ. Further, let v 1 and v 2 be two independent uniform samples. Then u = (u 1 , u 2 ) given by with the h-function as defined in (6), is a sample from the bivariate copula C with uniform margins.
This is implemented in the function BiCopSim which returns a sample of size N for given bivariate copula family and parameter(s). To illustrate rotated bivariate Archimedean copulas, we simulate samples of size N = 500 from Clayton copulas rotated by 0 and 33). Corresponding scatter plots are shown in Figure 4. The code can again be found in the replication file v52i03.R.

Tools for bivariate exploratory data analysis
When analyzing (bivariate) data, the true copula describing the dependence is however always unknown. Hence, we require tools to determine an appropriate bivariate copula family to describe the observed dependence pattern (see Step 2 of the proposed workflow in Figure 2). CDVine provides graphical as well as analytical tools.

Graphical tools
One of the most common graphical tools beside the standard scatter plot is the contour plot.
BiCopMetaContour either plots a bivariate contour plot corresponding to a bivariate meta distribution with specified margins (out of a set of possible margins; one common distribution for both margins) and specified copula family and parameter(s) or creates an empirical contour plot based on bivariate copula data. The choice of margins for BiCopMetaContour is summarized in Table 3, where additional parameters for the margins can be set by the argument margins.par. Standard normal margins are chosen as default, since they allow for direct comparisons to multivariate normal shapes and bring out characteristic features such as sharpe corners which indicate tail dependence. Figure 5 shows an empirical contour plot (contours based on an estimated bivariate density) as well as theoretical contour plots (contours based on the theoretical bivariate density) with standard normal and Gamma margins for a Gumbel copula with parameter θ = 2. Code is  given in the replication file v52i03.R.
While contour plots are rather general tools, there also exist specialized graphical tools to investigate bivariate copula dependence directly. Kendall's plot (K-plot) and the χ-plot (or chi-plot) for detecting dependence are well-described in Genest and Favre (2007). The corresponding functions in CDVine are BiCopKPlot and BiCopChiPlot, respectively. Examples of both can be found in Figure 8 of Section 5. Genest and Rivest (1993) introduced a further method-the λ-function. The λ-function is characteristic for each copula family and defined as is Kendall's distribution function for a copula C with parameter(s) θ, v ∈ [0, 1] and (U 1 , U 2 ) distributed according to C. Note that for Archimedean copulas the λ-function is explicitly given in terms of the generator function ϕ and its derivative ϕ as λ(v, θ) = ϕ(v)/ϕ (v) (see Genest and Rivest 1993, for more details).
In BiCopLambda we implemented the λ-function for the copula families 1 -10. However note that for the bivariate Gaussian and Student-t copulas no closed form expressions of the theoretical λ-functions exist. Therefore they are simulated based on samples of size 1000. The plot of the theoretical λ-function also shows bounds of the λ-function corresponding to independence and comonotonicity (λ = 0). For rotated bivariate copulas one can transform the input arguments u1 and/or u2 in order to use the λ-function. For copulas rotated by 90 degrees u1 has to be set to 1 -u1, for 270 degrees u2 to 1 -u2 and for 180 degrees u1 and u2 to 1 -u1 and 1 -u2, respectively. Then λ-functions of the corresponding non-rotated copula families can be considered.
Comparing empirical to theoretical λ-functions gives an indication which copula family might be appropriate to describe the observed dependence (see Section 5). An illustrative example for the Joe copula with parameter θ = 2 is shown here: we first produce a plot of the empirical λ-function, then of the theoretical one, and finally a plot showing both (see Figure 6; code can be found in the replication file v52i03.R).

Analytical tools
In addition to the graphical tools we implemented a range of analytical tools, too, where the numerical output of the plotting functions (set PLOT = FALSE) can of course also be considered as analytical. Typically a good start of a bivariate data analysis is an independence test, in particular if the strength of dependence appears to be rather small. In this regard Genest and Favre (2007) propose the use of a simple bivariate independence test based on Kendall's τ . The test exploits the asymptotic normality of the test statistic where N is the number of observations andτ the empirical Kendall's τ of the data. The approximate p value of the null hypothesis of bivariate independence hence is Test statistic and p value are computed by the function BiCopIndTest.
A copula goodness-of-fit test based on Kendall's process for bivariate data, as investigated by Genest and Rivest (1993), is implemented in the function BiCopGofKendall. It computes the Cramér-von Mises and Kolmogorov-Smirnov test statistics as well as the corresponding estimated p values by bootstrapping (the default are B = 100 bootstrap samples; note that, if B is chosen rather large, computations may take very long). For rotated copulas the input arguments are transformed and the goodness-of-fit procedure for the corresponding non-rotated copula is used (see the discussion of the λ-function above).
A second goodness-of-fit test implemented in CDVine is based on a scoring approach. Given a set of bivariate copula families, the function BiCopVuongClarke performs for each possible pair of families the asymptotic tests by Vuong (1989) and by Clarke (2007). The Vuong as well as the Clarke test compare two non-nested models against each other and based on their null hypothesis, allow for a statistically significant decision among the two models (see below).
In the goodness-of-fit test proposed by Belgorodski (2010) this is used for bivariate copula selection. It compares a bivariate copula model C 0 to all other possible bivariate copula models under consideration in order to determine which family fits the data better than the other families. If copula model C 0 is favored over another copula model, a score of "+1" is assigned and similarly a score of "-1" if the other copula model is determined to be superior.
No score is assigned, if the respective test cannot discriminate between two copula models. The total score is the sum of the scores from all pairwise comparisons.
The Vuong and the Clarke tests are suitable to compare two non-nested models. Both are likelihood ratio based and related to the common Kullback-Leibler information criterion, which measures the distance between two statistical models. In the following let c 1 and c 2 be two competing bivariate copula densities with estimated parametersθ 1 andθ 2 , respectively. For the Vuong test we then compute the standardized sum, ν, of the log differences of their pointwise likelihoods m i := log(c 1 (u i,1 , u i,2 |θ 1 ))−log(c 2 (u i,1 , u i,2 |θ 2 )) for observations u i,j , i = 1, . . . , N, j = 1, 2, i.e., Vuong (1989) showed that ν is asymptotically standard normal. We hence prefer copula model 1 to copula model 2 at level α if Similarly, if ν < −Φ −1 1 − α 2 , we choose model 2. If, however, |ν| ≤ Φ −1 1 − α 2 , no decision among the models is possible, that is the null hypothesis that both models are statistically equivalent cannot be rejected (H 0 : E(m i ) = 0 ∀i = 1, .., N ).
The null hypothesis of statistical indistinguishability in the Clarke test, on the other hand, is H 0 : P (m i > 0) = 0.5 ∀i = 1, .., N.
The intuition behind this null hypothesis is, that under statistical equivalence of the two models the log-likelihood ratios of the single observations are uniformly distributed around zero and in expectation 50% of the log-likelihood ratios are greater than zero. The test statistic where 1 is the indicator function, was proposed by Clarke (2007) and is asymptotically distributed Binomial with parameters N and p = 0.5. Based on this, critical values can easily be obtained (see Clarke 2007). Model 1 is interpreted as statistically equivalent to model 2 if B is not significantly different from the expected value N p = N 2 . Both test statistics (7) and (8) can be corrected for the number of parameters used in the models, either using the Akaike or the parsimonious Schwarz correction, which correspond to the penalty terms of the AIC (Akaike 1973) and the BIC (Schwarz 1978), respectively. These can be specified using the argument correction, while the significance level of the tests is set by level.
An example of this scoring goodness-of-fit test will be given in Section 5. The set of copula families to compare is specified by the argument familyset.
Commonly used alternative criteria to discriminate among models are the above-mentioned AIC and BIC. They are however less reliable when non-nested models are compared. By correcting the log-likelihood for the number of parameters used in a model, they allow for an efficient comparison based on single numbers, namely among a class of models the model with smallest AIC/BIC is chosen. We implemented this selection procedure in the function BiCopSelect which estimates copula parameters for a given set of families to choose from (familyset) using maximum likelihood estimation (see the discussion of BiCopEst in Section 3.3) and then selects the family based on the AIC (default) or the BIC. Furthermore, a preliminary independence test (see the description of BiCopIndTest above) can be performed to accommodate that an independence copula might be appropriate for the given bivariate data anyway. The function returns the selected bivariate copula family and the estimated parameter(s).

Estimation of bivariate copula families
Having selected an appropriate bivariate copula family for given observations, e.g., using the graphical and analytical tools discussed above, the corresponding copula parameter(s) has/have to be estimated (see Step 3 of the suggested workflow in Figure 2). This can be established using the function BiCopEst which performs either a method of moments (inversion of Kendall's τ (method = "itau"); see Tables 1 and 2 and the function BiCopTau2Par) or maximum likelihood estimation (MLE; method = "mle"). Note again that the inversion of Kendall's τ is however not available for all bivariate copula families but only for the one parameter ones. If possible, starting values for the MLE are obtained by inversion of Kendall's τ , while optimization is performed using the L-BFGS-B algorithm for constraint optimization to account for the parameter ranges (see Tables 1 and 2). Furthermore, standard errors for both estimation methods are provided, too (if se = TRUE). For MLE standard errors are based on inversion of the Hessian matrix, while for inversion of Kendall's τ they are obtained as described in Kojadinovic and Yan (2010a).
As noted above, CDVine always assumes that marginally uniform data is given. The MLE used here therefore corresponds to the inference functions from margins (IFM; Joe 1997) or maximum pseudo likelihood method (MPL; Genest, Ghoudi, and Rivest 1995) depending on whether the transformation to [0, 1] was parametric or rank based.
To stabilize numerical computations, upper bounds for the degrees of freedom parameter of the Student-t copula as well as for the parameters of the BB copulas (in absolute values) can be specified using the arguments max.df for the Student-t copula and max.BB for the BB copulas. The default values are based on experience and work quite well in most cases. In certain circumstances, lower or higher values might however be sensible to improve results. In particular, if the degrees of freedom parameter of the Student-t copula is estimated to be quite large (as a rule of thumb 20-30 degrees of freedom can already be regarded as "large"), the Student-t is very similar to the Gaussian copula and therefore it is preferable to work with the Gaussian because it has only one parameter and is thus more efficient when doing inference. A corresponding warning message is returned if this is the case.

Statistical inference of C-and D-vine copulas
Having discussed techniques for bivariate data analysis, we now turn to the main part of CDVine: methods for statistical inference of C-and D-vine copulas. Before discussing estimation and model selection, the coding of C-and D-vines is introduced. Finally, some numerical issues are discussed.

Specification of C-and D-vine copula models and data simulation
As discussed in Section 2, one has to select an order of the variables when specifying C-and D-vine copulas. For the D-vine, the order of the variables in the first tree has to be chosen and for the C-vine, the root nodes for each tree need to be determined. Functions for inference of C-and D-vine copulas in the package CDVine assume that the order of the variables in the data set under investigation exactly corresponds to this C-or D-vine order. E.g., in a C-vine the first column of a data set is the first root node, the second column the second root node, etc.. According to this order arguments have to be provided to functions for C-and D-vine copula inference. After choosing which vine type we are working with (type = 1 or "CVine" denotes a C-vine, while type = 2 or "DVine" corresponds to a D-vine), the copula families (family) and parameters (par and par2) have to be specified as vectors of length d(d − 1)/2, where d is the number of variables. In a C-vine, the entries of this vector correspond to the following pairs and associated pair-copula terms . . . , (1, d|2, . . . , d − 1). (Tree d − 1) As an example consider the following four-dimensional C-vine copula model involving the pair-copula terms c 12 , c 13 , c 14 , c 23|1 , c 24|1 and c 34|12 : R> type <-1 R> family <-c(1, 3, 6, 2, 1, 5) R> par <-c(0.5, 1.3, 2.1, -0.3, 0.2, 1.7) R> par2 <-c(0, 0, 0, 3, 0, 0) In particular, the pair-copula c 2,3|1 is a Student-t with dependence parameter ρ = −0.3 and 3 degrees of freedom, while pair c 3,4|1,2 in the last tree is modeled by a Frank copula with parameter θ = 1.7. The strength of dependence modeled by each pair-copula term can be illustrated by transforming the parameter(s) of each pair-copula term into the corresponding Kendall's τ value using CDVinePar2Tau (see BiCopPar2Tau).
To simulate from a vine copula specification, the function CDVineSim can be used. The corresponding algorithms are given in Aas et al. (2009). They are based on the same idea as the bivariate simulation described in Section 3.1.

Estimation
Having decided the structure of the C-or D-vine to be used, one has to select pair-copula families for each (conditional) pair of variables as described in Section 3.2 or using the function CDVineCopSelect (Step 2 in Figure 2). Based on BiCopSelect, this function selects for a given copula data set (data) and vine type (type), appropriate bivariate copula families from a set of possible copula families (familyset) according to the AIC (default) or the BIC. As in BiCopSelect preliminary independence tests can also be performed for each (conditional) pair to obtain more parsimonious models.
This copula selection proceeds tree by tree, since the conditional pairs in trees 2, . . . , d − 1 depend on the specification of the previous trees through the h-functions (see Section 2).
Hence, initially C-and D-vine copula models are typically fitted sequentially by proceeding iteratively tree by tree and thus only involving bivariate estimation for each individual paircopula term (see, e.g., Czado et al. 2012 for a detailed description of sequential estimation in C-vines; Step 3 in Figure 2). This can be established using the function CDVineSeqEst which internally calls the function BiCopEst described in Section 3.3. Therefore, estimation of the parameter(s) of each pair-copula can be carried out using inversion of Kendall's τ or MLE (method = "itau" or "mle"), standard errors can be computed (se = TRUE or FALSE) and upper bounds for the Student-t degrees of freedom and BB copula parameters can be set by max.df and max.BB. A detailed example will be given in Section 5. Even though these sequential estimates often provide a good fit, one typically is interested in maximizing the (log-)likelihood of a vine copula specification (see (4) and (5)) for observations u = (u k,j ) k=1,...,N, j=1,...,d : The C-vine copula log-likelihood with parameter set θ CV is given by where F j|i 1 :im := F (u k,j |u k,i 1 , . . . , u k,im ) and the marginal distributions are uniform, i.e., f k (u k ) = 1 [0,1] (u k ). Note that F j|i 1 :im depends on the parameters of pair-copula terms in tree 1 up to tree i m .
The log-likelihood of a vine copula for given data (data), pair-copula families (family) and parameters (par and par2) can be obtained using the function CDVineLogLik which implements the algorithms given in Aas et al. (2009).
Using these log-likelihood calculations, we can now estimate parameters jointly using MLE-in contrast to the pairwise sequential estimation discussed above. This can be established using the function CDVineMLE with arguments for the given data (data), the pair-copula families (family) and corresponding starting values for the parameters (start and start2), the vine type (type) as well as the maximum number of iterations of the optimizer (maxit), where the L-BFGS-B algorithm for constraint optimization problems is again used here. Upper bounds for the Student-t degrees of freedom and BB copula parameters can also be set by max.df and max.BB. Starting values, if not provided, are obtained using the function CDVineSeqEst. The usage of the estimation methods and the calculation of the log-likelihood will be illustrated in Section 5.

Selection among vine copula models
Having fitted different vine copula models to a given data set, one typically is interested in determining the "best" model in terms of one or more criteria (Step 4 in the suggested workflow in Figure 2). Besides the classical AIC and BIC, implemented in CDVineAIC and CDVineBIC, two such criteria are the Vuong and the Clarke tests described in Section 3.2. They allow for pairwise comparisons of two competing models, e.g., a C-and a D-vine copula model, and can be performed using the functions CDVineVuongTest and CDVineClarkeTest. In these functions, models have to be specified as usual: Model1.family, Model1.par, Model1.par2 and Model1.type for the first model and similarly for the second model. For each model an order of the variables has to be given, since the orders of C-vine root nodes or of the nodes in the first D-vine tree may be chosen differently in the two models. The arguments Model1.order and Model2.order therefore specify these orders corresponding to the respective vine type. As output, both functions return test statistics with and without correction for the number of parameters as well as corresponding p values.
Furthermore, obtained vine specifications can be illustrated using the function CDVineTreePlot which plots one or all trees of a specified vine model (either tree = "ALL" or a tree number in {1, . . . , d − 1}). If no parameters are provided, these are obtained using sequential estimation, where arguments for CDVineSeqEst can be specified. The trees are plotted using the igraph0 package with individually chosen edge labels. As edge labels the user is free to combine the following information in a vector or choose edge.labels = FALSE for no edge labels: "family": pair-copula family names (default), "par": pair-copula parameters, "par2": second pair-copula parameters, "theotau": theoretical Kendall's τ values corresponding to pair-copula families and parameters, or "emptau": empirical Kendall's τ values, which are available only if data for sequential estimation is provided.
Positions of the nodes are either determined automatically (default) or can be set by the argument P which gives x-and y-coordinates of the nodes. Node labels can be specified by the argument names. An example with code will be given in Figure 9 in Section 5 and in the replication file v52i03.R.

Implementation and numerical issues
In order to speed up computations we implemented the major parts of the algorithms in C. In particular, the MLE is considerably faster by coding the log-likelihood of C-and D-vine copula models in C. We also implemented the method by Knight (1966) for efficiently computing the empirical Kendall's tau.
Even more important is the question of numerical stability. As noted in Section 3.3, it is advisable to set prudent upper bounds for the estimation of the degrees of freedom parameter of the Student-t copula as well as of the BB1, BB6, BB7 and BB8 copula parameters. In general, the user should be careful when working with parameters that correspond to extreme choices of Kendall's τ , that is Kendall's τ values close to −1, 0 and 1. This may for example lead to problems in sequential estimation of pair-copulas in higher order trees of C-and D-vines. For such pair-copulas, dependence is typically rather small and inevitable rounding errors are amplified, so that weak negative dependence might be observed even if the dependence should actually be positive. If this pair is modeled by a copula family that can only accommodate positive dependence such as the standard Clayton, Gumbel or Joe copulas, the sequential estimation will abort. In such a case, it may be helpful to identify the "problematic" term by setting progress = TRUE in CDVineSeqEst and then check the copula choice for example using BiCopSelect. A simple countermeasure is often to simply set this pair-copula term to a Gaussian copula because copulas close to independence are rather similar anyway. Alternatively, running CDVineCopSelect also estimates parameters sequentially and chooses appropriate pair-copula families so that no such problems will occur.
When estimating copula parameters, mostly some bounds have to be set in order to respect the parameter ranges (see Tables 1 and 2). This has been done based on experience and extensive stability tests. Similarly, copula data is bounded to the interval [10 −10 , 1 − 10 −10 ] because values too close to 0 or 1 lead to severe numerical problems. Apart from that, additional measures have been taken to improve the stability, but we will not go into the details here.

Example: Major world stock indices
As an example we choose the worldindices data set which is included in the package CDVine. This data set contains transformed standardized residuals of daily log returns of major world stock indices in 2009 and 2010 (396 observations). The considered indices are the leading stock exchanges of the six largest economies in the world: the US American S&P 500 (ˆGSPC), the Japanese Nikkei 225 (ˆN225), the Chinese SSE Composite Index (ˆSSEC), the German DAX (ˆGDAXI), the French CAC 40 (ˆFCHI) and the British FTSE 100 Index (ˆFTSE). Each time series is filtered using an ARMA(1,1)-GARCH(1,1) model with Student-t innovations and standardized residuals are transformed non-parametrically to copula data using the respective empirical distribution function.  Figure 7: Pairs plot of the worldindices data set with scatter plots above and contour plots with standard normal margins below the diagonal. Axes of the contour plots range from −3 to 3 other than indicated here (see the replication file for code how to generate this plot).

R> data("worldindices")
For a first impression of the data Figure 7 shows a pairs plot with scatter plots above and contour plots with standard normal margins below the diagonal. In particular among the European indices there is evidently strong dependence, while the dependence to the two Asian indices is rather weak.
Following the proposed workflow in Figure 2 in Section 2 we will perform a detailed exploratory data analysis (EDA) of one particular variable pair and specify a C-vine copula model including copula selection, sequential estimation and MLE as well as log-likelihood computations and plotting of C-vine trees. This will illustrate the usefulness of our functions and their handling. The specification of a D-vine copula model is not explicitly discussed here, but could be covered in essentially the same way. Such a D-vine copula model is then compared to the selected C-vine copula model at the end of our presentation. Using the C-vine structure selection criterion described by Czado et al. (2012) we determinê FCHI as the first root node (C-vine tree with strongest dependencies in terms of absolute empirical values of pairwise Kendall's τ 's; this is the first part of Step 1 in the proposed workflow in Figure 2). We now exemplarily show the EDA for the pair (ˆFCHI,ˆFTSE) using the graphical tools BiCopMetaContour (see row 6, column 5 of Figure 7), BiCopKPlot, BiCopChiPlot and BiCopLambda, as well as the analytical tools BiCopIndepTest, BiCopVuongClarke and BiCopSelect in order to choose the best fitting copula (Step 2 in Figure 2). The scatter plot in row 5, column 6 of Figure 7 as well as the K-and chi-plots in Figure 8 show that the variables are strongly positively dependent. Evidence of symmetric tail dependence is also visible. The empirical contour plot confirms these properties which are characteristic for a Student-t copula. Additionally, the corresponding theoretical contour plot of the bivariate Student-t copula (not shown here) has a similar shape as the empirical one in Figure 7. The λ-function in the right panel supports our choice.
The following pro forma independence test with a p value of zero confirms the strong dependence.
R> mlePar <-CDVineMLE(dat, family = family, start = seqPar$par, + start2 = seqPar$par2, type = 1) CDVineMLE returns the parameters found, the optimized log-likelihood as well as information about the optimization. A direct comparison of the log-likelihoods using CDVineLogLik shows the slight improvement of the jointly estimated parameters over the sequential ones in terms of the log-likelihood.
R> P <-CDVineTreePlot(data = NULL, family = family, par = mlePar$par, + par2 = mlePar$par2, names = colnames(dat), type = 1, tree = 1, + edge.labels = c("family", "theotau")) Similarly we also fitted a D-vine copula model with order of variables order_dvine, paircopula families family_dvine and corresponding parameters par_dvine and par2_dvine. In order to determine the better fitting vine copula model for the worldindices data set, we perform a Vuong test comparing both models (see Step 4 of Figure 2).  Figure 9: First tree of the specified C-vine for the worldindices data set with pair-copula families and Kendall's τ values corresponding to pair-copula parameters as edge labels. The test statistics close to zero (irrespective of the correction considered) and the large p values indicate that the C-and the D-vine copula models for the worldindices data set cannot be distinguished statistically. Results from a Clarke test between both models, which are not reported here, confirm this.

R>
To summarize, the above analysis showed strong positive dependencies among the European stock indices, where the French CAC 40 was determined to be central for explaining the overall dependence observed in the data. Further, we found evidence of medium to strong tail dependence as well as of some asymmetries in the dependence structure. Based on the data we could however not discriminate among fitted C-and D-vine copula models, where it should be noted that both models provide additional insights due to their specific structures.

Conclusion and outlook
In this paper, we present the R package CDVine for statistical inference of C-and D-vine copulas and demonstrate its use and usefulness in a substantial example. For the first time, the package CDVine provides extensive functionality for vine copula inference and related data analysis. In the future, we are planning to extend this to the more general class of regular vines as defined in Kurowicka and Cooke (2006). Inference and model selection of these are treated in Dißmann, Brechmann, Czado, and Kurowicka (2013) and Brechmann, Czado, and Aas (2012), while a large scale financial application can be found in . Further possible extensions are Bayesian inference and model selection techniques as used in Min and Czado (2010) and Min and Czado (2011).