Simpliﬁed vine copula models: Approximations based on the simplifying assumption ∗

: Vine copulas, or pair-copula constructions, have become an important tool in high-dimensional dependence modeling. Commonly, it is assumed that the data generating copula can be represented by a simpli-ﬁed vine copula (SVC). In this paper, we study the simplifying assumption and investigate the approximation of multivariate copulas by SVCs. We introduce the partial vine copula (PVC) which is a particular SVC where to any edge a j -th order partial copula is assigned. The PVC generalizes the partial correlation matrix and plays a major role in the approximation of copulas by SVCs. We investigate to what extent the PVC describes the dependence structure of the underlying copula. We show that, in general, the PVC does not minimize the Kullback-Leibler divergence from the true copula if the simplifying assumption does not hold. However, under regularity conditions, stepwise estimators of pair-copula constructions converge to the PVC irrespective of whether the simplifying assumption holds or not. Moreover, we elucidate why the PVC is often the


Introduction
Copulas constitute an important tool to model dependence [28,17,25]. While it is easy to construct bivariate copulas, the construction of flexible highdimensional copulas is a sophisticated problem. The introduction of simplified vine copulas (Joe [16]), or pair-copula constructions (Aas et al. [2]), has been an enormous advance for high-dimensional dependence modeling. Simplified vine copulas (SVCs) are hierarchical structures, constructed upon a sequence of bivariate unconditional copulas, which capture the conditional dependence between pairs of random variables if the data generating process satisfies the simplifying assumption. In this case, all conditional copulas of the data generating vine collapse to unconditional copulas and the true copula can be represented in terms of a SVC. Vine copula methodology and applications have been extensively developed under the simplifying assumption [6,11,18,20,29], with studies showing the superiority of SVC models over elliptical copulas and nested Archimedean copulas (Aas and Berg [1], Fischer et al. [8]).
Although some copulas can be expressed as a SVC, the simplifying assumption is not true in general. Hobaek Haff, Aas and Frigessi [14] point out that the simplifying assumption is in general not valid and provide examples of multivariate distributions which do not satisfy the simplifying assumption. Stöber, Joe and Czado [37] show that the Clayton copula is the only Archimedean copula for which the simplifying assumption holds, while the Student-t copula is the only SVC arising from a scale mixture of normal distributions. In fact, it is very unlikely that the unknown data generating process satisfies the simplifying assumption in a strict mathematical sense. As a result, researchers have recently started to investigate new dependence concepts that are related to the simplifying assumption. In particular, studies on the bivariate partial copula, a generalization of the partial correlation coefficient, have (re-)emerged lately [5,9,10,35,31]. Estimators for non-simplified vine copula models are proposed by Schellhase and Spanhel [33] and Vatter and Nagler [38] using a non-parameteric approach and generalized additive models, respectively.
We introduce the partial vine copula (PVC) which generalizes the partial correlation matrix. The PVC is a particular SVC where to any edge a j-th order partial copula is assigned. We investigate several properties of the PVC and show to what extent the dependence structure of the underlying distribution is captured. The PVC plays a crucial role in terms of approximating a multivariate copula by a SVC. We show that stepwise estimators of SVCs converge to the PVC regardless of whether the simplifying assumption holds. However, we also prove that the PVC may not minimize the Kullback-Leibler divergence from the true copula and thus may not be the best approximation in the space of SVCs. This result is rather surprising, because it implies that it may not be optimal to specify the true copulas in the first tree of a SVC approximation. Moreover, joint and stepwise estimators of SVCs may not converge to the same probability limit any more if the simplifying assumption does not hold. Nevertheless, the PVC is often the best SVC approximation in practice because only a stepwise estimation is feasible. The PVC is used by Nagler and Czado [27] to construct a new non-parametric estimator of a multivariate distribution that can outperform classical non-parametric approaches. Moreover, Kurz and Spanhel [24] apply the PVC to test the simplifying assumption in high-dimensional vine copulas. All in all, these facts highlight the practical importance of the PVC for multivariate dependence modeling.
The rest of this paper is organized as follows. (Simplified) vine copulas, the simplifying assumption, conditional and partial copulas, are discussed in Section 2. The PVC and j-th order partial copulas are introduced in Section 3. Properties of the PVC and examples are presented in Section 4. In Section 5 we analyze the role of the PVC for SVC approximations and explain why the PVC is the best feasible approximation in practical applications. A parametric estimator for the PVC is presented in Section 6 and implications for the stepwise and joint maximum likelihood estimator of SVCs are illustrated. Section 7 contains some concluding remarks.

Simplified vine copulas, conditional and partial copulas
This section introduces vine copulas, the simplifying assumption and related concepts. First, notation and assumptions are stated. Then, we discuss (simplified) vine copulas and the simplifying assumption. Thereafter, we introduce the partial copula which can be considered as a generalization of the partial correlation coefficient and as an approximation of a bivariate conditional copula.

Notation and assumptions
The following notation and assumptions are used throughout the paper. We write X 1:d := (X 1 , . . . , X d ), so that F X 1:d (x 1:d ) := P(∀i = 1, . . . , d: X i ≤ x i ). Let dx 1:d := dx 1 . . . dx d denote the variables of integration in f X 1:d (x 1:d )dx 1:d . C ⊥ refers to the independence copula. X ⊥ Y means that X and Y are stochastically independent. For 1 ≤ k ≤ d, the partial derivative of g w.r.t. the k-th argument is denoted by ∂ k g(x 1:d ). We write 1 1 {A} = 1 if A is true, and 1 1 {A} = 0 otherwise. For simplicity, we assume that all random variables are real-valued and continuous. Let d ≥ 3, if not otherwise specified, and C d be the space of absolutely continuous d-dimensional copulas with positive density (a.e.). The distribution function of a random vector U 1:d with uniform margins is denoted by F 1:d = C 1:d ∈ C d . We set I d l := {(i, j): j = l, . . . , d − 1, i = 1, . . . , d − j} and S ij := i + 1 : i + j − 1 := i + 1, . . . , i + j − 1. We focus on D-vine copulas, but all results carry over to regular vine copulas (Bedford and Cooke [4], Kurowicka and Joe [23]). An overview of the used notation can be found in Table 1. All proofs are deferred to the Appendix.

Definition 2.1. (Simplified D-vine copula or pair-copula construction -Joe [16], Aas et al. [2])
Simplified (regular) vine copulas can be considered as an ordered sequence of trees, where j refers to the number of the tree and a bivariate unconditional copula C SVC i,i+j; Sij is assigned to each of the d − j edges of tree j (Bedford and Cooke [4]). The left hand side of Figure 1 shows the graphical representation of a simplified D-vine copula for d = 4, i.e., The bivariate unconditional copulas C SVC i,i+j; Sij are also called pair-copulas, so that the resulting model is often termed a pair-copula construction. By means of SVC models one can construct a wide variety of flexible multivariate copulas because each of the d(d − 1)/2 bivariate unconditional copulas C SVC i,i+j; Sij can be chosen arbitrarily. The resulting model is always a valid d-dimensional copula. Moreover, a pair-copula construction does not suffer from the curse of dimensions because it is build upon a sequence of bivariate unconditional copulas. As a result, SVCs are very attractive for high-dimensional applications. Obviously, not every multivariate copula can be represented by a SVC. However, every copula can be represented by the following (non-simplified) D-vine copula.

Definition 2.2. (D-vine copula -Kurowicka and Cooke [22])
Let U 1:d be a random vector with cdf F 1: Sij denote the conditional copula of F i,i+j|Sij (Definition 2.5) and let u k|Sij := F k|Sij (u k |u Sij ) for k = i, i + j. The density of a D-vine copula decomposes the copula density of U 1:d into d(d − 1)/2 bivariate conditional copula densities c i,i+j; Sij according to the following factorization: Contrary to a simplified D-vine copula in Definition 2.1, a bivariate conditional copula C i,i+j; Sij , which is a function of j + 1 variables, is assigned to each edge of a D-vine copula in Definition 2.2. The influence of the conditioning variables on the conditional copulas is illustrated by dashed lines in the right hand side of Figure 1. In applications, the simplifying assumption is typically imposed. That is, it is assumed that all bivariate conditional copulas of the data generating vine copula degenerate to bivariate unconditional copulas.
If the data generating copula satisfies the simplifying assumption it can be represented by a SVC, resulting in fast and simple statistical inference. Several methods for the consistent specification and estimation of pair-copula constructions have been developed under this assumption (Hobaek Haff [13], Dißmann et al. [6]). However, in view of Definition 2.2 and Definition 2.1 it is evident that it is extremely unlikely that the data generating vine copula strictly satisfies the simplifying assumption in practical applications.
Several questions arise if the data generating process does not satisfy the simplifying assumption and a simplified D-vine copula model (Definition 2.1) is used to approximate a general D-vine copula (Definition 2.2). First of all, what bivariate unconditional copulas C SVC i,i+j; Sij should be chosen in Definition 2.1 to model the bivariate conditional copulas C i,i+j; Sij in Definition 2.2 so that the best approximation w.r.t. a certain criterion is obtained? If the simplifying assumption does not hold for the data generating vine copula, what is the SVC model that established stepwise procedures (asymptotically) specify and estimate? What are the properties of an optimal approximation? Before we address these questions in Section 5, it is useful to recall the definition of the conditional and partial copula in Section 2.3 and introduce the partial vine copula in Section 3 and Section 4.

Definition 2.4. (Conditional probability integral transform (CPIT))
It can be readily verified that, under the assumptions in Definition 2.4, U k|Sij ∼ U(0, 1) and U k|Sij ⊥ U Sij . Thus, applying the random transformation F k|Sij (·|U Sij ) to U k removes possible dependencies between U k and U Sij . The CPIT U k|Sij can be interpreted as the remaining variation in U k that can not be explained by U Sij . This interpretation is crucial for understanding the conditional and partial copula which are related to the (conditional) joint distribution of CPITs. The conditional copula has been introduced by Patton [30] and we restate its definition here. 1

Definition 2.5. (Bivariate conditional copula -Patton [30])
Let U 1:d ∼ F 1:d ∈ C d and (i, j) ∈ I d 2 . The (a.e.) unique conditional copula C i,i+j; Sij of the conditional distribution F i,i+j|Sij is defined by Equivalently, we have that . 1 Patton's notation for the conditional copula is given by C i,i+j|S ij . Originally, this notation has also been used in the vine copula literature [2,23,3]. However, the current notation for a(n) (un)conditional copula that is assigned to an edge of a vine is given by C i,i+j; S ij and [18,37,21]. In order to avoid possible confusions, we use C i,i+j; S ij to denote a conditional copula and C SVC i,i+j; S ij to denote an unconditional copula.
Thus, the effect of a change in u Sij on the conditional distribution function F i,i+j|Sij (u i , u i+j |u Sij ) can be separated into two effects. First, the values of the CPITs, (F i|Sij (u i |u Sij ), F i+j|Sij (u i+j |u Sij )), at which the conditional copula is evaluated, may change. Second, the functional form of the conditional copula C i,i+j; Sij (·, ·|u Sij ) may vary. In comparison to the conditional copula, which is the conditional distribution of two CPITs, the partial copula is the unconditional distribution and copula of two CPITs.

Definition 2.6. (Bivariate partial copula -Bergsma [5])
The partial copula C P i,i+j; Sij of the distribution F i,i+j|Sij is defined by Since U i|Sij ⊥ U Sij and U i+j|Sij ⊥ U Sij , the partial copula represents the distribution of random variables which are individually independent of the conditioning vector U Sij . This is similar to the partial correlation coefficient, which is the correlation of two random variables from which the linear influence of the conditioning vector has been removed. The partial copula can also be interpreted as the expected conditional copula, and be considered as an approximation of the conditional copula. Indeed, it is easy to show that the partial copula C P i,i+j; Sij minimizes the Kullback-Leibler divergence from the conditional copula C i,i+j; Sij in the space of absolutely continuous bivariate distribution functions. The partial copula is first mentioned by Bergsma [5] who applies the partial copula to test for conditional independence. Recently, there has been a renewed interest in the partial copula. Spanhel and Kurz [35] investigate properties of the partial copula and mention some explicit examples whereas Gijbels, Omelka and Veraverbeke [9,10] and Portier and Segers [31] focus on the non-parametric estimation of the partial copula.

Higher-order partial copulas and the partial vine copula
A generalization of the partial correlation coefficient that is different from the partial copula is given by a higher-order partial copula. To illustrate this relation, let us recall the common definition of the partial correlation coefficient. Assume that all univariate margins of Y 1:d have zero mean and finite variance. For k = i, i + j, let P(Y k |Y Sij ) denote the best linear predictor of Y k w.r.t Y Sij which minimizes the mean squared error so thatẼ k|Sij = Y k − P(Y k |Y Sij ) is the corresponding prediction error. The partial correlation coefficient of Y i and Y i+j given Y Sij is then defined by ρ i,i+j;Sij = Corr[Ẽ i|Sij ,Ẽ i+j|Sij ]. An equivalent definition is given as follows. For i = 1, . . . , d − 2, let Moreover, for j = 3, . . . , d − 1, and i = 1, . . . , d − j, define It is easy to show that E k|Sij =Ẽ k|Sij for all k = i, i + j and (i, j) ∈ I d 2 . That is, E k|Sij is the error of the best linear prediction of Y k in terms of Y Sij . Thus, ρ i,i+j;Sij = Corr[E i|Sij , E i+j|Sij ]. However, the interpretation of the partial correlation coefficient as a measure of conditional dependence is different depending on whether one considers it as the correlation of (Ẽ i|Sij ,Ẽ i+j|Sij ) or (E i|Sij , E i+j|Sij ). For instance, ρ 14;23 = Corr[Ẽ 1|23 ,Ẽ 4|23 ] can be interpreted as the correlation between Y 1 and Y 4 after each variable has been corrected for the linear influence of Y 2:3 , i.e., Corr[g(Ẽ k|23 ), h(Y 2:3 )] = 0 for all linear functions g and h. The idea of the partial copula is to replace the prediction errors E 1|23 and E 4|23 by the CPITS U 1|23 and U 4|23 which are independent of Y 2:3 . On the other side, ρ 14;23 = Corr E 1|23 , E 4|23 is the correlation of (E 1|2 , E 4|3 ) after E 1|2 has been corrected for the linear influence of E 3|2 , and E 4|3 has been corrected for the linear influence of E 2|3 . Consequently, a different generalization of the partial correlation coefficient emerges if we do not only decorrelate the involved random variables in (3.1) and (3.2) but render them independent by replacing each expression of the form X − P(X|Z) in (3.1) and (3.2) by the corresponding CPIT F X|Z (X|Z). The joint distribution of a resulting pair of random variables is given by the j-th order partial copula. The set of these copulas together with a vine structure constitute the partial vine copula.

Definition 3.1. (Partial vine copula (PVC) and j-th order partial copulas)
Consider the D-vine copula C 1:d ∈ C d stated in Definition 2.2. In the first tree, we set for i = 1, . . . , d − 1: and We call the resulting SVC C PVC 1:d the partial vine copula (PVC) of C 1:d . Its density is given by Note that the first-order partial copula coincides with the partial copula of a conditional distribution with one conditioning variable. If j ≥ 3, we call C PVC i,i+j; Sij a higher-order partial copula. It is easy to show that, for all (i, j) Thus, PPITs are uniformly distributed and higher-order partial copulas are indeed copulas.
However, in general it is not true that U PVC i|Sij ⊥ U Sij as the following lemma clarifies.

Lemma 3.1. (Relation between PPITs and CPITs)
For (i, j) ∈ I d 2 and k = i, i + j, it holds: Consequently, if a higher-order partial copula does not coincide with the partial copula, it describes the distribution of a pair of uniformly distributed random variables which are neither jointly nor both individually independent of the conditioning variables of the corresponding conditional copula. If the simplifying assumption holds, then C 1:d = C PVC 1:d , i.e., higher-order partial copulas, partial copulas and conditional copulas coincide. This insight is used by Kurz and Spanhel [24] to develop tests for the simplifying assumption in highdimensional vine copulas. Let the first argument. The (j − 1)-th order partial copula is then given by Sij , and F Sij , i.e., it depends on C i:i+j . Moreover, C PVC i,i+j; Sij also depends on G PVC i|Sij and G PVC i+j|Sij , which are determined by the regular vine structure. Thus, the corresponding PVCs of different regular vines may be different. In particular, if the simplifying assumption does not hold, higher-order partial copulas of different PVCs which refer to the same conditional distribution may not be identical. This is different from the partial correlation coefficient or the partial copula which do not depend on the structure of the regular vine.
In general, higher-order partial copulas do not share the simple interpretation of the partial copula because they can not be considered as expected conditional copulas. However, higher-order partial copulas can be more attractive from a practical point of view. The estimation of the partial copula of C i,i+j; Sij requires the estimation of the two j-dimensional conditional cdfs F i|Sij and F i+j|Sij to construct pseudo-observations from the CPITs (U i|Sij , U i+j|Sij ). As a result, a non-parametric estimation of the partial copula is only sensible if j is very small. In contrast, a higher-order partial copula is the distribution of two PPITs (U PVC i|Sij , U PVC i+j|Sij ) which are made up of only two-dimensional functions (Definition 3.1). Thus, the non-parametric estimation of a higher-order partial copula does not suffer from the curse of dimensionality and is also sensible for large j [27]. But also in a parametric framework the specification of the model is much easier for PPITs than for CPITs. This renders higher-order partial copulas very attractive from a modeling point of view. As we show in Section 6, the PVC is also the probability limit of many estimators of pair-copula constructions and thus of great practical importance.

Properties of the partial vine copula and examples
In this section, we analyze to what extent the PVC describes the dependence structure of the data generating copula if the simplifying assumption does not hold. We first investigate whether the bivariate margins of C PVC 1:d match the bivariate margins of C 1:d and then take a closer look at conditional independence relations. By construction, the bivariate margins C PVC i,i+1 of the PVC given in Definition 3.1 are identical to the corresponding margins C i,i+1 , for i = 1, . . . , d− 1. That is because the PVC explicitly specifies these d−1 margins in the first tree of the vine. The other bivariate margins C PVC i,i+j , where (i, j) ∈ I d 2 , are implicitly specified and given by The relation between the implicitly given bivariate margins of the PVC and the underlying copula are summarized in the following lemma.

Lemma 4.1. (Implicitly specified margins of the PVC)
The next example provides a three-dimensional PVC and illustrates the results of Lemma 4.1. Other examples of PVCs in three dimensions are given in Spanhel and Kurz [35].
C A (γ) denote the following asymmetric version of the FGM copula ( [28], Example 3.16) and Elementary computations show that the implicit margin is given by which is a copula with quartic sections in u 1 and square sections in and the implicit margin of C PVC 1:3 is Higher-order partial copulas can also be used to construct new measures of conditional dependence. For instance, if X 1:d is a random vector with copula C 1:d ∈ C d , higher-order partial Spearman's ρ and Kendall's τ of X i and X i+j given X Sij are defined by Note that all dependence measures that are derived from a higher-order partial copula are defined w.r.t. a regular vine structure. They coincide with their conditional analogues if the simplifying assumption holds. A partial correlation coefficient of zero is commonly interpreted as an indication of conditional independence, although this can be quite misleading if the underlying distribution is not close to a Normal distribution (Spanhel and Kurz [35]). Therefore, one might wonder to what extent higher-order partial copulas can be used to check for conditional independencies. If C PVC i,i+j; Sij equals the independence copula, we say that X i and X i+j are (j-th order) partially independent given X Sij The following theorem establishes that there is in general no relation between conditional independence and higher-order partial independence.

Theorem 4.1. (Conditional independence and j-th order partial independence)
The next five-dimensional example illustrates higher-order partial copulas, PPITs, and the relation between partial independence and conditional independence.

Example 4.2.
Consider the following exchangeable D-vine copula C 1:5 which does not satisfy the simplifying assumption: Tree 1: 2) Tree 2: C 13; 2 (a, b|c) = C 24; 3 (a, b|c) = C 35; 4 (a, b|c) Tree 3: Tree 4: The left panel of Figure 2 illustrates the D-vine copula of the data generating process. All conditional copulas of the vine copula in Example 4.2 correspond to the independence copula except for the second tree. For all i = 1, 2, 3, 2 We now investigate the PVC of C 1:5 which is illustrated in the right panel of Figure 2. Since C 1:5 and C PVC 1:5 are exchangeable copulas, we only report the PPITs U PVC 1|2 , U PVC 1|2:3 and U PVC 1|2:4 in the following lemma. 2 The three-dimensional FGM copula is defined as Tree 1: Lemma 4.2 demonstrates that j-th order partial copulas may not be independence copulas, although the corresponding conditional copulas are independence copulas. In particular, under the data generating process the edges of the third tree of C 1:5 are independence copulas. Neglecting the conditional copulas in the second tree and replacing them with first-order partial copulas induces spurious dependencies in the third tree of C PVC 1:5 . The introduced spurious dependence also carries over to the fourth tree where we have (conditional) independence in fact. Nevertheless, the PVC reproduces the bivariate margins of C match the bivariate margins of C 1:5 in Example 4.2. Moreover, the mutual information in the third and fourth tree are larger if higher-order partial copulas are used instead of the true conditional copulas. Thus, the spurious dependence in the third and fourth tree decreases the Kullback-Leibler divergence from C 1:5 and therefore acts as a countermeasure for the spurious (conditional) independence in the second tree. Lemma 4.2 also reveals that U 1|2:4 is a function of U 2 and U 3 , i.e. the true conditional distribution function F 1|2:4 depends on u 2 and u 3 . In contrast, F PVC 1|2:4 , the resulting model for F 1|2:4 which is implied by the PVC, depends only on u 4 . That is, the implied conditional distribution function of the PVC depends on the conditioning variable which actually has no effect.

Approximations based on the partial vine copula
The specification and estimation of SVCs is commonly based on procedures that asymptotically minimize the Kullback-Leibler divergence (KLD) in a stepwise fashion. For instance, if a parametric vine copula model is used, the step-bystep ML estimator (Hobaek Haff [12,13]) is often employed in order to select and estimate the parametric pair-copula families of the vine. In this case, one estimates tree after tree and sequentially minimizes the estimated KLD conditional on the estimates from the previous trees. But also the non-parametric methods of Kauermann and Schellhase [20] and Nagler and Czado [27] proceed in a stepwise manner and asymptotically minimize the KLD of each pair-copula separately under appropriate conditions. In this section, we investigate the role of the PVC when it comes to approximating non-simplified vine copulas.

Tree-by-tree KLD minimization
Let C 1:d ∈ C d and C SVC 1:d ∈ C SVC d . The KLD of C SVC 1:d from the true copula C 1:d is given by where the expectation is taken w.r.t. the true distribution C 1:d . We now decompose the KLD into the Kullback-Leibler divergences related to each of the d − 1 trees. For this purpose, let j = 1, . . . , d − 1 and define so that T 1:j = × j k=1 T k represents all possible SVCs up to and including the j-th tree. Let T j ∈ T j , T 1:j−1 ∈ T 1:j−1 . The KLD of the SVC associated with T 1:d−1 is given by denotes the KLD related to the first tree, and for the remaining trees j = 2, . . . , d − 1, the related KLD is .
For instance, if d = 3, the KLD can be decomposed into the KLD related to the first tree D (1) KL and to the second tree D KL as follows .
Note that the KLD related to tree j depends on the specified copulas in the lower trees because they determine at which values the copulas in tree j are evaluated. The following theorem shows that, if one sequentially minimizes the KLD related to each tree, then the optimal SVC is the PVC. According to Theorem 5.1, if the true copulas are specified in the first tree, one should choose the first-order partial copulas in the second tree. Moreover, the second-order partial copulas should then be specified in the third tree and so on to minimize the KLD tree-by-tree. Theorem 5.1 also remains true if we replace C 2 in the definition of T j by the space of absolutely continuous bivariate cdfs. The PVC ensures that random variables in higher trees are uniformly distributed since the resulting random variables in higher trees are PPITs. If one uses a different approximation, such as the one used by Hobaek Haff, Aas and Frigessi [14] and Stöber, Joe and Czado [37], then the random variables in higher trees are not necessarily uniformly distributed and pseudo-copulas (Fermanian and Wegkamp [7]) can be used to further minimize the KLD.

Global KLD minimization
The previous sequential minimization neglects that the KLD related to a tree depends on the copulas that are specified in the former trees. For instance, if d = 3, the KLD of the first tree D (1) KL (T 1 ) is minimized over the copulas (C SVC 12 , C SVC 23 ) in the first tree T 1 . However, the effect of the chosen copulas in the first tree T 1 on the KLD related to the second tree D (2) KL (T 2 (T 1 )) is not taken into account. Therefore, we now analyze whether the PVC also globally minimizes the KLD. Note that specifying the wrong margins in the first tree T 1 , e.g., (C SVC 12 , C SVC 23 ) = (C 12 , C 23 ), increases D KL (T 1 ) in any case. Thus, without any further investigation, it is absolutely indeterminate whether the definite increase in D (1) KL (T 1 ) can be overcompensated by a possible decrease in D (2) KL (T 2 (T 1 )) if another approximation is chosen. The next theorem shows that the PVC is in general not the global minimizer of the KLD. and Theorem 5.2 states that, if the simplifying assumption does not hold, the KLD may not be minimized by choosing the true copulas in the first tree, firstorder partial copulas in the second tree and higher-order partial copulas in the remaining trees (see (5.4)). It follows that, if the objective is the minimization of the KLD, it may not be optimal to specify the true copulas in the first tree, no matter what bivariate copulas are specified in the other trees (see (5.5)). This rather puzzling result can be explained by the fact that, if the simplifying assumption does not hold, then the approximation error of the implicitly modeled bivariate margins is not minimized (see Lemma 4.1). For instance, if d = 3, a departure from the true copulas (C 12 , C 23 ) in the first tree increases the KLD related to the first tree, but it can decrease the KLD of the implicitly modeled margin C SVC 13 from C 13 . As a result, the increase in D KL can be overcompensated by a larger decrease in D (2) KL , so that the KLD can be decreased. It is an open problem whether and when the PVC can be the global minimizer of the KLD. Unfortunately, the SVC approximation that globally minimizes the KLD is not tractable. However, if the SVC approximation that minimizes the KLD does not specify the true copulas in the first tree, the random variables in higher trees are not CPITs. Thus, it is not guaranteed that these random variables are uniformly distributed and we could further decrease the KLD by assigning pseudo-copulas (Fermanian and Wegkamp [7]) to the edges in the higher trees. It can be easily shown that the resulting best approximation is then a pseudo-copula. Consequently, the best approximation satisfying the simplifying assumption is in general not a SVC but a simplified vine pseudo-copula if one considers the space of regular vines where each edge corresponds to a bivariate cdf.
While the PVC may not be the best approximation in the space of SVCs, it is often the best feasible SVC approximation in practical applications. That is because the stepwise specification and estimation of a SVC is feasible for (very) large dimensions which is not true for a joint specification and estimation. For instance, assume all pair-copula families of a parametric vine copula are chosen simultaneously and the selection is done by means of information criteria. In this case, we have to estimate K d(d−1)/2 different models, where d is the dimension and K the number of possible pair-copula families that can be assigned to each edge. On the contrary, a stepwise procedure only requires the estimation of Kd(d − 1)/2 models. To illustrate the computational burden, consider the Rpackage VineCopula [34] where K = 40. For this number of pair-copula families, a joint specification requires the estimation of 64,000 (d = 3) or more than four billion (d = 4) models whereas only 120 (d = 3) or 240 (d = 4) models are needed for a stepwise specification. For many non-parametric estimation approaches (kernels [27], empirical distributions [15]), only the sequential estimation of a SVC is possible. The only exception is the spline-based approach of Kauermann and Schellhase [20]. However, due to the large number of parameters and the resulting computational burden, a joint estimation is only feasible for d ≤ 5 [19].

Convergence to the partial vine copula
If the data generating process satisfies the simplifying assumption, consistent stepwise procedures for the specification and estimation of parametric and nonparametric SVC models asymptotically minimize the KLD from the true copula. Theorem 5.1 implies that this is not true in general if the data generating process does not satisfy the simplifying assumption. An implication of this result for the application of SVCs is pointed out in the next corollary. Letθ S denote the (semi-parametric) step-by-step ML estimator andθ J denote the (semi-parametric) joint ML estimator defined in Hobaek Haff [12,13]. Under regularity conditions (e.g., Condition 1 and Condition 2 in [36]) and for N → ∞, it holds that:

Step-by-step and joint ML estimates: Theory
Corollary 6.1 shows that the step-by-step and joint ML estimator may not converge to the same limit (in probability) if the simplifying assumption does not hold for the data generating vine copula. For this reason, we investigate in the following the difference between the step-by-step and joint ML estimator in finite samples. Note that the convergence of kernel-density estimators to the PVC has recently been established by Nagler and Czado [27]. However, in this case, only a sequential estimation of a SVC is possible and thus the best feasible approximation in the space of SVCs is given by the PVC.

Step-by-step and joint ML estimates: Simulation study
We compare the step-by-step and the joint ML estimator under the assumption that the pair-copula families of the PVC are specified for the parametric vine copula model. For this purpose, we simulate data from two three-dimensional copulas C 1:3 with sample sizes N = 500, 2500, 25000, perform a step-by-step and joint ML estimation, and repeat this 1000 times. For ease of exposition and because the qualitative results are not different, we consider copulas where C 12 = C 23 and only present the estimates for (θ 12 , θ 13;2 ).

Example 6.1. (PVC of the Frank copula)
Let C Fr (θ) denote the bivariate Frank copula with dependence parameter θ and C P-Fr (θ) be the partial Frank copula [35] with dependence parameter θ. Let C 1:3 be the true copula with (C 12 , C 23 , C 13; 2 ) = (C Fr (5.74), C Fr (5.74), C P-Fr (5.74)), i.e., C 1:3 = C PVC 1:3 , and C SVC 1:3 (θ) = (C Fr (θ 12 ), C Fr (θ 23 ), C P-Fr (θ 13;2 )) be the parametric SVC that is fitted to data generated from C 1:3 . Example 6.1 presents a data generating process which satisfies the simplifying assumption, implying θ PVC = θ . It is the PVC of the three-dimensional Frank copula with Kendall's τ approximately equal to 0.5. Figure 3 shows the corresponding box plots of joint and step-by-step ML estimates and their difference. The left panel confirms the results of Hobaek Haff [12,13]. Although the joint ML estimator is more efficient, the loss in efficiency for the step-by-step ML estimator is negligible and both estimators converge to the true parameter value. Moreover, the right panel of Figure 3 shows that the difference between joint and step-by-step ML estimates is never statistically significant at a 5% level. Since the computational time for a step-by-step ML estimation is much lower than for a joint ML estimation [12], the step-by-step ML estimator is very attractive for estimating high-dimensional vine copulas that satisfy the simplifying assumption. Moreover, the step-by-step ML estimator is then inherently suited for selecting the pair-copula families in a stepwise manner. However, if the simplifying assumption does not hold for the data generating vine copula, the step-by-step and joint ML estimator can converge to different limits (Corollary 6.1), as the next example demonstrates.

Example 6.2. (Frank copula)
Let C 1:3 be the Frank copula with dependence parameter θ = 5.74, i.e., C 1:3 = C PVC 1:3 , and C SVC 1:3 (θ) = (C Fr (θ 12 ), C Fr (θ 23 ), C P-Fr (θ 13;2 )) be the parametric SVC that is fitted to data generated from C 1:3 . Example 6.2 is identical to Example 6.1, with the only difference that the conditional copula is varying in such a way that the resulting three-dimensional copula is a Frank copula. Although the Frank copula does not satisfy the simplifying assumption, it is pretty close to a copula for which the simplifying assumption holds, because the variation in the conditional copula is strongly limited for many Archimedean copulas (Mesfioui and Quessy [26]). Nevertheless, the right panel of Figure 4 shows that the step-by-step and joint ML estimates for θ 12 are significantly different at the 5% level if the sample size is 2500 observations. The difference between step-by-step and joint ML estimates for θ 13; 2 is less pronounced, but also highly significant for sample sizes with 2500 observations or more. In Example 6.2 the step-by-step ML estimator is not a consistent estimator of the SVC model that minimizes the KLD from the underlying copula. In contrast, the joint ML estimator is still a consistent minimizer. A third example where the distance between the data generating copula and the PVC and thus the difference between the step-by-step and joint ML estimates is more pronounced is given in Appendix A.9.

Conclusion
We introduced the partial vine copula (PVC) which is a particular simplified vine copula (SVC) that coincides with the data generating copula if the simplifying assumption holds. The PVC can be regarded as a generalization of the partial correlation matrix where partial correlations are replaced by j-th order partial copulas. While a higher-order partial copula of the PVC is related to the partial copula, it does not suffer from the curse of dimensionality and can be estimated for high-dimensional data [27]. We analyzed to what extent the dependence structure of the underlying distribution is reproduced by the PVC. In particular, we showed that a pair of random variables may be considered as conditionally (in)dependent according to the PVC although this is not the case for the data generating process.
We also revealed the importance of the PVC for the modeling of highdimensional distributions by means of SVCs. Up to now, the estimation of vine copulas has almost always been based on the assumption that the data generating process satisfies the simplifying assumption. Moreover, the implications that follow if the simplifying assumption is not true have not been investigated. We showed that the PVC is the SVC approximation that minimizes the Kullback-Leibler divergence in a stepwise fashion. Since almost all estimators of SVCs proceed sequentially, it follows that many estimators of SVCs converge to the PVC regardless of whether the simplifying assumption holds. However, we also proved that the PVC may not minimize the Kullback-Leibler divergence from the true copula and thus may not be the best SVC approximation in theory. Nevertheless, due to the prohibitive computational burden or simply because only a stepwise model specification and estimation is possible, the PVC is often the best feasible SVC approximation in practice.
The analysis in this paper showed the relative optimality of the PVC when it comes to approximating multivariate copulas by SVCs. Obviously, it is easy to construct (theoretical) examples where the PVC does not provide a good approximation in absolute terms. But such examples do not provide any information about the appropriateness of the simplifying assumption in practice. To investigate whether the PVC is a good approximation in applications, one can use Lemma 3.1 to develop tests for the simplifying assumption, see Kurz and Spanhel [24]. Moreover, even in cases where the simplifying assumption is strongly violated, an estimator of the PVC can yield an approximation that is superior to competing approaches. Recently, it has been demonstrated in Nagler and Czado [27] that the PVC can be used to obtain a constrained kernel-density estimator that outperforms unconstrained kernel-density estimators.

A.3. Proof of Theorem 4.1
W.l.o.g. assume that the margins of X 1:d are uniform. Let C F GM3 (u 1: . . , d − 2} be fixed. Assume that C 1:d has the following D-vine copula representation of the non-simplified form This proves that C i,i+2; i+1 = C ⊥ ⇐ C PVC i,i+2; i+1 = C ⊥ is not true in general and that, for j ≥ 3, neither the statement C i,i+j;

A.4. Proof of Lemma 4.2
We show a more general result and set in ( For i = 1, 2, 3, the copula in the second tree of the PVC is given by and, by symmetry, For i = 1, 2, the joint distribution of these first-order PPITs is a copula in the third tree of the PVC which is given by where θ := 4( [0,1] ug(u)du) 2 > 0, by the properties of g. Thus, a copula in the third tree of the PVC is a bivariate FGM copula whereas the true conditional copula is the independence copula. The CPITs of U 1 or U 5 w.r.t.
whereas the corresponding second-order PPITs are given by For the copula in the fourth tree of the PVC it holds = 0. By setting γ := −2 [0,1] ug(u)du we can write the copula function as , the quantile function is given by (cf. Remillard [32])
Evaluating the density shows that C PVC 15; 2:4 is not the independence copula.

A.5. Proof of Theorem 5.1
The KLD related to tree j, D KL (T j (T 1:j−1 )), is minimized when the negative cross entropy related to tree j is maximized. The negative cross entropy related to tree j is given by To minimize the KLD related to tree j +1 =: n w.r.t. T n , conditional on T 1:n−1 = T PVC 1:n−1 , we have to maximize the negative cross entropy which is maximized if is maximized for all i = 1, . . . , d − n. Using the substitutions we obtain

Equation (5.3) is obvious, since C PVC
1:d is the data generating process. Equation (5.5) immediately follows from the equations (5.1) and (5.4). Using the same arguments as in Appendix A.2, the validity of (5.4) for d = 3 implies the validity of (5.4) for d ≥ 3. However, even for d = 3, the KLD is a triple integral and does not exhibit an analytical expression if the data generating process is a non-simplified vine copula. Thus, the hard part is to show that there exists a data generating copula which does not satisfy the simplifying assumption and for which the PVC does not minimize the KLD. We prove equation (5.4) for d = 3 by means of the following example.
We now derive necessary and sufficient conditions such that It depends on the data generating process whether the condition in Lemma A.1 is satisfied and D KL (C 1:3 ||C SVC 12 (0), C 23 , C PVC 13; 2 ) is an extremum or not as we illustrate in the following. If θ PVC 13;2 = 0, then K(u 1 ; θ PVC 13;2 ) = 0 for all u 1 ∈ (0, 1), or if g does not depend on u 2 , then h(u 1 ; g) = 0 for all u 1 ∈ (0, 1). Thus, the integrand in (A.16) is zero and we have an extremum if one of these conditions is true. Assuming θ PVC 13;2 = 0 and that g depends on u 2 , we see from (A.16) that g and C SVC 12 determine whether we have an extremum at θ 12 = 0. Depending on the copula family that is chosen for C SVC 12 , it may be possible that the copula family alone determines whether D KL (C 1:3 ||C SVC 12 (0), C 23 , C PVC 13; 2 ) is an extremum. For instance, if C SVC 12 is a FGM copula we obtain This symmetry of h across 0.5 implies that (A.16) is satisfied for all functions g.
If we do not impose any constraints on the bivariate copulas in the first tree of the SVC approximation, then D KL (C 1:3 ||C SVC 12 (0), C 23 , C PVC 13; 2 ) may not even be a local minimizer of the KLD. For instance, if C SVC 12 is the asymmetric FGM copula given in (4.1), we find that If Λ := 1 0 (1 − 2u 2 )g(u 2 )du 2 = 0, e.g., g is a non-negative function which is increasing, say g(u 2 ) = u 2 , then, depending on the sign of Λ, either h(0.5 + u 1 ; g) > h(0.5 − u 1 ; g), ∀u 1 ∈ (0, 0.5), or h(0.5 + u 1 ; g) < h(0.5 − u 1 ; g), ∀u 1 ∈ (0, 0.5), so that the integrand in (A.16) is either strictly positive or negative and thus D KL (C 1:3 ||C 12 , C 23 , C PVC 13; 2 ) can not be an extremum. Since θ 12 ∈ [−1, 1], it follows that D KL (C 1:3 ||C SVC 12 (0), C 23 , C PVC 13; 2 ) is not a local minimum. As a result, we can, relating to the PVC, further decrease the KLD from the true copula if we adequately specify "wrong" copulas in the first tree and choose the first-order partial copula in the second tree of the SVC approximation.

A.7. Proof of Lemma A.1
The KLD attains an extremum if and only if the negative cross entropy attains an extremum. The negative cross entropy is given by where ∂ 1 c SVC 13; 2 (u, v; θ PVC 13;2 ) is the partial derivative w.r.t. u and we have used Leibniz's integral rule to perform the differentiation under the integral sign for the second last equality which is valid since the integrand and its partial derivative w.r.t. θ 12 are both continuous in u 1:3 and θ 12 on (0, 1) 3 × (−1, 1).
To compute the integral we observe that  Note that if θ PVC 13;2 = 0, then K(u 1 ; θ PVC 13;2 ) = 0 for all u 1 ∈ (0, 1), or if g does not depend on u 2 , then h(u 1 ; g) = 0 for all u 1 ∈ (0, 1), so in both cases the integrand is zero and we have an extremum.

A.8. Proof of Corollary 6.1
Corollary 6.1 (i) and (ii) follow directly from Theorem 1 in Spanhel and Kurz [36], which states the asymptotic distribution of approximate rank Z-estimators if the data generating process is not nested in the parametric model family. Corollary 6.1 (iii) follows then from Theorem 5.2 and Theorem 5.1.

A.9. An example where the difference betweenθ S andθ J is more pronounced
Example A.2.
Note that g is a sigmoid function, with (g(0), g(1)) = (−0.2, √ 7/5), so that Spearman's rho of the conditional copula C Sar (g(u 2 )) varies in the interval (g(0), g(1)) = (−0.2, √ 7/5) because ρ C Sar = α. Figure 5 shows that the difference between step-by-step and joint ML estimates for the two parameters of the first copula in the first tree is already (individually) significant at the 5% level if the sample size is 500 observations. Thus, the difference between step-by-step and joint ML estimates can be relevant for moderate sample sizes if the variation in the conditional copula is strong enough. Once again, the difference between step-by-step and joint ML estimates is less pronounced for the parameters of C SVC 13;2 but it also becomes highly significant with sufficient sample size.