Stationary vine copula models for multivariate time series

Multivariate time series exhibit two types of dependence: across variables and across time points. Vine copulas are graphical models for the dependence and can conveniently capture both types of dependence in the same model. We derive the maximal class of graph structures that guarantee stationarity under a natural and verifiable condition called translation invariance. We propose computationally efficient methods for estimation, simulation, prediction, and uncertainty quantification and show their validity by asymptotic results and simulations. The theoretical results allow for misspecified models and, even when specialized to the iid case, go beyond what is available in the literature. Their proofs are based on new results for general semiparametric method-of-moment estimators, which shall be of independent interest. The new model class is illustrated by an application to forecasting returns of a portfolio of 20 stocks, where they show excellent forecast performance. The paper is accompanied by an open source software implementation.


Introduction
In multivariate time series there are two types of dependence: cross-sectional and serial. The first is dependence between variables at a fixed point in time. The second is dependence of two random vectors at different points in time. Copulas are general dependence models and have been used for both types. One line of research considers copula models for serial dependence in univariate Markov processes (including Darsow et al., 1992, Chen and Fan, 2006b, Chen et al., 2009, Beare, 2010). An orthogonal, but equally popular approach is to filter serial dependence by univariate time series models, like the ARMA-GARCH family, and model the cross-sectional dependence by a copula for the residuals (Patton, 2006, Hu, 2006, Chen and Fan, 2006a, Oh and Patton, 2017, Chen et al., 2021. See also Patton (2009Patton ( , 2012, Aas (2016) for surveys in the context of financial and economic time series.
Copulas can be used to capture both types of dependence in a single model (e.g., Rémillard et al., 2012, Simard andRémillard, 2015). In this context, vine copulas Cooke, 2002, Aas et al., 2009) have been proven particularly useful. Vine copulas are graphical models that build a d-dimensional dependence structure from two-dimensional building blocks, called pair-copulas.
The underlying graph structure consists of a nested sequence of trees, called vine. Each edge is associated with a pair-copula and each pair-copula encodes the (conditional) dependence between a pair of variables. , , and  proposed different vine structures suitable for time series models. The three models are quite similar. The vine graphs start with copies of a cross-sectional tree that connect variables observed at the same point in time. These trees are constrained to be either stars  or paths (Smith, 2015, Beare and. Cross-sectional trees are then linked by a specific building plan. Inspired by the three latter works, we propose more flexible vine models for stationary time series (Section 3). But we approach the problem from the opposite direction. Previous models aim to guarantee stationarity of the model through a condition called translation invariance : pair-copulas stay the same when corresponding random variables are shifted in time.
Translation invariance is a necessary condition for stationarity and the only practicable condition to check. We derive a characterization of the class of vines for which translation invariance is also sufficient for stationarity (Theorems 1 and 2). The class allows for general vine structures for cross-sectional dependence and leaves flexibility for linking them across time. The class includes the D-vine and M-vine models of  and  as special cases, but not the COPAR model of . Hence, the COPAR model is not stationary in general (see Example 1).
For practical purposes, it is convenient to restrict to Markovian models, which are easily obtained by placing independence copulas on most edges in the vine (Theorem 3). Parameters of such models can be estimated by adapting the popular sequential maximum-likelihood method to take time invariances into account. In Section 4, we show consistency and normality of parametric and semiparametric versions of this method (Theorems 4 to 7). By simulating from an estimated model, conditionally on the past, we can easily compute predictions. In Section 5, we translate this into a theoretical framework and prove consistency and asymptotic normality. The results also cover Monte Carlo integrals from estimated iid models, which are widely used but have not been formalized yet. We further propose a computationally efficient bootstrap method for both parameter estimates and predictions in Section 6. In Section 7, we apply the methodology to forecast portfolio returns, showing that our generalized models improve both performance and interpretability. Section 8 offers concluding remarks. Abstract mathematical results on general method-of-moment estimators, which empower all our main theorems, are stated in Appendix A. Additional illustrations, simulation results, and all proofs are provided in the supplementary materials. All methodology is implemented in the open source R package svines (available at https://github.com/tnagler/svines), which is built on top the C++ library rvinecopulib .

Multivariate time series based on vine copulas 2.1. Copulas
Copulas are models for the dependence in a random vector. By Sklar's theorem (Sklar, 1959), any multivariate distribution F of a d−dimensional random vector X = (X 1 , . . . , X d ) with marginal distributions F 1 , . . . , F d can be expressed as for some function C : [0, 1] d → [0, 1] called copula. It characterizes the dependence in F because it determines how margins interact. If F is continuous, then C is the unique joint distribution function of the random vector U = (F 1 (X 1 ), . . . , F d (X d )) . A similar formula can be stated for the density provided F is absolutely continuous: where c is the density of C and called the copula density, and f 1 , . . . , f d are the marginal densities.

Regular vines
Vine copulas are a particularly flexible class of copula models. They are based on an idea of Joe (1996,1997) to decompose the copula into a cascade of bivariate copulas. This decomposition is not unique, but all possible decomposition can be organized as a graphical model, called regular vine (R-vine) (see, Cooke, 2001, 2002). We shall briefly outline the basics of R-vines; for more details on R-vines, we refer to Dissmann et al. (2013), Joe (2014), Czado (2019). Additional illustrations of the following definitions can be found in Section S1 of the supplementary materials.
A regular vine is a sequence of nested trees. A tree (V, E) is a connected acyclic graph consisting of vertices V and edges E.
Definition 1. A collection of trees V = (V k , E k ) d−1 k=1 on a set V 1 with d elements is called R-vine if (i) T 1 is a tree with vertices V 1 and edges E 1 , (ii) for k = 2, . . . , d − 1, T k is a tree with vertices V k = E k−1 , (iii) ( proximity condition) for k = 2, . . . , d − 1: if vertices a, b ∈ V k are connected by an edge e ∈ E k , then the corresponding edges a = {a 1 , a 2 }, b = {b 1 , b 2 } ∈ E k−1 , must share a common vertex: |a ∩ b| = 1. Figure 1 shows a graphical example of special regular vine, called D-vine. We call a vine a D-vine if each tree is a path. A tree is a path if each vertex is connected to at most two other vertices.
Such a structure is most natural when there is a natural ordering (e.g., in time or space) of the variables. When one tree of the vine is a path, all trees at higher levels are fixed uniquely by the proximity condition. Another prominent sub-class are C-vines, where each tree is a star (see also Figure S2 in the supplementary material). A tree is a star if there is one vertex that is connected to all remaining vertices. This structure is most natural when there is a single variable driving the others (e.g., a market factor driving individual stocks). In that case, the proximity condition poses no restrictions on the next tree.
The connection of regular vines to a decomposition of the dependence becomes apparent through a specific labeling of the edges. Each edge corresponds to a pair of random variables conditioned on some others. This is encoded in the conditioned and conditioning sets of an edge. We first need the definition of a complete union.
Definition 2. The complete union of an edge e ∈ E k is given by U e = {i ∈ V 1 | i ∈ e 1 ∈ e 2 ∈ · · · ∈ e for some (e 1 , . . . , e k−1 ) ∈ E 1 × · · · × E k−1 } and for a singleton i ∈ V 1 it is given by the singleton, i.e. U i = {i}.
In other words, the complete union of an edge e ∈ E k is just the set of all vertices from the first tree T 1 , which are involved in the iterative construction of the edge e.
(i) The conditioning set of an edge e connecting v 1 with v 2 is D e = U v 1 ∩ U v 2 .  (ii) The conditioned set of an edge e connecting v 1 with v 2 is defined as (a e , b e ), where a e = U v 1 \D e and b k = U v 2 \ D e .
We will then label an edge by e = (a e , b e |D e ).
Definition 3 complements Definition 1 by relating each edge in the R-vine to a pair-wise conditional distribution, as we shall see in the following section. Note that the conditioning set D e is empty for edges of the first tree level.

Vine copulas
By fixing the marginal distributions F 1 , . . . , F d , it suffices to consider a random vector U = (F 1 (X 1 ), . . . , F d (X d )) with standard uniform margins to describe the dependence structure of X.
A vine copula model for the U identifies each edge of an R-vine with a bivariate copula. We shall write the model as (V, C(V)), where V = (V k , E k ) d−1 k=1 is the vine structure, d the number of variables, and C(V) = {c e : e ∈ E k , k = 1, . . . , d − 1} the set of associated bivariate copulas. As an example, consider the regular vine shown in Figure 1. The vertices in the first tree represent the random variables U 1 , . . . , U 5 . All edges connecting them are identified with a bivariate copula (or pair-copula). The edge (a e , b e ) then encodes the dependence between U ae and U be . In the second tree, the edges have labels (a e , b e |D e ) and encode the dependence between U ae and U be conditional on U De . In the following trees, the number of conditioning variables increases. Bedford and Cooke (2001) showed that the density of a such a copula model for the vector U has a product form: k=1 e∈E k c ae,be|De u ae|De , u be|De | u De , where u ae|De := C ae|De (u ae | u De ), u De := (u l ) l∈De is a subvector of u = (u 1 , . . . , u d ) ∈ [0, 1] d and C ae|De is the conditional distribution of U ae given U De . The functions c ae,be|De are copula densities describing the dependence of U ae and U be conditional on U De = u e . For every e ∈ E k , the conditional distributions C ae|De can be expressed recursively as where e ∈ E k−1 , a e = a e , b e ∈ D e and D e = D e \ b e . At the end of the recursion, the right hand side involves an edge e ∈ E 1 , for which u a e |D e = u a e and u b e |D e = u b e .
To make the model tractable, one commonly ignores the influence of u De on the pair-copula density c ae,be|De . Under this assumption, the density simplifies to Since each pair-copula can be modeled separately, simplified vine copulas remain quite flexible.
Conditional independence properties can be imposed by setting appropriate pair-copulas to the independence copula. This shall prove convenient when we construct Markovian time series models in Section 3.5. We further note that a similar factorization holds when some variables are discrete (see, Stöber, 2013, Section 2.1). Although both continuity and the simplifying assumption are immaterial for our theoretical results, we will stick to the simplified, continuous case for convenience.
For a more extensive introduction to vine copulas, we refer to Aas et al. (2009) and Czado (2019).

Vine copula models for multivariate time series
Now suppose (X t ) t=1,...,n = (X t,1 , . . . , X t,d ) t=1,...,T is a stationary time series, whose cross-sectional and temporal dependence structure we model by a vine copula. Throughout the paper, stationarity refers to strict stationarity. By fixing the stationary marginal distributions F 1 , . . . , F d , it suffices to consider time series (U t ) t=1,...,T = (U t,1 , . . . , U t,d ) t=1,...,T of marginally standard uniform variables, Note that the random variables U t,j in our model have two sub-indices. The first sub-index t indicates the time point and the second sub-index j determines the marginal variable. In the time series context, each vertex of a vine's first tree is identified with a tuple (t, i), where t is the time index and i is the variable index. The vertex (t, i) corresponds to the random variable U t,i . In particular, the components of edge labels e = (a e , b e |D e ) (i.e., a e , b e as well as elements of D e ) are tuples.
All existing regular vine models for multivariate time series follow the same idea . There is a vine capturing cross-sectional dependence of U t ∈ R d for all time points t = 1, . . . , T . The first trees of the cross-sectional structures at time t and t + 1 are then linked by one edge connecting a vertex from the structure at t to one vertex from the one at t + 1. Because the time series is stationary, it is reasonable to assume that the cross-sectional structure and the linking vertices are time invariant. The existing models make specific choices for the cross-sectional structure and connecting edge. Their first tree for a four-dimensional model on three time points is illustrated in Figures 2 to 4. Graphs of the first five trees T 1 , . . . , T 5 and additional details can be found in Section S2 of the supplementary materials. In short, there are three different models: • D-vine of Smith (2015): (i) the cross-sectional structure is a D-vine, (ii) two cross-sectional D-vines at time points t and t + 1 are connected at the two distinct vertices that lie at opposite borders of the D-vine trees. For the first tree, illustrated in Figure 2, we can assume without loss of generality that these vertices are (t, d) and (t + 1, 1). With these choices, there is only one global vine model satisfying the proximity condition, which is a long D-vine spanning all variables at all time points.
• M-vine of : (i) the cross-sectional structure is a D-vine, (ii) two crosssectional D-vines at time points t and t + 1 are connected at one vertex that lies at the same border of the D-vine trees. For the first tree, illustrated in Figure 3, we may assume without loss of generality that the vertices (t, 1) and (t + 1, 1) are connected. With the additional restriction that vertices of adjacent time points are connected first, this also uniquely fixes all further trees of the vine.
• COPAR of : (i) the cross-sectional structure is a C-vine, (ii) the first trees of two C-vines at time points t and t + 1 are connected at the root vertex of the C-vine. For the first tree, illustrated in Figure 4, we may assume without loss of generality that vertices (t, 1) and (t + 1, 1) are connected. This leaves a lot of flexibility for higher trees and the authors settled on a specific set of rules. In particular, the model contains all edges of a D-vine on the variables U 1,1 , U 2,1 , . . . , U T,1 .
There is obvious potential for generalization. First, we would like to allow for arbitrary R-vines in the cross-sectional structure. Second, we would like to connect two cross-sectional trees at arbitrary variables. Specific versions of such models were constructed in preliminary work by  (called temporal vine) and in unpublished work by Harry Joe. But where should we stop? In principle, we could take any (T × d)-dimensional vine as a model for the vector (U 1 , . . . , U T ). We address this question comprehensively in the following section.

Stationary vine copula models
The time series context is special. To facilitate inference, it is common to assume that the series is stationary, i.e., the distribution is invariant in time. When a time series is stationary, also its copula must satisfy certain invariances. This is a blessing and a curse: invariances reduce the complexity of the model, but not all vine structures guarantee stationarity under practicable conditions on the pair-copulas. We shall derive a generalization of the existing models that is maximally convenient in this sense. All proofs are collected in Section S5 of the supplementary material.

Stationary time series
As explained in Section 2.4, any stationary time series can be transformed into one with uniform marginal distributions by the probability integral transform. To ease our exposition, we shall therefore assume uniform marginal distributions in what follows. Let (V, C(V)) be a vine copula model for the random vector (U 1 , . . . , U T ) ∈ R T ×d . The time series U 1 , . . . , U T ∈ R d is strictly stationary if and only if U t 1 , . . . , U tm and U t 1 +τ , . . . , U tm+τ have the same joint distribution for all For vine copulas, this condition can involve intractable functional equations. The reason is that only some pairwise (conditional) dependencies are explicit in the model. Explicit pairs are those that correspond to edges in the vine V. All other dependencies are only implicit, i.e., they are characterized by the interplay of multiple pair-copulas. By only focusing on the explicit pairs, we see that translation invariance, defined below, is a necessary condition for stationarity. Recall that, in the time series context, the elements of the set V 1 of a vine are tuples (t, j) with time index t = 1, . . . , T and variable index j = 1, . . . , d.
Definition 4 (Translation invariance). A vine copula model (V, C(V)) on the set V 1 = {1, . . . , T }× {1, . . . , d} is called translation invariant if c ae,be|De = c a e ,b e |D e holds for all edges e, e ∈ T d−1 k=1 E k for which there is τ ∈ Z such that a e = a e + (τ, 0), b e = b e + (τ, 0), D e = D e + (τ, 0), where the last equality is short for D e = {v + (τ, 0) : v ∈ D e }. Remark 1. The notation e = e + (τ, 0) will be used short for (1) and indicates a shift in time by τ steps. For example, if a e = (t, j) then a e = (t + τ, j) and similarly for b e and D e .
Translation invariance was formally defined by , and also used as an implicit motivation for the models of  and . For example, in the M-vine of  shown in Figure 3, the copulas associated with the edges (1, 1) − (1, 2) and (2, 1) − (2, 2) must be the same. As mentioned above, this is only a necessary condition for stationarity. For all non-explicit pairs, stationarity requires more complex integral equations to hold. Provided with sufficient computing power, they could be checked numerically for any given model. But even if it holds for a specific model, a slight change in the parameter of a single pair-copula may break it. This is problematic in practice. Hence, the practically relevant vine structures are those for which translation invariance is also a sufficient condition for stationarity.
These structures are characterized in what follows.

Preliminaries
First we need some graph theoretic definitions. The first is a version of Definition 6 of .
is called restriction of V on the time points t, . . . , t + m.
Simply put: to restrict a vine on time points t to t + m, we delete all edges and vertices where time indices outside the range [t, t + m] appear in the labels. Note that the restriction is not necessarily a vine; the graphs (V k , E k ) can be disconnected (hence, no trees). For example, if the first tree of the vine V contains a path (1, i) − (3, i) − (2, i), the vertices (1, i) and (2, i) will be disconnected in V 1,2 .
The translation of a vine V is obtained by shifting all vertices and edges in time by the same amount.
Definition 6 (Translation of vines). Let m ≥ 0, and V = (V k , E k ) be a vine on {s, . . . , s + m} × {1, . . . , d}. We say that V is a translation of V (denoted by V ∼ V ) if for all k = 1, . . . , d − 1 and edges e ∈ E k , there is an edge e ∈ E k such that e = e + (t − s, 0) (and vice versa).
Remark 2. We shall call two edges e, e satisfying e = e + (τ, 0) translations of another and write e ∼ e . This defines an equivalence relationship between edges.
Additional illustrations of these concepts are provided in Section S1 of the supplementary material.

Stationary vines
The last definitions are key to ensure stationarity in vine copula models. If for all time points s, t and gap m, the restriction of a vine on t, . . . , t + m is a translation of the restriction on s, . . . , s + m, translation invariance will guarantee stationarity.
An important word in condition (i) is all. There are vines violating (ii) that are stationary for a specific choice of C(V). For example, c e ≡ 1 for all edges always leads to a stationary model. But these structures are impractical, because they limit the choices of C(V) to a restrictive and unknown set. Condition (ii) can be seen as a graph theoretic notion of stationarity for vine structures. S-vines have a distinctive structure. There is a d-dimensional vine V (0) that contains only pairs for cross-sectional dependence. We will therefore call V (0) the cross-sectional structure of V. Next, there is a 2d-dimensional vine V (1) that nests two duplicates of V (0) . Besides these cross-sectional parts, the vine contains d 2 pairs for dependence across two subsequent time points that are not yet constrained by translation invariance. A similar principal applies for vines V (m) , m ≥ 2, with d 2 unconstrained edges entering in every step (i.e., going from V (m−1) to V (m) ). An illustrative example for a five-dimensional S-vine on three time points is given in Section S2.5 of the supplementary materials.
It is easy to check that M-vines and D-vines of  and  are stationary. As the following example shows, the structure of the COPAR model of  is not stationary, however. In particular, the graph V t,t+1 is not a vine for t ≥ 2 because the second level of the restricted graph is disconnected. This poses an additional constraint on the choice of pair copulas that went seemingly unnoticed. for d = 2, T = 3. The second tree of the model is given in Figure 5, see also Figure S11 in the supplement for the remaining trees. For simplicity, we assume that all pair-copulas in trees k = 1 and k ≥ 3 are independence copulas. The restriction V 2,3 of the model is obtained by deleting all vertices and edges where a time index 1 occurs. Clearly, V 2,3 is not a vine, because the vertex (U 2,1 , U 2,2 ) is disconnected from the others. Now let us see why this is problematic. The joint copula density of vertices (U 2,1 , U 2,2 ) and (U 2,1 , U 3,1 ) equals the product of copulas associated with the edges along the path joining them, integrating over all intermediate vertices. That is, By translation invariance, it must further hold c (1,2),(2,1)|(1,1) (u, v) = 1 0 c (1,1),(2,2)|(2,1) (w, u)c (1,1),(3,1)|(2,1) (w, v)dw.
The copula on the left hand side is an explicit dependence in the model, because it is associated with an edge in the graph (the leftmost one). Thus the equation contains three pair-copulas of the model that are not constrained by translation invariance. For most combinations of pair-copulas, the equality does not hold and the model is not stationary.

An explicit characterization of stationary vines
Stationary vines can also be characterized more explicitly. Somewhat surprisingly, it suffices to pick a cross-sectional structure V (0) and two permutations of (1, . . . , d).
As mentioned earlier, S-vines generalize previous models: (iii) If we choose i s , j s , iteratively for s ≥ 2 as the smallest compatible indices, we obtain the T-vine model of .
Compared to the models of  and , S-vines do not require the cross-sectional structure to be D-vines. Further, we have some degree of freedom in how we connect variables across different time points. This relaxation can improve both interpretability and performance of associated copula models, as illustrated in Section 7 and Section S5 of the supplementary materials. The explicit characterization of Theorem 2 also makes it easy to establish conditions for existence and uniqueness of a stationary vine. The first step is to show that a compatible permutation always exists.
Lemma 1. For d-dimensional vine V and any i 1 ∈ {1, . . . , d}, there exists at least one permutation Now the following result is an immediate consequence of Theorem 2 and Lemma 1.

Corollary 1.
(i) (Existence) For any vine V * , there exists a stationary vine with cross-sectional structure (ii) (Uniqueness) Given a cross-sectional structure V (0) and two sequences of compatible in-and out-vertices, the stationary vine is unique.
In a stationary vine copula model, cross-sectional dependencies are associated with the same pair copulas for each time point. Similarly, serial dependencies are modeled with identical copulas for each lag. This significantly reduces the number of free pair-copulas in the model. We only need to specify (T − 1)d 2 + d(d − 1)/2 = O(T d 2 ) of them, all other pair-copulas are constrained by translation invariance. When the time series contains more than a few dozen time points, this is still too much. Most popular time series models also satisfy the Markov property: The Markov property limits complexity further. For the M -vine model,  Theorem 4) showed that it is equivalent to an independence constraint on the pair-copulas, used similarly by the Markovian models of  and . The same arguments apply for the general class of stationary vines.  In a stationary Markov model of order p, the independence copula is assigned to all edges reflecting serial dependence of lags larger than p. This reduces the number of distinct pair copulas further to Table 1 shows the number of distinct copulas in an unrestricted model for the full time series, a stationary vine model, and a stationary vine model with Markov order p = 1, 2. We can see a significant reduction when imposing stationarity and the Markov property.

Parameter estimation
Joint maximum-likelihood is unpopular for vine copula models, because they have many parameters even in moderate dimension.  discussed a version of the popular step-wise maximum likelihood estimator (Aas et al., 2009) for M-vine copula models, but without theoretical guarantees. We shall introduce such a method for the more general class of stationary vines and prove its validity, allowing for either parametric and nonparametric marginal models. The step-wise method is fast also for large models, but incurs a small loss in efficiency according to Hobaek Haff (2013).

Estimation of marginal models
We follow the common practice to estimate marginal models first. Given estimates " F 1 , . . . , " F d of the marginal distributions, the copula parameters can then be estimated based on 'pseudo-observations' Suppose we are given parametric models f j (·; η j ), j = 1, . . . , d, for the marginal densities. Then the parameters can be estimated by the maximum-likelihood-type estimator Given estimates of the marginal parameters, we then generate pseudo-observations from the copulas We can also consider semiparametric copula models by estimating the marginal distributions by empirical distribution functions "

Estimation of copula parameters
as the stacked parameter vector.
The joint (pseudo-)log-likelihood of a stationary vine copula model for ( where ( " U 1 , . . . , " U T ) can be either the parametric or nonparametric pseudo-observations. The joint MLE, arg max θ (θ), is often too demanding. The step-wise MLE of Aas et al. (2009) estimates the parameters of each pair-copula separately, starting from the first tree. We can adapt it to the setting of a Markov process of order p: for k = 1, . . . , d(p + 1) − 1 and every e ∈ E k (4) only contain parameter estimates from previous trees, i.e., ones that were already found in earlier iterations.

Asymptotic results
In what follows, we establish consistency and asymptotic normality of the parametric and semiparametric parameter estimates. Their proofs are given in Section S6 of the supplementary material.
All results are derived as consequences of the more general Theorem A.1 and Theorem A.2 given in Appendix A. A discussion of the results is given at the end of this section.
We shall assume in the following that the series (X t ) t∈Z is strictly stationary and absolutely regular. For p = 1, d = 1, it is sufficient that the copula density is strictly positive on a set of measure 1 (Longla and Peligrad, 2012, Proposition 2). The proof can be easily extended to p, d ≥ 1, which leads to the mild sufficient condition that all pair-copula densities are strictly positive on In what follows, · denotes the Euclidean norm.

Parametric estimator
To state the asymptotic results, it is convenient to introduce some more notation. For S-vines of Markov order p, one can check that any edge e ∈ E k , 1 ≤ k ≤ d(p + 1) − 1, the set {a e , b e , D e } only contains variables at most p time points apart. More precisely, if a e = (t 1 , j 1 ), b e = (t 2 , j 2 ) and Up to a finite number of terms, the parametric step-wise MLE ( η (P ) , θ (P ) ) is then defined as the solution of the estimating equation To allow for misspecified parametric models, we further define pseudo-true values (η * , θ * ) via and note that they agree with the true parameters if the model is correctly specified. Here and in the sequel, all expectations are taken with respect to unknown true distribution of X 1 , . . . , X 1+p .
We impose the following regularity conditions: (P1) The pseudo-true values (η * , θ * ) lie in the interior of H × Θ and for every > 0, η,θ is continuously differentiable with respect to (η, θ) and satisfies Condition (P1) ensures identifiability of the model parameters, (P2) and (P3) are standard regularity condition for maximum-likelihood methods. Condition (P4) quantifies a trade-off between moments of the 'score' function φ (P ) η * ,θ * and the mixing rate, see the discussion in Section 4.3.3.
Note that iid models are included as a special case: Corollary 2. If (X t ) t∈N is iid and p = 0, then Theorem 4 and Theorem 5 hold with . If the latter expectation exists, condition (P4) can be dropped.

Semiparametric estimator
Similarly to the parametric case, we define ,

For a generic vector of functions
is then defined as the solution of the estimating equation For u ∈ (0, 1) and some γ ∈ [0, 1), define the weight function w(u) = u γ (1 − u) γ and set (SP1) The pseudo-true value θ * lies in the interior of Θ and for every > 0, are continuously differentiable with respect to θ and its arguments and for any compactΘ ⊆ Θ, t = 1, . . . , 1 + p, j = 1, . . . , d.
(u) are continuous in u ∈ (0, 1) (p+1)d and θ in a neighborhood of θ * and there is δ > 0 such that for all t = 1, . . . , 1 + p, j = 1, . . . , d, Similar to the parametric case, (SP1) ensures identifiability of the model parameters and (SP4) is a standard regularity condition. Conditions (SP2)-(SP3) are more involved due to the suprema over function classes F δ . This is typical for semiparametric copulas models (see, Genest et al., 1995, Tsukahara, 2005, Chen and Fan, 2006b, Hobaek Haff, 2013, Chen et al., 2020. Derivatives of copula functions tend to blow up in the corners of the unit hypercube. This can be offset by exploiting stronger convergence properties of the empirical margins in these corners. The function w is used to strengthen the metric accordingly. (SP5) quantifies the trade-off between moments of the 'score' function φ (SP ) θ * and the mixing rate, see also our discussion below.
Corollary 3. If (X t ) t∈N is iid and p = 0, then Theorem 6 and Theorem 7 hold with

Discussion
The √ T -convergence predicted by our theorems is confirmed in numerical experiments in Section S4.1 of the supplementary materials. The results extend the existing literature in various ways.
The results on the fully parametric sequential MLE (Theorems 4 and 5) appear to be new -even in the iid case. Joe (2005) provided a similar result when there are only two steps: one for the marginal parameters and one for the copula parameters. For semiparametric models (Theorems 6 and 7), a similar result was obtained by Rémillard et al. (2012) for p = 1, and a joint MLE for the copula parameters. It does not apply to the step-wise MLE commonly used in vine copula models, however. The only known results for the semiparametric step-wise MLE were provided by Hobaek Haff (2013) in the iid (p = 0) case. These results assume a D-vine structure and correctly specified copula model. The latter assumption is especially questionable in view of the common simplifying assumption (see Section 2.3).
Also the results in Chen and Fan (2006b) are obtained as a special case with d = 1, p = 1. The regularity conditions here are slightly weaker than theirs, and also than those of Tsukahara (2005) and Hobaek Haff (2013) If the tail decay is fast, we can therefore use weaker moment conditions. The latter two conditions already appeared in similar form in Chen and Fan (2006b). For p = 1, d = 1, a sizeable literature (including Chen and Fan, 2006b, Chen et al., 2009, Beare, 2010, Longla and Peligrad, 2012 suggests that all popular parametric models exhibit exponentially decaying mixing coefficients, which is stronger than necessary. However, extending these results to the multivariate case is nontrivial and poses an important open problem.

Prediction
Vine copula models are quite complex and rarely allow closed-form expressions of conditional means, quantiles, or the predictive distribution. One may instead simulate (conditionally) from the estimated model and approximate such quantities by Monte Carlo methods. The standard simulation algorithm (e.g., Czado, 2019, Chapter 6) poses unnecessary computational demands, however. An efficient algorithm exploiting the Markov property is given in Section S3.2 of the supplementary material.
With the ability to simulate conditionally on the past, it is easy to compute predictions for all sorts of quantities, like conditional means or quantiles. More specifically, suppose we are . , x t−p ) of the next k time points given the past. We construct an estimator of this functional as follows: using the estimated model (either parametric or semiparametric; see Section 4).
When simulating from an estimated parametric model F k,p (· | x t−1 , . . . , x t−p ; η (P ) , θ (P ) ), we call the resulting estimator µ (P ) ; when simulating from a semiparametric model we call the resulting estimator µ (SP ) . The corresponding pseudo-true value µ * is defined as for the parametric and semiparametric cases respectively.
The following results account for the fact that we simulate from an estimated model. They are an immediate consequence of Theorem A.3 in Appendix A. In general, we assume that the map where I η * ,θ * , J η * ,θ * are defined in Section 4.3.1.

For any function
.
and D(X t ) given in Theorem 7.
Simulation-based prediction from vine copula models has been used widely in the last decade, despite a lack of theoretical justification. A consistency result for extreme quantile estimation in semiparametric iid models was previously established by (Gong et al., 2015, Theorem 1). In contrast, the results above allow for both parametric and semiparametric models, time series data, and a generic prediction target. In addition, Theorem 8 and Theorem 9 characterize a distributional limit for such predictions. This is of practical importance because it allows to properly assess estimation/prediction uncertainty. Of course, the results specialize to the iid case similarly to Corollaries 2 and 3. The asymptotic covariances are generally unknown and must be estimated. We propose computationally efficient methods in the following section.

Uncertainty quantification
In principle, the asymptotic covariances in the preceding theorems can be estimated by HAC methods (e.g., Andrews, 1991). In the prediction context such methods become numerically demanding, especially for the semiparametric estimator. (Semi-)parametric or block bootstrap methods (Künsch, 1989, Chen and Fan, 2006b, Genest and Rémillard, 2008 are general alternatives, but similarly demanding because the model has to be fit many times. We propose a more efficient bootstrap method based on an asymptotic approximation of the parameter estimates. In essence, we avoid refitting the entire model by performing only a single Newton-Raphson update on the bootstrapped likelihood.
We employ a depedendent multiplier bootstrap scheme similar to .
Its idea is as follows. Let T be a sequence with T → ∞. We simulate a stationary time series of bootstrap weights ξ 1 , . . . , ξ T that is T -dependent, independent of the data, and satisfies E(ξ 1 ) = Var(ξ 1 ) = 1 and Cov(ξ 1 , ξ 1+t ) = 1 − o(t/ T ). Given a sufficiently regular stationary time Z t converge to independent copies of the same random variable (see, Bühlmann, 1993, Bücher andKojadinovic, 2019). We shall apply this principle to bootstrap the step-wise log-likelihood and (if necessary) empirical marginal distributions.
Consider the bootstrapped estimating equation We define our bootstrap replicates as an approximate one-step Newton-Raphson update from ( η (P ) , θ (P ) ), i.e., have already been evaluated when computing ( η (P ) , θ (P ) ), this update has negligible computational cost. One may then show that with limiting variable independent of ( η (P ) , θ (P ) ).
In semiparametric models, we also have to bootstrap the empirical marginal distribution: Now consider the bootstrapped estimating equation and the approximate one-step update Note that the function φ (SP ) θ (SP ) on the far right is evaluated at the bootstrapped margins. This is necessary to account for the estimation uncertainty in the margins. It also makes the update slightly more demanding, since we have to evaluate the function φ (SP ) θ (SP ) for every bootstrap replication. This cost is manageable, however. One may again show that θ (SP ) and θ (SP ) converge in distribution to two iid variables.
2. Compute a bootstrapped model (parametric or semiparametric) as outlined above.
3. Compute µ r as in Section 5, where X

Application
Vine copula models are widely used in finance, in particular for modeling cross-sectional dependence in time series of financial returns (Aas, 2016). The most common approach is to model marginal series with ARMA/GARCH-models and the cross-sectional dependence of their residuals with a vine copula. Stationary vine copula models are different; they incorporate both serial and cross-sectional dependence in a single vine copula model.
We consider daily stock returns of 20 companies retrieved from Yahoo Finance 1 . These companies

In-sample analysis
We start with an in-sample illustration of models fit to the whole data set. We first fit skew-t distributions to the individual time series of each company. We then apply the probability integral transform to obtain pseudo-observations of the copula model. We consider the M-vine, D-vine, and a general stationary (S-)vine models from the previous section, each with Markov orders p = 1 (higher order models were fit in preliminary experiments, but did not improve fit/performance).
Vine structures are selected by a modification of the algorithm by Dissmann et al. (2013), the pair-copula families by the AIC criterion; we refer to Section S3 of the supplementary materials for details. We allow for all parametric families implemented in the rvinecopulib R package . This includes families without tail dependence (e.g., Gaussian copula), and with tail dependence in either one (e.g., Clayton copula), two (e.g., BB7 copula), or all four tails (e.g., the two-parameter t copula).
In Figure 6 we illustrate the first trees of the M-and D-vine obtained via the previously described approach. We observe that the cross-sectional D-vine for both approaches is described by a path The corresponding tree of the S-vine can be seen in Figure 7. The cross-sectional connection is described by a regular vine. We can identify some clusters of industry branches: IT (variables 11-15), insurance (1)(2)(3)(4)(5), and oil and gas (16-18). Interestingly, regional factors seem to be more important to Ping An, a Chinese insurance company, which makes sense economically. Further, this link did not appear in the cross-sectional parts of either of the vine models. So while the cross-sectional dependence between the companies is comparably weak, their inter-temporal dependence is still quite strong.
The fit of the models is compared by AIC in Table 3. We only consider parametric vine models, but also include three popular competitor models:  Table 3.: Aikaike's information criterion for the three vine copula time series models.
• VAR: A vector autoregressive model of order 1 (using the vars R package Pfaff, 2008).
• GARCH-vine: A combination of ARMA-GARCH marginal models (with skew-t residuals) and a vine copula for the residuals (using rugarch and rvinecopulib, Ghalanos, 2020, Nagler and Vatter, 2020). The ARMA-GARCH orders are selected for each marginal series individually by AIC. This model is an instance of the general residuals method proposed by . Other choices for marginal models would also be possible.
The VAR model clearly performs worst, since it cannot account for heteroscedasticity. We further see that the vine models outperform the GARCH-vine and DCC-GARCH models. The S-vine provides the best fit.

Out-of-sample predictions
We now compare the forecasting abilities by a backtest. We fit all models on three years' data (one year has 252 trading days). On each of the following days, we make predictions for the cumulative portfolio return over the next day or week and compare them to the observed data. Every half year the models are fitted again on three years' data.
Our predictions take the form of a Monte-Carlo sample drawn from the predictive distribution.
They are evaluated with three types of measures: • CRPS: The continuous ranked probability score of Gneiting and Raftery (2007).
• VaR95, VaR99: The check-loss known from quantile-regression (e.g., Koenker and Xiao, 2002) computed for predicted quantiles at levels 0 The forecast performance is shown in Figure 8. The dots are the average measure over the full period, the error bars indicate 90%-confidence intervals (adjusted for serial dependence). The left panel corresponds to 1-day-ahead, the right to 1-week-ahead forecasts. Scores are centered such that S-vine has score 0. Some observations: • A general observation is that uncertainty (as indicated by the confidence intervals) is rather larger compared to the differences between models. Everything that follows should therefore be taken with a grain of salt.
• Overall all methods seem to provide reasonable predictions, including the classical residuals method (GARCH-vine).
• The three stationary vine models perform similarly in all scenarios. The S-vine and M-vine tend to perform slightly better than the long D-vine. The S-vine is uniformly best for 1-dayahead forecasts, but slightly outperformed by the M-vine for 1-week-ahead forecasts (except for logS). After all, the vine structures are found by heuristics and there is no guarantee that the best is found.
• For 1-day-ahead forecasts, the S-vine performs best for all measures, especially for extreme quantiles and the predictive log-likelihood (logS). For 1-week-ahead forecasts the stationary vine models are outperformed by the competitors models for CRPS and for VaR95 by the DCC-GARCH. It compares favorably for the other measures. We conclude that stationary vine models provide good forecasts for financial time series. This is somewhat remarkable since, in contrast to the vine models, the GARCH-vine and DCC-GARCH models were specifically designed for such data. Recently, some copula families have been specifically designed for modeling serial dependence in economic time series (e.g., Bladt andMcNeil, 2020, Loaiza-Maya et al., 2018), but were not used in this article. We expect that incorporating such families will lead to a further increase in performance.

Discussion
This work deals with vine copula models for the joint distribution of a stationary time series. We derived the maximal class of vine structures that guarantee stationarity under practicable conditions.
The underlying principle is intuitive: we start with a vine model for the dependence at a specific time point and connect copies of this model serially in a way that preserves time ordering. This class includes previously proposed models of  and  as special cases.
The COPAR model of  was shown to be inadequate in this sense because it fails to guarantee stationarity under simple conditions. The simulations and application suggest that the added flexibility leads to improvements over the previous models. Another benefit is the greater interpretability of the model structure. But more importantly, our contribution gives a final answer in the search for vine copula models suitable for stationary time series.
We developed methods for parameter estimation, model selection, simulation, prediction, and uncertainty quantification in such models. All methods are designed with computational efficiency in mind, such that the full modeling pipeline runs in no more than a few minutes on a customary laptop.
The proposed bootstrap procedure avoids refitting the models through a one-step approximation.
The method appears to be new and may prove useful beyond the current scope. The bootstrap technique also does not require explicit estimation of the rather complicated limiting variances in our theorems. It might be possible to achieve this even more efficiently using a blocking technique similar to Ibragimov and Müller (2010).
We further provide theoretical justifications in the form of asymptotic results. To the best of our knowledge, these are the first results applicable to vine copula models under serial dependence. Even when specialized to the iid case, they extend the existing literature in several ways. In particular, they provide post-hoc justification for what is already practiced widely: step-wise estimation and simulation-based inference in fully parametric, but usually misspecified R-vine models. Our main results are empowered by more abstract theorems given in Appendix A. They deal with general semiparametric method-of-moment type estimators with potentially non-negligible nuisance parameter. As this is a common setup, especially in copula models, these abstract results shall prove powerful beyond the present paper. For example, generalizations of the results in Tsukahara (2005) are obtained as easy corollaries. The results shall also help in other interesting extensions of our model, for example accounting for long-memory dependence or non-stationarity (see, e.g., Lentzas, 2017, Chen et al., 2020).
Despite confirmatory numerical experiments, a limitation of the results is an assumption on the decay of mixing coefficients (required only for the asymptotic distribution). Judging from earlier work in a narrower context, we do not believe this poses a serious issue. However, we do not yet know any easily verifiable sufficient conditions. Investigating the mixing properties of stationary vine copulas -and multivariate copula models more generally -is therefore an urging problem for future research.

Acknowledgements
We thank the Editor and three anonymous referees for many helpful suggestions that substantially improved our paper.

A. General results for semiparametric method-of-moment estimation
The proofs for the parametric and semiparametric cases are largely similar. To avoid duplication, we first establish general results that cover both cases. The statements and proofs make extensive use of empirical process techniques (e.g., van der Vaart and Wellner, 1996, Dehling andPhilipp, 2002) and the associated short notation P T g = 1 T T t=1 g(X t ) and P g = E{g(X t )} for the empirical measure and expectation over an arbitrary function g.
Suppose we want to estimate a Euclidean parameter α * ∈ A ⊆ R p in the presence of a nuisance parameter ν * ∈ N, possibly infinite-dimensional. Let φ α,ν = (φ α,ν,1 , . . . , φ α,ν,r ) be a map R s → R r such that P φ α * ,ν * = 0. Given an estimator ν of ν * define α as the solution to P T φ α, ν = 0. We shall assume that N is a subset of a Banach space and define We impose the following general conditions: (C1) The series (X t ) t∈Z is strictly stationary and absolutely regular.
where Q is the inverse survival function of φ α * ,ν * .
Conditions (C2) and (C6) make convergence of ν, the estimator of the nuisance parameter, a prerequisite. The other conditions concern the regularity of the time series and identifying functions φ α,ν . The following theorems establish consistency and asymptotic normality of α, our estimator for the parameter of interest.
(ii) If T = o(N ) and T 1/2 {( α, ν) − (α * , ν * )} converges weakly to a tight process (A, N ), then In the context of our paper, ν is a vector of empirical distribution functions. Lemma 4.1 of Chen and Fan (2006b) establishes (C2) and (C6), but under conditions slightly stronger than our (C1) and (C8). The following lemma improves their result accordingly. For sake of completeness, we give a detailed proof in Section S6.6 of the supplementary material.

S1 Graph theoretical concepts: additional illustrations
In this section we provide additional illustrations for graph theoretical concepts. For convenience, we repeat the definitions from the main manuscript (with numbering preserved).
A graphical example of a regular vine on four variables is shown in Figure S1. Let us discuss the three conditions of the definition in the context of this example: (i) The tree T 1 consists of vertices V 1 = {1, 2, 3, 4} and edges E 1 = {(1, 2), (2, 3), (3, 4)}. We see that the graph is connected and contains no cycles, so it is indeed a tree.
(ii) The edges in E 1 become the vertices V 2 of the second tree. That is, V 2 = E 1 = {(1, 2), (2, 3), (3, 4)}. The vertices in V 2 are connected by some edges E 2 . The edges E 2 then become the vertices in T 3 , i.e., V 3 = E 2 . The labeling of edges in the second and third tree of the figure is explained below.

S1.2 Labeling of edges
The edge labels in Figure S1 follow the convention dictated by Definitions 2 and 3.
Definition 2. The complete union of an edge e ∈ E k is given by U e = {i ∈ V 1 | i ∈ e 1 ∈ e 2 ∈ · · · ∈ e for some (e 1 , . . . , e k−1 ) ∈ E 1 × · · · × E k−1 } and for a singleton i ∈ V 1 it is given by the singleton, i.e. U i = {i}.
(ii) The conditioned set of an edge e connecting v 1 with v 2 is defined as (a e , b e ), where We will then label an edge by e = (a e , b e |D e ).
As an example, consider the edge e ∈ E 2 connecting v 1 = (1, 2) ∈ V 2 to v 2 = (2, 3) ∈ V 2 in Figure S1. The complete union of an edge is obtained by collecting all variables indices appearing in the edges. In particular, we have  Figure S1 displays a four-dimensional D-vine. A D-vine is a vine where each tree is a path, i.e., each vertex is connected to at most two other vertices. The proximity condition then already fixes all the following trees. Edge labels indicate the pair-copulas involved in the four-dimensional D-vine copula:

S1.3 D-and C-vines
where we omit the arguments of copula densities for simplicity. Figure S2 displays a four-dimensional C-vine. A C-vine is a vine where each tree is a star, i.e., there is one vertex connected to all the others. In the first tree, vertex 1 is connected with other three vertices. One may check that the proximity condition does not impose any constraints on the second tree. Each vertex in the second tree can be connected with any of the remaining ones, as long as no cycle occurs (otherwise the graph would not be a tree). In the second tree of Figure S2, vertices (1,2) and (1,3) as well as vertices (1,2) and (1,4) are connected. Because each star on three variables also forms a path, the resulting third tree is then fixed.
The matrix representation is unique as soon as the diagonal entries are fixed. For details, we refer to Dissmann et al. (2013) or Czado (2019), who use lower-left or upper-right triangular matrices, respectively.
is called restriction of V on the time points t, . . . , t + m.
For tree k, the entries i k and j k of the compatible permutations (i 1 , . . . , i d ) and (j 1 , . . . , j d ), respectively, indicate how cross-sectional trees are connected. A cross-sectional vertex at time t with (t, i k ) in the conditioned set is connected to a vertex at time t + 1 with (t + 1, j 1 ) in the conditioned set. The conditioning set of the connecting edge label consists of (t, i k−1 ), . . . , (t, i 1 ).
Conversely, the index j k indicates that a cross-sectional vertex at time t + 1 with (t, j k ) in the conditioned set is connected to a vertex at time t with (t, i 1 ) in the conditioned set. The conditioning set of the connecting edge label consists of (t, j k−1 ), . . . , (t, j 1 ). For V (0) , the compatible permutation (i 1 , . . . , i d ) determines out-vertices of its cross-sectional trees and the compatible permutation (j 1 , . . . , j d ) determines in-vertices of its cross-sectional trees.
S2 Vine copula models for multivariate time series: additional illustrations Suppose (X t ) t=1,...,T = (X t,1 , . . . , X t,d ) t=1,...,T is a strictly stationary time series, whose crosssectional and temporal dependence we model by a vine copula. By fixing the stationary marginal distributions F 1 , . . . , F d , it suffices to consider time series of marginally standard uniform variables The existing models make specific choices for the cross-sectional structure and connecting

S2.1 D-vine of Smith (2015)
In the D-vine of Smith (2015), the cross-sectional structure is a D-vine and two cross-sectional D-vines at time points t and t + 1 are connected at the two distinct variables (vertices) that lie at their opposite borders. Without loss of generality, we can assume that the last variable at time t is connected with the first variable at time t + 1, i.e. an edge is added for (U t,d , U t+1,1 ).
With these choices, there is only one global vine model satisfying proximity condition, which is a long D-vine spanning all variables at all time points. Now, a technical advantage of using a D-vine for cross-sectional dependence becomes clear, since all trees of the D-vine are uniquely specified. Moreover, the cross-sectional D-vines are combined in one global D-vine. Therefore, there is no need in any further rules for tree constructions. Figure S5 illustrates the first five trees of a four-dimensional D-vine on three time points. For a better visibility, we drop edge labels since they are vertex labels in subsequent figures. Variables U t,i for t = 1, 2, 3 and i = 1, 2, 3, 4 are expressed through their sub-indices in parentheses as (t, i). Figure S6 shows the R-vine matrix of this long D-vine. The red entries of this matrix are related to the cross-sectional dependence at time points t = 1, 2, 3. If we ignore the first variable t in (t, i) then these entries are identical to the entries of the R-vine matrix for a four-dimensional D-vine.
The black entries of this matrix, which are not on the diagonal, capture the cross-temporal dependence between U 1 and U 2 as well as between U 2 and U 3 . Similarly, the blue entries of this matrix describe the cross-temporal dependence between U 1 and U 3 . If the multivariate time series is a Markov process of order 1 then the pair copulas corresponding to the blue entries of this matrix are just independence copulas. If U 1 , U 2 , U 3 are independent then all black and blue entries correspond to the independence copula.

S2.2 M-vine of Beare and Seo (2015)
In the M-vine of , the cross-sectional structure is a D-vine and two crossectional D-vines at time points t and t + 1 are connected at one variable that lies at the border of the D-vines. Without loss of generality, we can assume that the first variable at time t is connected with the first variable at time t + 1, i.e. an edge is added for (U t,1 , U t+1,1 ). With the additional restriction that vertices of adjacent time points are connected first, this also fixes all further trees of the vine (see also, Begin et al., 2020, for their connection to vector autoregressive models). Figure S7 illustrates the first five trees of a four-dimensional M-vine on three time points.
Again, variables are expressed through their sub-indices in parentheses. Figure S8 shows the corresponding R-vine matrix. The red entries of this matrix are related to the cross-sectional dependence at time points t = 1, 2, 3 and coincide with red entries from Figure S6. Similarly, black and blue entries of the matrix are related to the cross-temporal dependence for lag 1 and 2, correspondingly. A comparison of the R-vine matrices from Figures S6 and S8 shows that they have the same elements and the difference is only at their position/placement resulting in different vine-copulas. Further, red entries are identical due to the cross-sectional D-vine.

S2.3 COPAR of Brechmann and Czado (2015)
In the COPAR of , the cross-sectional structure is a C-vine and two C-vines at time points t and t + 1 are connected at the root vertex of the C-vine. Without loss of generality, we can assume that the root variable is the first variable, i.e. an edge is added for (U t,1 , U t+1,1 ). This leaves a lot of flexibility for higher trees and the authors settled on a specific set of rules.
In particular, the model contains all edges of a D-vine on the variables U 1,1 , U 2,1 , . . . , U T,1 .
Again, variable sub-indices are displayed in parentheses. Figure S10 shows the corresponding R-vine matrix. The red entries of this matrix are related to the cross-sectional dependence at time points t = 1, 2, 3. Unfortunately, black entries cannot be related to to the cross-temporal dependence for lag 1 since it contains variables for t = 1, 2, 3. In general, the same conclusion holds also for the blue entries and we could observe it for T = 4.

S2.4 Difference between M-vine and COPAR for bivariate time series
To illustrate the difference between the M-vine and COPAR, consider a two-dimensional time series with three times points, i.e. d = 2 and T = 3. Figure S12 shows the corresponding M-vine.

S2.5 S-vine
There is obvious potential for generalization of M-vines. First, we would like to allow for arbitrary R-vines in the cross-sectional structure. Second, we would like to connect two cross-sectional trees at arbitrary variables. Specific verisons of such models were constructed in preliminary work by  (called temporal vine (T-vine)) and in unpublished work by Harry Joe.
But where should we stop? In principle, we could take any (T × d)-dimensional vine as a model for the vector (U 1 , . . . , U T ). Figure S13 illustrates the first five trees of a five-dimensional S-vine on three time points.
Again, variables are expressed through their sub-indices in parentheses. We consider a fivedimensional time series since four-dimensional vines are either D-vines or C-vines. More general R-vine structures appear starting from dimension 5.
in the conditioned set for t = 1, 2. Therefore, the corresponding edge label is given by (t, i k ) and (t + 1, j 1 ) conditioned on (t, i k−1 ), . . . , (t, i 1 ). This procedure is similar for the in-vertices.
In tree k for k = 1, . . . , 5, the cross-sectional in-vertex with (t + 1, j k ) in the conditioned set is connected to vertex with (t, i 1 ) in the conditioned set for t = 1, 2. Therefore, the corresponding edge label is given by (t, j k ) and (t + 1, i 1 ) conditioned on (t, j k−1 ), . . . , (t, j 1 ).
Theorem 2 implies that all stationary vines can be represented by a matrix of the form shown in Figure S14, where the stars correspond to the cross-sectional structure.
The class of S-vines also allows to construct a stationary version of the COPAR model. More specifically, we keep the first tree of the COPAR, but construct later trees in a way that preserves stationarity. Five trees of the corresponding vine are illustrated in Figure S15. Starting from the fifth tree, its trees coincide with those of the M-vine.

S3.1 Model selection
Parametric models are easy to select in a fully step-wise procedure. The most common selection criteria are AIC or BIC. For the margins, these criteria are computed from the maximal loglikelihood max η j n t=1 ln f j (X t,j ; η j ), j = 1, . . . , d.
Then we continue step-wise with the pair-copulas, where AIC and BIC are computed from the maximal pair-copula log-likelihood For the vine structure of stationary time series, we propose a similar heuristic extending the popular algorithm of Dissmann et al. (2013). Its idea is to capture the strongest dependencies as early as possible in the tree structure. In addition to the cross-sectional structure V (0) , we also need to select the in-and out-vertices (j 1 , . . . , j d ) and (i 1 , . . . , i d ).
We first focus on the cross-sectional structure. We compute the (absolute) empirical Kendall's τ between all pairs of variables and find the maximum spanning tree. Then we find the optimal in-/out-vertices by computing all pair-wise empirical Kendall's τ between the original time series and a lagged version and choose the edge with maximal |τ |. Now the first tree is completely specified. Then we estimate all parameters in this tree and generate pseudo-observations for the next. We again build a maximum spanning tree for the cross-sectional part with the proximity condition as side constraints. Then we find the compatible in-/out-vertices by maximizing absolute Kendall's τ for the corresponding edges in Theorem 2 (ii). We continue this way until the first d trees are selected, which completely determines the remaining structure.
The procedure can be simplified for M-and D-vines by imposing appropriate constraints on the cross-sectional structure (which becomes a shortest path problem) and the in-/out-vertices (where only i 1 is a free parameter). It must be emphasized that all these methods are heuristic and give no guarantees to find the optimal model in their class.
Finally, the Markov order is selected by adding lags to model as long as the overall AIC improves.

S3.2 Simulation algorithm
The simulation algorithm is based on the Rosenblatt transform, whose inverse also appears in the standard algorithm. The Rosenblatt transform applies certain conditional distribution functions to a random vector to turn it into independent uniforms. Conversely, the inverse Rosenblatt transform turns independent uniforms into a vector with arbitrary joint distribution.
Let M = (m i,j ) d i,j=1 be the R-vine matrix corresponding to a d-dimensional vine copula model (V, C(V)). Our version of the Rosenblatt transform is defined as The conditional distributions in this formula can be computed recursively from pair-copulas in C(V) as explained in Section Section 2.3. The inverse transformation is which can also be computed recursively from pair-copulas in C(V), see Czado (2019).
Algorithm 1 can also be used to simulate unconditionally by giving it an unconditional draw

S4 Numerical experiments
To validate the methodology proposed in the paper, we work with the following default setup: We repeatedly simulate time series of length T = 100, 400, 1 000, 2 500 from models of varying size (cross-sectional dimension d = 5, 10, Markov order p = 1, 2), containing either only Gaussian or only Clayton pair-copulas. The pair-copula parameters are set such that the Kendall's τ in tree k equals τ 0 /k with either τ 0 = 0.2 (weak dependence) or τ 0 = 0.7 (strong dependence). This decay in dependence strength serves two purposes. First, it provides control over the strength of dependence for implicit pairs. Our specific choice produces models where the implicit pairs have a Kendall's τ of approximately the same magnitude than the explicit ones. Second, it reflects the common practice to work with vine structure that capture strongest dependencies early (see Section S3.1). The marginal distributions are set to a heavy tailed Student t distribution with mean zero, unit scale, and three degrees of freedom.

S4.1 Parameter estimation
The step-wise estimation procedure of Section 4 is validated in Figure S16. The x-axis shows the sample size and the y-axis the average estimation error for a rescaled version of Kendall's τ :  Figure S17: Coverage probabilities of bootstrapped confidence intervals for estimating the 90%quantile of (X t,1 + · · · + X t,d )/d. The dotted line indicates the nominal level of 90%.
plot. We see that √ T -convergence is achieved in all scenarios. However, the vertical upwards shift from the left to right panels indicates that the estimation error increases with model size.
The difference between parametric and nonparametric margins is generally negligible.

S4.2 Uncertainty quantification
To validate the bootstrap method proposed in Section 6, we simulate from the same models as before, but now the goal is to estimate the conditional 90%-quantile of µ t = (X t,1 , · · · + X t,d )/d given the past. We fit the models, estimate the quantile of µ t , and construct 90%-confidence intervals using the bootstrap method. In all scenarios, we use the rule of thumb T = 8T 1/5 , which is known to be the optimal rate (see, Bühlmann, 1993, Bücher and. Figure S17 shows the coverage probability on the y-axis with the target level of 0.9 indicated by the dotted line. In most scenarios the coverage is approximately right. Exception are the larger semiparametric Gaussian models and large models with the Clayton copula. The latter might be explained by the fact that the simulated time series mix fairly slowly, see also . The coverage can further be improved by tuning the block length parameter T , but our rule of thumb appears to give acceptable results in a wide range of scenarios.

S4.3 Model selection
We now assess the model selection heuristic proposed in Section S3.1 by comparing selected S-vine to M-and D-vine models. We shall use a randomized version of the above setup, to avoid a ground truth model that coincidentally favors one of the methods. Randomizing the vine structure and pair-copulas allows us to see how the structure and family selection heuristics perform over a wider range of data sets.
In each iteration, we draw a random S-vine structure. For the pair-copulas, we draw with equal probability from the Gaussian and Clayton copulas and rotate them randomly at 0, 90, 180 and 270 with equal probability. The parameters of pair copulas are set according to a randomly drawn Kendall's τ s from a Beta distribution with parameters β = 5τ 0 /(k − τ 0 ) and β = 5 with τ 0 = 0.2 (weak dependence) or τ 0 = 0.7 (strong dependence). The expected value of Kendall's τ in tree k is then τ 0 /k as before. We generate a time series from the model and run the selection heuristics outlined above, allowing for all parametric families implemented in rvinecopulib . The model fit is then measured in terms of the fitted models' KL-divergence from the true model (computed via Monte-Carlo integration). A small divergence indicates better fit.
The x-axis in Figure S18 shows the sample size and the y-axis the KL-divergence between the true and estimated models. Again both axes are on a logarithmic scale and we would expect to see straight lines in a correctly specified model. However, none of the lines are straight, indicating that the heuristics rarely identify the structure that generated the data. This underlines how important it is to base inference on results and tools that allow for misspecification. The M-and D-vine models are hardly distinguishable, but the S-vine model performs best in all scenarios.
The gap increases with increasing dependence strength and sample size. This suggests that the added flexibility of the general stationary model can make a difference in applications.

S5 Proofs of graph theoretic results
Remark: Referenced equations or results not prepended by an 'S' refer to the main manuscript.

S5.1 Proof of Theorem 1
It is easy to see that (ii) together with translation invariance implies that the model is stationary. Now consider the reverse implication. Condition (2) is obviously true for m = T − 1 and We first show that V 1,T −1 must be a vine, i.e., a sequence of trees satisfying the proximity condition. The proximity condition cannot be violated because V is a vine. We thus need to show that the restriction V 1,T −1 = (V 1,T −1,k , E 1,T −1,k ) k=1,...,(T −1)d−1 is a sequence of trees, which are connected, acyclic graphs. The restriction V 1,T −1 cannot contain cycles, because it is constructed from deleting vertices and edges out of the sequence of trees V. It thus remains to show that the graphs (V 1,T −1,k , E 1,T −1,k ) k=1,...,(T −1)d−1 are connected. We will prove this by contradiction.
Let k ≥ 1 be the smallest level, where the graph (V 1,T −1,k , E 1,T −1,k ) is not connected. Denote by E k the corresponding edge set of the complete vine V. Recall that the kth graph (V k , E k ) of the whole vine V is a tree and therefore connected. For the restricted graph to be disconnected, (V k , E k ) must contain a path P = (e 1 , . . . , e +1 ), 1 ≤ ≤ d, connecting vertices (v, v 1 , . . . , v , v ) with v, v ∈ V 1,T −1,k and v 1 , . . . , v / ∈ V 1,T −1,k . Suppose that all pair-copulas in trees below and above the kth level are independence copulas. Then the joint density of (U v , U v ) can be written Now consider the density of (U w , U w ) = (U v+ (1,0) , U v +(1,0) ). Since (V k , E k ) is connected, there must be another path P = (e 1 , . . . , e L+1 ), L ≥ 0, connecting vertices (w, w 1 , . . . , w L , w ). Hence, the joint density of (U v+ (1,0) For the model to be stationary, (S1) and (S2) must be equal for all (u v , u v ) = (u w , u w ) ∈ (0, 1) 2 .
The edges e 1 and e +1 connect vertices from V 1,T −1,k and V \ V 1,T −1,k . Because there is no time point T + 1, their translations e 1 + (1, 0) and e +1 + (1, 0) cannot be in E k . Hence, the edges e 1 and e L+1 cannot be translations of e 1 and e +1 . Therefore, translation invariance of C(V) does not imply equality of (S1) and (S2), which contradicts our premise. Hence, V 1,T −1 must be a vine. That V 2,T is also a vine follows from symmetric arguments.
It remains to show that V 1,T −1 ∼ V 2,T . If V 1,T −1 V 2,T , then there is an edge e 1 ∈ E k,1,T −1 for which there is no translation in E k,2,T . Similarly, there must be an edge e 2 ∈ E k,2,T for which there is no translation in E k,1,T −1 . Suppose all edges in V except the translations of e 1 and e 2 are independence copulas. Then the joint densities of (U 1 , . . . , U T −1 ) and (U 2 , . . . , U T ) are c e (u ae , u be ), and c 2,n (u) = e∼e 2 c e (u ae , u be ).

S5.2 Proof of Theorem 2
It is easy to check that a vine satisfying (i)-(ii) is stationary. It is sufficient to specify the edges of the first d trees, since the dth tree is a path. This fact and the proximity condition fix all edges in later trees. Now consider the reverse implication. Let V be a stationary vine. In the first tree of V there can only be one edge between cross-sectional trees at adjacent time points, since E 1 must not contain cycles. We denote this edge by {(t, i 1 ), (t + 1, j 1 )}. Now suppose that there is some k ≤ d − 1 such that (ii) holds for all k with 1 ≤ k ≤ k. We will show that it must also hold for k + 1.
First observe that the edges k r=1 ß e : a e = (t, i k+1−r ), b e = (t + 1, j r ), form a path. This together with the proximity condition fixes edges {e r,t : r = 2, . . . , k, t = For E k+1 to form a tree, there must be two more edges: one connects the path to an edge in k+1 + (t, 0), the other connects it to an edge in E for some r, r ∈ {1, . . . , k}. Hence (ii) holds for all 1 ≤ k ≤ d and the two permutations must be compatible with V (0) .

S5.3 Proof of Lemma 1
Let i 1 ∈ {1, . . . , d} be arbitrary. Because V is a vine, its first tree clearly contains an edge with conditioned set {i 1 , i 2 } for some i 2 = i 1 . Now suppose we have found 2 ≤ k < d − 1 indices i 1 , . . . , i k indices that do not violate the condition in Definition 8. In particular, there is e ∈ E k−1 with conditioned set {i k , i r } and conditioning set {i 1 , . . . , i k−1 }\i r for some r ∈ {1, . . . , k −1}. In the (k +1)th level, this edge becomes a vertex. Because V is a vine, there must be an edge leaving this vertex. This edge must have conditioned set {i, j} and conditioning set {i 1 , . . . , i k−1 } \ i for some j / ∈ {i 1 , . . . , i k } and i ∈ {i k , i r }. Setting i k+1 = j, we see that the condition is also satisfied for k + 1.

S6 Proofs of asymptotic results
Remark: Referenced equations or results not prepended by an 'S' refer to the main manuscript.
S6.4 Proof of Theorem A.2 Condition (C2) implies that there is a sequence δ T → 0 such that ν − ν ≤ δ T with probability tending to 1. On events where ν − ν ≤ δ T , the triangle inequality implies sup α∈A(K) By (C4) and the ergodic theorem, the first term on the right is of order O p (δ T ) = o p (1).
The expectation of the first term on the right is bounded by which is of order O( T ) = o(1) by (C5). For the second term, note that the sequence T 1/2 |φ α * ,ν * +T −1/2 b,j − φ α * ,ν * ,j | is dominated by G b and, thus, uniformly integrable. Then the weak law of large numbers for α-mixing triangular arrays (Kanaya, 2017, Theorem 1 and Remark 2) establishes condition (i). Therefore, (S4) holds which establishes the first display of the theorem.
Part (ii) then follows from the continuous mapping theorem and Sltusky's lemma. then follows from the bracketing Glivenko-Cantelli theorem (van der Vaart and Wellner, 1996, 2.4.1 Theorem), and Theorem 1 of Nobel and Dembo (1993) for the equivalence of the β-mixing and iid cases.