Time series copula models using d-vines and v-transforms

An approach to modelling volatile financial return series using stationary d-vine copula processes combined with Lebesgue-measure-preserving transformations known as v-transforms is proposed. By developing a method of stochastically inverting v-transforms, models are constructed that can describe both stochastic volatility in the magnitude of price movements and serial correlation in their directions. In combination with parametric marginal distributions it is shown that these models can rival and sometimes outperform well-known models in the extended GARCH family.


Introduction
The concept of a v-transform (McNeil, 2021) facilitates the application of copula models to time series where the dominant feature is stochastic volatility, such as financial asset return series. In the copula modelling approach to a single time series {x 1 , . . . , x n } the idea is to find an appropriate * Acknowledgement. Initial ideas were developed while AJM was a guest of the Institute for Mathematical Research (FIM) at ETH Zurich. MB would like to acknowledge financial support from the Swiss National Science Foundation Project 200021_191984. strictly stationary stochastic process (X t ) consisting of a continuous marginal distribution F X and a copula process (U t ), given by U t = F X (X t ) for all t, which models the serial dependencies in the data; the latter is a process of standard uniform variables with higher-dimensional marginal distributions that may be described by a family of copulas C U (u 1 , . . . , u d ) for d ⩾ 2.
For volatile return data, it is well known that serial dependence becomes more apparent under transformations like the absolute-value transformation T (x) = |x| or the squared-value transformation T (x) = x 2 , which remove directional information and summarise the magnitude of price movements in what we term a volatility proxy time series.
In McNeil (2021) general asymmetric volatility proxy transformations T (x) with change points µ T are considered; these are continuous functions which are increasing in (x − µ T ) for x > µ T and increasing in (µ t − x) for x ⩽ µ T . Under such transformations, the relationship between the terms of the copula process (U t ) of (X t ) and the terms of the copula process (V t ) of the volatility proxy process (T (X t )) can be described by a v-shaped function, known as a v-transform, which is a mapping V : [0, 1] → [0, 1] that preserves the uniformity of uniform random variables. The relationships between the transformations are shown in diagram (1).
The key idea is that, rather than modelling serial dependence of the time series at the level of (U t ), we can model it at the level of (V t ) and create a composite copula model consisting of a family of copulas C V (v 1 , . . . , v d ), d = 2, . . . , n and a v-transform V. In McNeil (2021) C V is modelled using the implied copula process of an ARMA model while in this paper we apply d-vine copula models (Aas et al., 2009;Smith et al., 2010).
The resulting copula models, when combined with suitable marginal distributions, yield an extremely flexible class of non-linear time series models for volatile data. In common with the popular GARCH family (Engle, 1982;Bollerslev, 1986;Ding et al., 1993;Glosten et al., 1993, among others) and models from the more general GAS (generalized autoregressive score) family (Creal et al., 2013), the resulting models are observation-driven volatility models, but they employ a rather different mechanism in which the key feature is the strict separation of marginal and serial dependence modelling. In the GARCH paradigm the mechanism employed to capture serial dependence behaviour can have a constraining effect on the resulting marginal behaviour, which is only partly mitigated by varying the innovation distribution. For example, the standard GARCH model yields marginal distributions with tails following power laws for a wide class of innovation distributions (Mikosch and Stărică, 2000) and we will show in our examples that these are not always appropriate for observed return series.
There is a large literature on copula models for time series and good starting points are the comprehensive review papers by Patton (2012) and Fan and Patton (2014). While the main focus of this literature has been on cross-sectional dependence between multiple time series, there is also a growing literature on modelling serial dependence within single series and lagged dependence across series. Markov copula models were first investigated by Darsow et al. (1992) and are further studied in Chen and Fan (2006). In the latter paper and in Chen et al. (2009) the theory of semiparametric estimation for these models is developed, while Beare (2010) studies mixing properties of the resulting processes; an application to data is given in Domma et al. (2009). Although theoretically interesting, first-order Markov models are not realistic candidates for modelling the persistent dependence and stochastic volatility that is found in typical financial return series.
A distinct approach to copula modelling in statistics uses pair-copula constructions. A key reference for applications in risk modelling is Aas et al. (2009), which builds on underpinning work on joint density decompositions by Joe (1996Joe ( , 1997 and on graphical dependence models by Cooke (2001a,b, 2002) and Kurowicka and Cooke (2006). The application of this methodology to modelling longitudinal dependence in time series with d-vines was developed in Smith et al. (2010) and the extension of this approach to bivariate processes with both serial and cross-sectional dependence using alternative vine structures is treated in Beare and Seo (2015), Brechmann and Czado (2015), Smith (2015) and .
The first-order Markov models in Chen and Fan (2006) are special cases of the d-vine copula approach of Smith et al. (2010) but the general d-vine pair-copula model allows higher-order Markov dependence of the kind analysed in Ibragimov (2009). In applications of these models, the paircopula building blocks used by researchers have tended to be limited to a number of well known bivariate copulas, such as the Gumbel, Clayton, Gaussian, t, Frank and Joe copulas, as well as rotations of certain copulas through 90, 180 and 270 degrees. None of these basic copulas are particularly effective at capturing the particular forms of serial dependence created by stochastic volatility, in which large price movements are followed be other large price movements, but of frequently changing sign. Loaiza-Maya et al. (2018) observe that these sign changes tend to lead to lag-plots of log returns on the copula scale that are cross-shaped. They address the shortcomings of standard pair copulas in d-vine models by creating mixtures of pair copulas and rotated pair copulas which can emulate these cross-shaped patterns. In this paper we will show that the standard copulas can be combined with v-transforms and d-vines to offer a parsimonious method of obtaining a similar effect. Moreover the approach has more econometric interpretability in that the driver of serial dependence is identified with a volatility proxy series.
The contributions of the paper are threefold: we extend the theory of v-transformed copula processes as presented in McNeil (2021) to allow models that can describe both the phenomenon of stochastic volatility, as well as serial correlation in the direction of price movements; we show how to apply the modelling framework to copula processes based on d-vines and develop an approach to estimation; we demonstrate that the resulting models, when combined with suitable marginal distributions, can rival and sometimes outperform popular models in the GARCH class.
The paper is structured as follows. In Section 2 we extend the theory of copula processes constructed using v-transforms; in particular, we explore a generalization of the concept of stochastic inversion of v-transforms. Section 3 shows how the theory applies to d-vine copula processes and Section 4 explains our approach to the estimation of models and empirical examples are presented in Section 5. We apply the fitted models to value-at-risk (VaR) estimation and analyse their outof-sample forecasting performance in Section 6; Section 7 concludes.

V-transforms of uniform random variables
In McNeil (2021) three equivalent definitions of v-transforms are provided. Suppose we consider absolutely continuous and strictly increasing cdfs F X on R and volatility proxy transformations T that are (i) continuous, (ii) strictly increasing for x ⩾ µ T , (iii) strictly decreasing for x < µ T and (iv) differentiable everywhere except at a change point parameter µ T which may or may not be zero; examples are T (x) = |x| as well as alternatives that are asymmetric around µ T . Then a v-transform is as a function V : [0, 1] → [0, 1] constructed from F X and T by where F T (X) denotes the cdf of T (X) for any random variable X with cdf F X . Observe that the composite transformation in (2) represents an excursion round three sides of the rectangle in diagram (1), from top right to bottom right.
V is thus a mapping of the probability-integral transform (PIT transform) of X to the PIT transform of T (X) since V(F X (X)) = F T (X) (T (X)). Clearly, by the properties of the PIT transform, such a transformation will preserve the uniformity of uniform random variables: if U ∼ U (0, 1) and A more visually interpretable definition is the following: with the following properties: 2. There exists a point δ known as the fulcrum such that 0 < δ < 1 and V(δ) = 0; where Ψ is a continuous and strictly increasing distribution function on [0, 1].
Parametric families of v-transforms may be obtained by assuming, for example, that Ψ is the cdf of a beta distribution or Ψ(x) = exp(−κ(−(ln x) ξ )) for κ > 0 and ξ > 0, which is the main family considered in McNeil (2021). Both families include the important special case of the linear v-transform which corresponds to a uniform cdf for Ψ and which subsumes the symmetric case V 0.5 (u) = |2u−1|.
If we write a volatility proxy transformation in the form for strictly increasing and continuous T 1 and T 2 satisfying T 1 (0) = T 2 (0), then it can be shown that the v-transform V in (2) is determined by F X , the value µ T and the profile function g T (x) = . Different volatility proxy transformations may share the same change point µ T and profile function g T . For a fixed distribution F X the pair (µ T , g T ) partitions the set of volatility proxy transformations into equivalence classes, each corresponding to a unique v-transform 1 . From this point of view, having selected F X , the selection of a v-transform amounts to an implicit choice of a class of volatility proxy transformations.
Example 1. Let F X be the distribution function of a standard Student t distribution with 5 degrees of freedom. Consider the v-transform V obtained by using the generator Ψ(x) = exp(−κ(−(ln x) ξ )) in (3) with parameter choices δ = 0.3, κ = 2.5 and ξ = 0.5.
The left panel of Figure 1 shows V together with the admissible area (in white) corresponding to the fulcrum value δ = 0.3; the restriction arises from the aforementioned square property. The centre and right panels show the implied volatility proxy transformation T (x) = Φ −1 (V(F X (x))), where Φ is the standard normal df, and the profile function g T ; note that T is just one possible member of the equivalence class defined by F X and V. The changepoint µ T = F X (δ) is negative and is marked by a dashed vertical line; two further dotted vertical lines show that an X value at -2 is associated with higher volatility than a value at 2. The profile function is determined by (see The dashed line shows the profile function for a symmetric volatility proxy transformation. In this example, the shape of the curve g T (x) indicates that, for x ⩾ 0, a realization at µ T − x has a greater effect on volatility than a realization at µ T + x, but that this effect wears off for larger x.
Note that, as the fulcrum parameter δ → 0 in (3) the v-transform tends towards the function V 0 (u) = u on (0, 1] and as δ → 1 it tends towards the function V 1 (u) = 1 − u on [0, 1). We refer to former corresponds to any strictly increasing transformations of a continuously distributed random variable X and the latter to any strictly decreasing function.

Stochastic inversion of a v-transform
Our aim is to develop stochastic processes for the process of uniform random variables (V t ) depicted in diagram (1), and hence to build processes (X t ) to model financial returns. To this end we need to be able to invert a v-transform V, but this is complicated by the fact that V is not one-to-one.
Stochastic inversion refers to the process of randomly reversing a v-transform to arrive back at one of the two dual points that yield the same value. To develop stochastic processes for (X t ) in which we have control over the marginal distribution, we need to be able to do this in such a way that uniformity is preserved under the stochastic inversion. We introduce some further notation. Let V −1 denote the partial inverse given by V If two uniform random variables are linked by the v-transform V = V(U ) then the joint distribution function of (U, V ) is a special kind of copula. McNeil (2021) showed that, conditional on where the function is referred to as the conditional down probability of the v-transform and satisfies E (∆(V )) = δ.
This allows the concept of the stochastic inversion function of a v-transform to be defined. This is simply a function that facilitates the construction of a Bernoulli event by which a value of V is randomly assigned to one of the dual points U and U * such that V(U ) = V(U * ) = V .
Definition 2 (Stochastic inversion function of a v-transform). Let V be a v-transform with conditional down probability ∆(·). The two-place function V −1 : is the stochastic inversion function of V.
It is obviously true that V(V −1 (v, w)) = v for any w. It is also simple to show that if V and W are independent U (0, 1) random variables and U = V −1 (V, W ), then U ∼ U (0, 1). This is because so U has the conditional distribution given in (5) and must be uniformly distributed.
When we apply a v-transform V(u) followed by a stochastic inversion of the v transform, then we either arrive back at the point u or at its dual point u * . In the next result we consider the sequence of uniformity-preserving transformations U → V(U ) → V −1 (V(U ), W ) for U and W independent and quantify the probability of arriving back at our starting point.
Proposition 1. Let U ∼ U (0, 1) and W ∼ U (0, 1) be independent random variables and let V be a vtransform with fulcrum δ and conditional down probability ∆(v).
We see that the probability P Ũ = U that we recover the original value of U is bounded below by δ 2 + (1 − δ) 2 . This value is attained for the linear v-transform in (7) since ∆(v) = δ for all v for that family. The global minimum value is 0.5, which is attained only for the symmetric v-transform V 0.5 . Interestingly, when asymmetry is present, there is a greater than 50% chance of recovering the original value.

V-transforms and inverse v-transforms of copulas
V-transforms and their stochastic inversion functions can be applied componentwise to random vectors. For vectors u, v and w in [0, 1] d we write Then we know that: 1. U is a uniform random vector or, in other words, its joint distribution is a copula. This is guaranteed by the independence of V i and W i for all i according to (8).
2. V(U ) = V regardless of the exact nature of the joint distribution of ( It is of interest to be able to determine the joint distribution of U under various assumptions about (V , W ). We give a result for the general case as well as the case where these vectors are independent of each other. This generalizes a result given in McNeil (2021) for the case where W 1 , . . . , W d are also iid. To state this result compactly we introduce the notation and the vector form of the latter p δ,u (x) = (p δ,u 1 (x 1 ), . . . , p δ,u d (x d )) ′ . Note that δ(u) is the probability that the v-transform of an observation at u is assigned to the left side of the fulcrum under stochastic inversion. We now state the main result of this section.
Theorem 2. Let V be a v-transform and let {(V 1 , W 1 ), . . . , (V d , W d )} be a set of pairs of uniform random variables with the property that V i is independent of W i for all i. Assume the copula of (V , W ) has a joint density. Then the copula density where c V ,W denotes the joint copula density of (V , W ). When V and W are independent this reduces to the simpler form where c V denotes the copula density of V and C p δ,u (W ) denotes the copula of p δ,u (W ). When, in In the applied sections of this paper the focus will be on models of type (11) and it is instructive to consider the structure of the copula in more detail. For d = 2, Figure 2 illustrates and c U (u 1 , u 2 ) for particular choices of parametric copulas for V and W and for the linear vtransform. We observe the characteristic cross-shape often observed in lag-plots for processes with stochastic volatility. The density in (11) is itself the product of two copula densities, c V (V(u 1 ), . . . , V(u d )) and the density To see that this is a density observe that c U (u 1 , . . . , u d ) = c W * (u 1 , . . . , u d ) when V is a vector of independent uniform variables so that c V (v 1 , . . . , v d ) = 1. In Figure 3 we illustrate for three different choices of v-transform the copula density c W * (u 1 , u 2 ), which controls the dependencies in directions of movements.

V-transforms of time series copula processes
The theory presented in the previous section can obviously be applied to the construction of time series copula processes (U t ) t∈Z that are suitable for modelling financial return data. By constructing a strictly stationary bivariate process (V t , W t ) t∈Z we obtain a strictly stationary process for (U t ) through the stochastic inversion construction of Theorem 2. In these models (V t ) can be thought of as accounting mainly for stochastic volatility (serial dependence in the magnitude of movements) while (W t ), if it is not an iid process, can account for extra serial dependence in the direction of price movements.
We restrict attention to the special case where (V t ) and (W t ) are independent processes as this results in the tractable joint density for (U t ) in (11) and permits likelihood-based inference. Within this framework we consider models where (W t ) is either strict white noise (an iid process) or a first-order Markov process. In comparison with stochastic volatility, dependence in the signs of asset returns is a relatively weak and transient phenomenon and first-order Markov models for (W t ) appear to be sufficient in the majority of datasets we have considered.
The process (V t ) will be modelled by using a stationary d-vine copula process of Markov order k yielding the class of vt-d-vine models, which complement the vt-ARMA class of models in McNeil (2021). First-order Markov dependence in (W t ) will be modelled using a d-vine process of order k = 1, i.e. the kind of model considered in Chen and Fan (2006) and Domma et al. (2009).
The aim of this section is to develop copula processes for (V t ) and (U t ) that fulfill the requirement of strict stationarity.

D-vine copula processes
Using the theory described in Smith et al. (2010) the multivariate copula density c V of a random vector V = (V 1 , . . . , V d ) ′ can be decomposed as a d-vine taking the form where S t,t+i = {t + 1, . . . , t + i − 1} denotes the set of indices of the variables lying between V t and V t+i , c t,t+i|S t,t+i is a pair copula density (i.e. a bivariate copula density) describing the dependence between variables V t and V t+i conditional on these variables and denotes the conditional cdf of variable j conditional on the intermediate variables; note that S t,t+1 = ∅ and so the conditioning set is dropped in this case.
The decomposition (13) (13) of an arbitrary joint density may result in pair copulas whose functional forms depend on the values of the conditioning variables in the sets S j,i . However, in applied statistics, (13) is used as a framework for constructing rather than deconstructing models and interest is usually confined to so-called simplified pair copula constructions in which the copula forms are invariant to the values of the conditioning variables and are chosen from a number of well-known parametric families. In this case we can simplify the copula notation to c t,t+i = c t,t+i|S t,t+i . There has been quite a lot of interest in the question of whether simplified constructions are sufficiently flexible and robust to model all dependence structures; see Haff et al. (2010), Stöber et al. (2013), Spanhel and Kurz (2019) and Mroz et al. (2021). While these authors draw attention to limitations, the simplifying assumption still admits a rich class of copulas which generalize the dependence inherent in classical autoregressive time series models, as we now explain.
It is possible to construct strictly stationary time series (V t ) t∈Z in which the d-dimensional joint densities of random vectors (V t+1 , . . . , V t+d ) for d ⩾ 2 are given by the simplified form of the decomposition (13). In this case the stationarity requirement imposes the restriction that the pair copula densities c t,t+i may only depend on i and we can further simplify notation by writing c i = c t,t+i . Moreover, by setting c i = 1 (corresponding to the independence copula) for i > k we obtain Markov processes of order k and reduce the number of copulas that need to be determined. Models of this kind are investigated by Brechmann and Czado (2015) (under the name COPAR), Smith (2015) and ; in the latter paper it is shown that d-vines are the only regular vines that can be used to construct stationary univariate time series.
for a sequence of bivariate copula densities c 1 , . . . , c k .
A d-vine(k) copula process is fully determined by the k copula densities corresponding to each of its conditional dependencies, or generalized lags. When k = 1, the copula process reduces to that of Chen and Fan (2006). When the copula densities are Gaussian, the dependence structure is that of a Gaussian AR(k) model, and the parameters of the pair copulas in the d-vine model are the partial autocorrelations of the underlying AR(k) process. In general, for k > 1, the main practical difficulty lies in the calculation of the expressions v t|S t,t+i and v t+i|S t,t+i for i > 1. This can be done where h , denotes the partial derivative or h-function of the copula C i . Thus the problem is recursively reduced to the problem of evaluating h-functions of bivariate copulas (Joe, 1996). Note that when C i is an exchangeable copula (satisfying but when C i is non-exchangeable then both partial derivatives must be calculated explicitly. The d-vine process of Definition 3 is strictly stationary by design but the questions of ergodicity and mixing are trickier. A number of authors including Chen and Fan (2006), Beare (2010) and Longla and Peligrad (2012) have studied the first-order process (k = 1). Longla and Peligrad (2012) show that, if the density of the absolutely continuous part of a copula is strictly positive almost everywhere, the resulting process is β-mixing (absolutely regular) and therefore ergodic.
A variety of results have been obtained on the rate of mixing and these typically depend on the tail dependence characteristics of the copula. The Gauss and Frank copulas can be shown to be ϕ-mixing (Longla and Peligrad, 2012) and thus geometrically β-mixing, while the Gumbel, Clayton and t copulas satisfy the weaker property of geometric ρ-mixing (Beare, 2010); see Bradley (2005) for more details of mixing conditions. Zhao et al. (2018) extend the Markov chain approach to models with order k > 1 and give general conditions for geometric ergodicity, but do not address specific copula choices.

Vt-d-vine copula processes
Definition 4. Let V be a v-transform, let (V t ) t∈Z be a d-vine(k) copula process and let (W t ) t∈Z be any strictly stationary copula process that is independent of (V t ). Let (U t ) t∈Z be defined componentwise by setting U t = V −1 (V t , W t ).
1. If (W t ) is an iid process (strict white noise) we say that (U t ) is a vt-d-vine(k) copula process.
It follows immediately from Theorem 2, by inserting the d-vine(k) density (14) in the general expression (11) and using the notation (12), that the joint density of a gvt-d-vine(k) process is .
Remark 1. Under the degenerate v-transform V 0 with fulcrum set at zero we have that V 0 (u) = u for u ∈ (0, 1]. In this case the joint density of a vt-d-vine(k) process reduces to c U (u 1 , . . . , u d ) = c V (u 1 , . . . , u d ) so that a d-vine(k) copula process model can be considered as a boundary case of a vt-d-vine(k) model.
As noted earlier, we will use d-vine(1) models for the process (W t ). In this case the term c W * (u 1 , . . . , u d ) in the joint density (16) can be fully expressed in terms of bivariate copula densities.
When d = 2 let the joint distribution function of (W 1 , W 2 ) be denoted C W = C W and let c W * = (1 − W 1 , W 2 ) respectively; these are the copulas obtained by rotating the distribution described by C W through 90, 180 and 270 degrees clockwise. We then have the following expression for c W * .
The conditional density of the resulting gvt-d-vine(k) copula process can be calculated from (16) and (17) and takes the form vt=V(ut) . Example 2. We construct a strictly stationary time series based on a gvt-d-vine(3) process with the follow specification. The underlying d-vine(3) copula process has a 180-degree-rotated Clayton copula (θ = 0.7) at lag 1, a t copula (ν = 5 and ρ = 0.2) at lag 2 and a Joe copula (θ = 1.5) at lag 3.
The v-transform V and marginal distribution F X are as in Example 1; in particular, the marginal distribution is Student t. The d-vine(1) copula process for (W t ) uses the Gaussian copula (ρ = 0.7). Figure 4 shows a realization of length n = 2000 from the resulting process (X t ) and Figure 5 shows that there is strong serial dependence in (|X t |) as well as serial dependence in (X t ) and between (X t ) and (|X t |).

Estimation
We now turn to the statistical estimation of the processes defined in the previous section. Let x = {x 1 , . . . , x n } denote a realization from a strictly stationary process with parametric marginal distribution F X (x; θ m ) and joint copula density c U (u 1 , . . . , u n ; θ c ). The full log-likelihood is In copula-based inference it is common to build up the model componentwise by estimating marginal distribution and copula model in successive steps. We follow the inference-functions-for-margins (IFM) approach (Joe, 1997) in which we first maximize L 1 (θ m ; x) = n i=1 log (f X (x i ; θ m )) to obtain estimates of the marginal parameters θ m and then maximize L 2 (θ c ; u) = log (c U (u 1 . . . , u n ; θ c )) using pseudo-copula data {u i = F X (x i ; θ m ), i = 1, . . . , n} to obtain θ c . Moreover, in the second step we use a variant on the sequential procedure proposed by Aas et al. (2009) to select pair copulas one by one; we refer to this as incremental copula inference. The final stage of our method is a joint maximization of (19) over all model parameters (with the exception of the fulcrum δ, for reasons we later explain) using the estimates from IFM as starting values.
Statistical theory for copula inference for time series is a developing area. For first-order Markov processes, Chen and Fan (2006) showed consistency and asymptotic normality of a two-step proce- In the following sections we elaborate first on the critical marginal modelling phase before discussing inference for the different elements of the copula density c U .

Marginal modelling
We are free to choose the best-fitting marginal distributions we can find for the data. In contrast, well-known econometric models are more constrained in the marginal behaviour they can model. In particular, many time series models in the GARCH family have the property that the resulting tails of the marginal distribution are regularly varying regardless of the choice of innovation distribution, i.e. they follow a power law (Mikosch and Stărică, 2000). However, in real asset return data we often encounter situations where tail behaviour differs in the two tails and one or both may be lighter than a regularly-varying law would dictate.
See, for instance, Figure 6. For three financial datasets exhibiting stochastic volatility, which will be analysed in Section 5, we show the well-known Hill estimator of the tail index of a regularlyvarying law (see Hill (1975) and De Haan and Resnick (1998) for the basic properties of the Hill estimator). These plots should stabilize towards the left-hand end at a value greater than zero if a power tail is justified, but all these plots appear to continue to decay towards zero.
To model this behaviour, as well as the asymmetry of tails, we introduce a simple mixture of positive-valued distributions as a model for marginal distributions. The specific families that we consider are the generalized gamma distribution and the Burr distribution. In terms of extreme value theory (EVT) the first of these belongs to the Gumbel domain of attraction, and the second belongs to the Fréchet domain of attraction (Embrechts et al., 1997). In both cases, the associated In general, we define a two-sided mixture corresponding by taking a density f 0 (· ; η) on the positive half-line and a parameter p ∈ [0, 1] and setting A mixture of this form can also be interpreted as the result of splicing densities at the origin in a non-continuous manner 2 . In the case of the generalized gamma distribution while for the Burr distribution Other choices for f 0 such as the log-gamma distribution can also give good results but the generalized gamma and Burr are good models for the data we analyse in this paper.
We have also compared these models with various two-sided distributions that are common in 2 It is also possible to splice densities continuously but our results suggest that this results in slightly inferior fits. modelling log-returns such as the Student t, skewed Student t and normal inverse-Gaussian (NIG) distributions. The mixtures generally give statistically superior fits.

Incremental copula inference
In our model, the first term L 2 (θ c ; u) = log (c U (u 1 . . . , u n ; θ c )) in the log-likelihood (19) is maximized with respect to the copula parameters θ c using pseudo-copula data {u i = F X (x i ; θ m ), i = 1, . . . , n}. The term L 2 (θ c ; u) further splits into a term coming from the copula c W * in (16) and a term coming from the d-vine copula model for (V t ) which may be written as where θ i = (θ 1 , . . . , θ i ) and θ i denotes the parameter of the ith pair copula. Note that the terms c i v t|S t,t+i , v t+i|S t,t+i ; θ i depend on θ i through the copula c i and on θ 1 , . . . , θ i−1 through the conditional distributions v t|S t,t+i and v t+i|S t,t+i . We assume to begin with that the parameter(s) θ v of the v-transform are fixed and postpone a discussion of their estimation to the next section.
We select pair copulas from a number of widely-used one-parameter families, these being Gauss, Gumbel, Clayton, Frank and Joe, as well as rotations of the Gumbel, Clayton and Joe copulas through 180 degrees. We have experimented with non-exchangeable generalizations of these copulas using the construction of Liebscher (2008). These break the time reversibility of the resulting process but only rarely deliver gains in fit to justify the extra complexity and additional parameters. We also omit the two-parameter t copula from consideration since it adds little in the examples we present and since the restriction to one-parameter copulas permits the use of fast sequential method-ofmoments estimation in our forecasting study.
In the incremental method we choose the copula families for c 1 , . . . , c k sequentially but we continually re-estimate all parameters of previously chosen copulas. We also use the incremental method to estimate the order k of the process. The algorithm consists of an initialization step and a continuation step: Step 1: Select the copula family c 1 and parameter value θ 1 that maximize L 1 (θ 1 , θ v ; u).
Step j: With copula families c 1 , . . . , c j−1 already determined, select the copula family c j and parameter values θ j that maximize j i=1 L i (θ i , θ v ; u). If the AIC of the resulting model of order j is lower than the previous model of order j − 1, continue; otherwise stop and set k = j − 1.
This differs from the sequential method used by Aas et al. (2009) and analysed in Nagler et al.
(2020) since we do not hold the parameters of previously selected copulas fixed at their previously estimated values. This should reduce the risk of parameter error percolating through the procedure and lead to estimates that are even closer to full ML estimates than those obtained using the sequential method.

Estimating the v-transform
A natural approach to estimating the parameters θ v of the v-transform is to optimize over these in an outer loop, while applying the incremental procedure of the previous section in an inner loop. This is feasible, albeit very computationally intensive, since the incremental procedure is itself a greedy algorithm.
Care must be taken in the optimization with respect to the fulcrum parameter δ. If δ = u i = F X (x i ; θ m ) for some original data point x i , then V(u i ) = 0 and the log-likelihood for the copula takes the value −∞. This implies that the profile likelihood of δ is not differentiable at such points and has multiple local maxima. We thus use a grid search for an optimal estimate of δ, avoiding the u i values, rather than continuous optimization in the interval [0, 1]. The key observation is that δ is a threshold parameter and pseudo copula observations u i on either side lead to different responses.
The situation is akin to that which arises for TAR and other threshold models in econometrics; see, for example, Tong (1983) and Hansen (2000). In these models it is typical to use conditional (or profile likelihood) inference for other parameters while the threshold parameter δ takes fixed values on a grid.
In our empirical examples we use a procedure that is simpler on two counts, and therefore faster.
First, we restrict attention to the linear v-transform. This is sufficient to capture the main features of stochastic volatility and outperform competitor models in the examples we present; moreover, it facilitates value-at-risk (VaR) forecasting and backtesting with the fitted models, as we show in Section 6. Second, we optimize over the fulcrum parameter δ in the d-vine(1) model of Step 1 above, rather than the full d-vine(k) model.

Estimating the copula C W
In a gvt-d-vine copula process the log-likelihood contains an extra additive term which can be fully expressed in terms of C W (δ(u t−1 ), δ(u t ); θ W ) for the bivariate copula C W in (17). A final step is added to the stepwise method in which this additional term is maximized with respect to θ W .
In practice, the extent to which the parameter(s) θ W of a copula C W can be identified from data depends on the extent to which the set of points S = {(δ(u t−1 ), δ(u t )), t = 2, . . . , d} fills the unit square [0, 1] 2 , which depends in turn upon the choice of v-transform. In the special case of the linear v-transform, S is identical to the singleton {(δ, δ)} meaning that we can only identify the value of C W at this point (see the left panel of Figure 3).
For the case of a non-linear v-transform we recommend picking a copula family C W that can model positive dependence, negative dependence and independence; an ideal candidate is the radially symmetric Frank copula which involves only simple functions and is quick to evaluate. For the linear transform we can actually circumvent the evaluation of C W entirely. The likelihood can be

A graphical method using generalized lagging
We use a graphical method to gain insight into the fitted pair copula model. Having fitted the model, we first construct pseudo-copula data {v i = V(F X (x i ; θ m ); θ V ), i = 1, . . . , n} from c V . We then use a sequential method to reconstruct, for i = 1, . . . , k, the datasets which are modelled by copula c i in the likelihood contribution (21). The points of the lag 1 dataset 2,t = (v t , v t+1 ), t = 1, . . . , n−1, and these are the points that would be used in a first-order lagplot. For i < k the recursive formulas (15) then allow the recursive construction The h-functions of the previous copula C i are always required to construct the next dataset D i+1 .
We refer to the iterative procedure as generalized lagging since a scatterplot of the points in the dataset D i may be thought of as a kind of lagplot at generalized lag i. In the graphical method we plot sample Kendall's tau values for each dataset D i against the theoretical values for the fitted copulas c i to see the degree of correspondence between sample and fitted measures.
The generalized lagging procedure also suggests a fast alternative to the sequential ML estimation method of Aas et al. (2009). We could use the dataset D i at lag i to estimate θ i by a method-of-moments procedure that exploits the one-to-one relationsip between θ i and Kendall's tau for the copula families of interest. This is a much faster method than our main incremental method and we use it in a rolling estimation and backtesting study at the end of the paper.

Data and models
In this section we consider three different financial datasets and their estimation using the models and methods introduced in the previous sections. The first dataset consists of the log-returns of the Bitcoin-USD exchange from January 1, 2016 to November 1, 2019. The second dataset comprises the log-returns of the WTI crude-oil price from January 1, 2015 to February 12, 2019. Finally, the third series consists of the log returns of the price of the PCL stock from January 1, 2006 to January 10, 2010. Each series consists of 1000 observations 3 .
The first two datasets do not exhibit strong serial correlation (as is usual for log-returns), but they do show stochastic volatility. In contrast, both serial correlation and stochastic volatility are present in the third dataset. The third dataset was identified by taking all S&P500 stocks for the 2006-10 time period and selecting those with the largest absolute values of first-order serial correlation in log-returns. Thus it can be considered as a more extreme example of the level of serial correlation that is present in raw log-returns and a good candidate series for exploring the added value of a gvt-d-vine model over a standard vt-d-vine-model.
In addition to the vt-d-vine model introduced in this paper we also consider a vt-ARMA model of the type presented in McNeil (2021). To examine the extent to which higher-order Markov models are necessary, we also compare with a vt-d-vine model of order 2. From the wider GARCH family inspired by the ideas of Engle (1982) we fit the standard GARCH model of Bollerslev (1986), the exponential GARCH model of Nelson (1991) and the GJR-GARCH model of Glosten et al. (1993).
We also vary the choice of innovation distribution for these models. The optimal vt-ARMA and GARCH model orders are selected by minimization of the AIC criterion.  Table 1 shows the estimates for the vt-d-vine models fitted to the datasets. The parameters of the pair copulas are simply denoted θ i and the names of the selected copulas are given. The marginal model used for both the vt-processes is a mixture of positive and negative distributions as in (20) amounting to 7 marginal parameters when using the generalized gamma (for the first two datasets) and Burr (for the third dataset) distributions. Marginal parameters are denoted η − j and η + j for the negative and positive tails respectively. The fulcrum estimates are δ = 0.45, 0.3, 0.55 for the three datasets. Table 2 summarizes measures of fit for the best fitting model within each of the five considered classes with each table relating to a different dataset. We see that vt-d-vine models (and sometimes also vt-ARMA models) fare favourably against the alternatives from the GARCH family. It is evident that the vt-d-vine model of order 2 is not at all competitive for the BTCUSD and PCL data, but it does much better for the WTI dataset, where it gives the second lowest AIC model after the vt-d-vine model of order 7.  For the PCL data, a gvt-d-vine model incorporating a Frank copula for the (W t ) process gave a significant improvement over the best vt-d-vine model according to a likelihood ratio test. In terms of log-likelihood, it was also superior to any combined ARMA-GARCH model using the GARCH specifications of Table 2 although the AIC value of the best ARMA-GARCH model does 'catch up'

Parameter estimates and model comparison
with the AIC value of the best gvt-d-vine model in this case.
Note that the Markov nature of the d-vine models means that more lags need to be considered in this approach to obtain similar behaviour to models in the GARCH family which effectively have a 'moving-average term' for modelling volatility.

Graphical analysis of fit
We assess the fit of the models graphically in Figures 7, 8 and 9. In addition to a time series plot of each dataset and a QQ-plot against the fitted marginal distribution, we show four other plots.
The first of these is a plot of a partial rank autocorrelation function in which the empirical values of Kendall's tau calculated from the generalized lagged datasets D i in (23)  The fourth plot for each series shows an estimate of the profile function g T of the volatility proxy transformation T as defined in Section 2.1. The fifth plot shows the volatility proxy function T (x) = Φ −1 (V(F X (x))). The points show an empirical estimate of this relationship while the curve is the parametric estimate implied by the fitted model. The final plot is an illustration of an interval-valued risk measure that we call ViVaR which is described in Section 6.1.
Overall, these sets of plots suggest satisfactory fits. For the PCL dataset, the curvature of the marginal QQ-plot does suggest a slight lack of fit in the tails. This seems to be introduced in the final joint optimization over all parameters and suggests a lack of fit in the copula component, possibly caused by some remaining misspecification of the tail dependencies in the copula model.
The estimated marginal distribution obtained in the first step of the IFM model yields a better looking QQ-plot (omitted).
The estimated profile function g T for the Bitcoin data is that of a symmetric volatility proxy function. Moreover the smallest value of the volatility proxy corresponds with a zero return. For the WTI and PCL data there is more asymmetry. The estimated values of µ T are not equal to zero and both profile functions satisfy g T (x) > x for large x. This can be interpreted as large negative log-returns contributing more to the volatility proxy variable.

Using the Model for Forecasting
In this section we consider the out-of-sample performance of the best models from the previous section, looking in particular at predictions of value-at-risk (VaR).

Prediction of value-at-risk measures
In our most general model, conditional quantiles of the forecast distribution F Ut|U t−1 ,...,U t−k can be calculated via numerical integration of the conditional density in equation (18). In the model with linear v-transform considerable simplification is possible. In this case, using the shorthand notation U t−1 = (U t−1 , . . . , U t−k ), the conditional distribution function satisfies, for u t ⩽ δ, where the second term satisfies where ρ U is the correlation parameter describing serial correlation in the (U t ) process in (22). In the case where ρ U = 0 we simply have d(u t−1 ; δ, ρ U ) = δ.
To find the α-VaR at time t for α ≫ 0.5 we have to solve the equation for v t and then compute VaR t,α = F −1 X (V −1 (v t )). The calculation is facilitated by the fact that V t depends on the past values of U t−1 only through the past values of V t−1 = V(U t−1 ) and hence where the final expression uses the notation of Section 3 and we recall that the conditional probabilities v t|S t−k−1,t can be calculated recursively.
While full calculation of VaR is relatively straightforward in models with a linear v-transform, the above calculations suggest that in general models an alternative measure of risk could be considered by concentrating on quantiles of the conditional distribution of V t in the first term of (24). For α ≫ 0.5 let v α,t be given by Iα,t ∁ and thus the interval labelled I α,t plays the role of a set-valued risk measure with the interpretation that its probability is exactly equal to α and all points outside it lead to values of the volatility proxy that exceed the α-quantile of volatility. We refer to I α,t as a volatility-implied VaR interval (ViVaR) and have calculated it for the examples of Section 5.

Out-of-sample forecasting experiments
We evaluate the predictive performance of the vt-d-vine models using a rolling backtesting procedure typical for financial risk applications; we construct models using a moving window of n = 500 data, calculate the one-step-ahead predictive distribution and then validate the model using probabilityintegral-transform (PIT) values and value-at-risk (VaR) exceptions. Data are taken for a 5-year period; these are the years 2015-19 for BTCUSD and WTI and the years 2006-10 for PCL.
To facilitate backtesting of vt-d-vine models in reasonable time, we use an accelerated fitting procedure based on the sequential estimation idea described at the end of Section 4.5. We follow the IFM approach and first estimate the marginal distribution F X . We then fix k so that we do not optimize over model order; k = 5 is sufficient to give good results for BTCUSD and WTI, while k = 10 is required for the PCL dataset. We estimate parameters using the sequential methodof-moments procedure based on Kendall's tau. For PCL we restrict attention to vt-d-vine models without serial dependence in (W t ).
The PIT method follows the general model validation approach of Diebold et al. (1998). Suppose at time t we have data x t−n , . . . , x t−1 . We estimate the marginal model F X and use this to obtain transformed data u t−n , . . . , u t−1 on the u-scale, which we use to estimate the copula model. The conditional cdf is given by (24) and this can be used to calculate a PIT value u t = F Ut|U t−1 (u t | u t−1 )   The VaR exception tests are based on comparing a sequence of α-VaR estimates (VaR t,α ) at times t with the corresponding realized values (x t ) and counting exceptions (x t < − VaR t,α ). In  Table 4: Results of two-sided binomial score test: n is number of trials; ne number of exceptions and pB is the test p-value. VaR is estimated at level α = 0.95.

Conclusion
The results in this paper and further unreported analyses suggest that many volatile time series of log-returns on financial assets can be successfully modelled using d-vine copula processes in conjunction with v-transforms and asymmetric mixed marginal distributions. In many cases the in-sample fits and out-of-sample predictions are superior to those obtained from all the most widely used members of the extended GARCH family. It is noteworthy that the best fitting models are higher-order Markov models with relatively short memory. In the examples of this paper the orders are 7, 10 and 13, corresponding to no more than 3 weeks of trading days.
The class of models in this paper could be extended in a number of directions. An obvious topic of interest is multivariate time series models for multiple volatile return series. The m-vine model of Beare and Seo (2015) and the recent work on s-vine models by  suggest directions for generalizing the vt-d-vine model to the bivariate and multivariate case.
Remaining in the univariate case, it would be interesting to find further tractable models for the bivariate process (V t , W t ) that is used to construct the process (U t ) under the stochastic inversion operation of Theorem 2. We have restricted attention to processes (V t ) and (W t ) that are fully independent of each other, although the requirement of the result is simply that the variables V t and W t are contemporaneously independent for all t. Cross dependencies between the processes at different lags could be useful for modelling feedback effects between the signs of log-returns at time t and volatility at future times t + k, or vice versa, although statistical estimation of the resulting models is likely to be challenging.
Despite the current popularity of vine copula models, there are two areas where more theoretical work is needed. The first is the study of the serial dependence or mixing properties of vine-copulabased time series. The papers of Beare (2010) and Longla and Peligrad (2012) make important contributions in the first-order Markov case but mixing and ergodicity results for more general processes would be valuable, particularly if the conditions can be checked for concrete combinations of pair copulas. The second area is statistical inference for vine-copula-based time series, where the recent paper of  is one of the first to address the challenge of underpinning widely-used and intuitive stepwise procedures with regularity conditions to guarantee the usual desirable properties of estimators.

Software
The analyses were carried out using R and the tscopula package (McNeil and Bladt, 2020) available on CRAN and at https://github.com/ajmcneil/tscopula. This makes use of code for vine copulas from the rvinecopulib package (Nagler and Vatter, 2020).
Now fix the point (u 1 , . . . , u d ) ∈ [0, 1] d and consider the set of events A i (u i ) defined by The probability P(A 1 (u 1 ), . . . , A d (u d )) is the probability of the orthant defined by the point (u 1 , . . . , u d ) and the copula density at this point is given by The event A i (u i ) can be written and hence the probability of the event d i=1 A i (u i ) can be written c V ,W (v 1 , . . . , v d , z 1 , . . . , z d )dv 1 · · · dv d dz 1 · · · dz d .
Taking the derivative of this expression with respect to u 1 , . . . , u d and using (A.1) and (A.2) yields (10).
When V and W are independent the joint density factorizes and (11) follows by noting that Clearly (11) reduces to the simple form c V (V(u 1 ), . . . , V(u d )) when C W is the independence copula.