Copulae: An overview and recent developments

Over the decades that have passed since they were introduced, copulae still remain a very powerful tool for modeling and estimating multivariate distributions. This work gives an overview of copula theory and it also summarizes the latest results. This article recalls the basic definition, the most important cases of bivariate copulae, and it then proceeds to a sketch of how multivariate copulae are developed both from bivariate copulae and from scratch. Regarding higher dimensions, the focus is on hierarchical Archimedean, vine, and factor copulae, which are the most often used and most flexible ways to introduce copulae to multivariate distributions. We also provide an overview of how copulae can be used in various fields of data science, including recent results. These fields include but are not limited to time series and machine learning. Finally, we describe estimation and testing methods for copulae in general, their application to the presented copula structures, and we give some specific testing and estimation procedures for those specific copulae.


| INTRODUCTION
In the modern era, vast amounts of data have become readily available. This offers the opportunity not only to investigate how single variables behave but also to examine the joint distribution of two or more variables, be it in economics (Kole et al., 2007;Oh & Patton, 2018;Salvatierra & Patton, 2015), medicine (Gomes et al., 2019;Kuss et al., 2014;Lapuyade-Lahorgue et al., 2017), climate research (Oppenheimer et al., 2016;Schölzel & Friederichs, 2008), hydrology and geophysics (Liu et al., 2018;Salvadori et al., 2007;Valle & Kaplan, 2019), engineering (Kilgore & Thompson, 2011), biology (Dokuzo glu & Purutçuo glu, 2017;Konigorski et al., 2014), transportation research (Bhat & Eluru, 2009;Huang et al., 2017;Ma et al., 2017), or any other field. However, describing and estimating the properties of these multivariate In Memory of Abe Sklar. distributions is complicated, even though the computation power has also increased massively in the last few decades. Copulae are special forms of bivariate and multivariate functions and they are an elegant way to express those multivariate distributions. Sklar (1959) showed that a copula exists for any multivariate distribution, such that the joint distribution equals the copula applied to the marginal distributions. This is very advantageous when estimating the multivariate distribution because the problem now reduces to estimating the univariate distributions, which is much easier, and their dependency structure. It also enables us to take a wider class of distribution functions into account, and model more complicated multivariate distributions more easily and efficiently. Apart from being interesting from a purely mathematical perspective, copulae have a wide range of applications. One of the first practical applications came from economics, where it was used to model economic risk; see Embrechts et al. (2002). The interest in copulae reached a temporary peak among practitioners during the financial crisis in 2008, where they were used to model complex financial derivatives, especially Collateralized Debt Obligations (see Li, 2000). Nowadays, their field of application ranges from financial risk modeling and asset prediction to weather forecasting (Schefzik, 2015) and detection of Martian sand dunes (Carrera et al., 2019). Essentially, copulae can be useful whenever we encounter a joint distribution. Over the last few decades, researchers from the field have proposed a list of text-books that are strongly recommended for everybody interested in copulae- Joe (1997), Cherubini et al. (2004), Nelsen (2006), Kurowicka and Cooke (2006), Mai and Scherer (2012), Joe (2014), Durante and Sempi (2015), Hofert et al. (2018), to name a few-and review articles, such as Patton (2012), Okhrin et al. (2017), Durante and Sempi (2010), and Genest and Nešlehová (2014). Some very useful overviews on the historical developments of copula theory are provided by Schweizer (1991) and Durante and Sempi (2010).
The rest of this article is structured as follows. First bivariate (Section 1) and then multivariate (Section 2) copulae are defined and the most prominent classes of copulae are introduced. Section 3 introduces copulae to other fields of statistics and Section 4 introduces copulae to machine learning. Section 5 takes a closer look at estimation, while Section 6 deals with dependency measures and testing copulae. The final section draws a conclusion.

| BIVARIATE COPULAE
In a simple and the most concise version, copulae can be defined as follows.
Definition 1 A copula C(u, v) is a restriction of a bivariate distribution to the unit square with uniform margins.
The first fundamental result regarding copula theory was stated by Sklar (1959). This lays the foundation for all research in this area as the statement allows the decomposition of any bivariate distribution into the univariate distributions together with the dependence structure.
Theorem 1 (Sklar's Theorem) For any bivariate distribution function F with margins F 1 and F 2 , there exists a copula C such that If F is continuous, then the copula C is unique. Otherwise, it is uniquely determined on F 1 R ð Þ× F 2 R ð Þ. The converse is true as well: for any copula C and univariate distribution functions F 1 and F 2 , the function C{F 1 (x), F 2 (y)} is a bivariate distribution function with margins F 1 and F 2 .
Interestingly, Sklar (1959) did not state a formal proof of the theorem now holding his name, although his work is a fundamental breakthrough in statistical thinking; see Durante and Sempi (2015). For many years, most results regarding copulae were contained in Schweizer and Sklar (1983). Since a copula is a distribution function, analogously to the density of any other differentiable distribution function, the density of a copula, if it exists, is given through: As shown in Nelsen (2006), copulae are invariant under strictly monotone transformations of the margins, which implies that they cannot model dependency structures that are not invariant under monotone transformation. To establish a better understanding of what copulae are and what they can look like, we will consider a few simple examples.
Probably the simplest and most straightforward copula is the independence copula Π(u, v) = uv. This is the copula of two stochastically independent random variables because for those it states that F(x, y) = F 1 (x)F 2 (y). The other two very basic dependencies between two variables are perfect positive and perfect negative dependencies. These can also be modeled via copulae. For the positive case, this gives M(u, v) = min (u, v). Thus, for random variables X, Y with distribution functions F 1 (x), F 2 (y) and the joint distribution M{F 1 (x), F 2 (y)}, the random variable X is an increasing function of Y, and vice versa, because the copula is symmetric. The converse, with decreasing instead of increasing, holds true for the joint distribution function W{F 1 (x), F 2 (y)} with W(u, v) = max(0, u + v − 1), which is a copula again. Those two copulae establish boundaries as any other copula has to lie between M and W, which we call the Fréchet-Hoeffding upper and lower bounds, respectively. Because neither M nor W is differentiable in u or v, they do not possess a density.
The following subsection will introduce the most important and most frequently used ones.

| Extreme-value copulae
Assuming that X 1 and X 2 are two random variables with margins of F 1 and F 2 , respectively, then the extreme-value (EV) copulae model the dependence between the maximum of n iid values generated from F 1 and the maximum of n iid values generated by F 2 . A copula C is shown to be an EV copula if and only if it is max-stable, meaning for all u, v in [0, 1] and n ℕ. Any other copula C F satisfying C F (u 1/n , v 1/n ) ! C(u, v) as n ! ∞ is said to be in the domain of attraction of the EV copula C. This definition can easily be extended to arbitrary dimensions and these copulae are uniquely determined by a tail dependence function that is used when estimating them. EV copulae are often used in financial models and they are also used in hydrology to model flood (c.f. Sraj et al., 2014). More on detailed recent advances on EV copulae can be found in Gudendorf and Segers (2010), Kamnitui et al. (2019), Vettori et al. (2017), Andersen et al. (2018), and Kim et al. (2020).

| Elliptical family
The elliptical family arises from modeling, for example, the dependence structure underlying a bivariate Gaussian distribution using a copula. The bivariate Gaussian distribution has Gaussian margins. However, as the margins to which a copula is applied are not necessarily Gaussian, by applying the Gaussian copula to any marginal distributions one obtains not only a bivariate normal distribution but also any kind of two dimensional distribution with Gaussian dependency structure between the two variables. This is called Meta-Gaussian distribution. This can be directly achieved using Sklar's Theorem, as follows C u, v The procedure works for any bivariate distribution F(x, y) that has invertible marginal distributions F 1 and F 2 , and can be extended to multivariate distributions. However, the crux of this approach is that those inverses frequently do not possess explicit representation. Thus, if one can explicitly give the inverse distribution, then this is a straightforward way to model the joint distribution via copulae. Let us have a closer look at the Gaussian copula with the correlation coefficient between the two variables being ρ. The copula is thus given using the inverse distribution of the univariate normal distribution and the distribution function of the bivariate normal distribution, while the density has a closed form representation.
The class of Archimedean copulae (AC) is a rich family of copulae that is not computed using Sklar's Theorem. They depend on the generator function ϕ : [0, ∞) ! [0, 1], with boundary conditions ϕ(0) = 1, ϕ(∞) = 0 and complete monotonicity (−1) j ϕ ( j) ≥ 0, for all j {1, 2, …∞}. This generator can in general depend on several parameters, while most generators that are used to construct copulae depend on a single parameter ϑ. The definition of the Archimedean copula is as follows.
This implies that AC is symmetric. If the inverse of the generator function does not exist, then the generalized inverse is used instead. An informative list of generators and their corresponding copulae can be found in Nelsen (2006). Here we recall a brief list of those that are most frequently used. Gumbel (1960) copula Clayton (1978) copula and Frank (1979) copula While the Frank copula is the only elliptically contoured Archimedean copula, the Gumbel copula applied to extreme valued marginal distributions is the only bivariate extreme value distribution that belongs to the Archimedean family, as shown by Genest and Rivest (1989). In addition, any other Archimedean copula is in the domain of attraction of the Gumbel copula. To get a better intuition of what copulae look like, some contour plots of copulae applied to standard normal or t distributed (with ν = 2) margins are supplied; see Figure 1. A powerful property of AC is that they can be extended to higher dimensions, as we will see in the next section.

| Archimax copula
The combination of AC and EV copulae provides Archimax copulae, which is a class of copulae that allows us to model any asymptotic dependence between extremes; which was first introduced in Capéràa et al. (2000). It depends on the Archimedean generator ϕ, which blurs the extreme-value dependence structure, and a convex function A, which is the so-called Pickands dependence function, satisfying A(t) ≥ max{t, 1 − t} for t [0, 1].
If A(t) = 1, then the copula is just a regular Archimedean copula. Meanwhile, if ϕ(t) = exp(−t), then the copula reduces to an EV copula, and thus the name Archimax. More information on Archimax copulae and their generalizations to higher dimensions can be found in Bacigal et al. (2011), Wysocki (2013, Mesiar and Jágr (2013), and Charpentier et al. (2014). Genest and Jaworski (2020) observe how a prespecification of the degree of association affects the size of the class of Archimax copulae. This is done in general as well as for many of the dependence measures mentioned in Section 6.

| Generalized Pareto copula
Pareto copula can further help to find and model multivariate extremes. They are defined in the following way: if there is a norm k Á k such that C(u, v) = 1 − k (u, v) − (1, 1)k, then C is a Pareto copula. Pareto copulae can be easily extended to higher dimensions by using norms on R d instead of R 2 . It is worth mentioning here that not every norm induces a copula. It can be shown (see Aulbach et al., 2012) that a copula satisfies the extreme-value condition if and only if it is a generalized Pareto copula (GPC), implying that there is a connection between extreme-value copulae and GPC. The so-called δ-neighborhood of a GPC contains, roughly speaking, the copulae whose maxima have a polynomial rate of convergence of maxima in the sense that where, k Á k is the norm corresponding to the Pareto copula; for further details see Falk et al. (2011). Thus, it is of interest whether a copula is in the neighborhood of a GPC. A test based on χ 2 to verify if this is the case can be found in Aulbach et al. (2019). GPC are of use in extreme value analysis to estimate exceedance probability (see Falk et al., 2019).

| Marshall-Olkin copulae
When describing the failure time of one specific part of a system, one often assumes it to be exponentially distributed. When a system consists of several components, the failure times are generally assumed to be independent of each other. Then the whole system's failure time, assuming one component's failure makes the system fail, can be computed straightforwardly. However, in reality, this independence assumption is rarely fulfilled. For example, common factors such as maintenance or common shocks can affect the lifetime of all of the components of a system. The Marshall-Olkin distribution takes this into account. This distribution was introduced in Marshall and Olkin (1967) and it relies on independent shocks affecting the lifetime of one or more components. One can then derive copulae describing this kind of dependency. An overview of how to define those copulae in general with their properties, for example, tail dependence coefficients, can be found in Lin and Li (2014) and Botev et al. (2016). When the univariate distributions are all continuous, and there is only one shock that affects all the variables at once, then the construction of the Marshall-Olkin copula is quite simple using a function depending on the shock distribution, as shown in Durante et al. (2016).  provide an idea of how to combine different Marshall-Olkin copulae. AC can also be applied to the study of Marshall-Olkin distributions because the resulting dependence structure can be modeled using the generator of a trivariate Archimedean copula. This limits this approach's applicability because it implies exchangeability of the different survival times, which could instead be tackled using hierarchical AC (see later sections). However, it is still a widely usable approach that is easy to understand and to implement. Mulinacci (2018) investigated under which conditions the memory-less property of the Marshall-Olkin distribution also holds for the Archimedean based Marshall-Olkin distribution.

| MULTIVARIATE COPULAE
In practice, multivariate dependencies are in many cases more relevant than bivariate ones. Consequently, the results obtained for bivariate distributions should be extended to multivariate ones. To that extent, Sklar's Theorem can be extended to its multivariate version.
Theorem 2 For any multivariate distribution function F with margins F 1 ,..., F d there exists a copula C such that The converse is also true: for any copula C and univariate distribution functions Some multivariate copulae arise practically from the bivariate copulae that were previously established. The independence copula and the upper Fréchet-Hoeffding bound can be extended directly. In contrast, the lower Fréchet-Hoeffding bound is not a copula in dimensions higher than two because the perfect negative dependence in higher dimensions cannot be defined. The family of elliptical copulae can also be extended to the multivariate case. This is done in the same way as constructing bivariate elliptical copulae, but this time one can use the multivariate distribution and the inverses of the marginals of that joint distribution. For AC, the extension to higher dimensions becomes a little trickier.

| Hierarchical Archimedean copula
With very little effort, one gets a multivariate Archimedean copula by ϕ{ϕ −1 (u 1 ) + … + ϕ −1 (u d )}. But this limits their applicability because the distribution depends only on the parameter of the only one Archimedean generator that is used. The variables are also exchangeable because the sum commutes. This is where hierarchical Archimedean copulae (HAC) are useful: to construct a three-dimensional distribution, one replaces one of the margins in the bivariate AC with another bivariate AC. So, with ϕ 2 and ϕ 1 being generators of bivariate AC, This procedure can be further extended up to any arbitrary dimension by replacing one of the nested margins with another AC. In addition to this fully nested approach it is also possible to construct partially nested copulae. In this case, when extending the above mentioned to higher dimensions, then one of the not yet nested marginals gets replaced with an AC.
The dependency structure partly determines the HAC: ϕ 1 models the dependency between two of the marginals, ϕ 2 models the dependency between the resulting bivariate random variable and another marginal, and so on. This results in a tree-like structure on the marginals, see Figure 2.
Further informations on HAC, their structure, properties, corresponding simulation procedures and some extensions can be found in Hofert and Scherer (2011) Okhrin and Ristig (2014). Recent results by Górecki et al. (2021) show that HAC can be vastly improved by outer power transformation.

| Multivariate Archimax copulae
As mentioned earlier, Archimax copulae can also be generalized to dimensions higher than two. Therefore, as introduced in Charpentier et al. (2014), one uses the characterization of the EV copula via tail dependence function l, which satisfies that there exists an EV copula D such that l where, ϕ is the generator of a multivariate Archimedean copula. Again, if ϕ is the generator of the independence copula, this is an EV copula; while for l x 1 ,…, this is a multidimensional Archimedean copula. Therefore, the name Archimax copula still makes sense. Furthermore, for d = 2, this returns the definition of the bivariate Archimax copula. Further details on multivariate Archimax copulae can be found, for example, in Chatelain et al. (2020) and Mesiar and Jágr (2013).

| Factor copulae
Factor copulae are another family of copulae that is more than just an extension from the bivariate case. The idea of factor analysis is to explain observed realizations from the random variables via latent variables, see Härdle and Simar (2015). Factor copulae are based on this approach and arise from the representation of X i via a functional applied to latent variables and an error term. For example, using the linear additive model, X i = P m j = 1 α ij W j + ε i , where W j are the latent variables and ε i are the mutually independent error terms. Replacing linearity by a non-linear function and Gaussian factors by non-Gaussian ones, we can even, among many others, represent classical AC (see Oh & Patton, 2015).
Another more general construction of factor copulae arises from the idea that the dependency between U 1 , …, U d comes solely from other random variables V 1 , …, V m , meaning that conditioned on V j , the U i are independent. Assuming that all random variables are uniform on [0, 1], this gives us the following copula: The product in this formula comes from the fact that the U i are independent when conditioned on the V j and the joint distribution function of independent random variables equals the product of those distribution functions. Assuming that m = 1, C U i ,V 1 , and c U i ,V 1 are the joint cdf and density of a pair U i , V 1 , then one can write the density of C (u 1 , …, u d ) as The structure of a fully nested hierarchical Gumbel copula with parameters 2, 2.5, and 3 on the left side and the partially nested hierarchical Gumbel copula with parameters 2, 2.5, and 3 on the right side An overview over factor copula theory can be found in Joe (2014), and for further details we refer the reader to Krupskii and Joe (2013), Nikoloulopoulos and Joe (2015), and Krupskii and Joe (2015).

| Vine copulae
Vine copulae arise by dividing the dependency structure of d variables into 1 2 d d−1 ð Þ bivariate copulae, not only between two variables but also between two variables, conditioned on one or more others. One typically starts with a graphical representation of the dependence structure. Aiming to decompose the dependency structure into bivariate copulae, one starts with a tree, where the nodes represent the single random variables and the edges represent the copulae between those random variables. The second step is a tree with the former edges as new nodes. The edges represent copulae conditioned on one random variable. This procedure continues until the tree containing only two nodes is left. The idea of vine copulae was introduced in Joe (1996), and they were developed enormously in the 2000s, in particular by Bedford and Cooke (2002), Bedford and Cooke (2001), Kurowicka and Cooke (2006), and Aas et al. (2009).
There are three different main dependency structures: the canonical vine (C-vine), the drawable vine (D-vine) and the regular vine (R-vine). The canonical vine assumes that in each level, one node is connected with every other node in that tree. For example, having random variables X 1 , …, X d , in the first level, there is one copula for the dependence between X 1 and X 2 , another for the dependence between X 1 and X 3 and so on, up to the one for the dependence between X 1 and X d . The copula density corresponding to the graphical representation of a four dimensional C-vine is given through For the D-vine, at each level, for any node there are at most two edges connecting this node to other nodes, resulting in a line of copulae in the graphical representation. The copula density is thus This class is useful in modeling the quantile regression, see Kraus and Czado (2017). For an R-vine, it is necessary that in each tree there is an edge between two nodes if and only if those nodes represent edges from the former tree which have a node in common. For example, if there is a copula modeling the dependence between X 1 and X 2 , and another one modeling the dependence between X 2 and X 3 , then on the next level there is a copula modeling the dependence between X 1 and X 3 conditioned on X 2 . C-vines and D-vines are special cases of R-vines. We refer readers who are interested in R-vines and recent advances to Joe (2014), Killiches et al. (2018), and Torre et al. (2019). A good overview of vine copulae can also be found in https://www.groups.ma.tum.de/statistics/forschung/vine-copulamodels/.

| Copulae in time series
Time series analysis is an important field of statistics, with applications from economics through biometrics to meteorology. There are important dependencies to be studied because there are either joint distributions of one random variable at different time points, such as the exchange rate EUR-USD at different days, see Patton (2006), or joint distributions of several random variables at different time points, such as the temperature in different capitals on different days. Note that all of the statements in this section are made under the assumption that the random variables are continuous. Sklar's theorem also holds in a conditional version (see Patton, 2004). This means that, given a random vector Y t and an information set ℱ t − 1 being the canonical σ-algebra generated by Y t − 1 , Y t − 2 , …, the multivariate conditional distribution of Y t given ℱ t − 1 can be expressed via a copula applied to the conditional marginal distributions: ). An introduction of copulae to ARCH, GARCH and t-GARCH models can be found in Patton (2004) and. A more general approach is considered in Patton (2012), where the marginals of Y t are distributed as with Z t − 1 ℱ t − 1 being the information obtained before time t, b is the vector of parameters, ϵ it j ℱ t − 1 $ F it , so ϵ it is conditionally distributed via the marginal distribution, and ϵ t = (ϵ 1t , …, ϵ dt ) j ℱ t − 1 $ F εt = C t (F 1t , …, F dt ), which means that the joint distribution of the ϵ it is captured by a copula applied to the marginals. Both parametric and nonparametric approaches are possible for the distribution F it . The use of factor copulae in time series applications is discussed in depth in Oh and Patton (2018), while Stöber and Czado (2012) observe how vine copulae can be used to model dependencies over time.
Markov chains can be understood as time series, and thus the corresponding finite-dimensional distributions can be uniquely specified by the distribution function F and the copula C modeling the dependence between two consecutive time points. In that respect, the advances of copulae are, similar to the general use of copulae, that heavy-tailed dependency structures can be expressed this way. The conditions under which the Markov chain defined by a copula is stationary with geometric mixing rates can be found in Beare (2010). It was then shown by Beare (2012) that Markov chains generated using AC are ergodic if the generator of the copula is regularly varying at 0 and 1. Furthermore, Beare and Seo (2015) define the M-vine, which is unique on each set of nodes. The Markov property for this M-vine is achieved by requiring certain copulae in the vine copula construction to be the independence copula, while stationarity can be achieved by requiring some copulae to be equal.

| Clustering methods
No later than 2009, due to the financial crisis, clustering of time series data not only became important for business analysts to optimize portfolios but also it became useful in climate research. Copulae are very useful in clustering, as shown by Luca and Zuccolotto (2017) and Disegna et al. (2017). The clustering methods presented in these articles rely on different procedures. The first one relies on time-varying copula-based estimators via minimization of the value-at-risk instead of other dependency measures. The other clustering algorithm COFUST estimates the copula, it then measures the distance between the copula and the Fréchet upper-bound copula M, which can be further updated using spatial information. Finally, a fuzzy clustering algorithm is applied.
The question of how to cluster data not only refers to time series models. Copulae can also be used to improve existing clustering methods to enable adjusting to more flexible dependencies: many existing finite-mixture approaches rely on the marginals all belonging to the same distribution. This can be improved with Gaussian mixtures, while not increasing the computational effort endlessly for high dimensional data as in Zhang and Baek (2019). Another approach to clustering was taken in Joe and Sang (2016), where the clustering algorithm relies on aggregation variables. When clustering hierarchically, an improvement to clustering methods from Brechmann (2014) can be found in Su et al. (2019). Note that these clustering methods rely on hierarchical Kendall copula with Archimedean clusters.

| Blind source separation
Statistical methods can also be applied to separate a set of source signals from a set of mixed signals with hardly any knowledge about the mixing process. This is useful when dealing with different signals; for example, when trying to listen to a specific conversation in a crowded room, in image processing, or in medical applications such as EEG or medical imaging. A special case of blind source separation is independent component analysis (ICA), where one assumes the source signals to be non-Gaussian independent variables that are mixed according to an invertible matrix. Consequently, the remaining task to recover these source signals is to estimate the matrix. Several approaches to do so were recently developed; an overview can be found in Hyvärinen (2012), while an approach using non-local learning methods was provided by Isomura and Toyoizumi (2016). However, copulae can also be used to perform ICA, leading to COPICA, an approach introduced by Chen et al. (2015). Furthermore, copulae can also be used to generalize ICA by replacing the independence assumption with a dependence structure represented by a copula (see Ma & Sun, 2007).

| USING COPULAE IN MACHINE LEARNING
Artificial intelligence and machine learning in general are some of the most popular trending topics in both scientific research and popular science. As there is a broad intersection between statistical methods and machine learning, there is an active area of the use of copulae in the improvement of existing machine learning methods or the development of new ones.
Bayesian networks are a kind of expert system and are useful thanks to their ability to enable deductive as well as abductive reasoning. However, when using them in a fully automated way, both the structure and the parameters need to be estimated or learned, as it is called in machine learning literature. One of the first overviews of how copulae can be used to describe properties of Bayesian networks can be found in Elidan (2013). Establishing a Bayesian network neither on expert knowledge nor multivariate normal distributions but vine copulae is done in Pircalabelu et al. (2017). When the observed random variables are not only continuous but also of discrete nature, then the presented models need to be adapted because copulae are only unique if the marginals are continuous. In the case of Bayesian networks, this is solved by Karra and Mili (2016).
Copulae are of further use in machine learning. The so-called Information Bottleneck (IB) is a technique from information theory that is used in machine learning, which was initially used to find a tradeoff between compressing data and obtaining all information. However, in machine learning it is used to find a trade-off between the information that any hidden layer of a deep neuronal network contains about the input and the output. Copula can be used to describe the IB equation, which reformulates the minimization of the IB to find the best fitting copula density for the dependence between input and hidden layer, as shown in Rey (2015). The classical IB methods face some difficulties (see Wieczorek et al., 2018), such as problems with monotone transformations and assumptions on the marginals that might not be true in practice. These can be overcome by transforming the observed random variables to Gaussian ones, which are called normal scores.
Fusing data from different sources is a field that also benefits from copulae. When combining the results from different classifiers, copulae are used in combining the respective probability scores (see Ozdemir et al., 2018). Different levels of deep neural networks and their corresponding classifiers are a special case of a general classifier. In , the authors fused different classifiers when detecting human activity recognition using vine copulae.

| ESTIMATION METHODS FOR COPULAE
For notation purposes, note that, while X is usually a random variable, x a realization of this random variable X. Under some regularity conditions, the full maximum likelihood (ML) estimator is efficient and asymptotically normal but it takes a lot of time and computing power, especially for higher dimensions. A computationally easier but not efficient estimator is given by the following procedure called inference for margins. First, the parameters of the margins are estimated using ML under the assumption that the random variables are independent and thus that the copula is the product copula. Then, the parameter of the copula is estimated via ML, using the estimated parameters of the marginals distributions.
To estimate with non-parametric marginals, one can use the rescaled empirical distribution function F nj x ð Þ = 1 n + 1 P n i = 1 I x ij ≤ x À Á for j = 1, …, d or kernel-based estimators. Then, the estimator of the copula parameter is obtained by maximizing the likelihood of the copula applied to the estimated marginals.
For a complete nonparametric approach to estimating the copula, the rank-based empirical copula is very often used, particularly for the testing procedures. However, this is piecewise constant and not continuous. To smooth this out, one can use, for example, the empirical beta copula, which replaces the indicator function with the distribution function of the beta distribution. Further details can be found in Segers et al. (2017), which also provides asymptotics of the empirical beta copula and the empirical Bernstein copula. In contrast, the empirical beta copula is a special case of the empirical Bernstein copula, which relies on Bernstein polynomials; see Sancetta and Satchell (2004). When nonparametrically estimating a bivariate AC using Bernstein polynomials, the Kendall distribution function can be estimated using B-splines or Bezier curve; as in Susam and Ucer (2020). Lin et al. (2017) proposed a procedure for nonparametric estimation of multivariate conditional copulae using local linear smoothing in the kernel estimation and then using the Newton-Raphson method to minimize the estimated likelihood. Another approach to estimating the density of copulae, in both the parametric and nonparametric cases, is based on Legendre multiwavelets and was introduced by Chatrabgoun et al. (2017). One of the most practical procedures to estimate a copula is via the method of moments. In the bivariate case, estimating the (usually univariate) parameter is done by matching Kendall's τ; as proposed in Genest and Rivest (1993).

| Estimating hierarchical Archimedean copulae
The problem in estimating HAC comes from two perspectives: first, the bivariate parameters of all of the AC, which can be of the same or different families (e.g., Frank and Clayton), have to be estimated; and second, the dependency structure is not clear. The big focus on research regarding HAC is on structures, where all copulae are of the same family but have different parameters. An estimator for this case was proposed by Okhrin et al. (2013a), using the fact that the dependencies between variables should decrease from the lowest to the highest hierarchical level of the copula structure, meaning that not all different possible structures should be evaluated. This procedure was further generalized by Górecki et al. (2017a) using Kendall's τ for estimating both the structure and the parameters of the copulae simultaneously. Their work includes a proof that the structure of a HAC can be recovered from agglomerative clustering using Kendall's τ. Furthermore, Górecki et al. (2017b) tackle the problem of estimating HAC, which include copulae from different Archimedean families, by establishing an estimator that uses goodness-of-fit test directly in the process of estimation. Under certain regularity conditions, the presented method guarantees a copula. This approach by Górecki et al. (2017b) is based on the estimation of a HAC using only one Archimedean family. The difference is that the bivariate copula is estimated for all admissible families in every step. One then chooses the family which suits best according to a goodness-of-fit test. While presented using the estimation method from Górecki et al. (2017b) for the bivariate copulae, this procedure also works when applied with other estimation methods for the bivariate copulae and the dependency structure. Gunawan et al. (2019) introduce a method to estimate high dimensional AC (not HAC) with discrete margins, which is less computationally complex. This is achieved by using unbiased Bayesian estimators of the likelihood rather than the likelihood itself. A comparison between empirical Bayesian approaches and fully Bayesian approaches, which are computationally more efficient for experiments with mixed outcomes, can be found in Schifano et al. (2020).

| Estimating Archimax copulae
The problem with Archimax copulae in practice is that estimating them is quite difficult. The first results not limited to the two-dimensional or three-dimensional case, when both the Archimedean generator and the tail dependence function belong to parametric Archimedean families, were obtained by Chatelain et al. (2020). The presented method of moments estimation assumes that estimating the Archimedean generator is a parametric problem. This means that only finitely many Archimedean families are considered, while no further information on the tail dependence function is given. Therefore, the parameter of the Archimedean generator first needs to be estimated. Having estimated the parameter of the Archimedean generator, this can be used to estimate the tail dependence function. Actually, the Pickands dependence function determining the tail dependence function is estimated instead of the tail dependence function itself. Therefore, with a random sample from a d-dimensional distribution generated from the Archimax copula applied to the margins andF nj , the empirical distribution function of x 1j , …, x nj and Z is a random variable with survival function ϕ. One then definesû ij =F nj x ij À Á and for every ω in the d-dimensional simplexξ i ω ð Þ = min ϕû i1 ð Þ=ω 1 , …,ϕû id ð Þ=ω d f g . Based on these, the Pickands (1981) estimator and the Capéràa et al. (1997) type estimator are calculated. Hence these estimators do not necessarily provide a Pickands function, they have to be further constrained; as pointed out in Rojo et al. (2001), who also provided some estimators satisfying the convexity condition and demonstrated the strong uniform convergence. Ahmadabadi and Ucer (2017) introduced a new estimator for the Pickands function based on Bernstein copula and kernel regression. However, its inclusion in the Archimax copula estimation has not yet been studied, while in Kiriliouk et al. (2018) the empirical beta copula was used.

| Estimating factor copulae
The demanding part of the estimation of factor copulae is that, in the case of non-Gaussian margins and non-Gaussian factors, an analytical form of the density function is in general not available; and therefore ML-estimation is not possible. To cope with this problem, Oh and Patton (2013) introduced a moment-based estimator, which is under weak regularity conditions consistent and asymptotically normal. However, as mentioned earlier, moment estimators for copulae tend to be highly inefficient. Thus, another estimation method, which is based on the copula representation in Equation (1) and the assumption of a parametric copula density, was proposed by Krupskii and Joe (2013). This estimation works similarly to the general approach on copula estimation by first estimating the margins and then using maximum likelihood to estimate the copula parameters.

| Estimating vine copulae
When estimating vine copulae, similar to HAC, both the copula structure and the resulting density function need to be estimated. The copula structure itself may be estimated using some greedy algorithm, capturing the largest correlations in lower trees, in the spirit of Okhrin et al. (2013b), which suffers from the usual problem that greedy algorithms have. More sophisticated results come from an algorithm proposed in Zhu et al. (2020), which starts with a structure that might be obtained from a greedy or any other algorithm. The algorithm then searches for a regular vine structure that better fits while having at least two common sampling orders with the initial structure. Better results than with the greedy algorithm may be obtained using the procedure proposed in Müller and Czado (2019), which involves structural equation models and the LASSO regularization. In Chang et al. (2019), the tree-like structure of vine copulae is exploited using Monte Carlo tree search to determine the dependency structure. There are several approaches to estimate the density function. The first and most straightforward is the ML estimation, either in one step or two; as discussed in Section 5. However, a semi-parametric approach as introduced in Haff (2013) is more useful. Because the density of a vine copula can be decomposed into a product of conditional bivariate copula densities, the log-likelihood can be decomposed into the sum of logarithms of conditional bivariate densities. By replacing the conditional densities with unconditional ones, one gets a multistep ML procedure by first estimating the parameters in the first level and then using the estimated parameters to estimate the parameters of the copulae in the second level, and so on. This procedure can be further improved by replacing the unconditional densities with conditional densities that only depend on the principal factor and not on all factors, which can be quite a lot in higher dimensions. Consequently, this is a tradeoff between decreased computation time and precision. Details on this procedure and determining the principal factor can be found in Schellhase and Spanhel (2018). Another approach of estimating the conditional densities was developed by Zhang and Beford (2018), where the authors use a mixture of basis functions.

| Estimating and testing time series copulae
When estimating the multivariate distribution in a copula-based time series, one first estimates all the parameters of the univariate series; such as means, variances, and residuals ϵ it . The latter can be estimated using either parametric or nonparametric approaches. In the case of a nonparametric approach, the distribution is mostly assumed to be constant in time and is then estimated using the empirical distribution or kernel; see Fermanian and Scaillet (2003). When performing a fully parametric approach, also assuming the copula to be parametric, ML is the most efficient estimator. However, multistage ML is often not less efficient but is much easier to compute; see Patton (2012).
One of the main advantages when using copulae in time series is that one can choose a semiparametric approach: while estimating the marginals using nonparametric methods, the copula itself can then be estimated parametrically using ML. While standard ML procedure cannot be applied as the likelihood depends on infinite-dimensional parameters, Chen and Fan (2006) showed that, under certain conditions, the asymptotic normal distribution of the estimator is achieved.
When testing whether the stochastic behavior of a set of random variables changes over time, most studies deal with the case where the change in the stochastic behavior happens notably at one specific time point rather than a step by step change. Quessy (2019) introduces a test procedure that allows us to test for rather gradual changes, which is what often happens in reality, in the copula as well as in the marginal distributions. Further information on estimating time series models can be found in Härdle et al. (2013) and.
A completely model-free approach using high-frequency data was proposed by Fengler and Okhrin (2016), where the copula is estimated using the realized covariance matrix computed using high-frequency data and later applied to an autoregressive time series model. The obtained estimator is compared with other more classical estimation methods, such as the ML method, with respect to value at risk.
Vine copulae can also be applied to time series problems. This can be done by choosing time-dependent parameters. Using this to model dependencies leads to dynamic vine copula models, where the bivariate copulae, and also the dependence structure, can change over time; as discussed in Kreuzer and Czado (2019), which also includes a Bayesian approach to estimate parameters. The presented class of vine copulae is a generalization of C-vines and D-vines. A nonparametric approach to estimating dynamical vines is presented in Acar et al. (2019).

| Bivariate copulae and dependence measures
Because copulae model the dependence between random variables, there exists a connection between copulae and dependence measures; such as Kendall's τ, Spearman's ρ, and Blomqvist's β. In contrast to Pearson's correlation coefficient, they do not measure the linear dependence but instead measure monotone dependency, and therefore are perfectly suitable for applications on copulae. All τ, ρ, and β are equal to one or minus one in the case of perfect positive or negative dependence, respectively, and are invariant under strictly increasing transformations.
Let (X 1 , X 2 ) and X 0 1 , X 0 2 À Á be two independent pairs of random variables, which are both distributed via a bivariate distribution function F, and let C be the corresponding copula. Then, Kendall's τ is defined as follows Using this, for many copulae there is a one-to-one relationship between the parameter and Kendall's τ. For example, Kendall's τ of a Gaussian copula equals 2 π arcsin ρ ð Þ. As mentioned in Section 5, this can be used to estimate this parameter by estimating Kendall's τ; for example, viaτ = 4 n n− 1 ð Þ P n − 1 with P n being the number of concordant pairs, see Genest et al. (1995).
Similar results can be obtained for Spearman's ρ, assuming that X 1 is distributed via F 1 and X 2 is distributed via F 2 in addition to the assumptions made for Kendall's τ: Schmid and Schmidt (2007) show that ρ can be expressed via thus Spearman's ρ can be interpreted as the normalized average distance between C and the independence copula Π. However, in contrast to Kendall's τ, in case of an Archimedean copula there is rarely an explicit representation with respect to the generator. Both Kendall's τ and Spearman's ρ can be used for a nonparametric estimation of the copula in case of discrete margins, see Zhang et al. (2020) and Blumentritt and Schmid (2014). The exact region that is determined by Spearman's ρ and Kendall's τ is specified in Schreyer et al. (2017). Blomqvist's β is defined using the medianX 1 of X 1 and the medianX 2 of X 2 Compared with Kendall's τ and Spearman's ρ, its estimation needs less computation time as only medians have to be estimated. Using the facts that P X 1 −X 1 À Á Þis the corresponding survival copula, one can express Blomqvist's β using copula notations in the following way so Blomqvist's β can also be understood as some normalized distance between the copula C and the independence copula Π similar to Spearman's ρ. This definition can be used to extend β to higher dimensions and to other measures of tail dependence by replacing 1 2 with other values between 0 and 1. Further details can be found in Schmid and Schmidt (2006), Genest et al. (2013), and Bukovsek et al. (2019).
The Kullback-Leibler divergence is a measure of how much one distribution differs from another. This generally depends on the marginal distributions and the dependency structure. When expressing the KL divergence for a random vector X, whose distribution can be expressed using the C with density c on the marginals of X, one gets that where, U is a d-dimensional uniform random variable with independent margins. The KL divergence no longer depends on the marginals. Then, as shown in Blumentritt and Schmid (2012), nonparametric estimators for a copula can be based on the KL divergence.

| Goodness-of-fit tests
Several goodness-of-fit tests have been developed to test the quality of the estimated copula's fit to the data. Under the H 0 hypothesis, one assumes that the true copula C belongs to some parametric family. A straightforward way is to measure the difference of the estimated copula from the empirical copula Equation (2), which is a consistent estimator of the true copula (see Fermanian et al., 2004;Gaensler & Stute, 1987). Then, tests can be defined using the empirical copula process The empirical copula process is shown to have smaller variance than the standard empirical process under certain regularity conditions in Genest, Mesfioui, and Nešlehová (2019a). If the marginal distributions are continuous, then the empirical copula process converges weakly to a centered Gaussian process. To extend this to arbitrary margins, a multilinear extension of the empirical copula-the so-called empirical checkerboard copula, see Deheuvels (1979)was introduced. As shown by Genest et al. (2017), for arbitrary marginal distributions, the corresponding checkerboard copula process is asymptotically a centered Gaussian process as long as there exists an open subset of [0, 1] d on which the partial derivatives of the checkerboard copula exist and are continuous. Some of the tests proposed by Fermanian (2005) and Genest and Rémillard (2008) are based on the Cramer-von-Mises or Kolmogorov distances between the estimated parametric and the empirical copula: Since these tests all rely on the empirical copula, one can also construct tests using Kendall's distribution instead (see Genest & Rivest, 1993;Genest et al., 2006;Wang & Wells, 2000). Analogously to the empirical copula process, one defines the empirical Kendall's distribution K n as while Kendall's distribution is the distribution of the copula value C(U 1 , …, U d ) $ K. Thus, Kendall's process is Subsequently, one can measure the deviation of the estimated Kendall's distribution from the empirical one using Cramer-von Mises or Kolmogorov-Smirnov distances. However, the null hypothesis when testing using Kendall's distribution is only equivalent to the initial H 0 when testing a bivariate Archimedean copula. Moreover, the approach using Kendall's distribution can also be used together with Bernstein polynomials and Bezier curves. Here we have the advantage that the power of the test can be adapted by choosing the degree of the Bernstein polynomial or Bezier curves, as shown in Susam and Ucer (2018) and (2020). Another integral transformation based approach relies on the Rosenblatt (1952) transformation of a copula C, which is obtained via the mapping ℜ: (0, 1) d ! (0, 1) d , (u 1 , …, u d ) 7 ! (e 1 , …, e d ) with u 1 = e 1 and e i = C d (u i j u 1 , …, u i − 1 ), where C d (u i j u 1 , …, u i − 1 ) = P(U i ≤ u i j U 1 = u 1 , …, U i − 1 = u i − 1 ) with U j being iid uniform [0, 1] random variables. Then, the initial H 0 is equivalent to e i being independent of each other. Tests using this transformation rely on Anderson-Darling test statistics (Breymann et al. (2003)) or deviations between estimated density functions, Chen et al. (2004). A substantial improvement regarding the power of those tests can be achieved by using the copula of the transformed data, see Genest et al. (2009). One then uses the empirical distribution function D n (u) of the e i and afterward computes the Cramer-von Mises distance S n = n ð 0,1 ½ d D n u 1 ,…, u d ð Þ−Π u 1 , …, u d ð Þ f g du 1 …du d : Further goodness-of-fit tests can be designed using kernel density estimators (see Scaillet, 2007) or the comparison between two-step pseudo maximum likelihood with the delete-one block pseudo maximum likelihood (see Zhang et al., 2016). The asymptotic distributions of most of those tests are not given in explicit form because p-values can be obtained using parametric bootstrap methods (see Okhrin et al., 2017). Some further interesting testing procedures can be sketched as follows: (a) When looking at the experiment with two binary outcomes, a procedure to maximize the weighted probabilities of the different results (e.g., being healed and not poisoned by a drug) under different copulae models is introduced in Deldossi et al. (2019). Because this maximization may depend on the copula model, it is also important to consider which copula fits the data better and is therefore more likely to be the real copula (and how much more likely); (b) To test whether a copula is symmetric, either in the sense of exchangeability (C(U, V) = C(V, U) in distribution) or radial symmetry C u,v ð Þ= C u, v ð Þ ð Þ , Beare and Seo (2020) introduced test statistics to check these hypotheses; (c) To test for independence, rank-based test statistics can be used in any dimension and with any marginal distributions; see Genest, Nešlehová, et al. (2019b); (d) As shown by Bianchi et al. (2020), copulae can be used to test for conditional independence, and the presented methods have power comparable to current state-of-the-art methods; (e) Copulae may also be useful when testing several hypothesizes simultaneously. However, the question of whether or not copulae-based tests are in general superior to classical multiple tests such as Bonferroni has yet to be examined. An introduction and a comparison for some Archimedean cases can be found in Neumann and Dickhaus (2020).

| CONCLUSION
Copula theory and its applications is an important topic and it has many advantages for practitioners in almost every discipline. Even though many results already exist, there is still a lot of work to be done, especially when estimating higher dimensional dependencies and establishing new copulae.

ACKNOWLEDGMENT
Open access funding enabled and organized by Projekt DEAL.

CONFLICT OF INTEREST
The authors have declared no conflicts of interest for this article.