Quantifying directed dependence via dimension reduction

Studying the multivariate extension of copula correlation yields a dimension reduction principle, which turns out to be strongly related with the `simple measure of conditional dependence' $T$ recently introduced by Azadkia&Chatterjee (2021). In the present paper, we identify and investigate the dependence structure underlying this dimension-reduction principle, provide a strongly consistent estimator for it, and demonstrate its broad applicability. For that purpose, we define a bivariate copula capturing the scale-invariant extent of dependence of an endogenous random variable $Y$ on a set of $d \geq 1$ exogenous random variables ${\bf X} = (X_1, \dots, X_d)$, and containing the information whether $Y$ is completely dependent on ${\bf X}$, and whether $Y$ and ${\bf X}$ are independent. The dimension reduction principle becomes apparent insofar as the introduced bivariate copula can be viewed as the distribution function of two random variables $Y$ and $Y^\prime$ sharing the same conditional distribution and being conditionally independent given ${\bf X}$. Evaluating this copula uniformly along the diagonal, i.e. calculating Spearman's footrule, leads to Azadkia and Chatterjee's `simple measure of conditional dependence' $T$. On the other hand, evaluating this copula uniformly over the unit square, i.e. calculating Spearman's rho, leads to a distribution-free coefficient of determination (a.k.a. copula correlation). Several real data examples illustrate the importance of the introduced methodology.


Introduction
Detecting statistical association among several random variables is an ubiquitous task. Measures of association capture the many facets of dependence relationships -from classical linear correlation coefficients to indices for detecting monotone associations, tail dependence, asymmetric dependence etc. Most concepts of statistical association, however, consider association as an undirected property, i.e., the association among the variables remains unchanged when permuting the variables. This includes Pearson's correlation coefficient, measures of concordance (see, e.g., [8,14,20,24,29,48,54]), distance covariance and distance multivariance (see, e.g., [5,53]), the maximal information coefficient (see, e.g., [47]), the randomized dependence coefficient (see, e.g., [40]), coefficients of tail dependence (see, e.g., [30]), and various measures of asymmetry (see, e.g., [27,33,45]). In manifold situations one variable may have a stronger influence on another variable than vice versa. This is why quantifying the degree of predictability or explainability of an endogenous random variable Y using the information contained in a set of d ≥ 1 exogenous random variables X = (X 1 , . . . , X d ) requires measures of directed dependence. An index measuring such a degree should be capable of detecting perfect dependence also known as complete dependence (see [25,38,55]): Y is said to be completely dependent on X if there exists a measurable function f such that Y = f (X) almost surely.
Quite recently, Azadkia and Chatterjee [2] introduced their so-called 'simple measure of conditional dependence' T given by which is based on [7,12], quantifies the scale-invariant extent of dependence of Y on X, and attracted a lot of attention in the past two years; see, e.g., [1,26,6,4,11,28,49,51]. As T equals 1 if and only if Y is completely dependent on X, and T is 0 if and only if Y and X are independent, T belongs to a class of indices capable of detecting complete dependence and independence which also include [17,25,31,55]. In order to estimate T , Azadkia and Chatterjee developed a nearest neighbour based estimation procedure that allows for a dimension reduction.
In the present paper we identify and investigate the bivariate dependence structure underlying this dimension-reducing estimation principle. We therefore extend the (d + 1)-dimensional random vector (X, Y ) with continuous marginal distribution functions and connecting copula A and consider the (d + 2)-dimensional random vector (X, Y, Y ) with Y and Y sharing the same conditional distribution and being conditionally independent given X, so that the joint distribution function of (Y, Y ) fulfills The copula of (Y, Y ) we then identify as the outcome of the dimension-reducing transformation ψ given by where G denotes the distribution function of Y . The dimension reduction principle becomes apparent insofar as ψ transforms the (d + 1)-dimensional copula A (corresponding to (X, Y )) to the bivariate copula ψ(A) (corresponding to (Y, Y )) but preserves the key information about the directed dependence of the variables Y and X, i.e., the bivariate copula ψ(A) itself contains the information (i) whether Y is completely dependent on X, and (ii) whether Y and X are independent. It is straightforward to show that the index T possesses a representation in terms of the well-known dependence measure Spearman's footrule (see, e.g., [21,23,44]) Thus, the initial (d + 1)-dimensional problem of quantifying the scale-invariant extent of dependence of Y on X reduces to a 2-dimensional one. Considering T as a map of ψ(A) suggests evaluating ψ(A) also via other measures of bivariate dependence, leading to new indices with which independence, complete dependence and also other facets of dependence between Y and X can be quantified: For instance, evaluating ψ(A) via 1. Spearman's rank correlation (Spearman's rho) leads to a distribution-free coefficient of determination which provides a benchmark for the proportion of variance that can be explained in a copula-based (regression) model.

2.
Gini's gamma leads to a measure of 'indifference' being able to detect whether the dependence structures of (X, Y ) and (X, −Y ) coincide, a kind of 'lack of association' property.
For either of these resulting dependence measures, we investigate their ability to identify certain facets of dependence between Y and X, show their invariance with respect to a variety of transformations of the random variables X 1 , . . . , X d , and verify the so-called information gain (in-)equality. Even though we restrict our investigation to the measures of concordance Spearman's footrule, Spearman's rho and Gini's gamma, all leading to meaningful measures of dependence, ψ(A) can in principle be applied to any bivariate measure of concordance / dependence. For the bivariate copula ψ(A) we propose an estimator whose form is reminiscent of the empirical copula, but which is actually based on the graph-based estimator of T developed by Azadkia and Chatterjee [2]. By applying the tools given in [2] we show that the copula estimator is strongly consistent from which strong consistency of the plug-in estimators of Spearman's footrule (coinciding with T ), Spearman's rho (coinciding with the distribution-free R 2 ) and Gini's gamma (coinciding with the measure of 'indifference') can be derived.
The rest of this contribution is organized as follows: In Section 2, we formally define the dimensionreducing transformation ψ mapping every (d + 1)-dimensional copula A to an exchangeable bivariate one, and show that the resulting bivariate copula ψ(A) captures independence and complete dependence, is invariant with respect to a variety of transformations of the original dependence structure A and satisfies the so-called information gain inequality. In Section 3 we then apply the resulting copula ψ(A) to (the measures of bivariate dependence) Spearman's footrule, Spearman's rho and Gini's gamma and demonstrate their ability to identify certain facets of dependence. Section 4 is devoted to the estimation of ψ(A). We prove strong consistency of the proposed estimator and illustrate its small/moderate sample performance. Finally, the potential and importance of the dimension-reducing methodology is illustrated by analyzing several real data examples regarding feature selection, copula-based regression analysis and the detection of reflection invariant dependence structures (Section 5). The proofs can be found in Section 6.
Throughout this paper we will write I := [0, 1] and let d ≥ 1 denote an integer which will be kept fixed. Bold symbols will be used to denote vectors, e.g., x = (x 1 , . . . , x d ) ∈ R d . The d-dimensional Lebesgue measure will be denoted by λ d , in case of d = 1 we simply write λ. We will let C d+1 denote the family of all (d+1)-dimensional copulas, M will denote the comonotonicity copula, Π the independence copula and, for d = 1, W will denote the countermonotonicity copula (we omit the index indicating the dimension since no confusion will arise). For every A ∈ C d+1 the corresponding probability measure will be denoted by µ A , i.e., µ A ([0, u] × [0, v]) = A(u, v) for all (u, v) ∈ I d × I; for more background on copulas and copula measures we refer to [16,44]. For every metric space (Ω, δ) the Borel σ-field on Ω will be denoted by B(Ω). For a copula A ∈ C d+1 we denote by A L the |L|-dimensional marginal of A with respect to the coordinates in L ⊆ {1, . . . , d + 1}, and for the l-dimensional marginal of A (l ≥ 2) with respect to the first l ∈ {1, . . . , d} coordinates we simply write A 1:l := A {1,...,l} .

One bivariate copula to capture it all
Consider a (d+1)-dimensional random vector (X, Y ) with continuous marginal distribution functions F i of X i , i ∈ {1, . . . , d}, and G of Y and connecting copula A. Then (denoting In what follows, we construct an exchangeable bivariate copula capturing the scale-invariant extent of dependence of Y on the random vector X in the sense that it allows to detect (see Theorem 2.2 below) (i) whether Y and X (or, equivalently, V and U) are independent.
(ii) whether Y is completely dependent on X (or, equivalently, V is completely dependent on U).
To this end, we extend the random vector (U, V ) and consider the (d + 2)-dimensional random vector (U, V, V ) with V and V sharing the same conditional distribution and being conditionally independent given U. Then, using disintegration, the distribution function of (V, V ) is a copula and can be expressed as for all (s, t) ∈ I 2 where µ A 1:d denotes the copula measure of the d-dimensional marginal copula A 1:d and K A : I d × B(I) → I denotes the Markov kernel of A ∈ C d+1 (with respect to the first d coordinates) 1 . The map ψ : C d+1 → C 2 given by then transforms every (d + 1)-dimensional copula to an exchangeable bivariate copula (see Figures 1  and 3 for an illustration), but preserves the key information about the directed dependence of the variables involved.
2.1 Remark. The copula ψ(A) may be interpreted as a generalization of the well-known bivariate Markov product of copulas: For two copulas B, C ∈ C 2 the Markov product B * C : I 2 → I given by is a copula (see [10,56] and [16,Chapter 5]); here, B t denotes the transpose of B. In the bivariate case, i.e. for d = 1, we therefore have ψ(A) = A t * A. Notice that, in this case, the fixed points of ψ are exactly those copulas that are idempotent with respect to the Markov product (i.e. those copulas A satisfying A * A = A).
In the remainder of this section, we present several key properties of the dimension-reducing transformation ψ, and we provide explicit formulas for ψ(A) in the case A belongs to certain copula families. 1 According to [16,Theorem 3.4.3] and due to disintegration, every copula A fulfills where KA is (a version of) the Markov kernel of A: A Markov kernel from I d to B(I) is a mapping K : I d × B(I) → I such that for every fixed F ∈ B(I) the mapping u → K(u, F ) is measurable and for every fixed u ∈ I d the mapping F → K(u, F ) is a probability measure. Given a random vector U with uniformly distributed univariate marginals and a uniformly distributed random variable V on a probability space (Ω, A, P ) we say that a Markov kernel K is a regular conditional distribution of V given U if K(U(ω), F ) = P (V ∈ F | U)(ω) holds P -almost surely for every F ∈ B(I). It is well-known that for each such random vector (U, V ) a regular conditional distribution K(., .) of V given U always exists and is unique for P U -a.e. u ∈ I d , where P U denotes the push-forward of P under U. For more background on conditional expectation and general disintegration we refer to [32,37]; for more information on Markov kernels in the context of copulas we refer to [16,36,43].
The following theorem shows that the copula ψ(A) characterizes complete dependence 2 of Y on X and independence of Y and X; notice that Theorem 2.2 generalizes Theorem 11.1 in Darsow et al. [10]. For a copula C ∈ C 2 , we denote by δ C its diagonal, i.e. δ C (t) := C(t, t) for all t ∈ I.

Theorem.
Consider a (d + 1)-dimensional random vector (X, Y ) with continuous marginals and connecting copula A.
1. The following statements are equivalent: (a) Y is completely dependent on X.
2. The following statements are equivalent: (a) Y and X are independent.
We now apply the transformation ψ to some well-known parametric copula families: the class of equicorrelated Gaussian copulas, the class of Marshall-Olkin extreme-value copulas, the class of Fréchet copulas and the class of EFGM copula. It turns out that ψ transforms every (d + 1)dimensional equicorrelated Gaussian copula to a bivariate Gaussian copula where the correlation parameter now depends on the dimension of the conditioning random vector X. We further show that also the class of Marshall-Olkin copulas, the class of Fréchet copulas and the class of EFGM copulas are to some extent closed with respect to ψ.

Corollary. ψ(A) satisfies
Notice that, although the Bertino copula B with diagonal t → Π(t, t) serves as a lower bound of ψ(C d+1 ), it is not a member of this class as shown in Subsection 3.3.

Example. According to Example 2.3, the copula
Due to Example 2.5, the copula ψ(A) fails to be stochastically increasing, left tail decreasing (LTD) and positive quadrant dependent (PQD), in general; see [44] for more information on these dependence properties.
The values of ψ(A) outside the diagonal are bounded from above by the values along the diagonal; the result is immediate from Hölder's inequality:

Corollary.
For The copula ψ(A) satisfies the so-called information gain inequality (see, e.g., [25]) along the diagonal, i.e. the more conditioning variables are involved, the larger the value of ψ(A) along the diagonal.

Theorem. Consider some
holds for all t ∈ I. In particular, the inequality holds for all l ∈ {1, . . . , d} and all t ∈ I.
In general, the information gain inequality does not hold outside the diagonal.

Example.
Consider the copula A : . Then A 13 = Π, hence ψ(A 13 ) = Π, and, according to Example 2.3, for all (s, t) ∈ I 2 . In accordance with Theorem 2.7 and Corollary 2.4, we obtain ψ( Looking at Ineq. (3) from the perspective of a possible dimension reduction, the question arises under which conditions on A the information gain inequality (after a certain step) becomes an equality, i.e., no information is added by considering additional explanatory variables. The next result solves this question; Theorem 2.9 is immediate from Lemma 6.1 in Section 6.

Remark.
1. In view of the hierarchical feature selection performed in Subsection 5.1, Theorem 2.9 is of particular interest, as it allows to derive additional information about the underlying conditional (in-)dependence structure of the involved random variables when no improvement in predictability or explainability is seen after a certain step.
2. Notice that, if Y and the random vector (X 2 , . . . , X d ) are conditionally independent given X 1 , the information gain inequality (3) becomes an equality. This assumption is even weaker than the well-known so-called conditional independence assumption (see, e.g., [3,25]), requiring that the random variables X 2 , . . . , X d , Y are conditionally independent given X 1 .
The next result shows that the copula ψ(A) is invariant under a variety of measurable and bijective transformations of the first d coordinates of A; this includes permutations and reflections of copulas 4 :

Corollary.
For A ∈ C d+1 , consider the identity map id : I → I and some measurable bijective transformation ζ :

Remark.
In terms of a random vector (X, Y ) with continuous marginals and connecting copula A, Corollary 2.11 implies that the transformation ψ is invariant 1. with respect to permutations of X, and 2. with respect to coordinatewise continuous and strictly increasing (or decreasing) transformations of X.
Note that ψ is also invariant with respect to the linkage transformation of (X, Y ) (considered, e.g., in [39,25]) which allows to transform the random vector (X, Y ) to a random vector (U, V ) with uniform univariate marginals such that U 1 , . . . , U d are independent.

Remark.
If A is absolutely continuous with Lebesgue density a, then ψ(A) is absolutely continuous as well and satisfies For an illustration, consider the checkerboard approximation of copula A ∈ C d+1 : For N ∈ N and Then the checkerboard approxmation CB N (A) of A with resolution N is given by Thus, ψ maps every ((d + 1)-dimensional) checkerboard copula to a bivariate checkerboard copula with the same resolution.
In the context of Markov products, it has been recognized in [16, Theorem 5.2.10] that, for d = 1, the map A → A t * A fails to be continuous with respect to the topology of uniform convergence and hence uniform convergence of a sequence of copulas (A n ) n∈N to A does not automatically imply uniform convergence of the sequence (ψ(A n )) n∈N to ψ(A) (see also [50,Theorem 6]).

Applications
In the present section, three applications of the dimension reduction principle induced by ψ are discussed by applying the bivariate copula ψ(A) to existing measures of bivariate dependence. Evaluating ψ(A) via 1. Spearman's footrule leads to the so-called 'simple measure of conditional dependence' recently introduced by Azadkia and Chatterjee [2] (Subsection 3.1).
3. Gini's gamma leads to a measure of 'indifference' being able to detect whether the dependence structures of (X, Y ) and (X, −Y ) coincide, a kind of 'lack of association' property (Subsection 3.3).
For either of these resulting measures, we investigate their ability to identify certain facets of dependence between Y and X, show their invariance with respect to a variety of transformations of the random variables X 1 , . . . , X d , and verify the information gain (in-)equality.

A simple measure of conditional dependence
Considering random vectors (X, Y ) with continuous marginals, the 'simple measure of conditional dependence' T defined in (1) possesses an alternative representation in terms of the well-known dependence measure Spearman's footrule: 3.1 Theorem. Consider a (d + 1)-dimensional random vector (X, Y ) with continous marginals and connecting copula A. Then T fulfills where φ : C 2 → R denotes Spearman's footrule given by φ(C) = 6 I C(t, t) dλ(t) − 2. Thus, adopting the notation used in (2), T fulfills with Y and Y sharing the same conditional distribution and being conditionally independent given X.
The following properties of T can be derived from Theorem 3.1 and the results in Section 2 (Theorem 2.2, Corollary 2.11, Theorem 2.7, and Theorem 2.9): 3.2 Corollary. Consider a (d+1)-dimensional random vector (X, Y ) with continuous marginals and connecting copula A. Then the following properties hold:

T (Y |X) is invariant with respect to continuous and strictly monotone transformations of
5. T fulfills the information gain inequality, i.e.
In Subsection 5.1 we apply the 'simple measure of conditional dependence' T to perform a hierarchical feature selection for analyzing the influence of a set of thermal variables on annual precipitation in a global climate data set.

A nonparametric coefficient of determination
Let (X, Y ) be a (d + 1)-dimensional random vector such that Y ∈ L 2 . It is well known that the variance var(Y ) of Y can be decomposed via where var(E(Y | X)) equals the part of the variance of Y explained by the regression function r(x) := E(Y | X = x), and (also known as Sobol index; see [22]) denotes the proportion of the variance that is explained by the regression function r.

Theorem. Consider a
where P denotes Pearson correlation coefficient, and Y and Y are such that they share the same conditional distribution and are conditionally independent given X.
Again, assume that the random vector (X, Y ) has continuous marginals F i of X i , i ∈ {1, . . . , d}, and G of Y and connecting copula A. Then (denoting U i := F i (X i ), i ∈ {1, . . . , d}, and V := G(Y )) the distribution-free R 2 (V |U) possesses a representation in terms of the well-known dependence measure Spearman's rho; the result is immediate from Theorem 3.4:

Corollary.
Consider a (d+1)-dimensional random vector (X, Y ) with continuous marginals and connecting copula A. Then R 2 (V |U) fulfills where S : C 2 → R denotes Spearman's rank correlation coefficient (a.k.a. Spearman's rho) given by For the bivariate case, i.e. d = 1, a copula-based representation of R 2 (V |U ) can already be found in Sungur [52]. Shih and Emura [50,Theorem 1] have further recognized that the copula correlation R 2 (V |U ) can be expressed as Spearman's rho of the Markov product A t * A, which in this case coincides with ψ(A). Again, adopting the notation used in (2), R 2 fulfills with Y and Y sharing the same distribution and being conditionally independent given X.
The following properties of R 2 (V |U) can be derived from Corollary 3.5 and the results in Section 2 (Theorem 2.2, Corollary 2.11, and Theorem 2.9). The information gain inequality (i.e. reducing the number of conditioning variables reduces R 2 ) does not follow from Theorem 2.7 but from Hilbert's projection theorem.
For d = 1, properties (1) and (2) in Corollary 3.6 are given in [50,Proposition 2]. Notice that R 2 (Y |X) and its distribution-free version R 2 (V |U) may differ: Then E(V |U = u) = 1 2 for λ-almost all u ∈ I and by Corollary 3.6 we obtain R 2 (V |U ) = 0. Now, consider the random vector (X, Y ) := (U, V 2 ). Since the map v → v 2 is continuous and strictly increasing on I, the random vectors (X, Y ) and (U, V ) share the same copula and hence (F (X), G(Y )) ∼ (U, V ). In addition, For the copula families discussed in Example 2.3, R 2 can be calculated explicitly.
The distribution-free coefficient of determination provides a benchmark for the proportion of variance that can be explained by a copula-based (regression) model. In Subsection 5.2 we illustrate how this information can be used to judge the appropriateness of a selected copula family in the semiparametric copula-based regression model introduced by Noh et al. [46].

A measure of 'indifference' or 'reflection invariance'
We now evaluate ψ(A) via Gini's gamma which leads to a measure of 'indifference' being able to detect whether the dependence structures of (X, Y ) and (X, −Y ) coincide 5 , a kind of 'lack of association' property 6 that is fulfilled in case X and Y are independent. Notice that the copulas of (X, Y ) and (X, −Y ) coincide if and only if A(u, v) = A(u, 1) − A(u, 1 − v) for all (u, v) ∈ I d+1 (see, e.g., [15]), which is equivalent to We define the map Q by letting where γ : C 2 → R denotes Gini's gamma given by γ(C) The following properties of Q(Y |X) can be derived from Eq. (5) and the results in Section 2 (Theorem 2.2, Corollary 2.11). 3. Q(Y |X) is invariant with respect to permutations of X 1 , . . . , X d .

Q(Y |X) is invariant with respect to continuous and strictly monotone transformations of
For the copula families discussed in Example 2.3, Q can be calculated explicitly.
Notice that the Bertino copula B with diagonal t → Π(t, t) discussed in Corollary 2.4 fulfills γ(B) = − 1 3 < 0 implying B / ∈ ψ(C d+1 ). In Subsection 5.3, we apply the introduced measure of 'indifference' to detect a reflection invariant dependence structure in the fuel spray data set discussed in [9].

Estimation
We propose an estimator for ψ(A) whose form is reminiscent of the empirical copula, but which is actually based on the graph-based estimation procedure developed by Azadkia and Chatterjee [2]. We show that this copula estimator is strongly consistent from which strong consistency of the plug-in estimators of Spearman's footrule, Spearman's rho and Gini's gamma can be derived. To this end, we consider a (d + 1)-dimensional random vector (X, Y ) with continuous univariate marginal distribution functions F i of X i , i ∈ {1, . . . , d}, and G of Y and connecting copula A. Further, let (X 1 , Y 1 ), . . . , (X n , Y n ) be i.i.d. copies of (X, Y ). Since the univariate marginals are continuous ties only occur with probability 0. For each i, we denote by N (i) the index j such that X j is the nearest neighbour of X i with respect to the Euclidean metric on R d . Since there may exist several nearest neighbours of X i ties are broken at random. For (s, t) ∈ I 2 , we define where G * n denotes a renormalized version of the empirical distribution function of Y 1 , . . . , Y n , i.e. G * n (y) = 1 n+1 n k=1 1 (−∞,y] (Y k ) and note that By adapting the ideas developed in [2], we prove consistency of our estimator (6); the proof of the main Theorem 4.1 can be found in Subsection 6.3.
In particular, in the bivariate case, i.e. when d = 1, the estimator D n is a strongly consistent estimator for the Markov product A t * A.

Simulation study
We illustrate the small and moderate sample performance of our estimator D n for two dependence structures discussed in Example 2.3: (1) the equicorrelated Gaussian copula for d = 1 with correlation parameter ρ = 0.6, and (2) the Marshall Olkin copula with parameter vector (α, β) = (1, 0.4). We mainly restrict ourselves to the case d = 1 in order to be able to interpret the results obtained also as estimators for the Markov product A t * A.
If X and Y have Gaussian copula A with parameter ρ = 0.6 as connecting copula, by Example 2.3, the copula ψ(A) is Gaussian as well with parameter 0.36. To test the performance of our estimator D n in this setting, we generated samples of size n ∈ {20, 50, 100, 200, 500, 1.000, 5.000, 10.000} and calculated D n . These steps were repeated R = 1.000 times. Figure 5 depicts the d ∞ -distance between our estimate and the true copula evaluated on a grid of size 50.
If   times. Figure 6 depicts the d ∞ -distance between our estimate and the true copula evaluated on a grid of size 50.
As can be seen from Figures 5 and 6, the copula estimate converges rather fast to the true copula.

Applications
Finally, we plug our consistent estimator (6) into the functionals Spearman's footrule, Spearman's rho and Gini's gamma (discussed in Section 3), which then leads to consistent estimators for the maps T , R 2 and Q.

A simple measure of conditional dependence
As estimator for T Akzadia & Chatterjee [2] propose to use the statistic Y 1 , . . . , Y n , i.e. the number of j such that Y j ≤ Y i , and L i denotes the number of j such that Y j ≥ Y i (see also [26]). Straightforward calculation yields In view of Theorem 3.1, it is worth to note that T n equals with φ(D n ) being the plug-in estimator of Spearman's footrule (see [21,23,44]) and D n being the consistent estimator for ψ(A) given in (6). Notice that there may exist more than one index i such that X j is a nearest neighbour of X i implying that n i=1 R N (i) may fail to equal n(n+1) 2 . It has been proven in [2, Theorem 2.2] that T n (Y |X) is a strongly consistent estimator for T (Y |X). In [49] the authors showed asymptotic normality of √ nT n under independence and for some regularity conditions; for a comprehensive summary of properties for T n we refer to Han [26] and the references therein.

A nonparametric coefficient of determination
Motivated by Corollary 3.5, as estimator for R 2 (V |U) we propose to use the statistic which equals the plug-in estimator ρ S (D n ) of Spearman's rho (see [44]) with D n being the consistent estimator for ψ(A) given in (6). By Theorem 4.1, the estimator R 2 n (V |U) is a strongly consistent estimator for R 2 (V |U). For the bivariate case, i.e. d = 1, Gamboa et al. [22] introduced an estimator for R 2 (Y |X) based on Pearson's correlation coefficient using the technique developed by Chatterjee [7].

A measure of "indifference" or "reflection invariance"
Motivated by (5), as estimator for Q(Y |X) we propose to use the statistic which equals the plug-in estimator γ S (D n ) of Gini's gamma (see [44]) with D n being the consistent estimator for ψ(A) given in (6). By Theorem 4.1, the estimator Q n is a strongly consistent estimator for Q.

Real data example
Finally, we illustrate the potential and importance of the functionals discussed in Section 3 by analyzing several real data examples. In Subsection 5.1, by applying the coefficient T we perform a feature selection for a data set of bioclimatic variables and at the same time determine what proportion of the variance the selected variables are capable of explaining. In Subsection 5.2, we then point out how the distribution-free coefficient of determination can be used for judging the appropriateness of a selected copula-based (regression) model, and to which extent it can assist in choosing the best copula family out of a set of candidate families. We conclude this section by demonstrating how the measure of 'indifference' can help identifying reflection invariant dependence structures in a fuel spray data set (Subsection 5.3).

Analysis of global climate data: Feature selection
We consider a data set of bioclimatic variables for n = 1862 locations homogeneously distributed over the global landmass from CHELSEA ( [34,35]) and want to analyze the influence of a set of thermal variables on Annual Precipitation (AP). For this purpose, by applying the coefficient T we perform an hierarchical feature selection and identify those variables that best predict AP (= variable Y). Figure 7 depicts the order of the hierarchically selected variables based on the estimated value for T . There, the value in line k indicates the estimated value for T (Y |(X 1 , . . . , X k )) where X 1 , . . . , X k are the variables in lines 1 to k. As an additional feature, the last column in Figure 7 contains the estimated values for the distribution-free coefficient of determination R 2 in this model and hence provides a benchmark for the proportion of variance that can be explained by a copula-based model. The increase in the estimated values for T and R 2 with increasing number of explanatory variables is in accordance with the information gain inequalities discussed in Corollary 3.2 and Corollary 3.6. : Results of the hierarchical feature selection based on the coefficient T to identify those variables that best predict AP; the last column contains the estimated value for R 2 in this model. Figure 7 indicates that it is sufficient to use only a selected number of variables (here 3 or 4) to build a model, as there is no improvement in the explained variance above this threshold. It is also interesting to see in Figure 8 that beyond a certain number of variables involved, not only do T and R 2 remain almost constant, but also the underlying dependence structure for measuring T and R no longer changes.

Copula-based regression: Explained variance
We now illustrates how the distribution-free coefficient of determination discussed in Subsection 3.2 can be used for judging the appropriateness of a selected copula-based regression model. To this end, let us consider the semiparametric copula-based regression estimator introduced by Noh et al. [46]. For a random vector (X, Y ) with continuous univariate marginals and connecting copula A, the authors showed that the mean regression function r(x) = E(Y | X = x) can be written as where a denotes the density of copula A (also see [13]). Concerning the estimation of r, Noh et al. [46] suggest a semiparametric approach in which the marginal distribution functions are estimated nonparametrically and the copula is estimated parametrically from a given copula family. The authors showed that this regression estimator is asymptotically normal if the parametric copula has been selected correctly. However, as pointed out by Dette et al. [13] "the quality of the estimate under misspecification of the parametric copula depends heavily on the specific structure of the unknown regression function". Dette et al. [13] underpinned their statement by considering the simple univariate (and nonmonotone) regression model with X i being uniformly distributed on [0, 1] and ε i being normally distributed with mean 0 and variance σ 2 = 0.01, i ∈ {1, . . . , n}. They showed that no copula from standard copula classes "reproduces the structure of the regression function in the resulting estimate" which is due to the fact that "none of the available parametric copula models" for the vector (X, Y ) "yields a nonmonotone regression function". Indeed, although the estimated value for the distribution-free coefficient of determination R 2 equals 0.964, indicating the existence of a suitable copula-based model, the explained variance when estimating from the family of t copulas and the family of Clayton copulas is less than 0.01 (t: 0.0046, Clayton: 0.0001) as illustrated in Figure 9. In a second step, we illustrate the extent to which the distribution-free coefficient of determination R 2 can assist in choosing the best copula family out of a set of candidate families. Therefore, let us consider the data set faithful provided in the R package datasets. The data set contains n = 272 observations of the waiting times between eruptions (variable waiting) and the duration of the eruption (variable eruptions) for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA. By applying the above-mentioned semiparametric copula-based regression method, we want to estimate a regression function describing the duration of the eruptions by the waiting times between eruptions, choosing from the following three parametric copula families: 'Gaussian', 'Clayton' and 'Joe'. Calculating the explained variance of the different copula models (Gaussian: 0.582, Clayton: 0.633, Joe: 0.494) indicates that the Clayton family-based regression model performs better than its two competitors; Figure 10 depicts the regression estimates of data set faithful when estimating from the Gaussian, Clayton, and Joe family of copulas. However, comparing these values with the estimated value 0.715 for the distribution-free coefficient of determination, which serves as a benchmark for the proportion of variance that can be explained by a copula-based regression model, it becomes apparent that the Clayton model may not be the optimal choice for this data set.

Detecting reflection invariant dependence structures
We finally demonstrate how the measure of 'indifference' introduced in Subsection 3.3 can be used to detect reflection invariant dependence structures. For this purpose, we consider the fuel spray data set discussed in [9], which describes the behaviour of the fuel spray droplets for a specific jet engine operating condition. According to [9], in "jet engines the fuel is typically injected by so-called prefilming airblast atomizers" 7 , and the behaviour of the droplets is modeled using the variables drop size, x-position, y-position, x-velocity, and y-velocity. Interestingly, the dependence structure of variables x-velocity and y-position, as depicted in Figure  11 (left panel), exhibits a reflection invariant dependence structure with respect to the variable y-position, i.e. for a given velocity of the droplets along the x-coordinate, the droplets distribute symmetrically along the y-coordinate, a behaviour that is highly desirable for the combustion process. The reflection invariance of the dependence structure A between x-velocity and y-position can be easily detected from the estimated values for T and Q. Since Q n = 0.0021 and T n = 0.0589, one may conclude that the dependence structure is reflection invariant with respect to y-position, but the two variables are not independent; the latter can also be deduced from Theorem 2.2 and the fact that the estimate for ψ(A) does not coincide with the independence copula (as seen in the right panel of Figure 11).
Proof. Fix k ∈ {1, . . . , d − 1}. The equivalence of (a) and (b) follows from the identities for all (s, t) ∈ I 2 . This proves (c) and the additional result. Finally, assume that (c) holds. Then, we have ψ(A 1:k,d+1 )(t, t) = ψ(A)(t, t) and hence 0 = for all t ∈ I. Since, for every t ∈ I, e. u ∈ I k , and hence we finally obtain 0 = I d K A ((u, v) which proves the result.

Proof. (of Theorem 3.4)
Since This proves the result.
Proof. (of Corollary 3.6) It remains to prove property (2) which is immediate from the identity R 2 (V |U) = 12 This proves the assertion.
Proof. (of Theorem 3.9) It remains to prove the identity which is immediate from This proves the result.

Proofs of Subsection 4
For proving consistency of (6) we use a modification of D n for which we choose the usual normalization of the ranks. For (s, t) ∈ I 2 , define C n (s, t) := 1 n n k=1 1 [0,s] (G n (Y k ))1 [0,t] (G n (Y N (k) )) (8) where G n denotes the empirical distribution function of Y 1 , . . . , Y n , i.e. G n (y) = 1 n n k=1 1 (−∞,y] (Y k ). Due to Lemma 6.2 below, the estimators C n and D n are asymptotically equivalent (see also (9) below).
For a realization X 1 , . . . , X n and each i ∈ {1, . . . , n}, let K n,i be the number of j such that X i is the nearest neighbour of X j . The following result is given in [2]: 6.2 Lemma. [2,Lemma 11.4] There exists a constant c(d) such that K n,1 ≤ c(d).
Notice that the upper bound c(d) used in Lemma 6.2 only depends on dimension d.
The following two results are key for proving consistency of (8): 6.3 Lemma. For every (s, t) ∈ I 2 we have lim n→∞ E(C n (s, t)) = ψ(A)(s, t).
The proof of the following lemma is similar to that of [2, Lemma 11.9]: 6.4 Lemma. There are positive constants M 1 and M 2 depending only on dimension d such that for any n ∈ N and any η ∈ (0, ∞) P ({|C n (s, t) − E(C n (s, t))| ≥ η}) ≤ M 1 exp(−M 2 nη 2 ) for every (s, t) ∈ I 2 .