Marčenko-Pastur law for Kendall ’ s tau

We prove that Kendall’s Rank correlation matrix converges to the Marčenko Pastur law, under the assumption that observations are i.i.d random vectors X1, . . . , Xn with components that are independent and absolutely continuous with respect to the Lebesgue measure. This is the first result on the empirical spectral distribution of a multivariate U -statistic.


Introduction
Estimating the association between two random variables X, Y ∈ IR is a central statistical problem.As such many methods have been proposed, most notably Pearson's correlation coefficient.While this measure of association is well suited to the Gaussian case, it may be inaccurate in other cases.This observation has led statisticians to consider other measures of associations such as Spearman's ρ and Kendall's τ that can be proved to be more robust to heavy-tailed distributions (see, e.g., [LHY + 12]).In a multivariate setting, covariance and correlation matrices are preponderant tools to understand the interaction between variables.They are also used as building blocks for more sophisticated statistical questions such as principal component analysis or graphical models.
The past decade has witnessed an unprecedented and fertile interaction between random matrix theory and high-dimensional statistics (see [PA14] for a recent survey).Indeed, in high-dimensional settings, traditional asymptotics where the sample size tends to infinity fail to capture a delicate interaction between sample size and dimension and random matrix theory has allowed statisticians and practitioners alike to gain valuable insight on a variety of multivariate problems.
The terminology "Wishart matrices" is often, though sometimes abusively, used to refer to p × p random matrices of the form X X/n, where X is an n × p random matrix with independent rows (throughout this paper we restrict our attention to real random matrices).The simplest example arises where X has i.i.d standard Gaussian entries but the main characteristics are shared by a much wider class of random matrices.This universality phenomenon manifests itself in various aspects of the limit distribution, and in particular in the limiting behavior of the empirical spectral distribution of the matrix.Let W = X X/n be a p × p Wishart matrix and denote by λ 1 , . . ., λ p its eigenvalues; then the empirical spectral distribution μp of W is the distribution on IR defined as the following mixture of Dirac point masses at the λ j s: Assuming that the entries of X are independent, centered and of unit variance, it can be shown that µ p converges weakly to the Marčenko-Pastur distribution under weak moment conditions (see [EKYY12] for the weakest condition).While this development alone has led to important statistical advances, it fails to capture more refined notions of correlations, notably more robust ones involving ranks and therefore dependent observations.A first step in this direction was made by [YK86], where the matrix X is assumed to have independent rows with isotropic distribution.More recently, this result was extended in [BZ08, O'R12] and covers for example the case of Spearman's ρ matrix that is based on ranks, which is also a Wishart matrix of the form X X/n.
The main contribution of this paper is to derive the limiting distribution of Kendall's τ matrix, a cousin of Spearman's ρ matrix but which is not of the Wishart type but rather a matrix whose entries are U -statistics.Kendall's τ matrix is a very popular surrogate for correlation matrices but an understanding the fluctuations of its eigenvalues is still missing.Interestingly, Marčenko-Pastur results have been used as heuristics, without justification, precisely for Kendall's τ in the context of certain financial applications [CCL + 15].
As it turns out, the limiting distribution of μp is not exactly Marčenko-Pastur, but rather an affine transformation of it.Our main theorem below gives the precise form of this transformation.
Theorem 1.Let X 1 , . . ., X n , be n independent random vectors in IR p whose components X i (k) are independent random variables that have a density with respect to the Lebesgue measure on IR.Then as n → ∞ and

Kendall's Tau
The (univariate) Kendall τ statistic [Ess24, Lin25, Lin29, Ken38] is defined as follows.Let (Y 1 , Z 1 ), . . ., (Y n , Z n ) be n independent samples of a pair (Y, Z) ∈ IR × IR of real-valued random variables.Then the (empirical) Kendall τ between Y and Z is defined as The statistic τ takes values in [−1, 1] and it is not hard to see that it can be expressed as Where a pair (i, j) is said to be concordant if Y i − Y j and Z i − Z j have the same sign and discordant otherwise.
It is known that the Kendall τ statistic is asymptotically Gaussian (see, e.g., [Ken38]).Specifically, if Y and Z are independent, then as n → ∞, This property has been central to construct independence tests between two random variables X and Y (see, e.g, [KG90]).Kendall's τ stastistic can be extended to the multivariate case.Let X 1 , . . ., X n , be n independent copies of a random vector X ∈ IR p , with independent coordinates X(1), . . ., X(p).The (empirical) Kendall τ matrix of X is defined to be the p × p matrix whose entries τ kl are given by Note that the τ can be written as the sum of n 2 rank-one random matrices: where the sign function is taken entrywise.
It is easy to see that τ ii = 1 for all i.Together with (1), it implies that the matrix This suggests that if the empirical spectral distribution of τ converges to a Marčenko-Pastur distribution, it should be a standard Marčenko-Pastur distribution.This heuristic argument supports the affine transformation arising in Theorem 1.However, the matrix τ is not Wishart and the Marčenko-Pastur limit distribution does not follow from standard arguments.Nevertheless, Kendall's τ is a U -statistic which are known to satisfy the weakest form of universality, namely a Central Limit Theorem under general conditions [Hoe48,dlPG99].In this paper, we show that in the case of the Kendall τ matrix, this universality phenomenon extends to the empirical spectral distribution.

Proof of Theorem 1
For any pair (i, j) such that 1 ≤ i, j ≤ n, let A (i,j) be a rank-one matrix defined by and recall from (3) that A (i,j) .
Akin to most asymptotic results on U -statistics, we utilize a variant of Hoeffding's (a.k.a.Efron-Stein, a.k.aANOVA) decomposition [Hoe48]: where and Ā(i,j) = A (i,j) − Ā(•,j) − Ā(i,•) − I p .It is easy to check that each of first three matrices in the right-hand side of (4) is centered and that the four matrices are orthogonal to each other with respect to the inner product IE Tr(A B) as long as i = j.
Proof.For any i ∈ [n], observe that since the components of X have a density, then IP Therefore, since the coordinates of X are independent, we have By symmetry, we have Ā(•,j) = Ā(j,•) .Thus Together with Lemma 2, this yields Next, note that, the coordinates of each U i , i = 1, . . ., n are mutually independent so that IE where T ∼ Unif([0, 1]).Thus Theorem 4 implies as n → ∞ and p n → γ > 0, the empirical spectral distribution of 2 n converges in probability to (2/3)Y , where Y is distributed according to the standard Marčenko-Pastur law with parameter γ.Moreover, in probability as n → ∞ and p n → γ > 0 by Chebyshev's inequality.Together with (6), it yields the following result.

Let μτ
p denote the empirical spectral distribution of τ .We are going to show that the Levy distance between μτ p and μp converges to zero which implies Theorem 1 by Proposition 3. To that end, observe that by (4) and Lemma 5 convergence of the Levy distance follows if we show that Ā(i,j) To show that (8) holds, observe that it can be readily checked from its definition that the collection of matrices Ā(i,j) 1≤i<j≤n satisfies IE Tr Ā(i,j) Ā(i ,j ) = IE Ā(i,j) for (i, j) = (i , j ) 0 otherwise .
p n → γ > 0 the empirical spectral distribution of τ converges in probability to 2 3 Y + 1 3 , where Y is distributed according to the standard Marčenko-Pastur law with parameter γ (see Theorem 4 for the appropriate definition).

Figure 1
Figure 1 illustrates numerically the result of Theorem 1.