A study of the power and robustness of a new test for independence against contiguous alternatives

: Various association measures have been proposed in the litera- ture that equal zero when the associated random variables are independent. However many measures, (e.g., Kendall’s tau), may equal zero even in the presence of an association between the random variables. In order to overcome this drawback, Bergsma and Dassios (2014) proposed a modiﬁcation of Kendall’s tau, (denoted as τ ∗ ), which is non-negative and zero if and only if independence holds. In this article, we investigate the robustness prop- erties and the asymptotic distributions of τ ∗ and some other well-known measures of association under null and contiguous alternatives. Based on these asymptotic distributions under contiguous alternatives, we study the asymptotic power of the test based on τ ∗ under contiguous alternatives and compare its performance with the performance of other well-known tests available in the literature.

Another recently popularized measure of association is distance covariance (see Székely et al. (2007), which is defined as dcov where (X 1 , Y 1 ), (X 2 , Y 2 ) and (X 3 , Y 3 ) are independent replications of (X, Y ) ∈ R p × R q , p, q ≥ 1. It is straightforward to show that dcov = 1 4 Eh(X 1 , X 2 , X 3 , X 4 )h(Y 1 , Y 2 , Y 3 , Y 4 ), where h(z 1 , z 2 , z 3 , z 4 ) = |z 1 − z 2 | + |z 3 − z 4 | − |z 1 − z 3 | − |z 2 − z 4 |, and (X 1 , Y 1 ), (X 2 , Y 2 ), (X 3 , Y 3 ), (X 4 , Y 4 ) are independent replications of (X, Y ). In addition, one can also show that dcov = 1 cpcq R p ×R q |ψ X,Y (s,t)−ψ X (s)ψ Y (t)| 2 ||t|| 1+p ||s|| 1+q dsdt (see Székely et al. (2007)), where ψ X,Y , ψ X and ψ Y are the characteristic functions of (X, Y ), X and Y respectively, and this definition implies that dcov = 0 ⇔ X ⊥ ⊥ Y . For the given data (x 1 , y 1 ), . . . , (x n , y n ), the sample version of dcov is defined as dcov n = 1 ( n 4 ) 1≤i<j<k<l≤n 1 4 h(x i , x j , x k , x l )h(y i , y j , y k , y l ). However, it is expected that dcov is not a robust measure of association since it is moment based, whereas τ and τ * are expected to be robust against the outliers since these measures are based on the ranks or the positions of the observations. Due to this reason, it is also expected that the test based on τ * n will be more powerful than the test based on dcov n when the null and the alternative distributions are associated with two well separated distinct populations, i.e., the data from one population can be considered as the outliers relative to the data cloud formed by the observations obtained from the other population.
Along with the issue of robustness of different measures of independence, it is also of interest whether we can determine if the pair of random variables are independent or not based on τ , τ * and dcov. In order to investigate this testing of hypothesis problem, one should ideally carry out the tests based on the exact distributions of τ n , τ * n and dcov n . However, since the exact distributions of τ n , τ * n and dcov n are not tractable, we estimate the size and the power of the tests based on the asymptotic distributions of τ n , τ * n and dcov n . In addition, since both the tests based on τ * n and dcov n are consistent tests against fixed alternative, here we investigate the asymptotic powers of the tests under contiguous alternatives. In short, under the condition of contiguity, a limit law Q n of random vectors X n : Ω n → R k , k ≥ 1 can be obtained from a suitable other limit law P n , where Q n and P n are probability measures defined on (Ω n , A n ). The more technical issues related to contiguity will be discussed at the beginning of Section 3.
The rest of the article is organized as follows. In Section 2, we study the robustness of the aforementioned measures of association. In Section 3, we obtain the asymptotic distributions of τ n , τ * n and dcov n under null and contiguous alternatives, and based on these results, we investigate the asymptotic powers of the tests based on these statistics. Section 4 contains some concluding remarks. All technical details appear in the appendix. (Huber, 2011, p. 9, 11) discusses a concept of maximum bias to investigate the robustness of the estimator (or the corresponding functional), which is based on a contamination neighborhood. The maximum bias of T (·) is defined as

Robustness study
0 is the true distribution function, and M is the collection of probability measures such that the map F → ψdF from M into R is continuous whenever ψ is bounded and continuous. Motivated by the concept of maximum bias, we define a new measure b(β; T (F 0 )) as follows. Let H be the dirac measure, i.e., H X, k), and let F 0 be the joint distribution function of (X, Y ), whose associated random vector has independent components. Finally, b(β; T (F 0 )) is defined as In other words, b(β; T (F 0 )) measures the effect on T (F 0 ) of an arbitrary large observation with mass β. Remark 1. It is also appropriate to mention here that one can define b(β; T (F 0 )) when h, k → ±∞, and in view of the fact that τ , τ * and dcov are based on the absolute values of the differences between the observations, the values of b(β; .) measure for τ (F 0 ), τ * (F 0 ) and dcov(F 0 ) will remain the same when h, k → −∞. However, for the sake of simplicity, we assume h, k → ∞ throughout the paper unless mentioned otherwise.
The following theorem states the behaviour of b(β; τ * (F 0 )). Theorem 1. Let F 0 be a joint distribution function of (X, Y ), whose associated marginal distribution functions are G X and H Y of X and Y , respectively and in addition, F 0 = G X H Y . Then, for any β < 1/2, we have b(β; τ * (F 0 )) = 4β 2 (1 − β) 2 , and consequently, b(β; τ * (F 0 )) < 1/4 for any β < 1/2. Y 4 ) are independent replications of (X, Y ), and for any z 1 , z 2 , z 3 and z 4 , Theorem 1 implies that in the presence of β ∈ [0, 1/2) proportion outliers in the data, the bias of the functional τ * evaluated at F 0 will be bounded by β 2 . In fact, strictly speaking, the bias will be bounded by 1/4. In other words, the bias will not break down to 1 even in the presence of arbitrarily large outliers. Also, in view of the fact that τ * n → τ * in probability as n → ∞, the bias of τ * n will be bounded by 1/4 in probability when the data is obtained from the joint distribution function having independent marginal distribution functions. Proposition 1 discusses the behaviour of b(β; τ (F 0 )) and b(β; dcov(F 0 )).
where (X 1 , Y 1 ) and (X 2 , Y 2 ) are independent replications of (X, Y ). Under the same conditions, for any β < 1/2, The assertion in Proposition 1 implies that τ is also a robust measure in the sense of having bounded b(β; τ (F 0 )), whereas unlike b(β; τ (F 0 )) and b(β; τ * (F 0 )), b(β; dcov(F 0 )) is unbounded. This fact implies that distance covariance is nonrobust against the outliers. As we have mentioned in the Introduction, the nonrobustness of distance covariance is expected to be reflected in the asymptotic power study, which will be fully discussed in the forthcoming section.

Asymptotic power study under contiguous alternatives
Besides the issue of robustness, since the tests based on both τ * n and dcov n are consistent (i.e., the power of the test tends to one as the sample size tends to infinite), a natural question is how the asymptotic powers of the tests based on τ * n and dcov n compare with other well-known tests (e.g., a test based on τ n ) under contiguous alternatives (e.g., see Hajek, Sidak, and Sen (1999), p. 249). Precisely, the sequence of probability measures Q n is contiguous with respect to the sequence of probability measures P n if P n (A n ) → 0 implies that Q n (A n ) → 0 for every sequence of measurable sets A n , where (Ω n , A n ) is the sequence of measurable spaces, and P n and Q n are two probability measures defined on (Ω n , A n ). In order to characterise the contiguity in terms of the asymptotic behaviour of the likelihood ratios between P n and Q n , Le Cam proposed some results popularly known as Le Cam's Lemma (e.g., see Hajek et al. (1999)). A consequence of Le Cam's first lemma is that the sequence Q n will be contiguous with respect to the sequence P n if log Qn Pn asymptotically follow a Gaussian distribution with mean = − σ 2 2 and Variance = σ 2 under P n (e.g., see Hajek et al. (1999), p. 253, Corollary to Le Cam's first Lemma)), where σ > 0 is a constant, and we use this fact to establish contiguity in this article (see the proof of Theorem 2).
Suppose that we now want to test H 0 : where F 0 is the joint distribution function of (X, Y ) with the associated marginal distribution functions of X and Y being G X and H Y , respectively, and we consider a sequence of contiguous or local alternatives H n : Here we should point out that A n is a sequence of sets, which is changing over n along with its σ-field A n , and for that reason, it does not follow directly from the definition of contiguity that F n is contiguous with respect to F 0 . In Theorem 2, based on Le Cam's first lemma, we establish that the alternatives H n will be contiguous alternatives under certain conditions.
In order to carry out the tests based on τ n , τ * n and dcov n , one needs to know the distributions (or an approximation of the distributions) of these estimators. In this context, note that τ n , τ * n and dcov n are U -statistics (e.g., see Lee (1990)) and to derive the asymptotic distributions of them, one needs to know the order of degeneracy of each τ n , τ * n and dcov n . For the sake of completeness, the definition of U -statistic and its order of degeneracy are given below. For the given data . . , X m ) = 0, for all x 1 , . . . , x l . The statistic τ n has the order of degeneracy = 0, whereas that of τ * n and dcov n are of order 1 (see the proofs of Theorems 2, 3 and 4). Here it should be further pointed out that 0 = δ 2 . . , X m )) for l = 1, . . . , m (see, e.g., Serfling (1980), p. 182), which implies that for all k ≥ l, δ 2 k > 0 when δ 2 l > 0. This fact ensures the uniqueness of the order of degeneracy of U -statistic in view of the definition of the order of degeneracy. The connection between the rate of convergence of U -statistic and its order of degeneracy will be discussed in Remark 3. In Theorems 2, 3 and 4, we describe the asymptotic behaviour of τ n , τ * n and dcov n respectively under contiguous alternatives H n . Theorem 2. Assume that F 0 and K have Lebesgue densities f 0 and k, respectively, and Then, the sequence of alternatives H n is contiguous to H 0 . Moreover, under H n , √ n(τ n − τ ) converges weakly to a Gaussian distribution with mean μ 1 and variance σ 2 1 , where ) 2 is essentially the first order approximation of an entropy E f0 (log k f0 ) that measures dissimilarity between two densities f 0 and k. In other words, E f0 (1 − k f0 ) 2 is the mean square contingency (see (Rényi, 1959, p. 446)) of f 0 and k. Further, we should mention that if k = f 0 , we have k f0 − 1 = 0, i.e., k and f 0 are similar. At the same time, larger values of k f0 − 1 indicate that k and f 0 are more dissimilar. To summarize, Theorem 2 asserts that the sequence of alternatives H n will be contiguous with respect to H 0 when the mean square contingency of f 0 and k is finite.
To prove Theorem 2, Le Cam's third lemma is used to obtain the asymptotic normality of √ n{τ * n −τ } under H n , and Le Cam's third lemma uses the fact that log L n converges weakly to a random variable associated with a normal distribution having certain location and scale parameters (see the proof of Theorems 2). We should point out that the asymptotic normality of log L n is a sufficient condition but not a necessary condition to establish the contiguity of Q n with respect to P n . Instead of Le Cam's third lemma, one can also follow Behnen and Neuhaus (1975)'s approach based on a specific truncation method for contiguity of the density functions associated with H n with respect to the density function associated with H 0 . Also, Behnen (1971) investigated the asymptotic relative efficiency of some tests for independence against general contiguous alternatives of positive quadrant dependence. However, neither Behnen (1971) nor Behnen and Neuhaus (1975) considered the distribution functions associated with H n as a mixture distribution, as we consider here. Recently, Banerjee (2005) studied the behaviour of the likelihood ratio statistics for testing a finite dimensional parameter under local contiguous hypotheses. To obtain the local (or contiguous) alternatives, he perturbed the null hypothesized parameter, which is different from the perturbance on the distribution function considered by us.
Note that the sequence of contiguous alternatives H n coincide with the null hypothesis H 0 when γ = 0, and hence, the asymptotic distribution of √ n(τ n −τ ) under H 0 directly follows from the assertion in Theorem 2 by choosing γ = 0. Corollary 1 states the asymptotic distribution of √ n(τ n − τ ) under H 0 .
Corollary 1. Assume that F 0 has density function f 0 . Then, under H 0 , √ n(τ n − τ ) converges weakly to a Gaussian distribution with mean zero and variance σ 2 1 , where σ 2 1 is the same as defined in Theorem 2.
Theorem 3. Assume the same conditions on F 0 and K as mentioned in Theo-

bivariate random vectors, and
Theorem 4. Assume the same conditions on F 0 and K as mentioned in Theorem 2. Then, under H n , n(dcov n − dcov) converges weakly to

bivariate random vectors, and
Here again, when γ = 0, the sequence of contiguous alternatives H n for n = 1, · · · coincide with the null hypothesis H 0 , and as a consequence, one can derive the asymptotic distributions of n(τ * n − τ * ) and n(dcov n − dcov) under H 0 from their asymptotic distributions under H n when γ = 0. Corollaries 2 and 3 state the asymptotic distributions of n(τ * n − τ * ) and n(dcov n − dcov) under H 0 , respectively.
Corollary 2. Assume the same conditions on F 0 as mentioned in Theorem 2. Then,under H 0 and (X 4 , Y 4 ) are i.i.d. bivariate random vectors, and g i (x) and g i (y) are the eigenfunctions such that Corollary 3. Assume the same conditions on F 0 as mentioned in Theorem 2.
are the eigenfunctions such that The assertion in Corollary 1 implies that √ n(τ n − τ ) = O p (1), which follows from Prohorov's theorem (e.g., see (Van der Vaart, 2000, p. 8), and consequently, we have τ n − τ = o p (1), which ensures that τ n is a consistent estimator of τ . Similarly, along with a straightforward application of Prohorov's theorem (e.g., see (Van der Vaart, 2000, p. 8), it follows from Corollaries 2 and 3 that n(τ * n − τ * ) = O p (1) and n(dcov n − dcov) = O p (1), respectively. These facts imply that τ * n and dcov n are consistent estimators of τ * and dcov respectively. Remark 2. It is appropriate to mention here that one can directly establish the results related to the consistency of τ n , τ * n and dcov n using the results on the consistency of U -statistics. Among these three estimators τ n , τ * n and dcov n , τ n is a non-degenerate U -statistic, whereas τ * n and dcov n are degenerate U -statistics of order = 1. The exact variance expressions of non-degenerate and degenerate U -statistics are given in p. 183 (Lemma A), and in p. 189 in Serfling (1980) respectively, and those variance terms converge to zero as n → ∞ (see p. 183 in (iii) of Lemma A in Serfling (1980)). These facts establish the consistency of τ n , τ * n and dcov n to its population counterpart. Remark 3. The rates of convergence of τ n , τ * n and dcov n also follow from the results related to the rate of convergence of the U -statistic. Based on the wellknown projection method of the U -statistic (see, e.g., Section 5.3.4 in Serfling (1980), pp. 189-190), one can derive directly the rate of convergence of τ n when c = 1 in the expression given in 5.3.4 in Serfling (1980) and that of τ * n and dcov n when c = 2 in that expression. The aforementioned choices of c depend on the order of degeneracy of the corresponding U -statistic. In other words, this fact gives us an idea on how the rate of convergence of U -statistic is associated with its order of degeneracy. To summarize, for a U -statistic with the order of degeneracy = p, the rate of convergence will be of n p+1 2 , where n is the sample size, and p is an integer (see, e.g., Section 5.3.4 in Serfling (1980), pp. 189-190).

Remark 4.
We would like to end this section with a discussion of the eigenvalues and the eigenfunctions, which are associated with the asymptotic distributions of τ * n and dcov n stated in Theorems 3 and 4. In view of the nonzero order of degeneracy of τ * n and dcov n , the eigenvalues and the eigenfunctions are involved in the asymptotic distributions of them (see, e.g., Section 5.5.2 in Serfling (1980), pp. 193-194). Further, using a spectral decomposition of the kernels l(x, y) and l * (x, y), we have l(x, y) = ∞ k=1 λ k g k (x)g k (y) and l * (x, y) = ∞ k=1 λ * k g * k (x)g * k (y), which hold true in the L 2 -sense. Here, g k (·)s are orthonormal eigenfunctions and λ k s are the corresponding eigenvalues of the integral equation on l(x, y) described in the statement of Theorem 3. Similarly, g * k (·)s are orthonormal eigenfunctions and λ * k s are the corresponding eigen-values of the integral equation on l * (x, y) described in the statement of Theorem 4. In addition, the orthonormality of g k (·) implies that E{g 2 k (X)} = 1 and E{g k (X)g k (X)} = 0 for all k = k . Similarly, due to the same reason, we have E{g * 2 k (X)} = 1 and E{g * k (X)g * k (X)} = 0 for all k = k . Moreover, it is here appropriate to mention that for n(dcov n − dcov), Bergsma (2006) listed the exact forms of the eigenvalues and the eigenfunctions for various distributions, and as a result, any asymptotic inference based upon Theorem 4 and Corollary 3 is feasible. However, the exact forms of the eigenvalues and the eigenfunctions associated with the asymptotic distributions of n(τ * n − τ * ) are not yet available in the literature.

Computation: Implementation of the tests and some examples
Theorem 2 helps us to compute the asymptotic power of the test based on τ n for different values of γ, and the asymptotic critical value at α% level of significance (denote it as c 1 (α)) can be obtained from the (1 − α)% quantile of the Gaussian distribution described in Corollary 1. Similarly, Theorems 3 and 4 enable us to compute the asymptotic power of the tests based on τ * n and dcov n , and the corresponding asymptotic critical values (denoted as c 2 (α) and c 3 (α) respectively) can be obtained from the (1 − α)% quantile of the distributions described in Corollary 2 and Corollary 3 respectively. However, since the infinite sum of the weighted chi-squared distribution (see Theorems 3 and 4 and Corollary 2 and 3) with weights as the eigenvalues of the kernels associated with τ * (or dcov), is not easily tractable in practice, it becomes difficult to have quantiles of this distribution. In order to overcome this problem related to infinitely many eigenvalues and the infinite sum, we approximate the kernel function at n 1 × n 1 many marginal quantile points and compute the eigenvalues of n 1 × n 1 finitedimensional matrix associated with the kernel function. The (i, j)-th element of the matrix is the (i/n 1 , j/n 1 )-th marginal quantile (see Babu and Rao (1988)) of the joint distribution associated with the bivariate random vector (X, Y ), where i = 1, . . . , n 1 and j = 1, . . . , n 1 . Then, we generate a large sample with size n 2 from that approximated finite sum of the weighted chi-squared distribution, and the (1 − α)%-th quantile of that sample is taken as the approximated value of the asymptotic critical value at α% level of significance. Similarly, in order to compute power, we approximate the infinite sum of the weighted chi-squared distributions, described in Theorems 3 and 4, by an appropriate finite sum of the chi-squared distributions. We simulate a large sample with size n 3 from the approximated distributions, and finally, the proportion of the observations in the sample larger than the approximated critical value, is considered to be the value of the asymptotic power. Also, for distance covariance, we carry out an alternative procedure based on the exact forms of the first four eigenvalues, which essentially explains more than 90% variation (see Bergsma (2006)) and the corresponding eigenfunctions. The results obtained by this procedure are nearly the same as the reported results. In the asymptotic power studies of different tests, we consider n 1 = 10, n 2 = 100 and n 3 = 100 unless mentioned otherwise. In the following examples, we compute the asymptotic power of the tests based on τ n , τ * n and dcov n for different values of γ with various choices of f 0 and k. All results are summarized in Figure 1.

Example 1. Consider
The results are reported in Table 1. The values in Table 1 indicate that the test based on τ * n is more powerful than the test based on dcov n because of dcov n 's non-robustness property. Comparing between the tests based on τ n and dcov n , for small values of γ, the τ n -based test performs better whereas for large values of γ, the test based on dcov n performs marginally better than the test based on τ n .

Example 2. Consider
The results are reported in Table 2. In this example, given that the right end point of the support of F 0 is too distant from the left end point of the support of K, the test based on dcov n does not perform well as it is a moment based procedure, i.e., non-robust against the outliers. Whereas, as expected, the test based on τ * n performs well since it is robust against the outliers. , where (x, y) ∈ R 2 . The results are reported in Table 3. The figures in Table 3 also indicate that the test based on τ * n performs better than the test based on dcov as expected in view of the fact that b(β; τ * (F 0 )) is bounded whereas b(β; dcov(F 0 )) is unbounded. The nature of b(β; ·) plays a crucial role in the power study because the distance between the location parameters of F 0 and K is large while the scatter matrices associated with F 0 and K are the same. (i.e., standard bivariate Cauchy density function), where (x, y) ∈ R 2 . The results are reported in Table 4.  As expected, the figures in Table 4 indicate that the test based on τ * n performs best whereas the test based on dcov n does not, since the latter lacks the robustness against the outliers generated from a heavy tailed distribution K, namely, the standard bivariate Cauchy distribution.

Concluding remarks
The asymptotic power study in Section 3.1 indicates that the test based on τ * n performs well when the null and alternative distributions are far away from each other while the test based on dcov n does not perform well in this situation since distance covariance is not robust against outliers. On the other hand, performances of both measures are comparable when null and alternative distributions are close.
Recently, Weihs, Drton, and Leung (2016 (to appear)) provided an efficient method to compute τ * n . Direct computation of τ * n using the definition requires O(n 4 ) operations. Similar to Christensen's 2005 idea for computing Kendall's τ , Weihs et al. observed that computing τ * n relies only on the relative ordering of quadruples of points. Based on this fact, they derived an algorithm to compute τ * n using only O(n 2 log(n)) operations.
We should also point out that one can carry out two-sample tests based on τ * n (or dcov n ). Suppose that U = {U 1 , . . . , U m } and V = {V 1 , . . . , V n } are two independent sets of random variables associated with distribution functions F and G, respectively, and we want to test H 0 : Note that it follows from (Bergsma and Dassios, 2014, Theorem 1 In other words, the two-sample test (i.e., H 0 : F = G against H 1 : F = G) is a special case of the test for independence.
One can also use τ * to estimate the mixing proportion in the mixture distribution such as F X,Y = (1− )F 1X G 1Y + F 2X G 2Y , where ∈ (0, 1/2) is the mixing proportion, and F 1X , G 1Y , F 2X and G 2Y are distribution functions. Suppose that (X 1 , Y 1 ), . . . , (X n , Y n ) are i.i.d. bivariate random vectors associated with F 1X G 1Y , and as a consequence of the product form F 1X G 1Y of the joint distribution function, we have τ * (X, Y ) = 0. Also, let (X * 1 , Y * 1 ), . . . , (X * m , Y * m ) be i.i.d. bivariate random vectors associated with F 2X G 2Y , and we have τ * (X * , Y * ) = 0 in view of the product form F 2X G 2Y of the joint distribution function. We now combine these n many (X, Y ) and m many (X * , Y * ) random vectors and then randomly choose n many random vectors from the combined (n + m) many random vectors, which can be done in n+m n ways. We denote j-th set of chosen random vectors are (x * * 1j , Y * * 1j ), . . . , (X * * nj , Y * * nj ), where j = 1, . . . , n+m n and compute τ * (X * * j , Y * * j ) for each j = 1, . . . , n+m n . Finally, in view of the structure of the mixture distribution F X,Y , one can propose the estimate of to bê where c is a constant, significantly larger than zero. The investigation of the properties ofˆ n,m is a subject for future research.

Appendix
Proof of Theorem 1. It follows from the definition of τ * that
For dcov, considering 1 Also, all terms associated with either β 3 (1 − β) or β(1 − β) 3 converge to zero as h, k → ∞. Among the terms associated with β 2 (1 − β) 2 , Proof of Theorem 3. We first note that τ * n is a U-statistic having a degeneracy of order 1, which follows from the following. Note that So, in order to establish that fact, it is now enough to show that E{a(X 1 , X 2 , X 3 , X 4 )|(X 1 = x 1 )} = 0 for all x 1 .
Hence, we consider The last step follows from the fact that X 2 , X 3 and X 4 are i.i.d. random variables. Also, it is easy to see that E{a(X 1 , X 2 , X 3 , X 4 )|(X 1 = x 1 , X 2 = x 2 )} = 0 for some x 1 and x 2 . Hence, it is now established that τ * n is a U-statistic having a degeneracy of order 1.
Further, note that the densities (denoted as q n ) associated with H n is dominated by the density (denote it as p 0 ) associated with H 0 with Radon-Nikodym derivative dqn dp0 = 1+n − 1 2 h n , where h n = γ( k f0 −1) ∈ L 2 (p 0 ) since E f0 ( k f0 −1) 2 < ∞, which is asserted in the statement of Theorem 2. Hence, q n and p 0 satisfy the assumptions stated in Theorem 1 in Gregory (1977), which concludes that n(τ * n − τ * ) converges weakly to ∞ i=1 λ i {(Z i + a i ) 2 − 1} under H n , where λ i , Z i and a i are as defined in the statement of the theorem. This completes the proof.
Proof of Theorem 4. We first note that dcov n is a U-statistic having a degeneracy of order 1, which follows from the following. Note that So, in order to establish that fact, it is now enough to show that E{h(X 1 , X 2 , X 3 , X 4 )|(X 1 = x 1 )} = 0 for all x 1 . Hence, we consider Eh(x 1 , X 2 , X 3 , X 4 ) = R 3 {|x 1 − X 2 | + |X 3 − X 4 | − |x 1 − X 3 |−|X 2 −X 4 |} 4 i=2 dG Xi = 0 for all x 1 in view of the fact that X 2 , X 3 and X 4 are i.i.d. random variables. In addition, it is easy to see that E{h(X 1 , X 2 , X 3 , X 4 )| (X 1 = x 1 , X 2 = x 2 )} = 0 for some x 1 and x 2 . Hence, it is now established that dcov n is a U-statistic having a degeneracy of order 1.