Detection of sparse positive dependence

: In a bivariate setting, we consider the problem of detecting a sparse contamination or mixture component, where the eﬀect manifests itself as a positive dependence between the variables, which are otherwise independent in the main component. We ﬁrst look at this problem in the context of a normal mixture model. In essence, the situation reduces to a univariate setting where the eﬀect is a decrease in variance. In particular, a higher criticism test based on the pairwise diﬀerences is shown to achieve the detection boundary deﬁned by the (oracle) likelihood ratio test. We then turn to a Gaussian copula model where the marginal distributions are unknown. Standard invariance considerations lead us to consider rank tests. In fact, a higher criticism test based on the pairwise rank diﬀerences achieves the detection boundary in the normal mixture model, although not in the very sparse regime. We do not know of any rank test that has any power in that regime. achieves the detection boundary in the dense regime, while it is asymptotically powerless in the


Introduction
The detection of rare effects has been an important problem for years in settings, and may be particularly relevant today, for example, with the search for personalized care in the health industry, where a small fraction of a population may respond particularly well, or particularly poorly, to some given treatment [20].
Following a theoretical investigation initiated in large part by Ingster [16] and broadened by Donoho and Jin [10], we are interested in studying two-component mixture models, also known as contamination models, in various asymptotic regimes defined by how the small mixture weight converges to zero. Most of the existing work in the setting of univariate data has focused on models where the contamination manifests itself as a shift in mean [12,11,14,7,19] with a few exceptions where the effect is a change in variance [1], or a change in both mean and variance [6].
In the present paper, we are interested in bivariate data, instead, and more specifically in a situation where the effect felt in the dependence between the two variables being measured. This setting has been recently considered in the literature in the context of assessing the reproducibility of studies. For example, [18] aims to identify significant features from separate studies using an expectationmaximization (EM) algorithm. They applied a copula mixture model and assumed that changes in the mean and covariance matrix differentiate the contaminated component from the null component. [23] studies another model where variables from the contamination are stochastically larger marginally. In both models, the marginal distributions have some non-null effects. Similar settings have been considered within a multiple testing framework [5,22].
While existing work has focused on models motivated by questions of reproducibility, in the present work we come back to basics and directly address the problem of detecting a bivariate mixture with a component where the variables are independent and a component where the variables are positively dependent.

Gaussian mixture model
Ingster [16] and Donoho and Jin [10] started with a mixture of Gaussians, and we do the same, and in our setting, this means we consider the following mixture model where ε ∈ [0, 1/2) is the contamination proportion and 0 ≤ ρ ≤ 1 is the correlation between the two variables under contamination. We consider the following hypothesis testing problem: based on (X 1 , Y 1 ), . . . , (X n , Y n ) drawn iid from (1), decide H 0 : ε = 0 versus H 1 : ε > 0, ρ > 0. (2) Note that under the null hypothesis, (X, Y ) is from the bivariate standard normal. Under the alternative, X and Y remain standard normal marginally. Following the literature on the detection of sparse mixtures [16,10], we are most interested in a situation, asymptotic as n → ∞, where ε = ε n → 0, and the central question is how large ρ = ρ n needs to be in order to reliability distinguish these hypotheses.
The formulation (1) suggests that the alternative hypothesis is composite, but if we assume that (ε, ρ) are known under the alternative, then the likelihood ratio test (LRT) is optimal by Neyman-Pearson lemma. We start with characterizing the behavior of the LRT, which provides a benchmark. We then study some other testing procedures that do not require knowledge of the model parameters: 1 • The covariance test rejects for large values of i X i Y i , and coincides with Rao's score test in the present context. This is the classical test for independence, specifically designed for the case where ε = 1 and ρ > 0 under the alternative. We shall see that it is suboptimal in some regimes. • The extremes test rejects for small values of min i |X i − Y i |. This test exploits the fact that, because ρ is assumed positive, the variables in the contaminated component are closer to each other than in the null component. • The higher criticism test was suggested by John Tukey and deployed by [10] for the testing of sparse mixtures. We propose a version of that test based on the pairwise differences, In detail, the test rejects for large values of where Ψ(u) := 2Φ(u) − 1, with Φ denotes the standard normal distribution function, andF (u) := 1 n n i=1 I{|U i | ≤ u}, the empirical distribution function of |U 1 |, . . . , |U n |.
As is common practice in this line of work [16,10], under H 1 we set The setting where β ≤ 1/2 is often called the dense regime and the setting where β > 1/2 is often called the sparse regime. Our analysis reveals the following: (a) Dense regime. The dense regime is most interesting when ρ → 0. In that case, we find that the covariance test and the higher criticism test match the asymptotic performance of the likelihood ratio test to first-order, while the extremes test has no power. (b) Sparse regime. The sparse regime is most interesting when ρ → 1. In that case, we find that the higher criticism test still performs as well as the likelihood ratio test to first order, while the covariance test is powerless, and the extremes test is suboptimal.

Gaussian mixture copula model
From a practical point of view, the assumption that both X and Y are normally distributed is quite stringent. Hence, we would like to know if there are nonparametric procedures that do not require such a condition but can still achieve the same performance as the likelihood ratio test. In the univariate setting where the effect arises as a shift in mean, this was investigated in [2]. In the bivariate setting, in a model for reproducibility, [23] proposes a nonparametric test based on a weighted version of Hoeffding's test for independence.
Here, instead of model (1), we suppose (X, Y ) follows a Gaussian mixture copula model (GMCM) [4], meaning that there is a latent random vector (Z 1 , Z 2 ) such that where F and G are unknown distribution functions on the real line, and Φ is the standard normal distribution function, while ε ∈ [0, 1/2) is the contamination proportion and 0 ≤ ρ ≤ 1 is the correlation between Z 1 and Z 2 in the contaminated component, as before in model (1). [18] also uses a copula mixture model, but they placed emphasis on the mean while we focus on the dependence. We still consider the testing problem (2), but now in the context of Model (5). The setting is nonparametric in that both F and G are unknown. Model (5) is crafted in such a way that the marginal distributions of X and Y contain absolutely no information that is pertinent to the testing problem under consideration.
The model is also attractive because of an invariance under all increasing marginal transformations of the variables. This is the same invariance that leads to considering rank based methods such as the Spearman correlation test [17,Chp. 6]. In fact, we analyze the Spearman correlation test, which is the nonparametric analog to the covariance test, showing that it is first-order asymptotically optimal in the dense regime. We also propose and analyze a nonparametric version of the higher criticism based on ranks which we show is first-order asymptotically optimal in the moderately sparse regime where 1/2 < β < 3/4. In the very sparse regime, where β > 3/4, we do not know of any rank-based test that has any power.

Gaussian mixture model
In this section, we focus on the Gaussian mixture model (1). We start by deriving a lower bound on the performance of the likelihood ratio test, which provides a benchmark for the other (adaptive) tests, which we subsequently analyze.
We say that a testing procedure is asymptotically powerful (resp. powerless) if the sum of its probabilities of Type I and Type II errors (its risk) has limit 0 (resp. limit inferior at least 1) in the large sample asymptote.
This only provides a lower bound on what can be achieved, but it will turn out that to be sharp once we establish the performance of the higher criticism test in Proposition 2 below.
Proof. The proof techniques are standard and already present in [12,16], and many of the subsequent works.
Defining (1) is equivalently expressed in terms of (U, V ), which has distribution Note that U and V are independent only conditional on knowing what distribution they were sampled from. In terms of the (U, V )'s, the likelihood ratio is where L i is the likelihood ratio for observation (U i , V i ), which in the present case takes the following expression The risk of the likelihood ratio test is equal to [17,Problem 3.10] risk(L) : We show that risk(L) = 1+o(1) under each of the stated conditions. We consider each regime in turn.
Dense regime. It turns out that it suffices to bound the second moment. Indeed, using the Cauchy-Schwarz inequality, we have For the third term, we have Hence, we have E 0 [L 2 1 ] = 1 + ε 2 ρ 2 /(1 − ρ 2 ), and, therefore, since ρ is assumed to be bounded away from 1. Under the specified parameterization, this happens exactly when γ > 1/2 − β.
Sparse regime. It turns out that simply bounding the second moment, as we did above, does not suffice. Instead, we truncate the likelihood and study the behavior of its first two moments. Define the indicator variable D i = I{|V i | ≤ √ 2 log n} and the corresponding truncated likelihood ratiō

E. Arias-Castro et al.
Using the triangle inequality, the fact thatL ≤ L, and the Cauchy-Schwarz inequality, we have the following upper bound: For the first moment, we have where, using the independence of U 1 and V 1 , and taking the expectation with respect to U 1 first, where, for t ≥ 0, and we used the fact that 1 − Ψ(t) e −t 2 /2 /t when t → ∞. Since ε = n −β with β > 1/2 in the sparse regime, for ρ sufficiently close to 1, For the second moment, we have The sum of first two terms is bounded from above by For the third term, we have using the fact that ρ ≤ 1. Hence, when ρ is sufficiently close to 1. This in turn yields the following bound Under the specified parameterization, this happens exactly when γ < 4β −2.
In the dense regime, with ρ parameterized as in (6), we say that a test achieves the detection boundary if it is asymptotically powerful when γ < 1/2 − β, and in the sparse regime, with ρ parameterized as in (7), we say that a test achieves the detection boundary if it is asymptotically powerful when γ > 4(β − 1/2).

The covariance test
Recall that the covariance test rejects for large values of T n := n i=1 X i Y i , calibrated under the null where X 1 , . . . , X n , Y 1 , . . . , Y n are iid standard normal. (2), the covariance test achieves the detection boundary in the dense regime, while it is asymptotically powerless in the sparse regime.

Proposition 1. For the testing problem
Proof. We divide the proof into the two regimes.
so that, by Chebyshev's inequality, for any sequence (a n ) diverging to infinity. Under H 1 , we have so that, by Chebyshev's inequality, Thus the test with rejection region {T n ≥ a n √ n} is asymptotically powerful If we choose a n = log n, for example, and ρ is parameterized as in (6), this happens for n large enough when γ < 1/2 − β.
Sparse regime. To prove that the covariance test is asymptotically powerless when β > 1/2, we show that, under H 1 , T n converges to the same limiting distribution as under H 0 . Under H 0 , by the central limit theorem, Under H 1 the distribution of the (X i , Y i )'s (which remain iid) depends on n, but the condition for applying Lyapunov's central limit theorem are satisfied since where Z ∼ N (0, 1) and the inequality is Cauchy-Schwarz's, while so that the test statistic still converges weakly to a normal distribution, In the present regime, we have so that E 1 (T n )/ Var 1 (T n ) → 0 and Var 1 (T n ) ∼ n, and thus we conclude by Slutsky's theorem that T n / √ n N (0, 1).
Remark 1. There are good reasons to consider the covariance test in this specific form since the means and variances are known. It is worth pointing out that the Pearson correlation test, which is more standard in practice since it does not require knowledge of the means or variances, has the same asymptotic power properties.

The higher criticism test and the extremes test
Seen through the U i 's, the problem becomes that of detecting a sparse contamination where the effect is in the variance. We recently studied this problem in detail [1], extending previous work by Cai et al [6], who considered a setting where the effect is both in the mean and variance. Borrowing from our prior work, we consider a higher criticism test, already defined in (3), and an extremes test, which rejects for small values of min i |U i |. (2), the higher criticism test achieves the detection boundary in the dense and sparse regimes.

Proposition 2. For the testing problem
Proof. Set σ 2 = 1 − ρ, which is the variance of the contaminated component. In our prior work [1, Prop 3], we showed that the higher criticism test as defined in (3) is asymptotically powerful when This can be directly translated into the present setting, yielding the stated result. (2), the extremes test is asymptotically powerless when ρ is bounded away from 1, while when ε parameterized as in (4) and ρ parameterized as in (7), it is asymptotically powerful when γ > 2β, and asymptotically powerless when γ < 2β.

Proposition 3. For the testing problem
Proof. This is also a direct corollary from our prior work our prior work [1, Thus the extremes test is grossly suboptimal in the dense regime, while it is suboptimal in the sparse regime due to the fact that 2β−4(β−1/2) = 2−2β > 0. Remark 2. The higher criticism and extremes tests are both based on the U i 's. This was convenient as it reduced the problem of testing for independence to the problem of testing for a change in variance (both in a contamination model). However, reducing the original data, meaning the (X i , Y i )'s, to the U i 's implies a loss of information. Indeed, a lossless reduction would be from the ( with joint distribution given in (8). It just turns out that ignoring the V i 's does not lead to any loss in first-order asymptotic power.

Numerical experiments
We performed some numerical experiments to investigate the finite sample performance of the tests considered here: the likelihood ratio test, the Pearson correlation test (instead of the covariance test from a practical point of view), the extremes test, the higher criticism test, and also a plug-in version of the higher criticism test where the parameters of the bivariate normal distribution (the two means and two variances) are estimated under the null. The sample size n is set large to n = 10 6 in order to capture the large-sample behavior of these tests. We tried four sparsity levels, setting β ∈ {0. 2 For each scenario, we repeated the process 200 times and calculated the fraction of p-values smaller than 0.05, representing the empirical power at the 0.05 level.
The results of this experiment are reported in Figure 1 and are broadly consistent with the theory developed earlier in this section. Though we show that the higher criticism test is first-order comparable to the likelihood ratio test in the dense regime, even with a large sample, its power is much lower. The Pearson correlation test does better in that regime. The plug-in higher criticism test has a similar performance as the higher criticism test in the dense regime, while it loses some power in the moderately sparse regime, and is powerless in the very sparse regime.

Gaussian mixture copula model
In this section we turn to the Gaussian mixture copula model introduced in (5). The setting is thus nonparametric, since the marginal distributions are completely unknown, and standard invariance considerations [17,Ch 6] lead us to consider test procedures that are based on the ranks. For this, we let R i denote the rank of X i among {X 1 , . . . , X n }, and similarly, we let S i denote the rank of Y i among {Y 1 , . . . , Y n }. (The ranks are in increasing order, say.) Although not strictly necessary, we will assume that F and G in (5) are strictly increasing and continuous. In that case, the ranks are invariant with respect to transformations of the form (x, y) → (p(x), q(y)) with p and q strictly increasing on the real line. In particular, for the rank tests that follow, this allows us to reduce their analysis under (5) to their analysis under (1).

The covariance rank test
The covariance rank test is the analog of the covariance test of Section 2.2. It rejects for large values of T n := i R i S i (redefined). As is well-known, this is equivalent to rejecting for large values of the Spearman rank correlation.  (5), the covariance rank test achieves the detection boundary in the dense regime, while it is asymptotically powerless in the sparse regime.
Proof. We again divide the proof into the two regimes.
We now turn to the alternative hypothesis H 1 . For convenience, we assume that the ranks run from 0 to n − 1. This does not change the test procedure since T n = − 1 2 i (R i − S i ) 2 + const, but makes the derivations somewhat less cumbersome. In particular, we have For the expectation, we have The expectation is with respect to (X 1 , Y 1 ), X 2 , Y 3 independent, with (X 1 , Y 1 ) drawn from the mixture (1), and X 2 and Y 3 standard normal.
. We note that (U, V ) is bivariate normal with standard marginals. Moreover, when (X 1 , Y 1 ) comes from the main component, U and V are uncorrelated, and therefore independent; while when (X 1 , Y 1 ) comes from the contaminated component, U and V have correlation ρ/2. Therefore, where Λ(ρ) = P(U > 0, V > 0) under (U, V ) ∼ N (0, Σ ρ ). We immediately have Λ(0) = 1/4, and in general, 2 We conclude that using the fact that sin −1 (a) ≥ a for all a ≥ 0. For the variance, we start with the second moment which then implies that the same bound we had for Var 0 (T n ). Thus, by Chebyshev's inequality, we have P 1 T n ≤ 1 4 n 3 + 1 4π n 3 ερ − a n n 5/2 → 0, for any sequence (a n ) diverging to infinity. We consider the test with rejection region {T n ≥ n 3 /4+a n n 5/2 }. Our analysis implies that this test is asymptotically powerful when If we choose a n = log n, for example, and ρ is parameterized as in (6), this happens for n large enough when γ < 1/2 − β.
Sparse regime. To prove that the covariance rank test is asymptotically powerless when β > 1/2, similarly as the covariance test, we show that, under H 1 , T n converges to the same limiting distribution as under H 0 . Under H 0 , we have [13,Ch. 11], where ζ n := E 0 (T n ) and τ 2 n := Var 0 (T n ). We place ourselves under H 1 , and show that (10) continues to hold. For this we use a simple coupling. We couple T n with a new statistic T n , defined just like T n , except that, for each pair (X i , Y i ) drawn from the contaminated component, we replace Y i by Y i ∼ N (0, 1) independent of X i and any other variable. Let M denote the number of pairs drawn from the contaminated component, and note that M is random, having the binomial distribution with parameters (n, ε). It's not hard to show that |T n − T n | ≤ Mn 2 , so that |T n − T n | = O P (n 3 ε). And by construction, T n has the same distribution as T n under H 0 . We use this in what follows where, on the RHS, the first term converges weakly to the standard normal distribution, while the second term is = O P (n 3 ε/τ n ) = o P (1), since ε = n 1−β with β > 1/2 and τ n n 5/2 by (9). We thus conclude that (10) with an application of Slutsky's theorem.

The higher criticism rank test
The analog of the higher criticism test of (3) is a higher criticism based on the pairwise differences in ranks, D i := |R i − S i |. To be specific, we define where u(t) is the probability P 0 (D i ≤ t), which can be expressed in closed form as Note that in this definition the denominator is only an approximation to the standard deviation of the numerator. The standard deviation has a closedform expression which can be derived from a more general result of Hoeffding [15,Th. 2], but it is cumbersome and relatively costly to compute (although its computation is only done once for each n). Also, there is a fair amount of flexibility in the choice of range of thresholds t considered. This particular choice seems to work well enough. As any other rank test, it is calibrated by permutation (or Monte Carlo if there are no ties in the data).

Theorem 2. For the testing problem (2) under the model (5), the higher criticism rank test achieves the detection boundary in the dense and in the moderately sparse regimes.
Proof. As usual, we first control the test statistic under the null, and then analyze its behavior under the alternative.

Under the null hypothesis
We start with the situation under the null hypothesis H 0 , where we show that HC rank is of order at most O(log n) based on the concentration inequality for randomly permuted sums. Fixing critical value t, define Since X is independent of Y , as we are under the null, we have that has the same distribution as A n := n i=1 a i,πn(i) when π n is a uniformly distributed random permutation of [n] := {1, · · · , n}. Note that This implies that, for q ≥ 1, for some other constant c 1 > 0, using the fact that 1/n ≤ u(t) ≤ 3/4 + 1/2n when 0 ≤ t ≤ n/2, which is the range of t's we are considering. Hence, choosing q = 2c 1 log n and using the union bound, we have ≤ 2(n/2 + 1) exp (−q/c 1 ) 1/n → 0.

Under the alternative hypothesis
We now consider the alternative H 1 , and show that HC rank log n in probability under the stated condition. For this, it suffices to find some t = t n ≤ n/2 such that, for some q = q n log n, with probability tending to 1 (under H 1 ). Since rank-based methods are invariant with respect to increasing transformations, in the following analysis we simply assume that F = G = Φ.
These empirical distribution functions are useful because, by definition, R i = nF (X i ) and S i = nĜ(Y i ), so that This gives By the Dvoretzky-Kiefer-Wolfowitz (DKW) concentration inequality, there is a universal constant c 0 such that, for any b ≥ 0, We choose k = (log n) √ n, and with that choice we have that I{K ≤ k/n} = 1− Q n , where Q n is Bernoulli with parameter bounded by η := c 0 exp(−(log n) 2 /c 0 ) (so that Q n = O P (η)).
As for the sum, the M i are iid, and for an observation (X i , Y i ) that comes from the null component, X i , Y i are iid standard normal, while when it comes from the contaminated component, X i , Y i are still marginally standard normal but no longer independent: Y i = 1 − ρ 2Ỹ i + ρX i , whereỸ i is independent of X i and also standard normal. We thus have In the dense regime, remember that 0 < β < 1/2 and ρ = n −γ . We place ourselves above the detection boundary, meaning that we fix γ < 1/2 − β. Here we choose t = n/2 (assumed to be an integer for convenience), let s = (t−k)/n = 1/2 − k/n. We note that v s (0) is continuous in s (by dominated convergence), and because s → 1/2 in our setting, we have Indeed, using the fact that with Φ(z) ≤ 1/2 if and only if z ≤ 0, we have where the inner integrals are positive by the fact that φ is symmetric, and the inequalities are indeed strict except when z = 0.
Moderately sparse regime. Let I 0 and I 1 index the observations coming from the null and contaminated components, respectively. We have We lower bound both terms on the right-hand side, starting with Δ 0 (t). To do this, we consider a slightly smaller threshold, specifically t 0 = (1 − ω)t with ω = o(1) specified below, and compare Δ 0 (t) with Δ 0 (t 0 ) : i denoting the rank of X i among {X j : j ∈ I 0 } and S 0 i denoting the rank of Y i among {Y j : j ∈ I 0 }. Conditional on |I 0 | = n 0 , Δ 0 (t 0 ) has the same distribution as Δ(t 0 ) in (11) under the null hypothesis but with n replaced by n 0 , so that from (12) we deduce that it has expectation μ := (n 0 (2t 0 + 1) − t 0 (t 0 + 1))/n 0 , and from (13) that Δ 0 (t 0 ) ≥ μ − 8(log n) μ ∨ log n with probability at least 1 − 2/n when n is large enough. (Again, this is conditional on |I 0 | = n 0 .) Because ε n −1/2 in the present regime, we have |I 0 | ≥ n − (log n) √ n with probability at least 1 − 1/n when n is large enough. Also, we will choose t below such that √ n t n, and ω such that ω 1, so that t 0 ∼ t. Together, this implies that eventually, with probability at least 1 − 3/n.

E. Arias-Castro et al.
We now claim that, with probability tending to 1, Δ 0 (t) ≥ Δ 0 (t 0 ). Indeed, by definition of the ranks R i and modified ranks R 0 i , we have j∈I1 I{X j ≤ x} is the empirical distribution function associated with the contaminated X observations. In particular, when |I 0 | = n 0 , so that |I 1 | = n − n 0 =: n 1 , we have valid for all i ∈ I 0 . At the same time, and with analogous notation, we also have valid for all i ∈ I 0 . Combining these, we obtain , valid for all i ∈ I 0 . LettingF 0 denote the empirical distribution function of valid for all i ∈ I 0 . Note that this is conditional on |I 0 | = n 0 and that the distributions of K 0 and K 1 depend (implicitly) on n 0 (and n 1 ). We conclude that, conditional on |I 0 | = n 0 , for any i ∈ I 0 , Applying the DKW inequality with the tight constant, we have that K 0 ≤ (log n)/ √ n 0 and K 1 ≤ (log n)/ √ n 1 with probability at least 1 − 2/n when n is large enough, and when this is the case, D i ≤ (n/n 0 )D 0 i +2(log n) √ n 1 , assuming that n 0 ≥ n 1 . This is given |I 0 | = n 0 and (therefore) |I 1 | = n 1 , and we also know that |I 0 | ≥ n − (log n) √ n and |I 1 | ≤ 2nε with probability at least 1 − 1/n when n is large enough. (We are using that |I 1 | ∼ Bin(n, ε) with nε = n 1−β with β < 1.) Hence, with probability at least 1 − 3/n, for any i ∈ I 0 . In particular, if we choose ω = (log n) 2 max 1/ √ n, √ nε/t , then, with probability at least 1 − 2/n when n is large enough, D 0 i ≤ t 0 implies that D i ≤ t for any i ∈ I 0 , implying that Δ 0 (t) ≥ Δ 0 (t 0 ).
We thus conclude that As for Δ 1 (t), as in (15), we have We choose k = (log n) √ n as we did before, so that I{K ≤ k/n} = 1 + O P (η), with the same η defined previously. As for the sum, Λ 1 (t) has the same distribution as parameters (n, ε) and iid normal with standard normal marginals and correlation ρ. In particular, by the fact that Φ has derivative bounded by 1/ √ 2π everywhere, and wherẽ We thus have and and applying Chebyshev's inequality, we thus have as long as the right-hand side diverges.
In the moderately sparse regime, remember that 1/2 < β < 3/4 and ρ = 1−n −γ . We place ourselves just above the detection boundary, meaning that we fix γ > 4(β − 1/2). We focus on the harder sub-case where, in addition, γ < 2β. In that case, we can fix a such that 1/2 > a > γ/2 and 1/2 − β + γ/2 − a/2 > 0, and set t = n 1−a . Note that such a real number a exists, and that t ≤ n/2 with t k. We also have nε = n 1−β and u(t) t/n n −a , as well as and Ψ is differentiable at 0 with positive derivative. In particular, nελ((t − k)/n) n 1−β+γ/2−a → ∞. Putting everything together, we have We focus on the case where t n, as the case where t n can be dealt with in a very similar fashion.

Under the null hypothesis
We first consider the behavior of Δ(t) under the null hypothesis, and argue that Δ(t) is asymptotically normally distributed. This is based on an application of a combinatorial central limit theorem due to Hoeffding [15]. Remember that under H 0 , Δ(t) has the distribution of A n = n i=1 a i,πn(i) when π n is a uniformly distributed random permutation of [n] and a i,j = I{|i − j| ≤ t}. We saw that and, as derived in [15], we also have We thus have, under the null hypothesis, and therefore, together with the fact that 1 t n, we conclude that again under the null hypothesis.

Under the alternative hypothesis
We now consider the alternative, again in the very sparse regime and in the most advantageous case where ρ = 1, and show that the same weak limit holds. For this, we follow the arguments of the proof of Theorem 2 in the moderately sparse regime, although in the reverse direction so-to-speak. We use the same notation.
Starting from the decomposition (16), we have In what follows, we first show that the first term on the RHS is asymptotically standard normal, and then we show that the second term converges to 0 in probability.
First term in (20). For i ∈ I 0 , as in (18) but in reverse, we have with probability tending to 1 uniformly over i ∈ I 0 . Assuming this is true, then D i ≤ t implies that Hence, with probability tending to 1, As before, conditional on |I 0 | = n 0 , Δ 0 (t 0 ) has the same distribution as Δ(t 0 ) in (11) under the null hypothesis but with n replaced by n 0 . This, the fact that |I 0 | ≥ n − O P ( √ n), and (19), implies that We used the fact that , which implies that where the O term is o(1) by the fact that t 0 /n = o (1). Continuing, with probability tending to 1, we have whenever t 0 /t → 1 and (t 0 − t)/ √ t → 0 (using the fact that t ≤ t 0 n). This is the case exactly when t (log n) 2 nε. We now consider the complementary case. In fact, what follows applies when t ≤ √ n. We use a slightly different strategy. Recall that, for i ∈ I 0 , and combined with the triangle inequality, and recalling that X j = Y j when j ∈ I 1 , we have Consider the event which happens with probability tending to one. Given Ω, we have using the fact that D 0 Given {(X k , Y k ) : k ∈ I 0 }, and conditional on (|I 0 |, |I 1 |) = (n 0 , n 1 ), W i is binomial with parameters n 1 and P i := |Φ(X i ) − Φ(Y i )|. As in (17), the latter is bounded by D 0 i /n + K 0 , which itself is bounded (eventually) by 2(log n)/ √ n under Ω when D 0 i = d with d ≤ t + 2nε (since we work under the assumption that t ≤ √ n). Thus, for such a d, eventually, where c 0 is a universal constant. The factor of 2 in the second inequality comes from de-conditioning from {K 0 ≤ (log n)/ √ n. In the last line we used the fact that Prob(Bin(m, q) ≥ k) ≤ m k q k , referred to as the Giné-Zinn inequality in [9]. We also have eventually, using the fact that P(D 0 i = d | |I 0 | = n 0 ) ≤ 2/n 0 . Together, this yields Hence, the second term on the RHS of (22) has expectation of order at most n times the last term in our last derivations, which is of order at most (log n) √ nε = o (1). Since that term is integer-valued, this implies that Δ 0 (t) ≤ Δ 0 (t) with probability tending to one. In particular, (21) applies. (20). Consider i ∈ I 1 . Because ρ = 1, we have X i = Y i , and conditional on X i = z, R i − 1 and S i − 1 are iid with distribution Bin(n − 1, p) where p := Φ(z). In particular, D i has the distribution of |U − V | where U and V are iid with distribution Bin(n − 1, P ) and P ∼ Unif(0, 1). Let u 2 (t) denote the probability that D i ≤ t. We want to bound u 2 (t) from above.

Second term in
For p ∈ [0, 1], define g(p) as the probability that |U −V | ≤ t when U and V are iid Bin(n − 1, p), and note that u 2 (t) = 1 0 g(p)dp. Define σ 2 = 2(n − 1)p(1 − p), which is the variance of U − V , and also h(a) = P((U − V )/σ ≤ a). Using the fact that U − V is integer valued, we have Where Φ is the standard normal distribution function. Because Φ has derivative bounded by 1/ √ 2π everywhere, the first term on the RHS is = O(t/σ). For the second term, we use the Berry-Esseen inequality (seeing U and V , each, as the sum of n − 1 iid Ber(p) random variables), to get that it is = O(1/σ). Therefore, since t ≥ 1, there is a universal constant c 0 such that g(p) ≤ c 0 t/σ. Of course, being a probability, we also have g(p) ≤ 1. Hence, Now, by Markov's inequality, and the fact that |I 1 | is binomial with parameters (n, ε), the second term in (20) is for any choice of t when β > 3/4 (very sparse regime). The control under the alternative can be secured in exactly the same way. In particular, it holds that Δ 0 (t) ≤ Δ 0 (t) with probability tending to one, with Δ 0 (t) having the same asymptotic distribution (Poisson with mean 2t + 1).

Numerical experiments
We consider the same setting as in Section 2 and compare the two nonparametric tests, the covariance rank test and the higher criticism rank test, to the parametric tests. The p-values for the higher criticism rank test are obtained based on 10 5 permutations, while the p-values for the covariance rank test are taken from the limiting distribution based on its correspondence with the Spearman rank correlation.
The results are presented in Figure 2. In finite samples, the higher criticism rank test exhibits substantially more power than the higher criticism in the dense and moderately sparse regime. We have no good explanation for this rather surprising phenomenon. However, the higher criticism rank test has no power in the very sparse regime, and neither does the covariance rank test.

Discussion
The power residing in the V i In Proposition 2 we established that the higher criticism test based on U 1 , . . . , U n achieves the detection boundary in the Gaussian mixture model. It is natural, however, to ask whether one could do better in finite samples by also utilizing V 1 , . . . , V n . We performed some side experiments to quantify this by comparing the full LRT, meaning the LRT based on (U 1 , V 1 ), . . . , (U n , V n ), the LRT based on U 1 , . . . , U n only, and the LRT based on V 1 , . . . , V n only. We did so in the same parametric setting of Section 2.4. The results are reported in Figure 3, and can be to some extent anticipated from our previous work [1]. In a nutshell, in the dense regime, what matters is the deviation of the variance from 1, and this is felt by all tests, so that the U -LRT and the V -LRT are seen to be also as powerful as the full LRT. In the sparse regime, however, we can see that the V -LRT has essentially no power. This is due to the fact that the V i 's in that case have variance 1 + ρ, which is bounded from above by 2, so that no test depending on the V i 's can have any power as we show in [1]. The U -LRT, which we know to be asymptotically optimal to first order, remains competitive, although now clearly less powerful than the full LRT.
The power of rank tests in the very sparse regime In Proposition 5 we argued, we hope convincingly, that no test that resembles the higher criticism rank test has any power in the very sparse regime (β > 3/4). This seems clear from the experiments reported in Figure 2. This begs the question of whether there are any rank tests that have any (asymptotic) power in the very sparse regime. We do not know the answer to that question, but are willing to conjecture that there are no such tests.
We did not look at this model, in part because we wanted to test against a monotonic association (in the contamination component), which is perhaps the most popular alternative in a nonparametric context.