Detectability of nonparametric signals: higher criticism versus likelihood ratio

We study the signal detection problem in high dimensional noise data (possibly) containing rare and weak signals. Log-likelihood ratio (LLR) tests depend on unknown parameters, but they are needed to judge the quality of detection tests since they determine the detection regions. The popular Tukey's higher criticism (HC) test was shown to achieve the same completely detectable region as the LLR test does for different (mainly) parametric models. We present a novel technique to prove this result for very general signal models, including even nonparametric $p$-value models. Moreover, we address the following questions which are still pending since the initial paper of Donoho and Jin: What happens on the border of the completely detectable region, the so-called detection boundary? Does HC keep its optimality there? In particular, we give a complete answer for the heteroscedastic normal mixture model. As a byproduct, we give some new insights about the LLR test's behavior on the detection boundary by discussing, among others, Pitmans's asymptotic efficiency as an application of Le Cam's theory.


Introduction
Signal detection in huge data sets becomes more and more important in current research. The number of relevant information is often a quite small part of the data set and hidden there. In genomics, for example, the assumption is often used that the major part of the genes in patients affected by some common diseases like cancer behaves like white noise and a minor part is differentially expressed but only slightly ( [8,15,21]). Consequently, the number of signals as well as the signal strength is small. This circumstance makes it difficult to decide whether there are any signals. Other application fields are disease surveillance ( [30,34]), local anomaly detection ( [35]), cosmology and astronomy ( [7,27]). In the last decade Tukey's higher criticism (HC) test ( [37,38,39]) modified by Donoho and Jin [12] became quite popular for these kind of problems. The reason for HC's popularity is that the area of complete detection coincide for the HC test and the log-likelihood ratio (LLR) test under different specific model assumption ( [2,3,5,6,12,26]). The LLR test, which achieves the highest power among all tests, cannot be applied since it requires the knowledge of the unknown signal strength and proportion. But it serves as an important benchmark and, in particular, determines which kind of signal alternatives are completely detectable at all. That the HC test can completely separate every completely detectable alternative was also shown within sparse linear regression models and binary regression models ( [1,20,33]). To overcome the problem of an unknown noise distribution Delaigle et al. [9] used a bootstrap version of HC. Moreover, Jager and Wellner [23] suggested a whole family of different tests sharing HC's complete detectability behaviour for the heterogeneous normal mixture model. Recently, Ditzhaus [11] verified that the same is true beyond this specific model. A lot of related literature about HC's possibilities, even beyond signal detection, can be found in the survey paper of Donoho and Jin [13]. For instance, Hall et al. [18] applied HC for classification.
There are (only) a few results concerning the asymptotic power behaviour of the LLR test on the detection boundary, which separates the area of complete detection and the area of no possible detection, see Cai et al. [5] and Ingster [19] for the heteroscedastic and heterogeneous normal mixture models. Since Donoho and Jin [12] the following questions is pending: How does HC perform on the detection boundary? Does it keep its optimality? Donoho and Jin [12] specially pointed out: "Just at the critical point where r = ρ * (1 + o(1)), our result says nothing; this would be an interesting (but very challenging) area for future work." Our paper's purpose is twofold. First, we want to fill the theoretical gap concerning the tests' power behaviour on the detection boundary and give an answer to the question mentioned before. We quantify the asymptotic power of the LLR test by giving the LLR statistic's limit distribution. On the detection boundary the LLR test has nontrivial asymptotic power, whereas the HC test does not. Consequently, HC is not overall powerful. However, our message is not to scrap the idea of HC. Its power behaviour is still optimal beyond the detection boundary for a long list of models. The second purpose of our paper is to add a p-value model with signals coming from a nonparametric alternative to this list of models.
The paper is organized as follows. In Section 1.1 we introduce the general model and the detection testing problem. For the readers' convenience we add to our paper the illustrative Section 1.2. There, all main results are presented by discussing our prime example. The asymptotic results about the benchmark LLR tests appear in Section 2. The following Section 3 is devoted to the HC statistic and introduce an "HC complete detection" as well as a "trivial HC power" Theorem. Whereas the previous two sections develop the general machinery, Section 4 contains the applications. We discuss a generalizations of the illustrative results from Section 1.2 as well as the heteroscedastic normal mixture model. Although the latter was already studied in great detail we can give some new insights for it. Further examples can be found in Ditzhaus [10,11]. All proofs are relegated to Appendix B.

The model
Let {k n : n ∈ N} ⊂ N, where k n → ∞ represents the number of observations. Throughout this paper, if not stated otherwise all limits are meant as n → ∞. Let the following three mutually independent triangular arrays consisting of rowwise independent random variables are given, where values in different spaces are allowed: • (Z n,i ) i≤kn representing the noisy background, where the distribution P n,i of Z n,i is assumed to be known. In the applications we often assume that P n,i = P 0 depends neither on i nor on n, and P 0 may stand for a distribution of p-values under the null. • (X n,i ) i≤kn representing the signals, where the signal distribution μ n,i of X n,i is typically unknown. • (B n,i ) i≤kn representing the appearance of a signal, where B n,i is Bernoulli distributed with typically unknown success probability 0 ≤ ε n,i ≤ 1.
Instead of these random variables we observe for all 1 ≤ i ≤ k n . The vector (Y n,1 , . . . , Y n,kn ) represents the noise data containing a random amount kn i=1 B n,i of signals. It is easy to check that the distribution Q n,i of Y n,i is given by Q n,i = (1 − ε n,i )P n,i + ε n,i μ n,i = P n,i + ε n,i (μ n,i − P n,i ). (1.1) The additional index i, for instance, μ n,i instead of μ n , allows to treat twosample or more general kinds of signal alternatives. We are interested whether there are any signals in the noise data, i.e., whether B n,i = 1 for at least one i = 1, . . . , k n . To be more specific, we study the testing problem H 0,n : ε n,i = 0 for all i versus H 1,n : ε n,i > 0 for at least one i, (1.2) where we observe pure noise (Y n,1 , . . . , Y n,kn ) = (Z n,1 , . . . , Z n,kn ) under the null. We are especially interested in the case of rare signals in the sense that max 1≤i≤kn ε n,i → 0. (1.3) In this setting, we distinguish between the sparse ( kn i=1 ε 2 n,i → 0), the classical (lim n→∞ kn i=1 ε 2 n,i ∈ (0, ∞)) and the dense signal case ( kn i=1 ε 2 n,i → ∞). In the rowwise identical setting, where all quantities, ε n,i = ε n etc., are independent of i the parametrization ε n = n −β for β ∈ (0, 1) is standard. Then β < 1/2 and β > 1/2 correspond to the dense and sparse case, respectively. We denote ε n = n −1/2 , or in other words β = 1/2, as the classical case since it is the usual rate of convergence when discussing contiguous alternatives. In the classical case nontrivial power results can be obtained by choosing a signal distribution μ n = μ = P 0 = P n , whereas in the sparse case, where less signals are present, only asymptotically singular μ n and P n lead to nontrivial power results. At the same time, asymptotically merging μ n and P n lead to nontrivial results in the dense case, where, relatively, a lot of signals occur. While our applications focus on the most interesting sparse case, the technical machinery applies for all three cases. A huge class of examples for the dense case is examined by Ditzhaus [11].
Another typical assumption in the signal detection literature is which we also suppose throughout this paper. In Section 2.4 we discuss what happens if the assumption of absolute continuity is violated. Following the ideas of Cai and Wu [6] we explain that every model can be reduced to a model such that (1.4) is fulfilled. Convention and Notation: Observe that dQ n,i dP n,i = 1 + ε n,i dμ n,i dP n,i − 1 .
The distributions P n,i , μ n,i , Q n,i and the densities dQn,i dPn,i •pr i shall lie on the same product space, where the projections pr i on the i th coordinate are suppressed throughout the paper to improve the readability. Moreover, we introduce the product measures

Illustration of the results and the main contents
In this illustrative section we give an overview of our results by studying a special nonparametric p-values model. For simplicity we set k n = n and restrict to the rowwise identical case, i.e., μ n,i = μ n etc. Testing results are often presented in terms of p-values since they allow a comparison of different data types on the same platform. In our context, a quantile transformation like p n,i = P n ((Y n,i , ∞)) or p n,i = P n ((−∞, Y n,i ]) may be used to get p-values. As long as the noise distribution P n is continuous the p-values p n,1 , . . . , p n,n follow under the null a uniform distribution P 0 , say, on the unit interval (0, 1). To benefit from this universal platform without too many or too specific model assumptions, we consider in this illustrative section from the beginning that p-values are present and, in particular, P n = P 0 .
Typically, small p-values indicates that the alternative is true, or in our case that signals are present. Respecting this we suggest signal distributions μ n with a shrinking support [0, κ n ], where κ n = n −r and ε n = n −β (1.5) for some r > 0. Clearly, μ n and P 0 are asymptotically singular. Hence, this setting is an example for the sparse case and we restrict our considerations to β ∈ (1/2, 1). In order to obtain such μ n the interval (0, κ n ) is blown up to (0, 1) and a nonparametric shape function h is used. Let h : (0, 1) → (0, ∞) be a Lebesgue probability density, i.e., we have h dP 0 = 1, with h 2 dP 0 ∈ (0, ∞) and define the signal distribution by its rescaled Lebesgue density Since it could be too restrictive in practice to consider only measures with a shrinking support, in Section 4.1 we add a "small" perturbation to the densities.
To sum up, we have a nonparametric testing problem which can be expressed heuristically as The alternative H 1,n is composite since, for example in this specific setting, the signal proportion ε n and the signal shape function h are unknown. When we talk about the LLR test below then the LLR test corresponding to the true but unknown ε n,true and h true is meant. This test is optimal for testing H 0,n against the simple alternative H 1,n : {ε n = ε n,true , h = h true }. In contrast to that, the HC test is designed for the composite alternative while being asymptotically as good as the specific LLR test based on the unknown ε n,true and h true . The heuristic phrase "being asymptotically as good as" is explained below in more detail.
The following list of the seven problems I-VII and their solutions regarding our prime example gives the reader a first impression and overview of the results which can be obtained be the general machinery developed in Sections 3 and 2.
I. Determination of the detection boundary: Since the paper of Donoho and Jin [12] the term detection boundary is of great interest for the detection problem. This boundary splits the r-β parametrisation plane into the completely detectable and the undetectable area. For each pair (r, β) from the completely detectable area the LLR test, the optimal test, can completely separate the null and the alternative asymptotically. This means that there is a sequence (ϕ n ) n∈N of LLR tests with nominal levels E P (n) (ϕ n ) = α n such that α n → 0 and the power E Q (n) (ϕ n ) under the alternative tends to 1. For each (r, β) from the undetectable area the null H 0,n and the alternative H 1,n are asymptotically indistinguishable, i.e. the sum of error probabilities tends to 1 for each possible sequence of tests. Hence, no test yields asymptotically better results than a constant test ϕ ≡ α ∈ (0, 1). For the illustrative model we have a nonparametric detection boundary which is independent of the shape function h and given by The area where r > ρ(β) (r < ρ(β), resp.) corresponds to the completely detectable area (undetectable area, respectively), see Figure 1. II. Gaussian limits on the detection boundary? For some parametric models the limit distribution of the log-likelihood ratio test statistic T n , see below, was determined, e.g. for the heteroscedastic and heterogeneous normal mixture model, see Cai et al. [5] and Ingster [19]. For our model with 1/2 < β < 1 and r = ρ(β) we have where σ 2 (h) = is Pitman's asymptotic relative efficiency, see Hájek et al. [17], Φ denotes the distribution function of a standard normal distribution and u α is the corresponding α-quantile, i.e. Φ(u α ) = α. This formula quantifies the loss of power by choosing the wrong β or h. In particular, the LLR test ϕ n,β2,h2,α cannot separate the null and the alternative asymptotically, i.e ARE= 0, if the supports of h 1 and h 2 are disjunct, or if β 1 and β 2 are unequal. IV. Beyond Gaussian limits on the detection boundary. Non-Gaussian limits of T n may occur ( [5,19]). Here, these limits can be observed if the second moment assumption on h is violated, i.e., we have 1 0 h 2 dP 0 = ∞. In this case the limits are infinitely divisible distributed with nontrivial Lévy measure. These Lévy measures depend heavily on the special structure of h, details can be found in Theorem 4.5. V. Extension of the detection boundary: We discuss also the case β = 1, whereas a lot of former research was focused (only) on β < 1. The case β ≥ 1 was of minor interest reason since the probability that at least one signal is present equals 1 − (1 − ε n ) n , which tends to 1 − e −1 and 0 if β = 1 and β > 1, respectively. In particular, the pair (β, r) with β > 1 and r > 0 always belongs to the undetectable area. Hence, β > 1 do not need to be studied further. But β = 1 should be taken into account since a new class of limits can be observed. To be more specific, for β = 1 and r > 1 we have where a denotes the Dirac measure centered in a ∈ [−∞, ∞], i.e. a (A) = 1{x ∈ A}. As far as we know such nontrivial limits, where ξ 2 equals ∞ with a positive probability, were not observed for the detection issue until now. VI. Optimality of HC. As already known for different mainly parametric models, we can show also for the illustrative nonparametric p-values model that the completely detectable regions of the LLR and the HC test coincide. By this we give a further reason why HC is a good candidate for the signal detection problem. VII. No power of HC on the boundary. We show that on the detection boundary, i.e. β ∈ (1/2, 1) and r = ρ(β), the HC test cannot distinguish between the null and the alternative alternative, whereas the LLR test has nontrivial power, compare to II.
Among others, we apply our results to the model (1.6) in a more general form, e.g. h n,i , κ n,i and ε n,i may depend on i and n. We want to point out that these kind of alternatives were already studied in the context of goodness-of-fit testing by Khmaladze [28]. He used the name spike chimeric alternatives. Finally, we want to mention that our general model and the upcoming results also include VIII. discrete models as the Poisson model of Arias-Castro and Wang [2] (Note that only the results concerning LLR tests apply for discrete models).

Asymptotic power behaviour of LLR tests
In this section we discuss the asymptotic power behaviour of LLR tests. These tests depend on the unknown signals and, hence, they are not applicable. But they serve as an import benchmark and all new suggested tests should be compare with the optimal LLR tests. It is well known that at least for a subsequence T n converges in distribution to a random variable with values on the extended real line [−∞, ∞] under the null as well as under the alternative, see Lemma 60.6 of Strasser [36]. That is why we can assume without loss of generality that where ξ 1 and ξ 2 are random variables on [−∞, ∞]. Regarding the phase diagram on the right side in Figure 1 we are interested in the following three different regions/cases: (i) (Completely detectable) The LLR test ϕ n = 1{T n > c n } with appropriate critical values c n ∈ R can completely separate the null and the alternative asymptotically, i.e. the sum of error probabilities E H0,n (ϕ n )+E H1,n (1−ϕ n ) tends to 0. We will see that this corresponds to ξ 1 ≡ −∞ and ξ 2 ≡ ∞. (ii) (Undetectable) No test sequence (ψ n ) n∈N can distinguish between the null and the alternative asymptotically, i.e we always have E H0,n (ϕ n ) + E H1,n (1 − ϕ n ) → 1. This case corresponds to ξ 1 ≡ 0 ≡ ξ 2 . (iii) (Detectable) The LLR test ϕ n = 1{T n > c n } with appropriate critical values c n ∈ R can separate the null and the alternative asymptotically but not completely, i.e. E H0,n (ϕ n ) + E H1,n (1 − ϕ n ) → c ∈ (0, 1).
In the following we denote the completely detectable and the undetectable case as the trivial cases since the limits of T n are degenerated. We start by discussing these and we present a useful tool to verify these trivial cases/limits of T n . After that we will see that the same tools can be used to determine the nontrivial limits in the detectable case. In the last two subsections we consider the asymptotic relative efficiency, compare to (III) from Section 1.2, and explain what to do when the condition (1.4) is violated.

Trivial limits
In the proofs we work with different distances for probability measure, among others the Hellinger distance and the variational distance. Using theses distances we can classify the different detection regions. We refer the reader to the Appendix B, for further details. Here, we only present our new tool. Let us introduce for all x > 0 the following two sums

Nontrivial limits
It turns out that only a special class of distributions ν 1 and ν 2 , say, of ξ 1 and ξ 2 may occur. The results fit in the more general framework of statistical experiments: all nontrivial weak accumulation points with respect to the weak topology of statistical experiments are infinitely divisible statistical experiments in the sense of Le Cam [31], see Le Cam and Yang [32] and [24]. In the following we explain what this means in our situation. Classical infinitely divisible distributions on (R, B) play a key role for our setting. That is why we want to recall that the characteristic function ϕ of an infinitely divisible distribution on (R, B) is given by the Lévy-Khintchine formula is called the Lévy-Khintchine triple and is unique. See Gnedenko and Kolmogorov [16] for more details about infinitely divisible distributions. The following theorem gives us a characterisation of all possible limits of T n .
According to Theorem 2.2(b) the Lévy-Khintchine triplets of ν and ρ = a −1 ν 2|R are closely related to each other. This was already observed in the context of statistical experiments by Janssen et al. [24]. Now, we know the class of all possible limits and, hence, the questions arises naturally how to determine the distribution of ξ 1 and ξ 2 for a given setting. To answer this question we first observe that by Theorem 2.2(i) the Lévy measures η 1 and η 2 are uniquely determined by their difference M = η 2 − η 1 . Combining this, Theorem 2.2(ii) and Theorem 2.2(iii) yields that M , σ 2 1 and a = ν 2 (R) serve to understand the distribution of ξ 1 and ξ 2 completely. We will see that these three are determined by the limits of the sums given by (2.2) and (2.3). To give a first impression why this is the case we explain briefly the impact of I n,1,x . Since the summands of T n fulfill the so-called condition of infinite smallness, i.e. a finite number of summands has no influence of the sum's convergence behaviour, well-known limit theorems to infinitely divisible distributed random variable can be applied, see, for instance, Gnedenko and Kolmogorov [16]. In the case of real-valued ξ 1 we obtain from these theorems for all x from a dense subset of (0, ∞). If additionally ξ 2 is real valued then the same holds for η 2 when we replace P n,i by Q n,i . Combining these and (1. for all x coming from a dense subset of (0, ∞) if both, ξ 1 and ξ 2 , are real-valued. In the case of a = ν 2 (R) = P (ξ 2 ∈ R) < 1 a similar convergence can be observed, namely I n,1,e x −1 tends to (a) There is a dense subset D of (0, ∞) and a measure M on i.e. this equation holds for lim sup n→∞ and lim inf n→∞ simultaneously.
If (a) and (b) hold then using the notation from Theorem 2.
(ii) Consider the rowwise identical case with a noise distribution independent on n, i.e. P n,i = P 0 , μ n,i = μ n and ε n,i = ε n . Thus, Y n,1 , . . . , Y n,kn are identical P 0 -distributed under the null. By using techniques of extreme value theory it is sometimes possible to show that Hence, regarding (2.6) we get the following connection to the Lévy measure η 1 of ξ 1 : for all x coming from a dense subset of (0, ∞). This may be useful to get a first impression how to choose μ n and ε n to obtain nontrivial limits.

Asymptotic relative efficiency
In the case of normal distributed limits we have where N (0, 0) denotes the Dirac measure 0 centered in 0.
In the case of σ = 0 no test sequence can separate between the null and the alternative asymptotically, see Section 2.1. Observe that both normal distributed limits depend only on one parameter, namely σ 2 . In Appendix A, see Theorem A.1, we give many different equivalent conditions for normal distributed ξ 1 and ξ 2 , even the conditions in Theorem 2.2 can be simplified in this case. Further equivalent conditions and closely related results can be found in Section A3 and A4 of Janssen [25]. In this section we restrict ourselves to these kind of limits, excluding the trivial case σ = 0, and discuss the LLR test's power behaviour if the "wrong" signal distributions and/or the "wrong" signal probabilities are chosen for the test statistic. To be more specific, we fix the triangular schemes of noise distributions {P n,i : 1 ≤ i ≤ n ∈ N} and consider for j = 1, 2 a triangular scheme of signal distributions (1) ) be the true, underlying model and θ 2 = (μ (2) , ε (2) ) be the model pre-chosen by the statistician for the LLR test. Denote by T n (θ j ) and ϕ n (θ j ) = 1{T n (θ j ) > c n,j } the LLR statistic and the LLR test for the model θ j , j = 1, 2. Using Pitman's asymptotic relative efficiency, see Hájek et al. [17], we quantify the loss in terms of the asymptotic power if ϕ n (θ 2 ) instead of the optimal ϕ n (θ 1 ) is used.
Remark 2.7. The assumption γ(θ j , θ j ) = σ 2 j is connected to the classical Lindeberg-condition. It is often but not always fulfilled if (2.8) holds. For example, it is violated in the case β = 3/4 and r = ρ(β) for the heterogeneous normal mixture model, which is discussed in Section 4.2. The good news are that by a truncation argument we find for every model θ = (μ, ε), for which (2.8) holds, another θ = ( μ, ε) such that the limit γ( θ, θ) from (2.9) exists and equals σ 2 from (2.8), and, moreover, the test's asymptotic behaviour is not effected by replacing θ by θ. The details are carried out in Appendix A, see Lemma A.3.
Note that Theorem 2.6 gives the sharp upper bound of the asymptotic power for all tests of asymptotic size α ∈ (0, 1) if (2.8) holds for the underlying model. The asymptotic relative efficiency ARE is a good tool to quantify the loss of power if the wrong LLR test is used. If ARE = 1 there is no loss of power by using ϕ n (θ 2 ) and if ARE = 0 the test ϕ n (θ 2 ) cannot distinguish between the null and the alternative asymptotically. Consider for a moment the rowwise identical case, i.e. P n,i = P n,1 , μ (1) n,i = μ (1) n,1 etc. If ARE ∈ (0, 1) then, heuristically, (1 − ARE) · 100% of the observations are wasted. To be more specific, it can be shown that ϕ n (θ 2 ) based on all k n observations (Y n,1 , . . . , Y n,kn ) achieves the same power as the optimal test does when only

Violation of (1.4)
Here, we discuss how to handle a violation of (1.4). This issue was already discussed by Cai and Wu [6], see their Section III.C, in terms of the Hellinger distance to determine the detection boundary. Their idea can be used for our purpose to determine, more generally, the limits of T n , even on the boundary. Instead of the original model it is sufficient to analyse a "closely related" model for which (1.4) is fulfilled.
By Lebesgues' decomposition, see Lemma 1.1 of Strasser [36], there exist a constant λ n,i ∈ [0, 1], a P n,i -null set N n,i as well as probability measures μ n,i and ν n,i such that μ n,i P n,i , ν n,i (N n,i ) = 1 and μ n,i = (1−λ n,i ) μ n,i +λ n,i ν n,i . Now, let Q n,i , Q (n) and T n defined as Q n,i , Q (n) and T n replacing μ n,i and ε n,i by μ n,i and ε n,i = (1 − λ n,i )ε n,i , respectively. Clearly, for this new model (1.4) is fulfilled and our results can be applied to determine the limits of T n . When knowing these we can immediately give the ones of T n : We can state the results of Corollary 2.8 also in terms of distributions. Denote by ν j the distribution of ξ j . Then

Power of the higher criticism test
In the previous section we discussed the LLR test which can be used to detect simple alternatives from the null. An adaptive and applicable test for alterna-tives of the whole completely detectable area is Tukey's HC test modified by Donoho and Jin [12]. There are different versions of it. To relax the notation, we decided to use the one dealing with continuously distributed p-values having a quantile transformation in mind, see also the explanations at the beginning of Section 1.2. The optimality of HC in a discrete model, namely the Poisson means model, was shown by Arias-Castro and Wang [2]. Our results about the LLR statistic in Section 2 are also valid for discrete models but in this section we only regard continuous ones. The extension to discrete models is a possible project for the future.
The HC statistic for outcomes p n,i ∈ [0, 1] is defined by where F n is the empirical distribution function of the observation vector (p n,i ) i≤kn . For every t ∈ (0, 1) we compare the empirical distribution function and the null/noise distribution function t → F (t) = t. This difference is normalized in the spirit of the central limit theorem. For a fixed t the resulting fraction is asymptotically standard normal distributed. The interval (0, 1), over which the supremum is taken, can be replaced by (0, α 0 ), (k −1 n , α 0 ) or (k −1 n , 1−k −1 n ) for some tuning parameter α 0 ∈ (0, 1), see Donoho and Jin [12]. The test statistic can also be defined without taking the absolute value of the fraction. All these versions of the HC statistic would lead here to the same power results. To improve the readability of this section we give the results only for the HC version introduced above. By Jaeschke [22], see also Eicker [14], the limit distribution of HC n is known under the null. We have where Λ is the distribution function of a standard Gumbel distribution and the following normalisation constants are used a n = 2 log log(k n ) and b n = 2 log log(k n ) + 1 2 log log log(k n ) − 1 2 log(π).
Hence, the test ϕ n,HC,α = 1{HC n > c n (α)} with is an asymptotically exact level α ∈ (0, 1) test, i.e. E H0,n (ϕ n,HC,α ) → α. But we cannot recommend to use these critical values based on the limiting distribution since the convergence rate is really slow, see Khmaladze and Shinjikashvili [29]. Since the noise distribution is known, standard Monte-Carlo simulations can be used to estimate the α-quantile of HC n for finite sample size. Alternatively, you can find finite recursion formulas for the exact finite distribution in the paper of Khmaladze and Shinjikashvili [29].
In the following we present our tool for HC.
Let (v n ) n∈N be a sequence in the interval (0, 1/2) such that a −1 n H n (v n ) → ∞ and lim inf n→∞ k n v n > 0. Then a n HC n − b n → ∞ in Q (n) -probability.
Basically, we compare the tails near to 0 and 1 of the signal and the noise distribution. This verification method for HC's optimality is an extension of the ones used by Cai et al. [5] and Donoho and Jin [12]. Under the assumptions of Theorem 3.1 the sum of HC's error probabilities tends to 0 for appropriate critical values. In other words, HC can completely separate the null and the alternative.
The same H n (v) can be used to show that HC has no power under the alternative, i.e. the sum of error probabilities tends to 1 independently how the critical values are chosen. Theorem 3.2 (Undetectable by HC). Suppose that P n,i = P n , ε n,i = ε n and μ n,i = μ n do not depend on i. Define H n (v) as in Theorem 3.1. Moreover, assume that P (n) and Q (n) are mutually contiguous, compare to Remark 2.3. If for some sequences r n , s n , t n , u n ∈ (0, 1) then Remark 3.3. Suppose that a 2 n kn i=1 ε 2 n,i → 0, which is usually fulfilled for sparse signals. From Hölder's inequality (a n / √ k n ) kn i=1 ε n,i → 0 follows. Hence, it is easy to see that the statements of Theorems 3.1 and 3.2 remain true if H n (v) is replaced by

Nonparametric alternatives for p-values
Here, we discuss a generalisation of the p-values model (1.6). In particular, we suppose P n,i = P 0 = λλ |(0,1) . In contrast to Section 1.2, we now consider that the shape function h n,i , the shrinking parameter κ n,i > 0 and the signal probability ε n,i may depend on i. The assumption that the signal distribution has a shrinking support can be too restrictive for practice. But the approach allows an extension of the model in the way that we add a perturbation r n,i . Throughout this section we consider signal distributions μ n,i given by where h n,i is close to some h ∈ L 1 (P 0 ) and the perturbation r n,i is "small" in the sense that Instead of (1.5) we suppose that max 1≤i≤kn (ε n,i + κ n,i ) → 0.
Since we already presented the results concerning this model for the rowwise identical case μ n,i = μ n and ε n,i = ε n in Section 1.2, the theorems are stated only in their general versions here.
for some h, h n,i ∈ L 2 (P 0 ).
n,i ) i≤kn : n ∈ N} denote a model for j = 1, 2 such that (4.3) and (4.5) hold for some K (j) ∈ (0, ∞) and h (j) ∈ L 2 (P 0 ). Then all assumptions of Theorem 2.6 are satisfied with The detection boundary introduced in (1.7) follows immediately from Theorem 4.1(a) and (b). The asymptotic behaviour on this boundary, discussed in II, can be deduced from Theorem 4.1(c). As stated in V, the case β = 1 is of special interest. If β = 1 and r < 1 then the pair (β, r) = (1, r) belongs to the undetectable area by Theorem 4.1(a). But, if in addition to β = 1 we have either r = 1 or r > 1 then we obtain non-Gaussian limits ξ 1 and ξ 2 , note that (4.5) is not fulfilled anymore. Details about the actual limits' distributions are presented in the subsequent Theorem 4.3 and Remark 4.4. Using Theorem 4.1(d) we can calculate the asymptotic relative efficiency ARE if the LLR test ϕ n (θ 2 ) is used although θ 1 is the underlying model, see III and the following Remark 4.2. In addition to the rowwise identical scenario, the general formulation of Theorem 4.1 allows also a discussion, for instance, of a two-sample alternative with mainly ε n,i = 0 and only sparse positive ε n,i > 0. n,i = ε (2) n,i and κ (1) n,i = κ (2) n,i in Theorem 4.1(d) then γ(θ 1 , θ 2 ) can be expressed in terms of K (1) = K (2) , h (1) and h (2) . In particular, we obtain If ε n,i = ε n and κ n,i = κ n does not depend on i = 1, . . . , k n then (4.4) is fulfilled for r n = [k n /2] if and only if K = ∞ and k n ε n → ∞. Combining this and Theorem 4.1 yields the detection boundary presented in I from Section 1.2 and the Gaussian limits introduced in II on this boundary if β < 1. Next, we give the generalisation of the result stated in IV from Section 1.2 concerning the case β = 1. Note that we need for the statements in Theorem 4.3 and Remark 4.4 only h ∈ L 1 (P 0 ), and not h ∈ L 2 (P 0 ) as in Theorem 4.1. It is also possible to determine the detection boundary if h / ∈ L 2 (P 0 ). In this case we get nontrivial Lévy measures on the whole detection boundary depending heavily on the shape of h comparable to the situation in Theorem 4.3(b). In the following we discuss an example for h ∈ L 1 (P 0 ) \ L 2 (P 0 ).
Let us now consider the HC test. Since the given model is one for p-values the observations do not need to be transformed. Hence, the HC test is based on p n,i = Y n,i . Then the areas of complete detection of the HC and the LLR test coincide. HC cannot distinguish between the null and the alternative asymptotically if r ≤ 1 and r = ρ(β) or r = ρ # (β, α), respectively, i.e. on the detection boundary.
Moreover, under the model assumptions of Theorem 4.3 with h n,i = h n HC cannot distinguish between the null and the alternative asymptotically if β = r = 1.

Heteroscedastic normal mixtures
The heteroscedastic normal mixture model was already studied essentially in the literature (e.g., [5,12,19]). Nevertheless, we can give, as a further application of our results, some new insights about it concerning the extension of the detection boundary and the asymptotic power of the HC test on the boundary. But we first introduce the model. Let k n = n, P n,i = P 0 = N (0, 1) and μ n,i = μ n = N (ϑ n , σ 2 0 ), σ 0 > 0, where the parametrisation ε n,i = ε n = n −β and ϑ n = √ 2r log n with β ∈ (1/2, 1) and r > 0 is used. The detection boundary given by and the limits of T n on it were already determined by Cai et al. [5] and Ingster [19]. The detection boundary is plotted for different σ 0 in Figure 2. Moreover, it was shown that the completely detectable areas of the LLR and HC tests coincide, see Cai et al. [5], Donoho and Jin [12]. All these results can be proven by using our methods, see Ditzhaus [10]. Note that the HC test is applied to the vector (p n,i ) i≤kn of p-values, which we get by transforming each observations Y n,i to p n,i = 1 − Φ(Y n,i ).
In (4.8) the detection boundary is (only) defined for β < 1. As we already did in the previous section, we can extend this boundary for β = 1 by a infinite vertical line starting in (r, β) = (1, 1), see Figure 2. Again, we observe on this line unusual limits of T n . The results concerning ARE can also be applied for the heteroscedastic models. Fix the variance parameter σ 0 > 0. Let θ 1 = (β 1 , r 1 ) and θ 2 = (β 2 , r 2 ) represent two models from the linear part (I) of the detection boundary leading to Gaussian limits of T n . Suppose that the models are different, i.e. β 1 = β 2 . By applying Theorem 2.6 and simple calculations, which are omitted to the reader, ARE = 0 can be shown. That means that the LLR test ϕ n (θ 2 ) can not distinguish between the null and the alternative asymptotically when θ 1 is the true, underlying model. As already mentioned γ(θ j , θ j ) = σ 2 j does not hold if β j = 1 − σ 2 0 /4. In this case make use of the truncation Lemma A.3. Cai et al. [5] already considered the dense case β < 1/2. In this case σ 2 0 = 1 always leads to the completely detectable case independently of how the signal strength ϑ n is chosen. Thus, only the heterogeneous case σ 2 0 = 1 is of real interest. In this case the parametrisation ϑ n = n r is used for r > 0. The corresponding detection boundary is given by ρ(β) = 1/2 − β and is plotted in Figure 2. The HC test achieves the same region of complete detection, see Cai et al. [5]. Our results concerning the tests' power behaviour on the detection boundary can also be applied. In short, on the detection boundary (2.8) holds for some σ > 0 and the HC test has no asymptotic power there. This is even possible to a general class of one-parametric exponential families including the dense heterogeneous normal mixtures. Further details concerning the dense case can be found in Ditzhaus [10,11].

Appendix A: Gaussian limits
Gaussian limits ξ 1 and ξ 2 , compare to (2.8), are of special interest, for example regarding Theorem 2.6. Recall that the degenerate case is included as σ = 0. In the following we give several equivalent conditions for Gaussian limits. (a) ξ 1 and ξ 2 are Gaussian or ξ 1 = ξ 2 ≡ 0 with probability one. To apply Theorem 2.6 γ(θ, θ) = σ 2 is needed, where σ 2 comes from the previous section and θ denotes the underlying model, compare to the notation in Section 2.3. As already mentioned there are examples, for which this equation fails although ξ 1 and ξ 2 are normal distributed. But by truncation we can always ensure the equality without changing the asymptotic results. All our asymptotic results in this paper remain the same if we replace μ n,i and ε n,i by μ n,i and ε n,i .

Appendix B: Proofs
In the following we give all the proofs. These are not given in the order of their appearance since we apply, for example, Theorem 2.4 to verify Theorem 2.2. Before giving the proofs we introduce some useful properties of binary experiments and generalise limit theorems of Gnedenko and Kolmogorov [16] to infinitely divisible distributions.

B.1. Binary experiments and distances for probability measures
Binary experiments classify different types of signal detectability. This gives us a first rough insight in the different detection regions for our signal detection problem. This standard approach is recalled for a sequence of binary experiments   [36]. It is easy to show that weak convergence of { P (n) , Q (n) } to { P (0) , Q (0) } implies convergence of the variational distance || P (n) − Q (n) || → || P (0) − Q (0) ||. Our three cases can be reformulated to: • completely detectable: ||P (n) − Q (n) || tends to 1.
For product measures the Hellinger distance d is useful: To sum up, we get the following characterisation of the trivial detection regions. Note that from the connection between the variational distance and the Hellinger distance we obtain

B.2. Limit theorems
For the readers' convenience let us recall well known convergence results of Gnedenko and Kolmogorov [16] which we use rapidly. Let (Y n,i ) 1≤i≤kn be a triangular array of row-wise independent, infinitesimal, real-valued random variables on some probability space (Ω, A, P ). In our case we have (i) There is a Lévy measure η on R \ {0} such that η(−∞, 0) = 0 and for all x ∈ C + (η), i.e. for all continuity points of t → η(t, ∞), t > 0.
w −→ η and lim sup n→∞ (0,τ1) t 2 dM n (t) < ∞. Thus, we obtain min(t 2 , 1) dη(t) < ∞, which proves that η is a Lévy measure. Define Z n,u = kn i=1 Y n,i 1{Y n,i ≤ u} for all u ∈ D, u > τ 0 . By Theorem B.2 Z n,u converges in distribution to X u , where X u is infinitely divisible with Lévy-Khintchine triplet (γ u , σ 2 , η u ), Lévy measure η u = η (0,u] and shift term Since η is Lévy measure it is easy to verify γ u → γ as D u → ∞. By this and Theorem 3.19.2 of Gnedenko and Kolmogorov [16] X u converges in distribution to X as D u → ∞, where X ∼ ν. Now, let (u n ) n∈N be a sequence in D which tends to ∞ slowly enough such that kn i=1 P (Y kn,i > u n ) → M 0 ({∞}). Standard arguments, see Theorem 3.2 of Billingsley [4], imply that Z n,un converges in distribution to X since for all δ > 0 lim sup The basic idea to determine the limit distribution of where the latter summand tends to 0. Moreover, observe that It is remains to show that Z n,un tends to X conditioned on C n . Conditioned on C n we have Z n,un = kn i=1 Y n,i 1{Y n,i ≤ u n } and (Y n,i 1{Y n,i ≤ u n }) i≤kn is a rowwise independent and infinitesimal triangular array. Hence, we can apply Theorem B.2 to Z n,un conditioned on C n . Finally, by basic calculations Theorem B.2(i)-(iii) are fulfilled for the same η, σ 2 and γ given by the Lévy-Khintchine triplet of the limit X of Z n,un , e.g. we have for all

B.3.1. Proof of Theorem 2.1
The statement of Theorem 2.1 follows immediately from the following lemma. Proof of Lemma B.4. To shorten the notation, we define We can deduce from (B.5) that Applying this (pointwisely) to the integrand in (B.13) with t = ε n,i ( dμn,i dPn,i 1(A c n,i,τ ) − 1) yields (B.10). We split the proof of (B.11) into two steps. First, define for all x > 0 For ε max n = max 1≤i≤kn ε n,i we can deduce from ε max for all x > 0. Since dQ n,i /dP n,i is bounded from above by 1 + τ on A c n,i,τ we obtain Combining this and (B.16) gives us the first bound in (B.14) for appropriate C τ . Second, set C = 1/( τ /2 + 1 + 1) < 1/2. Note that on A n,i,τ dQ n,i dP n,i Consequently,

B.3.2. Proof of Theorem 2.2(b)
The statements follows from Remark (8.6) and Lemma (8.7) of Janssen et al. [24] as we explain in the following. Let C 2 lok (R) be set of all bounded functions f : R → R that are twice differentiable with continuous derivatives in some neighbourhood of 0. Denote by f (k) (0) the k th derivative of f at 0. The Lévy-Khintchine triplet of a infinitely divisible measure ν is equal to (γ, σ 2 , η) if and only if the generating functional A : C 2 lok (R) → R admits the Lévy-Khintchine representation for all f ∈ C 2 lok (R). For the actual definition of A and more details about it we refer the reader to Janssen et al. [24], in particular to (8.1)-(8.4).

B.3.3. Proof of Theorem 2.4
We carried out two different proofs for Theorem 2.4. The first one relies on infinitely divisible statistical experiments and accompanying Poisson experiments, and arguments from Chap. 4, 5, 9, 10 of Janssen et al. [24] are used. The second one is based on traditional limit theorems for real-valued random variables. Since, probably, the second one is easier to follow for the readers who are not experts in the field of statistical experiments we decided to present only the second proof.
At the end of the proof we will verify the following lemma. Let us first assume that (a) and (b) are fulfilled. Define Y n,i as in (B.19). Regarding Lemma B.8 and using typical sub-subsequence arguments we can assume without loss of generality that Theorem B.2(i) and (ii) as well as Theorem B.3(a) and (b) hold for a measure M 1 (resp. M 2 ), σ 1 ≥ 0 (σ 2 ≥ 0, resp.) and γ 1 ∈ R (γ 2 ∈ R, resp.) under P (n) (Q (n) , resp.). In particular, by Lemma B.8 σ 2 1 = σ 2 . Note that η j = M j|(0,∞) is a Lévy measure. From (B.15) we obtain M 1 ({∞}) = 0 and so ξ 1 , the limit of T n under P (n) , is real-valued. Moreover, since max 1≤i≤kn ε n,i → 0 and ε n, . Finally, the proof for the first assertion is completed by Theorem 2.2(b). Now, let ξ 1 be not equal to −∞ with probability one. By Theorem 2.1(a) we have sup n∈N I n,1,τ + I n,2,τ < ∞ for all τ > 0. Hence, for each subsequence there is a subsequence such that (a) for some measure M and (b) for some σ 2 are fulfilled. From Theorem 2.2(b) and the first assertion proved above we obtain: ξ 1 is real-valued, and M and σ 2 are uniquely determined by the distribution of ξ 1 and so do not depend on the special choice of the subsequence, which proves the second assertion (and Theorem 2.2(a)).
Proof of Lemma B.8. First, observe that by (B.15) the sum in Theorem B.3(a) is upper bounded under P (n) as well as under Q (n) for all τ > 0. By (1.3) The equivalence of (a)-(e) follows from (B.3) and is standard for binary experiments, see Strasser [36]. The equivalence of (g) and (h) follows from (1.

B.3.6. Proof of Lemma A.3
Let Q n,i and Q (n) be defined as Q n,i and Q (n) replacing μ n,i and ε n,i by μ n,i and ε n,i . For the statement in Lemma A.3 it is sufficient to show that {Q (n) , Q n,i } tend weakly to the uninformative experiment { 0 , 0 }. The main task for this purpose is to verify kn i=1 ||Q n,i (θ) − Q n,i ( θ)|| → 0, which is left to the reader.
That is why it sufficient to show that for some γ > 0 To verify this we apply Chebyshev's inequality. Note that for every real-valued random variable Z on some probability space (Ω, A, P ) with finite expectation we have Consequently, we need to determine first the expectation and variance for Z n (v) for v ∈ {v n , 1 − v n }:

4127
By assumption we have Suppose that (B.26) holds. Then log log(k n ) → ∞ and Var Q (n) (Z n (v n )) E Q (n) (Z n (v n )) 2 → 0. Let G n be the distribution function of Q n,1 , i.e. G n (v) = Q n,1 ([0, v]), v ∈ (0, 1). Let U 1 , U 2 , . . . be a sequence of independent, uniformly on (0, 1) distributed random variables on the same probability space (Ω, A, P ). Note (U 1 , . . . , U kn ) ∼ P (n) and (G −1 n (U 1 ), . . . , G −1 n (U kn )) ∼ Q (n) , where G −1 n denotes the left continuous quantile function of Q n,1 . Moreover, denote the interval (r n , s n ) ∪ (t n , u n ) by J n,1 and [1 − u n , 1 − t n ] ∪ [1 − s n , 1 − r n ] by J n,2 . By (3.3) it is easy to see that we can replace r n by any r n ≥ r n such that log(r n ) = (−1 + o(1)) log(n). In particular, we can assume without loss of generality that k n r n ≥ 1 and, analogously, u n < 1/2. From Corollaries 2 and 3 as well as (1) and (2) of Theorem of Jaeschke [22], which also hold for the statistics W n , V n , W n introduced at the beginning of subsection 2 therein, we can deduce that a n sup v∈(0,1)\(Jn,1∪Jn,2) and a n sup where the distribution function of Y equals Λ 2 , see (3.1). By (B.28), the mutually contiguity of P (n) and Q (n) and the equivalence "G n (v) ≥ u ⇔ v ≥ G −1 n (u)" it is sufficient for (3.5) to verify a n sup v∈Jn,1∪Jn,2