Sharp minimax adaptation over Sobolev ellipsoids in nonparametric testing

Abstract: In the problem of testing for signal in Gaussian white noise, over a smoothness class with an L2-ball removed, minimax rates of convergences (separation rates) are well known (Ingster [24]); they are expressed in the rate of the ball radius tending to zero along with noise intensity, such that a nontrivial asymptotic power is possible. It is also known that, if the smoothness class is a Sobolev type ellipsoid of degree β and size M , the optimal rate result can be sharpened towards a Pinsker type asymptotics for the critical radius (Ermakov [9]). The minimax optimal tests in that setting depend on β and M ; but whereas in nonparametric estimation with squared L2-loss, adaptive estimators attaining the Pinsker constant are known, the analogous problem in testing is open. First, for adaptation to M only, we establish that it is not possible at the critical separation rate, but is possible in the sense of the asymptotics of tail error probabilities at slightly slower rates. For full adaptation to (β,M), it is well known that a log logn-penalty over the separation rate is incurred. We extend a preliminary result of Ingster and Suslina [25] relating to fixed M and unknown β, and establish that sharp minimax adaptation to both parameters is possible. Thus a complete solution is obtained, in the basic L2-case, to the problem of adaptive nonparametric testing at the level of asymptotic minimax constants.


Introduction and main result
Consider the Gaussian white noise model in sequence space, where observations are Y j = f j + n −1/2 ξ j , j = 1, 2, ..., (1.1) with unknown, nonrandom signal f = (f j ) ∞ j=1 , and noise variables ξ j which are i.i.d. N (0, 1). We intend to test the null hypothesis of "no signal" against nonparametric alternatives described as follows. Here ρ 1/2 is the radius of the open ball; for brevity we call ρ itself the "radius". We study the hypothesis testing problem Assuming that n → ∞, implying that the noise size n −1/2 tends to zero, we expect that for a fixed radius ρ, consistent α-testing in that setting is possible. More precisely, there exist α-tests with type II error tending to zero uniformly over the nonparametric alternative f ∈ Σ(β, M ) ∩ B ρ . If now the radius ρ = ρ n tends to zero as n → ∞, the problem becomes more difficult and if ρ n → 0 too quickly, all α-tests will have the trivial asymptotic (worst case) power α.
According to a fundamental result of Ingster [24] there is a critical rate for ρ n , the so-called separation rate ρ n n −4β/(4β+1) (1.2) at which the transition in the power behaviour occurs. More precisely, consider a (possibly randomized) α-test φ in the model (1.1) for null hypothesis H 0 : f = 0, that is, a test fulfilling E n,0 φ ≤ α where E n,f (·) denotes expectation in the model (1.1). For given φ, we define the worst case type II error over the alternative f ∈ Σ(β, M ) ∩ B ρ as The search for a best α-test in this sense leads to the minimax type II error An α-test which attains the infimum above for a given n is minimax with respect to type II error. Ingster's separation rate result can now be formulated as follows: if ρ n n −4β/(4β+1) and 0 < α < 1 then 0 < lim inf n π n (α, ρ n , β, M) and lim sup n π n (α, ρ n , β, M) < 1 − α.
Moreover, if ρ n n −4β/(4β+1) then π n (α, ρ n , β, M) → 0, and if ρ n n −4β/(4β+1) then π n (α, ρ n , β, M) → 1 − α. These minimax rates in nonparametric testing, presented here in the simplest case of an l 2 -setting, have been extended in two ways. In the first of these, Ermakov [9] found the exact asymptotics of the minimax type II error π n (α, ρ, β, M ) (equivalently, of the maximin power) at the separation rate. The shape of that result and its derivation from an underlying Bayes-minimax theorem on ellipsoids exhibit an analogy to the Pinsker constant in nonparametric estimation. In another direction, Spokoiny [35] considered the adaptive version of the minimax nonparametric testing problem, where both β and M are unknown, and showed that the rate at which ρ n → 0 has to be slowed down by a log log n-factor if nontrivial asymptotic power is to be achieved. Thus an "adaptive minimax rate" was specified, analogous to Ingster's nonadaptive separation rate (1.2), where the additional log log n-factor is interpreted as a penalty for adaptation. However this result did not involve a sharp asymptotics of type II error in the sense of [9].
It is noteworthy that in the problem of nonparametric estimation of the signal f over f ∈ Σ(β, M ) with l 2 -loss, where the risk asymptotics is given by the Pinsker constant, there is an array of results showing that adaptation is possible with neither a penalty in the rate nor in the constant, cf. Efromovich and Pinsker [7], Golubev [16], [17], Tsybakov [36]. The present paper deals with the question of whether the sharp risk asymptotics for testing in the sense of [9] can be reproduced in an adaptive setting, in the context of a possible rate penalty for adaptation.
Let us present the well known result on sharp risk asymptotics for testing in the nonadaptive setting. Let Φ be the distribution function of the standard normal, and for α ∈ (0, 1) let z α be the upper α-quantile, such that Φ(z α ) = 1− α. Write a n b n (or b n a n ) iff b n = o(a n ), and a n ∼ b n if a n = b n (1 + o (1)). Furthermore, we write a n b n if both a n = O (b n ) and b n = O (a n ) hold. Proposition 1.1 (Ermakov [9]). Suppose α ∈ (0, 1) and that the radius ρ n tends to zero at the separation rate, more precisely ρ n ∼ c n −4β/(4β+1) (1.5) for some constant c > 0. This gives the sharp asymptotics for the minimax type II error at the separation rate, analogous to the Pinsker constant [33] for nonparametric estimation. The optimal test attaining the bound of (ii) above, as given in [9], depends on β and M . Concerning adaptivity in both of these parameters, the following result is known. Proposition 1.2 (Spokoiny [35]). Let J be a subset of (0, ∞) × (0, ∞) such that there exist M > 0, β 2 > β 1 > 0 and (i) If t n (log log n) 1/2 and ρ n ∼ c · (t n n −1 ) 4β/(4β+1) for some c > 0, then for any sequence of tests φ n satisfying E n,0 φ n ≤ α + o(1) we have sup (β,M )∈J Ψ(φ n , ρ n , β, M) ≥ 1 − α + o (1).
(ii) For some β * > 1/2 and 0 < M 1 ≤ M 2 , let Then there exist a constant c 1 = c 1 (β * , M 1 , M 2 ) and a sequence of tests φ n satisfying E n, Here the criterion to evaluate a test sequence does now include the worst case type II error over a whole range of β, M . Hence the critical radius rate (1.8) has to be interpreted as an adaptive separation rate. It differs by a factor (log log n) 2β/(4β+1) from the nonadaptive separation rate (1.2); this factor is an example of the well-known phenomenon of a penalty for adaptation. Furthermore, as noted in [35], a degenerate behaviour occurs here, in that both error probabilities at the critical rate tend to zero. Thus any sequence φ n of tests fulfilling (1.9) should be seen as adaptive rate optimal, comparable to rate optimal tests in the nonadaptive case (that is, tests fulfilling lim sup n Ψ(φ n , ρ n , β, M) < 1 − α at ρ n given by (1.2)). In Ingster and Suslina [25], chap. 7, the worst case adaptive error (1.9) is further analyzed, with a view to a sharp asymptotics; essentially a test is developed there which is sharp minimax adaptive over β for known M . We address this subject in our Sections 1.2 and 4 where the results of [25] are extended towards full minimax sharp adaptivity over β and M .

Adaptation over M only
Initially we now assume that β is fixed while we aim for adaptation over the ellipsoid size M . First, we present a negative result for adaptation at the classical separation rate (1.2).
This result states that adaptation just over M is impossible at the separation rate.
We now modify the criterion, by enlarging the radius slightly and examining how the minimax error approaches zero. To be specific, we replace the constant c in (1.5) by a sequence c n tending to infinity slowly. In that case the minimax type II error bound of Proposition 1.1, namely Φ(z α − A(c, β, M )/2) will tend to zero (since A(c, β, M ) as defined in (1.6) contains a factor c 2+1/(2β) ). When the log-asymptotics of this error probability is considered, as in moderate and large deviation theory, it turns out that adaptation to Ermakov's constant is possible.

P. Ji and M. Nussbaum
To complement this result, a formal argument is needed that no α-test can be better in the sense of the log-asymptotics over radii ρ n for the error of second kind. In Ermakov [12] the nonadaptive sharp asymptotics is studied in the above setting where type II error probability tends to zero.
This result is implied by Theorem 3 in [12], and hence the proof is omitted. In conjunction with Theorem 1.2, this proposition implies that if one switches to an error criterion expressed in the rate exponent of a slowly decaying error probability, then there is no penalty for adaptation. It is obvious from [12] that the bound (1.10) is nontrivial, in the sense that it specifies as optimal the quadratic tests using the optimal filtering weights found by Ermakov [9], and is not attained e.g. by tests with projection weights (i.e. weights from {0, 1}).
It is of interest to consider a certain dual formulation of Theorems 1.1 and 1.2, where the radius ρ n is allowed to depend on the ellipsoid parameters β and M , and a certain prescribed type II error level is to be attained, such as Φ(z α − d) for fixed d > 0. This formulation might be called the variable radius approach. A test is then optimal if it attains a given type II error level over the complement of sufficiently small balls. The "variable radius aproach" has been crucially used in [35] for expressing the rate penalty for adaptation (cf Proposition 1.2); we will also adopt it here for our sharp adaptation results. Note that in this setting, the sharp type II error asymptotics is encoded in the radius ρ n .
Consider first the nonadaptive setting of Proposition 1.1. In this case, the connection between type II error level and optimal radius can easily be obtained by rescaling from Proposition 1.1: if for a d > 0, the constant c is determined by with A 0 (β) given by (1.7), or equivalently, the radius ρ n is determined by  2 , and also that ρ n,M is given by (1.12). Then there is no test φ n satisfying E n,0 φ n ≤ α + o(1) and both relations (ii) Assume d n → ∞ but d n = o(n K ) for every K > 0, and that ρ n,M is given by ρ (1.13) As with Theorem 1.2, to complement Theorem 1.3 a formal argument can be given that no α-test, possibly depending on M , can be better in the sense of the log-asymptotics over radii ρ n,M for the error of second kind. Such a result is analogous to Proposition 1.3 and is implicit in [12].

Adaptation over β and M
The adaptivity result of Spokoiny [35], discussed in Proposition 1.2, about the rate penalty for adaptation (log log n) 2β/(4β+1) , does not provide a sharp risk asymptotics in the sense of either Proposition 1.1 or our Theorem 1.3. Some important results in this direction however are presented in section 7.1.3 of Ingster and Suslina [25]. Indeed in [25] the solution is presented for unknown β ∈ [β 1 , β 2 ] but fixed M . It should be noted that adaptation to β only, with M assumed known, does not have a practical interpretation in the context of smooth functions. We will address here the problem of a sharp risk bound for adaptation to the full parameter (β, M ). For the analogous problem in the estimation case (regarding the Pinsker bound), solutions have been presented by Golubev [17] and Tsybakov [36], sec 3.7.
Our result can be summarized as follows: the lower asymptotic risk bound for known M , unknown β ∈ [β 1 , β 2 ] of [25] is achievable even for unknown M , by a refinement of the Bonferroni-type tests used to treat adaptation to β. Thus there is no further penalty for adaptation to M , in addition to the log log n-type penalty already incurred by adaptation to β.
We begin by stating the lower asymptotic risk bound for known M , unknown β ∈ [β 1 , β 2 ], a variation of Theorem 7.1 in [25]. Assume that 0 < β 1 < β 2 are given as well as some M > 0. Let D ∈ R be arbitrary and define a radius sequence ρ n,β,M by (1.14) In [25] the l 2 -Sobolev ellipsoids represent a boundary case and are therefore not covered, but the above bound can be proved by very similar methods (cf. Section 4.2 below). Note that part (i) of Proposition 1.2 is implied by (1.15) by letting D → −∞.
As to the attainability of this bound, the test provided in section 7.3 of [25] depends on M . Indeed in [25] observations are assumed to be where R → ∞ and r/R → 0 (the "power norm" case in [25], where p = q = 2, s = β; also r is ρ in [25]). This observation model is equivalent to ours upon setting R 2 = nM , r 2 = nρ, and then Y j = n −1/2 X j , f j = n −1/2 v j . The reasoning provided in section 7.3.2 of [25] makes it clear that the test constructed uses solutions of an extremal problem under where r 2 = nρ n,β,M with ρ n,β,M from (1.14) and β is from a certain grid of values in (β 1 , β 2 ). Since in particular R = n 1/2 M 1/2 , it turns out that the test depends on M , though it has been made independent of β ∈ (β 1 , β 2 ). A version of such results for α n -tests with α n → 0 is given in [26].
The following theorem extends the result of [25] about attainability of the bound (1.15) for fixed M towards full adaptivity over (β, M ). Theorem 1.4. Let D ∈ R be arbitrary and define a radius sequence ρ n,β,M by (1.14). Assume a nonempty interval

Further discussion
To further discuss the context of the main results, we note the following points.
It is an open question whether an adaptive analog of (1.17) holds. For standardized sums S n of independent random variables, if {S n > x n } is a large or moderate deviation event, theorems on the relative error caused by replacing the exact distribution of S n by its limiting distribution are sometimes called strong large or moderate deviation theorems to distinguish them from first order results on log P (S n > x n ). For a background cf. [32], [22], [2], chap. 11.
The detection problem. Instead of focussing on the worst case type II error Ψ(φ, ρ, β, M ) (1.3) of α-tests φ, one may consider minimization of the sum of errors, that is of E n,0 φ + Ψ(φ, ρ, β, M ), over all tests φ. That has been called the detection problem in the literature; in [25] this problem is largely treated in parallel to the one for α-tests. There and in [23] one finds the analog of the nonadaptive sharp asymptotics of Proposition 1.1. It may be conjectured that analogs of our Theorems 1.1-1.4 concerning adaptivity hold there as well.
The sup-norm problem. Lepski and Tsybakov [29] proved a sharp minimax result in testing when the alternative is a Hölder class (denoted H (β, M ), say) with an sup-norm ball removed, which is a testing analog of the minimax estimation result of Korostelev [27] and also a sup-norm analog of Ermakov [9]. For adaptive minimax estimation with unknown (β, M ) in the sup-norm case cf. [19]; for the testing case where β is given, Dümbgen and Spokoiny [6] established a sharp adaptivity result with respect to the size parameter M only. The result in Theorem 2.2. of [6] can be seen as a analog of our Theorem 1.3, although the methodology in the sup-norm case is much different due to the connection to deterministic optimal recovery, cf. [29]. The case of unknown (β, M ) seems to be an open problem in the sup-norm testing case, with regard to sharp minimaxity, although in [6] a test is given which is adaptive rate optimal without a log log n-type penalty. Rohde [34] discusses the sup-norm case for regression with nongaussian errors, combining methods of [6] with ideas related to rank tests.
Density, regression and other models. The phenomenon of the log log n-type penalty in the rate for adaptation when an L 2 -ball is removed, as found in [35], has also been established in a discrete regression model [15], and in density models with direct and indirect observations [13], [1]. Testing in a white noise model with composite hypotheses derived from a shifted curve model has been treated in [3]. In a regression context, composite null hypotheses given by a parametric family have been considered in [21]. For a review of adaptive separation rates and further results in a Poisson process model cf. [14]. For sharp minimax testing in nongaussian models (the nonadaptive theory) cf. Ermakov [10] and references therein; for the analogous topic in estimation cf. [18], [31]. An interesting connection to random matrix theory has recently been made in [5] by establishing a Pinsker type constant for estimation in high dimensional regression models.
The structure of the paper is as follows. In Section 2 we prove the negative result of Theorems 1.1 and 1.3 (i) that adaptation over M fails at the separation radius (at rate n −4β/(4β+1) ). Section 3 presents the proofs that adaptation over M is possible if the radius is slightly enlarged, i.e. the proofs of Theorem 1.2 and its dual version Theorem 1.3 (ii). The proof of Theorem 1.4 about existence of adaptive tests in the two parameter framework (β, M ) is presented in Section 4; for completeness a proof of the lower bound of Proposition 1.4 is also included. In an Appendix section some technical auxiliary results are collected.

Proof of the negative result at separation rate
The following lemma will serve to prove the result of Theorem 1.1 and its version in the variable radius setting (Theorem 1.3 (i)) in a unified way.
Assume there exists a test sequence φ n satisfying, for some α > 0 and both relations The proof will be carried out in several steps. For brevity we write A i = A(c i , β, M i ), i = 1, 2 in this section (cp. (1.6)). Let λ (M, ρ), μ (M, ρ) be the solutions of (A.1) provided by Lemma A.1, and for some ε ∈ (0, 1) set Then according to (A.1) Proof. It suffices to show that We have, by the first relation of (2.4) According to Lemma A.1, relation (A.5), the latter quantity is O ρ , which establishes (2.5). Furthermore, by the second relation of (2.4) according to Lemma A.1, relation (A.6), which establishes (2.6).
As a consequence, for Recall Y j = f j + n −1/2 ξ j . Let Q n,0 be the prior distribution where f = 0 a.s. and consider the resulting joint distribution of (Y j ) ∞ j=1 under the priors Q n,i i = 0, 1, 2, that is Combining this with (2.2), (2.1) and (2.7) gives The likelihood ratio of π n,i against π n,0 is, using notation γ 2 j,i := nf 2 0,j,i , Setg j,i := γ 2 j,i / 1 + γ 2 j,i , then by the factorization theorem, it is seen that the bivariate statistic ⎛ is sufficient for the family of distributions {π n,i , i = 0, 1, 2}. To simplify notation, . Since only finitely manyg j,i are nonzero, the scalar product g i , z and the euclidean norm g i are well defined. Set g =g i / g i and define the bivariate statistic which is equivalent to (2.8) and thus sufficient in {π n,i , i = 0, 1, 2}. Write the induced family for T n as {π T n,i , i = 0, 1, 2} with corresponding expectations E π,T n,i and take the conditional expectation φ * n (T n ) = E π n,· (φ n |T n ). By sufficiency the (possibly randomized) test φ * n (T n ) for null hypothesis π T n,0 against alternative π T n,i , i = 1, 2 is as good as φ n (cf. e.g. Theorem 4.66 in [30]), that is Then we have the following lemma, which is proved later. Lemma 2.3. As n → ∞, for fixed ε > 0, each distribution π T n,i , i = 0, 1, 2 converges in total variation to π T 0,i := N (μ i , Σ), i = 0, 1, 2 respectively, where μ 0 = (0, 0) and for The proof follows below. Here both the families π T n,i , i = 0, 1, 2 and their limit π T 0,i , i = 0, 1, 2 depend on ε. It is then obvious that there exists a sequence ε n → 0 such that π T n,i , i = 0, 1, 2 converges in total variation to a limit family defined as in the Lemma above, with δ ε replaced by 1, still denoted π T 0,i , i = 0, 1, 2 . Then by the weak compactness theorem (cf. [28], A.5.1), there exist a test φ * and a subsequence φ * n k such that φ * n k converges weakly to φ * . Thus (2.14) Consider now the Neyman-Pearson test for N (0, Σ) against a simple hypothesis N (μ i , Σ). Here the type II error is Φ(z α − Σ −1/2 μ i ), and we find (from (2.9)-(2.11) for ε = 0) Assume now that M 1 /c 1 < M 2 /c 2 ; then it follows that 0 < r < 1. Since μ 1 is a multiple of the vector (1, r) and μ 2 is a multiple of the vector (r, 1) , it follows that μ 1 , μ 2 and the origin are not on the same line. We shall show that in that situation, a UMP test for alternative {N (μ 1 , Σ), N(μ 2 , Σ)} does not exist. Indeed, the log-likelihood ratio of N (μ i , Σ) against N (0, Σ) has the form But since these two types of tests can never coincide, for any choice of thresholds To check the CLT in distribution for T n under π n,0 , it suffices to show that This follows from relation (A.13) in Lemma A.2 in conjunction with N 1 , N 2 → ∞, cf. (A.2). It remains to check the CLT in total variation. Consider the first component of T n , that is Here at most [N 1 ] of theg j,1 are nonzero, so one may apply Lemma A.4, identifying the sample size m there with [N 1 ]. Then the condition follows again from (A.13). To check the condition on the characteristic function of z 1 , note that such that |φ| 2 is integrable. Hence the two marginal distributions of T n satisfy the CLT in total variation. A straighforward extension of Lemma A.4 to bivariate coefficients c jn = (c jn,1 , c jn,2 ) for which the limit of j c jn,1 c jn,2 exists, establishes the result for the law of the vector T n under π n,0 .

Proofs for adaptation over M
Assume f ∈ Σ (β, M ), M fixed and set for a fixed γ ∈ 0, (4β + 1) where c n is a sequence fulfilling such that ρ n = o (1). Define the test statistic The following lemma concerning the nonadaptive test (3.6) is similar to Theorem 4 in [12]; for clarity of exposition we present a proof here.
Lemma 3.1. Under assumptions (3.1), (3.2) and (3.5), the test φ n = φ n (M, Proof. Under H 0 , we have f = 0. In this case T n = U n where In view of 1≤j≤N d 2 j = 1, Ez j = 0 and Var (z j ) = 1, for the convergence in law of U n to N (0, 1) it suffices to show that max 1≤j≤N |d j | = o (1). Recall that d j =d j / d , where d j ≤ 1, and observe that according to Lemma A.1 and U n N (0, 1). Consider now E n,0 φ n = P n,0 (U n ≥z (α n )) .
We now make use of some further results collected in the Appendix. By Lemma A.6, the set B ρ in the supremum may be replaced by B 0 ρ defined in (A.16). Set then where L n (d, f ) denotes the functional (A.17) and U n is defined by (3.9). Note that by using (3.10) and f ∈ B 0 ρ , and then (A.2). The current choice ρ = ρ n,M n −4β/(4β+1) c n implies (3.14) By splitting into events −T 0 n (f ) ≤ 1 and its complement, we find Definingž n =z (α n ) + 1, we now have Again invoking Lemma A.5, Appendix, with c jm and m as above but now setting Y j = −z j and a m = h n , we find Note that by (3.14), (3.18) and (3.19) we have τ (1)) .

With (3.18) and (3.19) this yields
The uniformity claim can be checked using part (iii) of Lemma A.1 and the uniformity implicit in Lemma A.5. Proof. Let L n → ∞ be a sequence satisfying L n = O (c n ). Using notation M (i) for the smoothness bounds M i , i = 1, 2 in the lemma, define a grid of values Consider again the test statistic T n given by (3.3) and observe that it depends on M and ρ (via (A.1)); we use notation T n (M, ρ) to indicate that dependence. Define the test statistics First check that φ 0 n is an α-test: by Bonferroni's inequality P n,0 φ 0 n = 1 ≤ 1≤l≤Ln P n,0 (ψ n,l = 1) . (3.22) We claim that P n,0 (ψ n,l = 1) ≤ αL −1 n (1 + o (1) uniformly over M ∈ M (1) , M (2) , which implies (3.21).
Proof of Theorem 1.2. In (3.1) set γ = 0 and note that the test φ 0 n found in Lemma 3.2 has the property claimed in the theorem.
For l ∈ Λ n define (2) and consider the grid of values {(β l , M l ) , l ∈ Λ n } ⊂ J. Consider the test statistic T n given by (3.3) with coefficients d j determined by (3.4) where λ = λ (ρ n , M), μ = μ (ρ n , M) are the solutions of (A.1) provided by Lemma A.1, and N = (λ/μ) 1/2β . Thus the test statistic T n is determined by β, M , and ρ = ρ n ; we write T n (β, M, ρ) to indicate that dependence. Consider the radius ρ n,β,M given by (1.14), namely by (recall that D ∈ R is arbitrary; it will be fixed throughout this section). For the radius ρ associated to (β l , M l ) according to (4.1), we introduce an abbreviation ρ n,l := ρ n,β l ,M l . and for the type II error we have uniformly over f ∈ Σ(β l , M l )∩B ρ n.l and l ∈ Λ n in view of (4.6). The test φ n therefore fulfills uniformly over f ∈ Σ(β l , M l ) ∩ B ρ n.l and l ∈ Λ n , in view of (4.7). For the supremal type II error Ψ (·) according to ( uniformly over l ∈ Λ n . In conjunction with (4.1) this implies that a lower bound for the rate of N l → ∞ is given by N l n 2/(4β l +1) , which implies .
This and (4.10) establish (4.9). As in the proof of Lemma 3.1, setting T n,l = U n,l as in (3.9) for every l ∈ Λ n , from Lemma A.5 we now obtain uniformly over l ∈ Λ n . The relation exp −z 2 n /2 = (L n L 2,n ) −1 now establishes (4.5).
To prove (4.6), we first note the inequality (3.15) obtained in the proof of Lemma 3.1, applied to the test ψ n,l (4.12) Lemma A.1 now yields a relation, holding uniformly over l ∈ Λ n S n,l = nρ (1)) .  (1)) . (4.14) As a consequence of Lemma A.1 (iii), the o(1) term here is of algebraic rate, meaning there is γ > 0 such that this term is actually o (n −γ ) uniformly over l ∈ Λ n . This holds because (β l , M l ) ∈ J and because there exists γ 1 > 0 such that ρ n,l given by (4.2), (4.1) satisfies ρ n,l = o (n −γ1 ) uniformly over l ∈ Λ n . This implies that (4.14) can be strengthened to S n,l = (2 log log n) Furthermore, for τ n,l we find for ρ = ρ n,l

P. Ji and M. Nussbaum
For z n we find in view of (4.4) z n = (2 log (L n L 2,n )) 1/2 = (2 log L 1,n + 4 log L 2,n ) As a consequence, in conjunction with (4.15) we obtaiň It now suffices to choose t n = o(1) such that t −2 n τ n = o (1). The choice t n = (log n) −1 clearly qualifies in view of (4.16). Now invoking (4.12) and the Lindeberg-Feller CLT for U n,l (with uniformity over l ∈ Λ n , cp. Lemma A.4) concludes the proof.
Define also another radius sequence as for a constant c ∈ (1/2, 1). We thus have η n γ n . For (β, M ) ∈ V l we now have in view of Lemma 4.2 As a consequence, there exists n 0 such that for n ≥ n 0 we have In conjunction with (4.17) we obtain for n ≥ n 0 and (β, In analogy to (4.3), for any l ∈ Λ n define the test statistic 1 − E n,fψn,l ≤ Ψ(ψ n,l ,ρ n,l , β l , M l ).
Note that the quantity Ψ(ψ n,l ,ρ n,l , β l , M l ) is an exact analog of Ψ(ψ n,l , ρ n,l , β l , M l ) considered in (4.11): in both cases, the test ψ is defined by the statistic T n (β, M, ρ) (via (3.3), (A.1), (3.4)) with parameters (β l , M l ) and radius ρ specified either asρ n,l or as ρ n,l , and with the same critical value z n . The proof of (4.22) is therefore entirely analogous to that of (4.6), where the change from ρ n,l toρ n,l = ρ n,l (1 − η n ) has to be taken into account. LetS n,l for l ∈ Λ n be the appropriate modification of the saddlepoint value S n,l , i.e.S n,l is given by (A.20) for the pertaining values (ρ, β, M ) = (ρ n,l , β l , M l ). Analogously to (4.13) we obtainS n,l = nρ where o(1) is uniform over l ∈ Λ n . Analogously to (4.14) we find (1)) . This, in conjunction with same argument about the o (1) term in (4.23) as used previously for (4.14) allows to obtain the analog of (4.15), that is S n,l = (2 log log n) The rest of the proof follows verbatim that of Lemma 4.1 after (4.15).
The proof of Theorem 1.4 is now concluded in the same way which led up to in view of (4.6). The testφ n therefore fulfills uniformly over f ∈ Σ(β, M )∩B ρ n,β,M and (β, M ) ∈ J. This implies Theorem 1.4.

Lower risk bound
The proof of lower risk bound of Proposition 1.4 in [25] does not actually cover the present case of a Sobolev ellipsoid. Indeed in [25] the problem of adaptive testing is considered for parameters spaces given by a quadruplet (p, q, r, s), where the corresponding space is an l p -ellipsoid of smoothness r with an l qellipsoids of smoothness s removed. In this generality, the prior measures required for the lower risk bound have to be non-Gaussian, but the model of the present paper, which corresponds to the case p = q = 2, r = β and s = 0, calls for Gaussian prior measures on the ellipsoids Σ (β, M ). In [25] the two sets of quadruplets Ξ G01 and Ξ G02 treated ( [25], p. 278) exclude the present case which is on the boundary between these. In this section we will provide the necessary details for this boundary case in an abbreviated fashion.
(4.25) The first step is to find prior probability measures Q n,l such that  N (0, 1), and H 0 : y j ∼ N 0, nf 2 0,j,l + 1 . Denote γ 2 j,l = nf 2 0,j,l , and let Λ n,l be the corresponding log-likelihood ratio: . (4.27) Note that setting N l = (λ/μ) 1/2β l , we have γ 2 j,l = 0 for j > N l . Observe that as Setting z j = y 2 j − 1 / √ 2, such that Ez j = 0, Var (z j ) = 1, this can be written with R n,l being the last two terms on the r.h.s. of (4.30). Denotẽ (convergence in distribution, uniform in l).
We formulate a version of the lower bound for type II error proved in Corollary 7.2 of [25]. Introduce the following notation: let Φ (x, y; r) be the twodimensional distribution function of jointly normal random variables Z 1 , Z 2 , each having marginal distribution N (0, 1) and correlation r, and let Φ (x) be the distribution function of N (0, 1). Let π n l , l = 0, . . . , L n be a set of probability measures on a sample space (Ω n , A n ), where π n l π n 0 and define Λ n,l = log Assume α ∈ (0, 1). Then for any sequence of tests φ n satisfying E [φ n |π n 0 ] ≤ α + o(1) one has The measures π n 0 and π n l are those where y j ∼ N (0, 1) and y j ∼ N 0, f 2 0,j,l + n −1 (independent) respectively. Setting u n,l =S n,l from (4.32), we find analogously to (4.24) S n,l = (2 log log n) uniformly over l = 1, . . . , L n (note that in (4.24) the radius is given by ρ n,l (1 − η n ) whereas now we useρ n,l = ρ n,l (1 + η n ) with ρ n,l given by (4.25), but it can be checked that (4.38) still holds true). Furthermore we have Set t n = log (2) n and note that
To determine the family ρ n,kl , consider the expansion (4.31) of the loglikelihood ratios Λ n,l . Define ρ n,kl through ρ n,kl u n,l u n,k = Cov (Λ n,l − R n,l , Λ n,k − R n,k ) .
We then have Lemma 4.4. The current choice L n = log n/ log (2) n implies that condition (4.35) is fulfilled.
Proof. We first show that uniformly over k < l. Indeed, since β k < β l and since N is related to the bandwidth parameter in a smoothing problem, we expect N k > N l . Note that according to (A.2), we have uniformly over k = 1, . . . , L and for N l The function g(t) = (4t + 1) −1 and its derivative are monotone decreasing for t > 0, hence where C does not depend on n and k, l hence Clearly the exponent tends to −∞, so (4.39) is proved. Now we have The asymptotics for λ l , λ k shown in (A.3) gives λ l N −(2β l +1) l . Observe that for l = k the above term would be of the order We can use (4.39) to show it is o(1) for k < l. Equivalently, it can be shown, analogously to (4.39), that and then it follows We still need to establish the asymptotic normalities required, namely (4.36) and (4.37). In the sequel we use notation C for generic constant which does not depend on n, j and l, the value of which can change, even on the same line.
Proof of (4.36). Recall that β ∈ β (1) , β (2) and define δ 2 n := n −1/(8β (2) +1) so that for some ε > 0 we have uniformly in l, and in addition, using standard arguments, that uniformly in l. To establish the latter, denote NR n,l the nonrandom term of S −1 n,l R n,l : its absolute value is bounded by In conjunction with (4.29) the stronger relation was shown sup j=1,...,n . (4.45) Furthermore, by (4.38) the term S n,l grows like a power of log (2) n. In view of n −1/(4β l +1) ≤ n −ε δ 2 n the last two facts imply |NR n,l | = o δ 2 n uniformly in l. To consider the random term of S −1 n,l R n,l , apply Chebyshev's inequality to its first component: In view of (4.45) and (4.40) the latter term is o δ 2 n . The other random term S −1 n,l R n,l (namely n j=1 z j O γ 6 j,l ) is treated in a similar fashion, which establishes (4.44). For (4.43) we use the Berry-Esseen type result of [20]: suppose v j , j = 1, . . . , n are independent random vectors in R k with zero expectation and normalized in such a way that n j=1 v j has unit covariance matrix. If C is the set of convex subsets of R k and Φ is the standard normal measure then where · is the Euclidean norm, and the constant C depends only on k. We use this result for k = 1, A = (−∞, x] and set v j = S −1 n,l γ 2 Proof of (4.37). We will apply (4.46) for k = 2 and random vectors where Σ is the covariance matrix Let ξ be a 2-vector of independent standard normals; then (4.46) implies that for any convex set A (4.48) Here Σ = I 2 + o(1) uniformly over k, l (cp. (4.35) and (4.38)), so that it suffices to estimate By By a standard reasoning, the results (4.44) and (4.49) for rectangles B x,y together imply Comparing notation with (4.37), we find The claim (4.37) follows since L 2 n δ n = o (1). j 2β , (j + 1) 2β for j ≥ 1. Solving for λ in (A.7), one sees that it suffices to prove that for the function g 3 = g 1 /g 2 there is a unique solution t > 1 of Suppose that t > 1; for k t = t 2β we have Thus if 1 < t < 2 2β then k t = 1 and g 3 (t) = 1. To show that g 3 (t) is strictly monotone increasing for t > 2 2β , it suffices to show that on each interval j 2β , (j + 1) 2β , j ≥ 2 we have g 3 (t) > 0. Now g 2 2 (t) > 0 and Let Y be a random variable with discrete uniform distribution on {1, . . . , k t } and write k −1 t kt j=1 j α = EY α for α > 0. Then the above expression can be written For t > 2 2β we have k t > 1, and Var Y 2β > 0. This shows that g 3 (t) is strictly monotone increasing for t > 2 2β . It is easy to see that g 3 (t) ∞ as t → ∞, so that g 3 (t) = M/ρ has a unique solution.