Estimation and detection of functions from anisotropic Sobolev classes

: We consider the problems of estimating and detecting an un- known function f depending on a multidimensional variable (for instance, an image) observed in the Gaussian white noise. It is assumed that f be- longs to anisotropic Sobolev class. The case of a function of inﬁnitely many variables is also considered. An asymptotic study (as the noise level tends to zero) of the estimation and detection problems is done. In connection with the estimation problem, we construct asymptotically minimax estima- tors and establish sharp asymptotics for the minimax integrated squared risk. In the detection problem, we construct asymptotically minimax tests and provide conditions for distinguishability in the problem.


Introduction
Recently nonparametric estimation and detection of multivariate signals, in a variety of estimation and testing schemes, aroused considerable interest. In this paper we study the problem of estimating and detecting a multivariate function f ∈ F ⊂ L 2 ([0, 1] d ) = L d 2 , 1 ≤ d ≤ ∞, observed in the Gaussian white noise model where W is a d-dimensional Gaussian white noise, ε > 0 is a small parameter (noise intensity), and F is a subset of L d 2 that consists of sufficiently smooth functions. In this model, the "observation" is the function X ε : L d 2 → G taking its values in the set G of normal random variables such that if ξ = X ε (φ), η = X ε (ψ), where φ, ψ ∈ L d 2 , then E(ξ) = (f, φ), E(η) = (f, ψ), and Cov(ξ, η) = ε 2 (φ, ψ). For any f ∈ L d 2 , the observation X ε determines the Gaussian measure P ε,f on L d 2 with mean function f and covariance operator ε 2 I, where I is the identity operator (see [4,17] for references). The corresponding expectation is denoted by E ε,f . In this paper we study the case of fixed and finite d and the case d = ∞.
We assume that f belongs to a Sobolev class F of functions with anisotropic constraints of regularity. One problem of interest is to estimate an unknown signal f using quadratic loss. Another problem of interest is to detect f , that is, to test the hypothesis H 0 : f = 0 versus a family of nonparametric alternatives of the form H 1ε : f ∈ F , f 2 ≥ r ε , where · 2 is the L 2 -norm and r ε → 0 is a positive family.
Let {φ l } l∈L be a fixed orthonormal basis in L d 2 , with L being a countable set. Then model (1.1) can be equivalently represented by the Gaussian sequence space model X ε,l = θ l + εξ l , ξ l i.i.d.
∼ N (0, 1), l ∈ L, (1.2) where θ l = (f, φ l ) are the Fourier coefficients of f with respect to the basis {φ l } l∈L and X ε,l = X ε (φ l ) are the empirical Fourier coefficients. Then, the problems of interest can be restated, in an obvious way, in terms of the Fourier coefficients. Anisotropic functional classes were studied in [4,12,13,15] among others in connection with estimating functions of a multidimensional variable. Anisotropic constraints provide for a possible disparity of the inhomogeneous aspect in different directions. We will be interested in Sobolev functional classes described with the aid of Fourier coefficients. In this paper, assuming that f belongs to a Sobolev ball of varying radius, we construct asymptotically minimax estimators and provide optimal rates of convergence and exact asymptotic constants in the estimation problem (see Theorem 3.1 and Remark 3.1), cf. Theorem 1 of [15]. Also, for fixed and finite d we construct a family of asymptotically minimax tests and establish sharp asymptotics of the minimax total error probability in the detection problem (Theorem 3.2).
The estimation and detection problems for infinite-dimensional model (1.1) with d = ∞ or d = d ε → ∞ were studied in [5,[7][8][9][10]. Compared to the d-dimensional case with d being fixed and finite, the problems for infinite-dimensional Gaussian white noise have a much richer analytical content. This motivates the study of the case d = ∞. Another argument in favour of studying the infinitedimensional model is that in reality the true dimensionality may not be known or may vary. Then, the results for infinite d, in conjunction with those for finite d, would form a clearer picture of the state of nature.
In comparison with the case when d is fixed and finite, the problems of estimating and detecting an infinite-dimensional signal are more challenging from a mathematical point of view. So far, we have established logarithmic asymptotics in these two problems (Theorem 3.3). When studying the case of fixed and finite d we use some general results of the minimax theory given by Theorems 2.1 and 2.2. For d = ∞ the analysis is completely different and is done by using probabilistic methods for the study of the so-called count function.
The paper is organized as follows. First we introduce anisotropic Sobolev classes with d being fixed and finite and then extend the definition to the infinitedimensional case (Section 2.1). After that we formulate the problems of interest (Section 2.2), and introduce some general results of the minimax theory used in the subsequent sections (Section 2.3). The main results are collected in Section 3. The last section of the paper, Section 4, is rather diverse and contains the tools of study, auxiliary results, and proof of theorems.
The majority of limits in the paper are taken as ε → 0. The relation a ε ∼ b ε means lim ε→0 a ε /b ε = 1. The relation a ε ≍ b ε means that there exist constants 0 < c < C < ∞ and a number ε

Definition of anisotropic Sobolev balls
Assume that d is fixed and finite and consider functional classes indexed by a smoothness parameter σ = (σ 1 , . . . , σ d ), σ j > 0, j = 1, . . . , d, that are defined by seminorms. Such classes are introduced as follows.
First, assume that σ j is a positive integer and that f is σ j -smooth in the jth argument, j = 1, . . . , d. For such a function f , define the seminorm f σ,2 by where ∂ σj f /∂x σj j is a (generalized) derivative of order σ j in the jth direction (see, for example, [14,Sec. 4.1]), and denote by F σ,d the anisotropic Sobolev ball, i.e., In a general case of σ = (σ 1 , . . . , σ d ) with σ j > 0, we assume that all partial derivatives ∂ m j f ∂x m j j of order 0 ≤ m j ≤ [σ j ], j = 1, . . . , d, are 1-periodic, that is, for all k = 1, . . . , d, For When the σ j 's are positive integers this corresponds to (2.1) under the periodic constraints.
We now move on to the case d = ∞ and remind the definition of the space L ∞ 2 = L 2 ([0, 1] ∞ ) of square integrable functions of infinitely many variables (see [8]).
The space L ∞ 2 is a Hilbert space with a standard scalar product. Its basis is easy to specify by using the following argument. For each d ∈ N the standard projection P d : so that L ∞ 2 is a separable Hilbert space.) Next, define the set Z ∞ 0 that consists of infinite sequences (l j ) with finitely many nonzero terms: Then, by the fact and Lin {φ l (x)} l∈Z ∞ 0 is the space spanned by the functions φ l (x), l ∈ Z ∞ 0 . Thus the orthonormal system {φ l (x)} l∈Z ∞ 0 form the basis of L ∞ 2 . Now we consider smoothness constraints that are applicable to functions of infinitely many variables. First, let σ = (σ 1 , σ 2 , . . .) be an infinite sequence of positive integers. Define the semi-norm f σ,2 by Assume, as before, that f together with all its partial derivatives is 1-periodic, i.e., for all partial derivatives ∂ m j f ∂x m j j of order 0 ≤ m j ≤ σ j , j = 1, 2, . . . , and for all k = 1, 2, . . . , Thus, in a general case of σ = (σ 1 , σ 2 , . . .) with σ j > 0, j = 1, 2, . . ., we assume that all partial derivatives ∂ m j f ∂x m j j of order 0 ≤ m j ≤ [σ j ], j = 1, 2 . . . , are 1periodic, and set Under a periodic constraint, the anisotropic Sobolev ball for d = ∞ is given by

Estimation and detection over anisotropic Sobolev balls
When dealing with the estimation problem, we follow a familiar pattern. If for an estimatorf ε of f based on the observation X ε , and a sequence δ ε → ∞, δ ε × (maximal risk off ε ) ≤ C < ∞ for ε sufficiently small, and at the same time for any estimatorf ε of f based on X ε , δ ε × (maximal risk off ε ) ≥ c > 0 for ε sufficiently small then the estimatorf ε is said to be rate optimal. The parameter δ ε controls the best possible rate of convergence. A more delicate problem, called the sharp optimality problem, consists of finding the rate optimal estimator whose minimax risk is the smallest possible. That is, if one can find a rate optimal estimatorf ε such that the constants C and c obey the same asymptotics: the estimatorf ε is called asymptotically minimax, and A is called an exact asymptotic constant.
To be precise, for 1 ≤ d ≤ ∞ define the minimax integrated squared risk by where the infimum is taken over all possible estimatorsf ε of f based on the observation X ε . In this paper, we wish to find the asymptotically minimax esti- and establish sharp asymptotics, which includes convergence rates and exact asymptotic constants, for the risk R 2 ε (F σ,d ). Now, we turn to the detection problem. For a meaningful minimax testing problem, the alternative hypothesis must have some neighborhood of the null hypothesis removed. Therefore, for r ε > 0 and 1 ≤ d ≤ ∞, we put and consider testing the hypotheses When dealing with the detection problem, we judge the quality of testing by using the minimax criterion based on the total error probability. For a test ψ ε based on the observation X ε , define the error probabilities The maximum probability of type II error is then given by

Yu. Ingster and N. Stepanova
The quantity and the infimum is taken over all tests ψ ε based on X ε , is called the minimax total error probability. A family of tests ψ * ε is called asymptotically minimax if We are interested in finding asymptotics of γ ε (F σ,d (r ε )) and determining the structure of asymptotically minimax tests. In the context of signal detection problem, this is called the sharp optimality problem. It is always true that 0 ≤ γ ε (F σ,d (r ε )) ≤ 1. If the parameter r ε in the alternative hypothesis is too close to zero then γ ε (F σ,d (r ε )) → 1 as ε → 0, and one cannot distinguish between the null hypothesis and the alternative. Therefore the knowledge of the smallest r ε for which then the family r * ε is called the separation rate. Thus, another problem of interest to us is to find asymptotics for the separation rate r * ε . From a technical point of view, it is more convenient to deal with ellipsoids in sequence spaces rather than Sobolev balls in functional spaces. In the sequence space of Fourier coefficients, the ball F σ,d with fixed and finite d corresponds to the ellipsoid and for d = ∞ to the ellipsoid The estimation problem then transforms to constructing asymptotically minimax estimatorθ ε of θ using the data X ε,l in model (1.2), and establishing exact asymptotics for the minimax squared risk associated to the ellipsoid Θ σ,d , 1 ≤ d ≤ ∞: It is well known that R ε (Θ σ,d ) ≍ ε 2/(2+σ −1 ) (see [4,Sec. 16.3] and [15,Sec. 3]). Moreover, the sharp asymptotic relation for R ε (Θ σ,d ) exists (see Theorem 1 of [15]). In Sections 3 and 4, we shall state and prove a similar result for the ball Remark 3.1), and provide the asymptotically minimax estimator of f . We do that to illustrate our approach, which is somewhat different (and shorter) compared to the one in [15], and goes in parallel with deriving sharp asymptotics in the detection problem.
In the detection problem the set F σ,d (r ε ), 1 ≤ d < ∞, that specifies the alternative hypothesis corresponds to the ellipsoid with a small ball removed: If M = M ε is a positive constant such that r ε /M → 0 as ε → 0, then the results obtained for Θ σ,d (r ε ) are immediately extended to the set Θ σ,d (r ε , M ) defined similarly to (2.6) with l∈Z d θ 2 In both cases, the hypotheses to be tested become H 0 : θ = 0 versus H 1ε : θ ∈ Θ σ,d (r ε ).

Some general results
When estimating and detecting an infinite-dimensional vector from an ellipsoid in a sequence space, the sharp asymptotics of the minimax squared risk and minimax error probabilities are obtained by solving alike extremal problems.
Solutions to these problems, in implicit form, nowadays constitute standard results of the minimax theory. The first of these results connected to the estimation problem is largely due to Pinsker [16] (see also [1,Ch. 7 In what follows, we use notation (x) + = max(x, 0), x ∈ R.
Theorem 2.1. Let E 2 ε (σ, d) be the value of the extremal problem on the set of real-valued bilateral sequences {v l : l ∈ Z d }:

7)
where c 2 l = c 2 l (σ) are given by (2.2). Then 9) and the value of the problem is (1)). Jointly with (2.8) this yields The asymptotically minimax estimator is a weighted projection-type estimator ) be the value of the extremal problem

Sharp asymptotics for fixed d
Based on Theorems 2.1 and 2.2 we now establish two results that solve the sharp optimality problems in connection with estimating and detecting a multivariate signal f . We keep the notation of Theorems 2.1 and 2.2, and put Theorem 3.1. Assume that the dimension d < ∞ and the smoothness parameter σ = (σ 1 , . . . , σ d ) ∈ R d + , are fixed. Then as ε → 0 where the exact asymptotic constant c(σ, d) is given by the formula . (3.1) Theorem 3.2. Assume that the dimension d < ∞ and the smoothness parameter σ = (σ 1 , . . . , σ d ) ∈ R d + , are fixed. Then as ε → 0 where the exact asymptotic constant C(σ, d) is given by the formula .

(3.2)
Remark 3.1. By using rescaling arguments, it is straightforward to extend the results of Theorems 3.1 and 3.2 to the case of anisotropic Sobolev ball Indeed, settingc l = c l /M transforms the ellipsoid Θ σ,d (M ) = {θ = (θ l ) l∈Z d : stands for the value of extremal problem (2.7) andε = ε/M → 0, then by Theorem 3.1, as ε → 0, Similarly, if u 2 ε,c (F σ,d (r ε )) stands for the value of extremal problem (2.12) and r ε = r ε /M → 0, then by Theorem 3.2, as ε → 0 Remark 3.2. Theorem 3.1 together with Remark 3.1 extends Theorem 1 of [15] to the case of Sobolev ball F σ,d (M ε ) of varying radius M ε ≫ ε (with the assumption that M ≫ r ε in the detection problem). In addition, Theorems 3.1 and 3.2 extend Theorems 3 and 4 of [7]. Indeed, when σ 1 = . . . = σ d = σ > 0, our results coincide with those of Theorems 3 and 4 of [7] for the norm In this special (isotropic) case, the constants c(σ, d) and C(σ, d) are monotone in d; decreasing and increasing, respectively. In a general (anisotropic) case, when d = d ε → ∞ or d = ∞, under the assumption ∞ j=1 σ −1 j < ∞, formulas (3.1) and (3.2) yield Generally speaking, these asymptoics are not usable because Theorems 3.1 and 3.2 are proved for fixed d. For example, in the isotropic case, the limiting be-

Logarithmic asymptotics for d = ∞
The study of infinite-dimensional estimation and detection problems is done under the assumption We have the following theorem.

2)
It follows from (2.14), (2.15), and (2.17) that Let us study the asymptotic behaviour of I k , k = 0, 1, 2, when T → ∞. For this put where m j = T 1/σ j 2π . Then, recalling (2.2), as T → ∞, where Similarly, as T → ∞, Next, making the change of variables in the integrals in (4.7)-(4.9):     The integrals on the right-hand sides of (4.10)-(4.12) can be calculated using the Liouville formula (see, for example, [3, Ch. XVIII]) where p i > 0, i = 1, . . . d, and the integral on the right-hand side is absolutely convergent. Applying the Liouville formula, we get as T → ∞ and
Whence, by (4.5) and in view of Theorem 2.2, we arrive at the statement of Theorem 3.2.
Proof of Theorem 3.1. Similarly to the proof of Theorem 3.2, we need to study equations (2.9) and (2.10) as T → ∞. Denote 14) Let x lj be defined in (4.6), and let D σ,d and Σ d be as before. Applying the Liouville formula, we have as T → ∞ Similar calculations for the sum J 2 yield as T → ∞ Next, in view of (2.9) and (2.10), .
, and the statement of Theorem 3.1 follows.

Proof of Theorem 3.3
The proof of Theorem 3.3 utilizes the so-called count function and largely consists of studying its properties.

Count function
An important role in the analysis of the infinite-dimensional case is played by the count function N (t) which is defined for any t > 0 as follows: The count function can be thought of as the distribution function of the coefficients c l . It satisfies N (t) → ∞ as t → ∞, and determines rate asymptotics of integrated squared risk in the estimation problem and of separation rate in the detection problem. More precisely, for the estimation problem (see, for example, and for the detection problem In addition, under certain regularity constraints on N (t), this function controls sharp asymptotics of the minimax integrated squared risk R 2 ε (F σ,∞ ) and the minimax total error probability γ ε (F σ,d (r ε )) (see, for example, [10, Sec. 2] for details).

Probability measures
Due to (4.16) and (4.17) finding rate asymptotics of R ε (F σ,∞ ) and r * ε (F σ,∞ (r ε )) requires studying the properties of N (t) as t → ∞. A general method consists of defining a family of prior distributions P h on the set of indices Z ∞ 0 and investigating the behaviour of the function N (t) = card{l ∈ Z ∞ 0 : c l ≤ t} using probabilistic and analytical tools.
First, let us define a family of probability measures P h , depending on a positive parameter h, of the form To this end, define the random variables Y j (k) = (2π|k|) 2σj , j = 1, 2, . . . , k ∈ Z, and view the coefficients as realizations of the random variables S(l), l ∈ Z ∞ 0 . Then Next, for h > 0 define the probability measures P h,j on Z by and put (4.18) Using the arguments similar to those in [9, p. 16], it is readily shown that for leads to the representation, for any h > 0, t > 0,

Auxiliary results
This section contains some auxiliary results that will be used in the proof of Theorem 3.3.
Lemma 4.1. Let the sequence (σ j ) j≥1 be non-decreasing and let (3.4) hold true. Then Proof. The lemma is easily proved by contradiction. Suppose there exist a subsequence (σ j k ) k≥1 and a constant B > 0 such that Then for j k /2 ≤ j ≤ j k we have σ j ≤ 2Bj, and for k sufficiently large On the other hand, in view of (3.4), which contradicts (4.20). Hence, the lemma follows.
In the sequel, without loss of generality the sequence (σ j ) j≥1 is assumed non-decreasing.
By (4.19) the upper bound for log N (t) is controlled by the term Z(h) + ht 2 . The following lemma establishes the asymptotic behavior of Z(h). Proof. The key point is to split Z(h) appropriately into the main term, which gives the required asymptotics, and the remainder. We have (see (4.18)) where parameter J = J(h) → ∞ as h → 0 is chosen to have with some constants C j ≍ 1 and B j ≍ 1. Such a choice of J is possible because for "small" σ j 's, setting m j = h −1/(2σj ) , It remains to show that S 2 = o(log(h −1 )). By the inequality e y − e x ≥ e x (y − x), which is the same as e z − 1 ≥ z, we have (2π) 2σj − (2π) 2σJ ≥ (2π) 2σJ c(σ j − σ J ), c = 2 log(2π). Therefore using log(1 + x) ≤ x and noting that Therefore for the validity of (4.26) it suffices to show that for sufficiently small ε log N (T ) ≤ 4σ −1 log(ε −1 ) 4 + σ −1 (1 + o(1)), where by Lemma 4.2, Up to a vanishing term, the right-hand side of inequality in (4.28) is equal to (σ −1 /2) log(h −1 ) + hT 2 . The minimum of the latter (as a function of h) is attained at the point h = σ −1 /(2T 2 ). In other words, the minimum occurs when h ≍ ε 2 N 1/2 (T ), see (4.28). In this case hT 2 ≍ 1, and log(h −1 ) ∼ 2 log(ε −1 ) − 1 2 log N (T ).
Therefore, using (4. From this inequality (4.27), and hence the upper bound (4.26), follows. Combining (4.25) and (4.26), we get the required asymptotic expression for r * ε . For the estimation problem the proof is completely analogous, cf. (4.16) and (4.17). The proof of Theorem 3.3 is completed.