Intensity estimation of non-homogeneous Poisson processes from shifted trajectories

This paper considers the problem of adaptive estimation of a non-homogeneous intensity function from the observation of n independent Poisson processes having a common intensity that is randomly shifted for each observed trajectory. We show that estimating this intensity is a deconvolution problem for which the density of the random shifts plays the role of the convolution operator. In an asymptotic setting where the number n of observed trajectories tends to infinity, we derive upper and lower bounds for the minimax quadratic risk over Besov balls. Non-linear thresholding in a Meyer wavelet basis is used to derive an adaptive estimator of the intensity. The proposed estimator is shown to achieve a near-minimax rate of convergence. This rate depends both on the smoothness of the intensity function and the density of the random shifts, which makes a connection between the classical deconvolution problem in nonparametric statistics and the estimation of a mean intensity from the observations of independent Poisson processes.


Introduction
Poisson processes became intensively studied in the statistical theory during the last decades. Such processes are well suited to model a large amount of phenomena. In particular, they are used in various applied fields including genomics, biology and imaging. In this paper, we consider the problem of estimating nonparametrically a mean pattern intensity λ from the observation of n independent and non-homogeneous Poisson processes N 1 , . . . , N n on the interval [0, 1]. This problem arises when data (counts) are collected independently from n individuals according to similar Poisson processes. In many applications, such data can be modeled as independent Poisson processes whose non-homogeneous intensities have a common shape. A simple model, that is well studied for genomics applications [25], is to assume that the intensity functions λ 1 , . . . , λ n of the Poisson processes N 1 , . . . , N n are randomly shifted versions λ i (·) = λ(· − τ i ) of a common intensity λ, where τ 1 , . . . , τ n are i.i.d. random variables. The intensity λ that we want to estimate is thus the same for all the observed processes up to random translations.
Basically, such a model corresponds to the assumption that the recording of counts does not start at the same time (or location) from one individual to another, e.g. when reading DNA sequences from different subjects in genomics [22].
In more rigorous terms, let τ 1 , . . . , τ n be i.i.d. random variables with known density g with respect to the Lebesgue measure on R. Let λ : [0, 1] → R + a real-valued function. Throughout the paper, it is assumed that λ can be extended outside [0, 1] by periodization i.e. by taking λ(t) = λ(t mod 1) for t / ∈ [0, 1], where t mod 1 denotes the modulo operation. We suppose that, conditionally to τ 1 , . . . , τ n , the point processes N 1 , . . . , N n are independent Poisson processes on the measure space ([0, 1], B([0, 1]), dt) with intensities λ i (t) = λ(t − τ i ) for t ∈ [0, 1], where dt is the Lebesgue measure. Hence, conditionally to τ i , N i is a random countable set of points in [0, 1], and we denote by dN i t = dN i (t) the discrete random measure T ∈N i δ T (t) for t ∈ [0, 1], where δ T is the Dirac measure at point T . In other terms, conditionally to τ 1 , . . . , τ n , one has that for any set A ∈ B([0, 1]) and for each 1 ≤ i ≤ n, the number of points of N i lying in A is a random variable N i (A) = A dN i t = A dN i (t) which is Poisson distributed with parameter A λ(t − τ i )dt. Moreover, for all finite family of disjoint measurable sets A 1 , . . . , A p of B([0, 1]), the random variables N i (A 1 ), . . . , N i (A p ), i = 1 . . . , n are independent. For an introduction to non-homogeneous Poisson processes we refer to [13]. The objective of this paper is to study the estimation of λ from a minimax point of view as the number n of observed Poisson processes tends to infinity.
Denote by λ 2 2 = 1 0 |λ(t)| 2 dt the squared norm of a function λ belonging to the space L 2 ([0, 1]) of squared integrable functions on [0, 1] with respect to dt. Let Λ ⊂ L 2 ([0, 1]) be some smoothness class of functions, and letλ n ∈ L 2 ([0, 1]) denote an estimator of the intensity function λ ∈ Λ, i.e a measurable mapping of the random processes N i , i = 1, . . . , n taking its value in L 2 ([0, 1]). Define the quadratic risk of the estimatorλ n as R(λ n , λ) = E λ n − λ 2 2 , and introduce the following minimax risk R n (Λ) = inf λn sup λ∈Λ R(λ n , λ), where the above infimum is taken over the set of all possible estimators constructed from N 1 , . . . , N n . In order to investigate the optimality of an estimator, the main contribution of this paper is to derive upper and lower bounds for R n (Λ) when Λ is a Besov ball, and the construction of an adaptive estimator that achieves a near-minimax rate of convergence. The estimation of the intensity of non-homogeneous Poisson process has recently attracted a lot of attention in nonparametric statistics. In particular the problem of estimating a Poisson intensity from a single trajectory has been studied using model selection techniques [19] and non-linear wavelet thresholding [7], [14], [20], [23]. Adopting an inverse problem point of view, estimating the intensity function of an indirectly observed non-homogeneous Poisson process has been considered by [1], [6], [17]. Poisson noise removal has also been considered by [8], [24] for image processing applications. Deriving optimal estimators of a Poisson intensity using a minimax point of view has been considered in [6], [19], [20] [23], in the setting where the intensity of the observed process λ(t) = κλ 0 (t) such that the function to estimate is the scaled intensity λ 0 and κ is a positive real, representing an "observation time", that is let going to infinity to study asymptotic properties.
In this paper, since we observe n independent Poisson processes, we adopt a different asymptotic setting where n tends to infinity. In this framework, our main result is that estimating λ corresponds to a deconvolution problem where the density g of the random shifts τ 1 , . . . , τ n is a convolution operator that has to be inverted. Hence, estimating λ falls into the category of Poisson inverse problems. A related model of randomly shifted curves observed with Gaussian noise has been considered by [2] and [3]. The results in [2] show that estimating a mean shape curve in such models is a deconvolution problem. However, to the best of our knowledge, the case of estimating a mean intensity from randomly shifted trajectories in the case of a Poisson noise has not been considered before. The presence of the random shifts significantly complicates the construction of upper and lower bounds for the minimax risk. In particular, to derive a lower bound, standard methods such as Assouad's cube technique that is widely used for standard deconvolution problems in a white noise model (see e.g. [18] and references therein) have to be carefully adapted to take into account the effect of the random shifts.
The rest of the paper is organized as follows. In Section 2, we describe an inverse problem formulation for the estimation of λ, and a linear but nonadaptive estimator of the intensity is proposed. Section 3 is devoted to adaptive estimation using non-linear Meyer wavelet thresholding, and to the construction of an upper bound on the minimax risk over Besov balls. In Section 4 a lower bound on the minimax risk is derived.

Linear estimation 2.1 Inverse problem formulation
For each observed counting process, the presence of a random shift complicates the estimation of the intensity λ. Indeed, for all i ∈ {1, . . . , n} and any f ∈ L 2 ([0, 1]) we have where E[.|τ i ] denotes the conditionnal expectation with respect to the variable τ i . Thus Hence, the mean intensity of each randomly shifted process is the convolution λ ⋆ g between λ and the density of the shifts g. This shows that a parallel can be made with the classical statistical deconvolution problem which is known to be an inverse problem. This parallel is highlighted by taking a Fourier transformation of the data. Let (e ℓ ) ℓ∈Z the complex Fourier basis on [0, 1], i.e. e ℓ (t) = e i2πℓt for all ℓ ∈ Z and t ∈ [0, 1]. For ℓ ∈ Z, define as the Fourier coefficients of the intensity λ and the density g of the shifts. Then, for ℓ ∈ Z, define y ℓ as where we have used the notationγ Hence, the estimation of the intensity λ can be formulated as follows: we want to estimate the sequence (θ ℓ ) ℓ∈Z of Fourier coefficients of λ from the sequence space model where the ξ ℓ,n are centered random variables defined as The model (2.4) is very close to the standard formulation of statistical linear inverse problems. Indeed, using the singular value decomposition of the considered operator, the standard sequence space model of an ill-posed statistical inverse problem is (see [5] and the references therein) where the γ ℓ 's are eigenvalues of a known linear operator, and the z ℓ 's represent an additive random noise. The issue in model (2.5) is to recover the coefficients θ ℓ from the observations c ℓ . A large class of estimators in model (2.5) can be written aŝ where δ = (δ ℓ ) ℓ∈Z is a sequence of reals with values in [0, 1] called filter (see [5] for further details). Equation (2.4) can be viewed as a linear inverse problem with a Poisson noise for which the operator to invert is stochastic with eigenvaluesγ ℓ (2.3) that are unobserved random variables. Nevertheless, since the density g of the shifts is assumed to be known and given that Eγ ℓ = γ ℓ andγ ℓ ≈ γ ℓ for n sufficiently large (in a sense which will be made precise later on), an estimation of the Fourier coefficients of f can be obtained by a deconvolution step of the form where δ = (δ ℓ ) ℓ∈Z is a filter whose choice has to be discussed. In this paper, the following type of assumption on g is considered: Assumption 2.1 The Fourier coefficients of g have a polynomial decay i.e. for some real ν > 0, there exist two constants C ≥ C ′ > 0 such that C ′ |ℓ| −ν ≤ |γ ℓ | ≤ C|ℓ| −ν for all ℓ ∈ Z.
In standard inverse problems such as deconvolution, the expected optimal rate of convergence from an arbitrary estimator typically depends on such smoothness assumptions for g. The parameter ν is usually referred to as the degree of ill-posedness of the inverse problem, and it quantifies the difficult of inverting the convolution operator. We will also need the following technical assumption on the decay of the density g, which is not a very restrictive condition as g is supposed to be an integrable function on R.
Assumption 2.2 There exists a constant C > 0 and a real α > 1 such that the density g satisfies g(x) ≤ C 1+|x| α for all x ∈ R.

A linear estimator by spectral cut-off
First, we propose a non-adaptive estimator in order to derive an upper bound on the minimax risk. This part allows us to shed light on the connexion between our model and a deconvolution problem. For a given filter (δ ℓ ) ℓ∈Z and using (2.6), a linear estimator of λ is given bŷ whose quadratic risk can be written in the Fourier domain as The following proposition illustrates how the quality of the estimatorλ δ (in term of quadratic risk) is related to the choice of the filter δ.
Proposition 2.1 For any given non-random filter δ, the risk ofλ δ can be decomposed as Proof. Remark thatθ where the ǫ ℓ,i are centered random variables defined as ǫ ℓ,i = γ −1 Taking expectation in the above expression yields Now, remark that given two integers i = i ′ and the two shifts τ i , τ i ′ , ǫ ℓ,i and ǫ ℓ,i ′ are independent with zero mean. Therefore, using the equality one finally obtains Using in what follows the equality where the last equality follows from the fact that λ has been extended outside [0, 1] by periodization, which completes the proof.
Note that the quadratic risk of any linear estimator in model (2.4) is composed of three terms. The two first terms in the risk decomposition (2.8) correspond to the classical bias and variance in statistical inverse problems. The third term corresponds to the error related to the fact that the inversion of the operator is done using (γ l ) l∈Z instead of the (unobserved) random eigenvalues (γ l ) l∈Z .

Upper bound of the minimax risk on Sobolev balls
There exist different type of filters in the inverse problems literature (see e.g. [5]). In this section, we consider the family of projection (or spectral cut-off) filters δ M = (δ ℓ ) ℓ∈Z = 1 1 {|ℓ|≤M } ℓ∈Z for some M ∈ N. Using Proposition 2.1, it follows that The proof follows immediately from the decomposition (2.10), the definition of H s (A) and Assumption (2.1). Hence, Proposition 2.2 shows that under Assumption 2.1 the quadratic risk R(λ δ M , λ) is of polynomial order of the sample size n, and that this rate deteriorates as the degree of illposedness ν increases. Such a behavior is a well known fact for standard deconvolution problems, see e.g. [18], [12] and references therein. Proposition 2.2 shows that a similar phenomenon holds for our linear estimator. Hence, there may exist a connection between estimating a mean pattern intensity from a set of non-homogeneous Poisson processes and the statistical analysis of deconvolution problems. However, the choice of M n depends on the a priori unknown smoothness s of the intensity λ. Such a spectral cut-off estimator is thus non-adaptive and is of limited interest for applications. Moreover, the result of Proposition 2.2 is only suited for smooth functions since Sobolev balls H s (A) for s > 1/2 are not well adapted to model intensities λ which may have singularities. In the following section, we thus consider the problem of constructing an adaptive estimator and we study its minimax risk over Besov balls.

Meyer wavelets
We will use Meyer wavelets to obtain a non-linear and adaptive estimator. Let us denote by ψ (resp. φ) the periodic mother Meyer wavelet (resp. scaling function) on the interval [0, 1] (see e.g. [18,12] for a precise definition). The intensity λ ∈ L 2 ([0, 1]) can then be decomposed as follows , j 0 ≥ 0 denotes the usual coarse level of resolution, and are the scaling and wavelet coefficients of λ. It is well known that Besov spaces can be characterized in terms of wavelet coefficients (see e.g [16]). Let s > 0 denote the usual smoothness parameter, then for the Meyer wavelet basis and for a Besov ball B s p,q (A) of radius A > 0 with 1 ≤ p, q ≤ ∞, one has that with the respective above sums replaced by maximum if p = ∞ or q = ∞. The parameter s is related to the smoothness of the function f . Note that if p = q = 2, then a Besov ball is equivalent to a Sobolev ball if s is not an integer. For 1 ≤ p < 2, the space B s p,q (A) contains functions with local irregularities.
Meyer wavelets satisfy the fundamental property of being band-limited function in the Fourier domain which make them well suited for deconvolution problems. More precisely, each φ j,k and ψ j,k has a compact support in the Fourier domain in the sense that and where D j 0 and Ω j are finite subsets of integers such that #D j 0 ≤ C2 j 0 , #Ω j ≤ C2 j for some constant C > 0 independent of j and and from the unfiltered estimatorθ ℓ = γ −1 ℓ y ℓ of each θ ℓ , see equation (2.4), one can build estimators of the scaling and wavelet coefficients by defininĝ

Hard thresholding estimation
We propose to use a non-linear hard thresholding estimator defined bŷ In the above formula,ŝ j (n) refers to possibly random thresholds that depend on the resolution j, while j 0 = j 0 (n) and j 1 = j 1 (n) are the usual coarsest and highest resolution levels whose dependency on n will be specified later on. Then, let us introduce some notations. For all j ∈ N, and for any γ > 0, let where K i = 1 0 dN i t is the number of points of the counting process N i for i = 1, . . . , n. Introduce also the class of bounded intensity functions Theorem 3.1 Suppose that g satisfies Assumption 2.1 and Assumption 2.
is computed using the random thresholdŝ with γ ≥ 2, and where σ 2 j and ǫ j are defined in (3.4). Define j 0 (n) as the largest integer such that 2 j 0 (n) ≤ log n and j 1 (n) as the largest integer such that 2 j 1 (n) ≤ n log n 1 2ν+1 . Then, as n → +∞, Hence, Theorem 3.1 shows that under Assumption 2.1 the quadratic risk of the non-linear estimatorλ h n is of polynomial order of the sample size n, and that this rate deteriorates as ν increases. Again, this result illustrates the connection between estimating a mean intensity from the observation of Poisson processes and the analysis of inverse problems in nonparametric statistics. Note that the choices of the random thresholdsŝ j (n) and the highest resolution level j 1 do not depend on the smoothness parameter s. Hence, contrary to the linear estimator proposed in Section 2, the non-linear estimatorλ h n is said to be adaptive with respect to the unknown smoothness s. Moreover, the Besov spaces B s p,q (A) may contain functions with local irregularities. The above described non-linear estimator is thus suitable for the estimation of non-globally smooth functions.
In Section 4, we show that the rate n − 2s 2s+2ν+1 is a lower bound for the asymptotic decay of the minimax risk over a large scale of Besov balls. Hence, the wavelet estimator that we propose is almost optimal up to a logarithmic term which is usually called the price to be paid for adaptivity.

Proof of the upper bound
Following standard arguments in wavelet thresholding to derive the rate of convergence of such non-linear estimators (see e.g. [18]), one needs to bound the centered moment of order 2 and 4 of c j 0 ,k andβ j,k (see Proposition 3.1), as well as the deviation in probability betweenβ j,k and β j,k (see Proposition 3.2). In the proof, C, C ′ , C 1 , C 2 denote positive constants that are independent of λ and n, and whose value may change from line to line. The proof requires technical results that are postponed and proved in Section 3.3.2. First, define the following quantities is a sequence of reals such that u n (γ) = o γ log n n as n → +∞.

Proof of Theorem 3.1
As classically done in wavelet thresholding, use the following risk decomposition Bound on R 4 : first, recall that following our assumptions, Lemma 19.1 of [11] implies that where C is a constant depending only on p, q, s, A. Since by definition . Note that in the case p ≥ 2, then s * = s and thus 2s 2ν+1 > 2s 2s+2ν+1 . In the case 1 ≤ p < 2, then s * = s + 1/2 − 1/p, and one can check that the conditions s > 1/p and s * p > ν(2 − p) imply that 2s * 2ν+1 > 2s 2s+2ν+1 . Hence in both cases one has that Bound on R 1 : using Proposition 3.1 and the inequality 2 j 0 ≤ log n it follows that Bound on R 2 and R 3 . remark that R 2 ≤ R 21 + R 22 and R 3 ≤ R 31 + R 32 with Now, applying twice the Cauchy-Schwarz inequality, we get that for all sufficiently large n where ∆ n jk (γ) is defined in (3.6). Moreover, by (3.1) there exists two Now, define the non-random threshold Using that V 2 j = o(σ 2 j ) and δ j = o (ǫ j ) as j → +∞, and that j 0 (n) → +∞ as n → +∞ it follows that for all sufficiently large n and j 0 (n) From equation (3.32) (see below), one has that P λ 1 ≥K n ≤ 2n −γ , which implies that s j (n) ≤ŝ j (n) with probability larger than 1 − 2n −γ . Hence, by inequalities (3.10) and (3.12), it follows that for all sufficiently large n 2 max with probability larger than 1 − 2n −γ . Therefore, for all sufficiently large n, Proposition 3.2 and inequality (3.13) imply that for all j 0 (n) ≤ j ≤ j 1 (n).
Proposition 3.1 There exists C > 0 such that for any j ≥ 0 and 0 ≤ k ≤ 2 j − 1 and Proof : We only prove the proposition for the wavelet coefficientsβ j,k since the arguments are the same to prove the result for the scaling coefficientsĉ j,k . Remark first thatβ j,k − β j,k = ℓ∈Ω j c ℓ (ψ j,k )(θ ℓ − θ ℓ ) = Z 1 + Z 2 , where Z 1 and Z 2 are the centered variables and where Control of the moments of Z 1 : by arguing as in the proof of Proposition 3 in [2], one obtains that there exists a universal constant C > 0 such that The main arguments to obtain (3.26) rely on concentration inequalities on the variables τ i , i = 1, . . . , n.
Proof : Using the notations introduced in the proof of Proposition 3.1, writeβ j,k − β j,k = Z 1 + Z 2 and remark that for any u > 0 Now, arguing as in Proposition 4 in [2] and using Bernstein's inequality, one has immediately that Let us now control the deviation of Z 2 = 1 n n i=1 1 0ψ j,k (t)dÑ i t . First, remark that conditionnaly to the shifts τ 1 , . . . , τ n , the process Using an analogue of Bennett's inequality for Poisson processes (see e.g. Proposition 7 in [19]), we get that for any s > 0 Remark that the quantity M n jk is not computable from the data as its depends on λ and the unobserved shifts τ 1 , . . . , τ n . Nevertheless it is possible to compute a data-based upper bound for M n jk . Indeed, note that Bernstein's inequality (see e.g. Proposition 2.9 in [15]) implies that P M n jk > M jk +M jk γ log n 3n + 2γ log n n ≤ n −γ .
withM jk = λ ∞ ψ j,k 2 2 . Obviously,M jk is unknown but for all sufficiently large n, one has thatM jk = λ ∞ ψ j,k 2 2 ≤ log n ψ j,k 2 2 . Moreover, remark that M jk = ψ jk √ λ ⋆ g 2 2 ≤ ψ jk 2 2 g ∞ λ 1 . Hence, To obtain a data-based upper bound for M n jk , it remains to obtain an upper bound for λ 1 . Recall that we have denoted by K i the number of points of the process N i . Conditionally to τ i , K i is real random variable that follows a Poisson distribution with intensity 1 0 λ(t−τ i )dt. Since λ is assumed to be periodic with period 1, it follows that for any i = 1, . . . , n, 1 0 λ(t − τ i )dt = 1 0 λ(t)dt, and thus (K i ) i=1,...,n are i.i.d. random variables following a Poisson distribution with intensity λ 1 = 1 0 λ(t)dt. Using standard arguments to derive concentration inequalities one has that for any u > 0 where λ 1 = 1 0 λ(t)dt. Now, define the function h(y) = y 2 − √ 2ay − a/3 for y ≥ 0 and with a = u/n. Then, the above inequality can be written as Since h restricted on [ √ a( √ 30 + 3 √ 2)/6; +∞[ is invertible with h −1 (y) = y + 5a 6 + a 2 it follows that for u = γ log n and all sufficiently large n P λ 1 ≥K n + 4γ log n 3n + 2γ log n nK n +

Some properties of Meyer wavelets
Recall that the Meyer mother wavelet ψ is not compactly supported. Nevertheless, Meyer wavelet function satisfies the following proposition which will be useful for the construction of a lower bound of the minimax risk.
Proposition 4.1 There exists a universal constant c(ψ) such that for any j ∈ N and for any Proof : Note that for periodic Meyer wavelets, one has that sup x∈R k∈Z Hence the proof follows using the definition of ψ j,k (x) = 2 j/2 ψ(2 j x − k).

Definitions and notations
Recall that τ 1 , . . . , τ n are i.i.d. random variables with density g, and that for λ ∈ Λ 0 a given intensity, we denote by N 1 , . . . , N n the counting processes such that conditionally to τ 1 , . . . , τ n , N 1 , . . . , N n are independent Poisson processes with intensities λ(·−τ 1 ), . . . , λ(·−τ n ). Then, the notation E λ will be used to denote the expectation with respect to the distribution P λ (tensorized law) of the multivariate counting process N = N 1 , . . . , N n . In the rest of the proof, we also assume that p, q denote two integers such that 1 ≤ p ≤ ∞, 1 ≤ q ≤ ∞, A is a positive constant, and that s is a positive real such that s > 2ν + 1, where ν is the degree of ill-posedness defined in Assumption 2.1.
A key step in the proof is the use of the likelihood ratio Λ(H 0 , H 1 ) between two measures associated to two hypotheses H 0 and H 1 on the intensities of the Poisson processes we consider. The following lemma, whose proof can be found in [4], is a Girsanov's like formula for Poisson processes.
The above lemma means that if F (N ) is a real-valued and bounded measurable function of the counting process N = N 1 (hypothesis H 1 ), then where E H 1 denotes the expectation with respect to P λ 1 (hypothesis H 1 ), and E H 0 denotes the expectation with respect to P λ 0 (hypothesis H 0 ). Obviously, one can adapt

Minoration of the minimax risk using the Assouad's cube technique
Let us first describe the main idea of the proof. In Lemma 4.2, we provide a first result giving a lower bound on the quadradic risk of any estimator over a specific set of test functions. These test functions are appropriate linear combinations of Meyer wavelets whose construction follows ideas of the Assouad's cube technique to derive lower bounds for minimax risks (see e.g. [9,18]).
A key step in the proof of Lemma 4.2 is the use of the likelihood ratio formula (4.2). Then, we detail precisely in Lemma 4.3 the asymptotic behavior of the likelihood ratio (4.5) defined in Lemma 4.2 under well-chosen hypotheses H 1 and H 0 . The result of Theorem 4.1 then follows from these two lemmas.
Given an integer D ≥ 1, introduce For any ω = (ω k ) k=0,..., In what follows, we will use the likelihood ratio formula (4.2) with the intensity which corresponds to the hypothesis H 0 under which all the intensities of the observed counting processes are constant and equal to A/2 where A is the radius of the Besov ball B s p,q (A). Next, for any ω ∈ {0, 1} 2 D −1 , we denote by λ D,ω the intensity defined as for some constant 0 < c ≤ A/(2+c(ψ)), and where c(ψ) is the constant introduced in Proposition 4.1. For the sake of convenience, we omit in what follows the subscript D and write λ ω instead of λ D,ω . First, remark that each function λ ω can be written as λ ω = ρ(A) + µ ω where is a positive intensity belonging to Λ 0 by Proposition 4.1. Moreover, it can be checked that the condition c ≤ A/(2 + c(ψ)) implies that λ ω ∈ B s p,q (A). Therefore, λ ω ∈ S D (A) for any ω ∈ {0, 1} 2 D . The following lemma provides a lower bound on S D .
We detail in the next paragraph how to use Lemma 4.2 with a suitable value for the parameter D to obtain the desired lower bound on the minimax risk.

Quantitative settings
In the rest of the proof, we will suppose that D = D n satisfies the asymptotic equivalence 2 Dn ∼ n 1 2s+2ν+1 as n → +∞. (4.8) To simplify the notations we will drop the subscript n, and we write D = D n . For two sequences of reals (a n ) n≥1 and (b n ) n≥1 we use the notation a n ≍ b n if there exists two positive constants C, C ′ > 0 such that C ≤ an bn ≤ C ′ for all sufficiently large n. Then, define m Dn = 2 Dn/2 ξ Dn . Since ξ Dn = c2 −Dn(s+1/2) , it follows that m Dn ≍ n −s/(2s+2ν+1) → 0 as n → ∞. Remark also that the condition s > 2ν + 1 implies that as n → ∞.
4.6 Lower bound of the "likelihood ratio" Q k,ω The above quantitative settings combined with Lemma 4.2 will allow us to obtain a lower bound of the minimax risk. For this purpose, let 0 < δ < 1, and remark that Lemma 4.2 implies that (4.9) The remainder of the proof is thus devoted to the construction of a lower bound in probability for the random variable Q k,ω (N ) := I 1 I 2 where where to simplify the presentation of the proof we have taken ρ(A) = 1 i.e. A = 2. Then, the following lemma holds (which is also valid for ρ(A) = 1). its L 2 norm and by λ ∞ = sup t∈[0,1] {|λ(t)|} its supremum norm. In the proof, we repeatedly use the following inequalities that hold for any ω ∈ {0, 1} 2 Dn µ ω ≤ µ ω ∞ ≤ 2c(ψ)m Dn → 0, (4.10) λ ω ≤ λ ω ∞ ≤ ρ(A) + 2c(ψ)m Dn → ρ(A) = 1/2, as n → +∞. Since for any k, one has 1 0 ψ D,k (t)dt = 0, it follows that for any ω and α, 1 0 µ ω (t − α)dt = c(ψ)ξ Dn 2 Dn/2 = c(ψ)m Dn . Therefore, and Let z > 0 be a positive real, and consider the following second order expansion of the logarithm Applying (4.11) implies that and that Then, remark that inequalities (4.10) imply that Therefore, by Markov's inequality it follows that Dn as n → +∞. Hence, using inequality (4.12), one obtains that and by inequality (4.13) it follows that Combining the above inequalities and the Fubini's relation we obtain that (4.14) Let z ∈ R and consider the following second order expansion of the exponential exp(z) = 1 + z + z 2 2 + z 3 6 exp(u) for some − |z| ≤ u ≤ |z|.
By inequalities (4.10), one has that Therefore, |z| = O p (m Dn ) by Markov's inequality. Since m Dn → 0, we obtain by using (4.15) that for each i ∈ {1, . . . , n}, where we have used the fact that . From the definition of J 1 in (4.14), we can use a stochastic version of the Fubini theorem (see [10], Theorem 5.44) to obtain At this step, it will be more convenient to work with the logarithm of the term J 1 . We have Using again the second order expansion of the logarithm (4.11), we obtain that Using similar arguments for the term J 2 defined in (4.14), we obtain that Combing the above equalities for J 1 and J 2 , we obtain the following lower bound for ln(Q k,ω (N )) g ⋆ λωk 2 (4.16) In what follows, we will show that, for all sufficiently large n, the terms (4.16)-(4.20) are bounded from below (in probability). Since nm 3 Dn → 0, this will imply that there exists c > 0 (not depending on λ ω ) and a constant p(c) > 0 such that for all sufficiently large n which is the result stated in Lemma 4.3.
Lower bound for (4.16): since for any 1 ≤ i ≤ n We obtain that Remark that µ ω − µωk = ±ξ D ψ Dk . In what follows we will repeatidely use the following relation which follows from Assumption 2.2 combined with Parseval's relation, from the fact that #Ω D ≍ 2 D and that under Assumption 2.1 |γ ℓ | ≍ 2 −Dν for all ℓ ∈ Ω D . Therefore which implies that there exists a constant 0 < c 0 < +∞ such that for all sufficiently large n the deterministic term (4.16) satisfies In the rest of the proof, we show that, for all sufficiently large n, the terms (4.17)-(4.20) are bounded from below in probability. Without loss of generality, we consider only in what follows the case µ ω − µωk = ξ D ψ Dk .

Lower bound on B p,q s (A)
By applying inequality (4.9) and Lemma 4.3, we obtain that there exists 0 < δ < 1 such that for all sufficiently large n