Nonparametric estimation of mark's distribution of an exponential Shot-noise process

In this paper, we consider a nonlinear inverse problem occurring in nuclear science. Gamma rays randomly hit a semiconductor detector which produces an impulse response of electric current. Because the sampling period of the measured current is larger than the mean inter arrival time of photons, the impulse responses associated to different gamma rays can overlap: this phenomenon is known as pileup. In this work, it is assumed that the impulse response is an exponentially decaying function. We propose a novel method to infer the distribution of gamma photon energies from the indirect measurements obtained from the detector. This technique is based on a formula linking the characteristic function of the photon density to a function involving the characteristic function and its derivative of the observations. We establish that our estimator converges to the mark density in uniform norm at a logarithmic rate. A limited Monte-Carlo experiment is provided to support our findings.


Introduction
In this paper, we consider a nonlinear inverse problem arising in nuclear science: neutron transport or gamma spectroscopy. For the latter, a radioactive source, for instance an excited nucleus, randomly emits gamma photons according to a homogeneous Poisson point process. These high frequency radiations can be associated to high energy photons which interact with matter via three phenomena : the photoelectric absorption, the Compton scattering and the pair production (further details can be found in [14]). When photons interact with the semiconductor detector (usually High-Purity Germanium (HPGe) detectors) arranged between two electrodes, a number of electron-holes pairs proportional to the photon transferred energy is created. Accordingly, the electrodes generate an electric current called impulse response whenever the detector is hit by a particle, with an amplitude corresponding to the transferred energy. In this context, a feature of interest is the distribution of this energy. Indeed, it can be compared to known spectra in order to identify the composition of the nuclear source. In practice, the electric current is not continuously observed but the sampling rate is typically smaller than the mean inter-arrival time of two photons. Therefore, there is a high probability that several photons are emitted between two measurements so that the energy deposited is superimposed in the detector, a phenomenon called pile-up. Because of the pile-up, it is impossible to establish a one-to-one correspondence between a gamma ray and the associated deposited energy.
This inverse problem can be modeled as follows. The electric current generated in the detector is given by a stationary shot-noise process X = (X t ) t∈R defined by: where h is the (causal) impulse response of the detector and (SN-1) k δ T k ,Y k is a Poisson point process with times T k ∈ R arriving homogeneously with intensity λ > 0 and independent i.i.d. marks Y k ∈ R having a probability density function (p.d.f.) θ and cumulative distribution function (c.d.f) F .
We wish to estimate the density θ from a regular observation sample X 1 , . . . , X n of the shot noise (1). Note that the sampling rate is set to 1 without meaningful loss of generality. If a different sampling rate is used, e.g. we observe X δ , . . . , X nδ for some δ = 1, it amounts to change λ and to scale h accordingly.
The process (1) is well defined whenever the following condition holds on the impulse response h and the density θ min(1, |y h(s)|)θ(y)dyds < ∞ . ( As shown in [12], this condition is also necessary. Moreover, the marginal distribution of X belongs to the class of infinitely divisible (ID) distributions and has Lévy measure ν satisfying, for all Borel sets B in R \ {0}, The ID property of the marginal distribution shows that this estimation problem is closely related to the estimation of the Lévy measure ν. This property strongly suggests to use estimators of the Lévy triplet, see for instance [16] and [8]. However, up to our best knowledge, these estimators use the increments of the corresponding Lévy process which are i.i.d. and they assume a finite Lévy Khintchine measure. In contrast, the observations are not independent and the Lévy measure of the process is infinite since from (3), we have that In order to tackle this estimation problem, we then propose to bypass the estimation of ν and directly retrieve the density θ of the marks distribution F from the empirical characteristic function of the measurements. Coarsely speaking, using (3), the Lévy-Khintchine representation provides an expression of the characteristic function ϕ X of the marginal distribution as a functional of θ. The estimator is built upon replacing ϕ X by its empirical version and inverting the mapping θ → ϕ X . A more standard marginal-based approach would be to rely on the p.d.f. of X. However, the density of X 0 is intractable, which precludes the use of a likelihood inference method. Consequently, although shot-noise models are widespread in applications (for example, such models were used to model internet traffic [1], river streamflows [4], spikes in neuroscience ( [11], [10]) and in signal processing ( [19] , [20])), theoretical results on the statistical inference of shot-noise appear to be limited. Recently, Xiao and al. ([24]) provide consistent and asymptotically normal estimators for parametric shot-noise processes with specific impulse responses.
In this contribution, we consider the particular case given by the following assumption.
(SN-2) The impulse response h is an exponential function with decreasing rate α > 0 : Under (SN-2), the process (X t ) t∈R is usually called an exponential shot-noise. In this case, Condition (2) becomes Under (SN-2), the process (X t ) t≥0 can alternatively be introduced by considering the following stochastic differential equation (SDE) : where L = (L t ) t≥0 is a Lévy process defined as the compound Poisson process where (T k , Y k ) k≥0 satisfies (SN-1). The solution to the equation (6) is called a Ornstein-Uhlenbeck(O-U) process ( [18,Chapter 17]) driven by L with initial condition X 0 = x and rate α. Note that L defined by (7) has Lévy measure λF . Thus, by [18,Theorem 17.5], this Markov process admits a unique stationary version if (5) holds, and this stationary solution corresponds to the shot-noise process (1).
In recent works, [3,Brockwell,Schlemm] exploit the integrated version of (6) to recover the Lévy process L and show that the increments of L can be represented as: These quantities are only well estimated for high frequency observations so that we cannot rely on this method in our regular sampling scheme.
To the best of our knowledge, the paper that best fits our setting is [13]. The authors propose a nonparametric estimation procedure from a low frequency sample of a stationary O-U process which exploits the self decomposability property of the marginal distribution. The authors construct an estimator of the so called canonical function k defined by: The two main additional assumptions are that k is decreasing on (0, ∞) and ν satisfies the integrability condition ∞ 0 (1 ∧ x)ν(dx) < ∞. In our setting (i.e. when specifying the Lévy process to be the compound Poisson process defined in (7)), it is easily shown that these conditions hold and the canonical function and the cumulative distribution of the marks are related by the equation: In this article, we introduce an estimator of θ based on the empirical characteristic function and a Fourier inversion formula. This algorithm is numerically efficient, being able to handle large datasets typically used in high-energy physics. Secondly, we establish an upper bound of the rate of convergence of our estimator which is uniform over a smoothness class of functions for the density θ.
The paper is organized as follows. In Section 2, we introduce some preliminaries on the characteristic function of an exponential shot-noise process and provide both the inversion formula and the estimator of the density θ. In particular, we derive an upper bound of convergence for our estimator over a broad class of densities under the assumption that λ/α is known. In Section 3, we present in details the algorithm used to perform the density estimation and illustrate our findings with a limited Monte-Carlo experiment. Section 4 provides error bounds for the empirical characteristic function based on discrete-time observations and exploit the β-mixing structure of the process. Finally, Section 5 is devoted to the proofs of the various theorems.

Inversion formula
As mentioned in the introduction, it is difficult to derive the probability density function of the stationary shot-noise unless the marks are distributed according to an exponential random variable and the impulse response is an exponential function. In this case, it turns out that the marginal distribution of the shotnoise is Gamma-distributed (the reader can refer to [2] for details). In all other cases, we can only compute the characteristic function of the marginal distribution of the stationary version of the shot-noise when treating it as a filtered point process (see for example [17] for details). We have for every real u: From (8), the characteristic function of X 0 can be expressed as follows: where K h , the kernel associated to h is given by: Note that if h is integrable, then K h is well defined since, for any real x, ∞ 0 |e ixh(s) − 1|ds ≤ |x| ∞ 0 |h(s)|ds. Moreover, if h is integrable, then K h is a C 1 (R, C) function whose derivative is bounded and equal to: Furthermore, if E [|Y 0 |] < ∞, then the characteristic function of X 0 is differentiable and we have: Under (SN-2), the kernel K h takes the form With (10), we obtain that Since the marginal distribution of X is infinitely divisible, we have by [18,Lemma 7.5.] that ϕ X (u) does not vanish. If in addition ϕ Y is integrable, (12) provides a way to recover θ knowing α/λ, namely, for all x ∈ R, This relation shows that the estimation problem of the p.d.f. θ is directly related to the estimation of the second characteristic function.
Remark 2.1. We assume in the following that the ratio α/λ appearing in the inversion formula (13) is a known constant, as it typically depends on the measurement device. Interestingly, however, an estimator of this constant can be derived from [12,Theorem 1], where it is shown that the marginal distribution G of the stationary shot-noise is regularly varying at 0 with index λ/α, i.e. : with L being slowly varying at 0. Hence it is possible to estimate α/λ by applying Hill's estimator [9] to the sample X −1 1 , · · · , X −1 n .

Nonparametric estimation
Letφ n (u) ∆ = n −1 n j=1 e iuXj denotes the empirical characteristic function (e.c.f.) obtained from the observations andφ ′ n its derivative. From (13), we are tempted to plug the e.c.f. of the observations to estimate the p.d.f. θ. Let (h n ) n≥0 and(κ n ) n≥0 be two sequences of positive numbers such that lim n→∞ h n = lim n→∞ κ n = 0 , and consider the following sequence of estimators: Remark 2.2. We estimate 1/ϕ(u) by ½ {|φn(u)|≥κn} /φ n (u) with a suitable choice of a sequence (κ n ) n≥1 which converges to zero. The constant κ n is chosen such that |φ n (u) − ϕ(u)| remains smaller than |φ n (u)| and |ϕ(u)| with high probability in order to avoid large errors when invertingφ n (u). In [16], the authors deal with the empirical characteristic function of i.i.d. random variables. In this case, the deviations of √ n(φ n (u) − ϕ(u)) are bounded in probability, hence, they use 1 {|φn(u)|≥κn −1/2 } /φ n (u) as an estimator of 1/ϕ(u). Here we truncate the interval of integration R by −h −1 n , h −1 n , where h n is a bandwidth parameter. This allows us to bound the estimation errorθ n −θ in sup norm. The depends on h n , see Theorem 4.1. The resulting κ n is then taken slightly larger than n −1/2 .
In order to evaluate the convergence rate of our estimator, we consider particular smoothness classes for the density θ. Namely we define, for any positive constants K, L, m and s > 1/2, where F θ denotes the Fourier transform of θ Hence L is an upper bound of the Sobolev semi-norm of θ. Note also that under (SN-1)-(SN-2), θ belongs to Θ(K, L, s, m) is equivalent to assuming that In the following, under assumptions (SN-1)-(SN-2), we use the notation P θ and E θ , where the subscript θ added to the expectation and probability symbols indicates explicitly the dependence on the unknown density θ. The following result provides a bound of the risk P θ ( θ−θ n ∞ > M n ) for well chosen sequences (h n ), (κ n ) and (M n ), which is uniform over the densities θ ∈ Θ(K, L, s, m).
and C be a positive constant such that 0 < C < C K,L,m,λ/α . Set and defineθ n by (14). Then, for n ≥ 3, the density estimatorθ n satisfies where M > 0 is a constant only depending on C, K, L, m, s and λ/α.

Remark 2.3.
The constant C in Theorem 2.1 might be adaptively chosen. Indeed, the well known relationship (see [5][Chapter 6] for example) between the cumulant function of a filtered Poisson process and its intensity measure implies that the mean µ θ of X 0 is given by Since X is ergodic (see Section 4), the empirical meanμ n of the sample X 1 , · · · , X n converges to µ θ almost surely and thus, replacing C in Theorem 2.1 byĈ n ∆ = exp(−μ n )/2, leads to the same rate of convergence since

Remark 2.4.
This theorem provides that the error in uniform norm converges at least at a polynomial rate n −(2s−1)/(4s+2+8λ/α) log(n) 1/2 that depends both on the quantity λ/α and the smoothness coefficient s. For a given ratio λ/α, the convergence rate becomes faster as s increases and tend to behave as n −1/2 when s → ∞. On the other side, for a given smoothness parameter s, the rate of convergence decreases when the ratio λ/α tends to infinity. This can be interpreted as the consequence of the pileup effect that occurs whenever the intensity λ is large or the impulse response coefficient α is close to zero.

Remark 2.5.
Based on the previous theorem, one might wonder whether the rates of convergence are optimal. According to similar but not identical problems ( [16], [8]) in which authors estimate in a nonparametric fashion a Lévy triplet (with finite activity) based on a low frequency sample of the associated process, the optimal rates of convergence are identical to ours. Our estimation procedure lies on stationary but dependent infinitely divisible random variables associated to an infinite Lévy measure so that these results do not apply here. However we believe that the rates obtained in Theorem 2.1 are also optimal in this dependent context. The proof of this conjecture is left for future work.

Experimental results
The estimation procedure based on the estimatorθ n given by (14) can be made time-efficient and thus well suited to a very large dataset. In nuclear applications, it is usual to deal with several million of observations while the intensity of the time-arrival point process can reach several thousand of occurrences per second. Typically, the shot-noise process in nuclear applications corresponding to the electric current is discretely observed for three minutes at a sampling rate of 10Mhz and the mean number of arrivals between two observations lies between 10 and 100. Such large values for the intensity and the number of sampled points motivate us to present a practical way to compute the estimator (14).

Practical computation of the estimator
In Section 2, we have defined the estimator of mark's density by (14). Although it theoretically converges to the true density of shot-noise marks, the evaluation of the empirical characteristic function and its derivative based on observations X 1 , · · · , X n might be time-consuming when the sample size n is large. To circumvent this issue, we propose to compute the empirical characteristic function using the fast fourier transform of an appropriate histogram of the vector X 1 , · · · , X n . More precisely, for a strictly positive fixed h, we consider the grid G = {hl : ⌊min k≤n (X k )/h⌋ ≤ l ≤ ⌈max k≤n (X k )/h⌉} and compute the normalized histogram H of the sample sequence (X l ) 1≤l≤n with respect to the grid G defined by Denoting m n ∆ = ⌊min k≤n (X k )/h⌋ and M n ∆ = ⌈max k≤n (X k )/h⌉ − 1, remark that for every real u, we have: we get an approximation of the empirical characteristic function by defininĝ For any real u, we have the following upper bounds showing that the approximations are close to the true functions for small values of h and u. From these empirical characteristic functions, we construct an estimator of the marks' characteristic function ϕ Y setting for any positive u: The advantage of usingφ h,n is thatφ h,n (u) andφ ′ h,n (u) can be evaluated on a regular grid using the fast Fourier Transform algorithm.
The last step in the numerical computation of the estimator (14) consists in evaluating the quantity Using the Inverse fast Fourier Transform, we approximate the integral on a regular grid x ∈ by a Riemann sum.

Numerical results
We now illustrate the finite sample behavior of our estimator on a simulated data set when the marks (in keV energy units) density follows a Gaussian mixture Moreover, in nuclear spectrometry, the bandwidth h n is directly related to the known precision of the measuring instrument. For the following numerical experiment, we set it to 2.5 which is in range with the detector resolution as described in [14], Chapter 4. Figure 1 below shows a simulated sample path of such a shot-noise with its associated marked point process. As shown in Figure 2 below, our estimatorθ n defined by (14) well retrieves the three modes of the Gaussian mixture as well as the corresponding variance from a sample of size 10 5 , which corresponds to a signal observed for one hundredth second. Current estimators used in nuclear spectrometry for similar data requires much longer measurements (up to 10 seconds). The reason is that these estimators do not consider observations where pile-up is suspected to occur, thus throwing away a large part of the available information. Moreover, an estimation of the risk E θ θ −θ n ∞ is provided in Table 1 for the shot-noise configuration described at the beginning of this section and for three different sample sizes.

Error bounds for the empirical characteristic function and its derivatives
To derive Theorem 2.1, since our constructed estimator involves the empirical characteristic function and its derivative, we rely on deviation bounds for which are uniform over θ ∈ Θ(K, L, s, m), where the smoothness class Θ(K, L, s, m) is defined by (15). Here, ϕ (k) andφ (k) n respectively denote the k-th derivative of the characteristic function and its empirical counterpart associated to the sample X 1 , . . . , X n . These bounds are of independent interest and therefore are stated in this separate section. Upper bounds of the empirical characteristic function deviations have been derived in the case of i.i.d. samples: [8][Theorem 2.2.] provides upper bounds of (19) for i.i.d. infinitely divisible random variables, based on general deviation bounds for the empirical process of i.i.d. samples found in [22]. Here we are concerned with a dependent sample X 1 , · · · , X n and we rely instead on [7]. We obtain upper bounds with the same rate of convergence as in the i.i.d. case but depending on the β-mixing coefficients, see Theorem 4.1. An additional difficulty in the non-parametric setting that we consider is to derive upper bounds that are uniform over smoothness classes for the density θ, and thus to carefully examine how the β coefficients depend on θ, see Theorem 4.2.
Let us first recall the definition of β-mixing coefficient (also called absolutely regular or completely regular coefficient) as introduced by Volkonskii and Rozanov [23]. For A, B two σ-algebras of Ω, the coefficient β(A, B) is defined by the supremum being taken over all finite partitions (A i ) i∈I and (B j ) j∈J of Ω respectively included in A and B. When dealing with a stochastic process (X t ) t≥0 , the β-mixing coefficient is defined for every positive s by: The process (X t ) t≥0 is said to be β-mixing if lim t→∞ β(t) = 0 and exponentially β-mixing if there exists a strictly positive number a such that β(t) = O(e −at ) as t → ∞. We first state a result essentially following from [7] which specify how the β coefficients allows us to derive bounds on the estimation of the characteristic function and its derivatives. Theorem 4.1. Let k be a non-negative integer and X 1 , · · · , X n a sample of a stationary β-mixing process. Suppose that there exists C ≥ 1 and ρ ∈ (0, 1) such that β n ≤ Cρ n for all n ≥ 1. Let r > 1 and suppose that E |X 1 | 2(k+1)r < ∞.
Then there exists a constant A only depending on C, ρ and r such that for all h > 0 and n ≥ 1, we have Proof. The proof is deferred to Section 5.3 It turns out that the stationary exponential shot-noise process X defined by (1) is exponentially β-mixing if the first absolute moment of the marks is finite, see [15][Theorem 4.3] for a slightly more general condition. However, in order to obtain a uniform bound of the risk of our estimatorθ n over a smoothness class, a more precise result is needed. In the sequel, we add a superscript θ to the β-mixing sequence to make explicit the dependence with respect to the mark's density θ. The following theorem provides a geometric bound for the β-mixing coefficients of the shot noise which is uniform over the class Θ(K, L, s, m).
As a corollary of Theorems 4.1 and 4.2, we obtain error bounds for the empirical characteristic function when dealing with observations X 1 , · · · , X n of the stationary shot-noise process given by (1). Corollary 4.1. Let X 1 , · · · , X n be a sample of the stationary shot-noise process given by (1) satisfying (SN-1)-(SN-2). Let K, L, m > 0 and s > 1/2 and let k be an integer such that 0 ≤ k < 1 + n/2. Then there exists a constant B only depending on k, λ, α, K, L, s and m such that for all and n ≥ 1, This result can be compared to [8][Theorem 2.2]. Note however that although our sample has infinitely divisible marginal distributions, it is not independent and the Lévy measure is not integrable.

Preliminary results on the exponential shot noise
We establish some geometric ergodicity results on the exponential shot noise that will be needed in other proofs. D(V, µ, b)) if there exists a measurable function

Definition 5.2 (Doeblin set). A set C is called a (m, ǫ)-Doeblin set if there exists a positive integer m , a positive number ǫ and a probability measure ν on R such that, for any x in C and
The following proposition is borrowed from [21] and relates explicitly the geometrical drift condition to the convergence in V -norm (denoted by | · | V ) to the stationary distribution. D(V, µ, b). Assume moreover that for some d > 2b(1 − µ) − 1 , m ∈ N − {0} and ǫ ∈ (0, 1), the level set {V ≤ d} is an (m, ǫ)-Doeblin set. Then P admits a unique invariant measure π and P is V -geometrically ergodic, that is, for any

Proposition 5.1. Let P be a Markov kernel satisfying the drift condition
In order to apply such a result to the sample (X 1 , · · · , X n ) of the exponential shot-noise defined by (1), observe that it is the sample of a Markov chain which satisfies the autoregression equation where .v.'s with probability density function θ, • all these variables are independent.
In the following, we denote by Q θ the Markov kernel associated to the Markov chain (X i ) i≥0 under (SN-1)-(SN-2).
Proof. We have for all θ ∈ Θ(K, L, s, m) and x ∈ R, Proof. Let θ ∈ Θ(K, L, s, m). Denote byθ the density of random variable Y 1 e −αU1 with U 1 and Y 1 two independent random variables respectively distributed uniformly on [0, 1] and with density θ. It is easy to show that, for all v ∈ R,θ

Remark 5.1. A similar result holds for the functions
The distribution of W 0 is thus given by the infinite mixture where δ 0 is the Dirac point mass at 0 andθ * k denote the k-th self-convolution ofθ. It follows that, for all Borel set A, In order to show that {V ≤ l} is a (ǫ, 1)-Doeblin-set for the kernels Q θ , it is sufficient to exhibit a probability measure ν such that, for all |x| ≤ l − 1 and all Hence if for each θ ∈ Θ(K, L, s, m) we find c(θ) < d(θ) such that inf Hence the proof boils down to showing that for each

Proof of Theorem 4.2
As explained in Section 5.1, (X i ) i≥0 is a stationary V -geometrically ergodic Markov chain with Markov kernel denoted by Q θ . By [6], the β-coefficient of the stationary Markov chain (X i ) i≥0 can be expressed for all n ≥ 1 and θ ∈ Θ(K, L, s, m) as where π θ is the invariant marginal distribution and | · | T V denotes the total variation norm, i.e. the V -norm with V = 1. Combining Propositions 5.2, 5.3 and 5.1, we can find constants C > 0 and ρ ∈ (0, 1) only depending on λ, α, K, L, s and m such that where V (x) = 1 + |x|. The last two displays yield which concludes the proof.

Proof of Theorem 4.1
In [7], the authors establish a Donsker invariance principle for the process is the normalized centered empirical process associated to a stationary sequence of β-mixing random variables (X 1 , · · · , X n ) with marginal distribution P and F is a class of functions satisfying an entropy condition. To be more precise, suppose that the sequence (X i ) i≥1 is β-mixing with n∈N β n < ∞. The mixing rate function β is defined by β(t) = β ⌊t⌋ if t ≥ 1 and β(t) = 0 otherwise while its càdlàg inverse β −1 is defined by: Further, for any complex-valued function f , denote by Q f the quantile function of the r.v. |f (X 0 )| and introduce the norm: The space L 2,β is defined as the class of functions f such that f 2,β < ∞. In [7], the authors proved that (L 2,β , · 2,β ) is a normed subspace of L 2 . A useful and trivial result from the definition of the norm L 2,β (P ) provides the following relation: For any real r > 1, another useful (less trivial) result in [7] states that under the condition n≥0 β n n r/(r−1) < ∞, we have L 2r ⊂ L 2,β with the additional inequality where f 2r = E |f (X 0 )| 2r 1/2r denote the usual L 2r -norm. Now, we can state a result directly adapted from [7][Theorem 3] that will serve our goal to prove Theorem 2.1. For the sake of self-consistency, we recall that, given a metric space of real-valued funtions (E, · ) and two functions l, u , the bracket [l, u] represents the set of all functions f such that l ≤ f ≤ u. For any positive ǫ, [l, u] is called an ǫ-bracket if l − u < ǫ.
Theorem 5.1. Suppose that the sequence (X i ) i≥1 is exponentially β-mixing and that there exists C ≥ 1 and ρ ∈ (0, 1) such that β n ≤ Cρ n for all n ≥ 1. Let σ > 0 and let F ⊂ L 2,β be a class of functions such that for every f in F , where N [ ] (u, F , · 2,β ) denotes the bracketing number, that is, the minimal number of u-brackets with respect to the norm · 2,β that has to be used for covering F . Suppose that the two following assumptions hold.
(DMR1) F has an envelope function F such that F 2r < ∞ for some r > 1.
Then there exist a constant A > 0 only depending on C and ρ such that, for all integer n, we have Having this result at hand, we now remark that (19), for a fixed integer k, can be rewritten as where The proof of Theorem 4.1 based on an application of the previous theorem is as follows.
Proof. We apply Theorem 5.1 for a fixed integer k, F = F k h , F = F k and r = (4 + m)/4 where F k : x → |x| k .

Assumption (DMR1):
Let k be a fixed integer. On the one hand, the function F k is an envelope function of the class F k h and on the other hand, for any real r > 1, from (29), we have

Assumption (DMR2):
For k a fixed integer, the class F k h is Lipschitz in the index parameter: indeed, we have for every s, t in −h −1 , h −1 and every real x A direct application of [22][Theorem 2.7.11] for the classes F k h gives for any ǫ > 0: where N and N [ ] are respectively called the covering numbers and bracketing number (these numbers respectively represent the minimum number of balls and brackets of a given size necessary to cover a space with respect to a given norm). From (35), it follows that for any σ > 0, we have because we supposed F k+1 ∈ L2r and βn ≤ Cρ n which, from (29), implies that F k+1 2,β < ∞.

Conclusion of the proof
The application of Theorem 5.1 gives Set c r,β ∆ = 1 + r n≥0 βn n r/(r−1) . From (36) and (33), we can write By Lemma A.6, we get for a universal constant B > 0 that In the particular context of Corollary 4.1, we use the fact that F k 2r can be bounded by max (1, K 4k/(4+m) ) and c r,β by a constant only depending on the parameters K, L, s, m

Proof of Theorem 2.1
Proof. We denote by θ 0 n the function defined by : Since θ ∈ Θ(K, L, s, m) and s > 1/2, we have that F [θ] is integrable and, under θ ≥ 0, we have We decompose the error in infinite norm as From (38) and (39), we get where we used the Cauchy-Schwartz inequality and the assumption that θ ∈ Θ(K, L, s, m). We conclude with a bound of the term involving θ 0 n −θ n ∞ in (40). To this end, the following inequality will be useful. Using (12) and the mean-value theorem, we have By (12), we can bound the term θ 0 n −θ n ∞ by ϕn , the term A n,1 can be bounded as follows.
Thus, using (42), we get The two terms on the right hand side can be bounded using Corollary 4.1 with r = (4 + m)/4. It gives In the following, for two positive quantities P and Q, possibly depending on θ and n we use the notation P Q ⇐⇒ for all n ≥ 3, sup θ∈Θ(K,L,s,m) (P is less than Q up to a multiplicative constant uniform over θ ∈ Θ(K, L, s, m)). We thus have that where we used the fact that κ −1 n ≤ 2C −1 h λ/α n for any integer n.

A Useful lemmas
The following classical embedding will be useful.
On the one side, we have and on the other side where the first inequality is obtained via an application of the Markov inequality. for some a ∈ (0, T K ]. Let δ ∈ (1, e α ∧ TK +∆ TK ) . Since (a + ∆)/a is a decreasing function in a for a fixed ∆ and 0 ≤ a ≤ T K , we have that (a + ∆)/a ≥ T K + ∆ T K so that δ < (a + ∆)/a . For any v ∈ [a, (a + ∆)/δ], we havě which concludes the proof.
The following elementary lemma generalizes the previous result for convolutions of lower bounded functions.
A lower bound of the decay of the absolute value of the shot-noise characteristic function is given by the following lemma. Proof. From (9) and (11), we have for all u ∈ R, If follows that First, we have for any real z and any function θ ∈ Θ(K, L, s, m), We thus get that 1 0 1 − Re (ϕ Y0 (z)) z dz ≤ K 1/(4+m) .