Bayesian methods for the Shape Invariant Model

: In this paper, we consider the so-called Shape Invariant Model that is used to model a function f 0 submitted to a random translation of law g 0 in a white noise. This model is of interest when the law of the deformations is unknown . Our objective is to recover the law of the process P f 0 ,g 0 as well as f 0 and g 0 . To do this, we adopt a Bayesian point of view and ﬁnd priors on f and g so that the posterior distribution concentrates at a polynomial rate around P f 0 ,g 0 when n goes to + ∞ . We then derive results on the identiﬁability of the SIM, as well as results on the functional objects themselves. We intensively use Bayesian non-parametric tools coupled with mixture models, which may be of independent interest in model selection from a frequentist point of view.


Introduction
This study deals with the so-called Shape Invariant Model (SIM), that describes a statistical process involving a random geometric deformation of a shape. This type of geometric deformation of a common shape corresponds to a particular case of Grenander's General Pattern Theory ( [GM07]). This model may be applicable to a number of fields such as image processing ( [AGP91], [PMRC10]) and medicine ( [Big13]). It is also used in genetics when dealing with delayed activation curves of genes when drugs are administrated to patients, and in Chip-Seq estimations when translations in protein fixation yield randomly shifted counting processes ( [MMW07] and [BGKM13]). It is found in econometrics as well when dealing with Engel curves ( [BCK07]) and in landmark registration.
This model has received considerable attention in the statistical community, and most studies in this field focus on the estimation of an unknown functional object using noisy i.i.d. observations. Some studies consider a semi-parametric approach for the estimation of the deformation parameters (self-modeling regression framework used by [KG88]). [Cas12] uses a Bayesian approach to obtain statistical results on the SIM in a semi-parametric setting when the level of noise on observations asymptotically vanishes. Older approaches generally focus on parametric problems (see [GM01] and the discussion therein for an overview). M -estimation is used in [BGL09] and [AKT10] proposes a stochastic algorithm to run estimations in the SIM. A recent study [CD11] used a testing strategy to obtain curve registration. Finally, [BG10] obtained minimax adaptive rates for non-parametric estimations when the law of the randomized translations is known. The SIM could be extended to more general situations of geometrical deformations described through an action of a finite dimensional Lie Group (see [BCG12]). We restricted our work to the case of the one-dimensional Lie group of translation S 1 to warp the functional objects.
This work was inspired by several discussions about the work of [AKT10] concerning the study of the SIM. Our aim was to extend their theoretical parametric Bayesian result to non-parametric settings and to then study the behavior of some posterior distributions. We considered the general case where both the functional shape and the probability distribution of the deformations are unknown. This in fact corresponds to the more realistic case. To the best of our knowledge, significant statistical results have yet to be derived in this non-parametric situation.
Our paper describes the evolution of the posterior distribution when the number of observations grows to +∞ with a fixed noise level σ, with the aim of obtaining results on the estimation of f and g, which are assumed to be unknown in our work. This is considerably different from the study of the asymptotically vanishing noise situation (σ → 0) that is studied in [BG12]. This is itself a special feature of the SIM: there is no obvious Le Cam equivalence of experiments (see [LCY00]) between the experiments when n → +∞ and when σ → 0. Very different minimax results occur in [BG10] (n −→ +∞) and in [BG12] (σ −→ 0). Bayesian non-parametric methods are used with mixture models to obtain the contraction rates of our Bayesian procedures that rely on the important contributions of [BSW99], [GGvdV00], [GvdV01] and [GW00].
We first prove polynomial contraction rates for the posterior distribution of the law of the process generated by the SIM. This is a step toward semiparametric results that no longer interest us from the point of view of the whole process but, instead, in terms of the common shape or in the law of deformations that underlies it. We prove identifiability, but obtain only logarithmic posterior contraction rates for them. These slower contraction rates are linked to the logarithmic frequentist lower bounds that we also present here.
This paper therefore contributes to the understanding of the SIM under a Bayesian framework. It also illustrates that there is still numerical work to be done to efficiently estimate the common shape or the law of the deformations when both are unknown.
Section 2 presents a description of the SIM, standard notations, and ends with the statements of our main results (contraction rates, identifiability and lower bounds). Section 3 gives a look at some of the challenging issues that remain to be elucidated. Section 4 provides a metric description of the main probability spaces of the model, and Section 5 presents the proofs of the main result. Finally, Section 6 provides the proof of our semi-parametric results. A few auxiliary results are given in the appendix.

Motivation
We give next a brief description of a possible application that present the main features of the SIM, even though our primary interest is mainly theoretical. Note that although in this study, simulation results are not included, they are nevertheless of major interest for applications. However, development of efficient sampling algorithms is beyond the initial scope of this paper and requires further advances on stochastic algorithms (MCMC and Langevin diffusions).
An important application of the mean shape computation is the estimation of the average cardiac cycle using electrocardiogram (ECG) records. An ECG records the electrical activity of the heart and consists of a succession of similar cycles. An example taken from the MIT-BIH database [GR01] is shown at the top of Fig. 1. The estimation of the mean shape of the successive cycles is useful for the diagnosis of heart disease, and among them, arrhythmia. There are several types of arrhythmia and it is important to determine the variant that is affecting the patient, which can be determined on the basis of an ECG. Unfortunately, segmentation of the ECG records is a non-trivial task, as pointed out by [GK95], and is generally based on feature detections such as maximum, inflexion points, etc. A segmentation of cycles using the crossing level 0 of the ECG 1 is shown in Fig. 1 and it can be observed that this segmentation is far from satisfactory. This phenomenon is even worse when dealing with arrhythmic cycles, which are more irregular than normal cycles and essential for establishing a medical diagnosis. In order to eliminate spacial fluctuations around the mean shape, it is possible to attempt to average the cycles, but as indicated in Fig.  1, the averaging step produces a type of Tichonov regularization. This inverse problem phenomenon occurs as a result of the errors in the alignment procedure, and the law of these warping errors produces a convolution kernel that has a negative impact on the estimation of the mean shape.
The question arises as to whether or not it is possible to produce a sharper warping procedure that perfectly recovers the unknown random deformations between each individual cycle with an asymptotically vanishing warping error. Indeed, as shown by Theorem 5 of [BG10], this last search is useless for smooth signals when increasing the number of observations (n −→ +∞), and is only possible when the resolution of the recorded signals is increasing (the level of the noise should asymptotically vanish). Even worse, statistical estimation of the true shape with any warping estimator fails and cannot be consistent for regular shapes when the number of samples is growing to +∞.
In the following section, we therefore attempt to obtain an estimation of the common shapes with a learning procedure of the deformation law. We are interested in the asymptotic case where the resolution level is kept fixed, since it is not possible to imagine reducing the noise level to 0 in many practical problems (i.e., below the precision of the recording machine), although it may be possible to multiply the observations (for example, by increasing the time of ECG recordings).

Statistical settings
Shape Invariant Model We recall here the definition of the Shape Invariant Model. Let f 0 ∈ F be a subset of smooth functions and consider a probability measure g 0 ∈ M([0, 1]) (this last set stands for the set of probability measures on [0, 1]). We observe n realizations of noisy and randomly shifted complex valued curves Y 1 , . . . , Y n resulting from the following white noise model: Here, f 0 is the mean pattern of the curves Y 1 , . . . , Y n . The random shifts (τ j ) j=1...n are sampled independently according to the probability measure g 0 , and (W j ) j=1...n are independent complex standard Brownian motions on [0, 1], which models the presence of noise in the observations. The noise level σ is kept fixed in our study and is set to 1 for the sake of simplicity. Note that in our white noise model, σ can be directly obtained from the quadratic variation of each individual curve, and is therefore known. Complex valued curves are considered here to simplify notations but our results can be adapted to the real-valued case. A complex standard Brownian motion W t on [0, 1] is such that W 1 is a standard complex Gaussian random variable whose distribution is designated by N C (0, 1). A standard complex Gaussian random variable has independent real and imaginary parts distributed according to N R (0, 1 2 ). We aim to describe the behavior of posterior distributions given a sample (Y 1 , . . . , Y n ) when n → +∞. We intensively use " ", which refers to an inequality up to a multiplicative absolute constant, and a ∼ b, which means that a/b −→ 1.

Notations and Definitions
Bayesian framework We briefly recall the Bayesian formalism following the presentation of [GGvdV00] but readers familiar with it can skip this paragraph.
The functional objects f 0 and g 0 that we are looking for belong to F ⊗ M([0, 1]) and for any couple (f, g) ∈ F ⊗ M([0, 1]). Equation (2.1) describes the law of one continuous curve. Its law is designated as P f,g and possesses a density p f,g with respect to the Wiener measure on the sample space. P f 0 ,g 0 belongs to a set P of probability measures over the sample space. This set P is the set of all possible measures described by Equation (2.1) when (f, g) varies over F ⊗ M([0, 1]). Given a prior distribution Π n on P, Bayesian procedures are based on the posterior: which is a random measure on P that depends on the observations Y 1 , . . . , Y n . Bayesian estimators can then be obtained using the mode, the mean or the median of the posterior distribution (the approach, for example, adopted in [AKT10]).
The posterior distribution is then said to be consistent if it concentrates on arbitrarily small neighborhoods of P f 0 ,g 0 in P with a probability tending to 1 when n grows to +∞. If d is a distance on probability measures (e.g., the Hellinger distance d H or the Total Variation d T V ), a frequentist rate ǫ n corresponds to: A frequentist property of such a posterior distribution describes the contraction rate (ǫ n ) n≥0 of such neighborhoods while still capturing most of the posterior mass. We refer to [GGvdV00] (especially their Theorem 2.1) for a complete discussion on the posterior concentration rates, as well as the links with classical non-parametric frequentist benchmarks (see [IH81]). Using Equation (2.1), we attempt to tackle such a Bayesian property.
The notation H s refers to the whole Sobolev subspace (A = +∞). Finally, given any integer ℓ, we define the "thresholded" elements of H s as: Mixture models and probability distributions Equation (2.1) can be written in the Fourier domain as: where θ 0 := (θ ℓ (f 0 )) ℓ∈Z denotes the true unknown Fourier coefficients of f 0 , and θ ℓ (Y j ) those of the observed signals. The white noise model implies that ξ ℓ,j ∼ i.i.d. N C (0, 1) for any ℓ, j. The density of the complex Gaussian random variables is designated as: and will be referred to as γ when µ = 0 and p = 1. For any ℓ ∈ Z, Equation (2.1) implies that θ ℓ (Y ) ∼ 1 0 γ θ 0 ℓ e −i2πℓϕ (·)dg(ϕ), which is a mixture law of Gaussian random variables. We use, for any phase ϕ ∈ [0, 1] and any θ ∈ ℓ 2 (Z), the notation: This corresponds to a rotation of each coefficient θ ℓ with an angle 2πℓϕ. Hence From one frequency to another, the rotations used in θ(Y ) are not independent, and the coefficients (θ ℓ (Y )) ℓ∈Z are highly correlated. When θ has a length k, p θ,g will be the density w.r.t. the Lebesgue measure on C k of the law P θ,g : We will use standard objects (see [vdVW96]) such as the Hellinger (d H ) or Total Variation (d T V ) distances between probability measures, as well as covering numbers of metric spaces such as D(ǫ, P, d), bracketing numbers N [ ] (ǫ, P, d), etc.

Bayesian priors in the randomly shifted curves model
We detail here the prior Π n on P, defined on H s ⊗ M([0, 1]). Equation (2.1) will convert it to a prior on P. We choose f and g independently as follows.
The selected frequencies are chosen with a distribution λ on N ⋆ , which satisfies: The prior π depends on the variance of the Gaussian laws ξ n used to sample the Fourier coefficients. We use a sample size-dependent variance (see Remark 5.1): where µ s and ζ may depend on s (non-adaptive prior) or not (adaptive prior).

8
Prior on g We propose in the sequel two priors on g.
Dirichlet Process prior As pointed out above, the model can be seen as a Gaussian mixture model, and natural priors on g may be built according to a Dirichlet Process (see [GvdV01]). Given any absolutely continuous measure α w.r.t. the Lebesgue measure on [0, 1] with a positive continuous density, the Dirichlet Process DP α generates a random probability measure g ∈ M([0, 1]). For any finite partition (A 1 , . . . , A k ) of [0, 1], the vector (g(A 1 ), . . . , g(A k )) on the simplex has a Dirichlet distribution Dir(α(A 1 ), . . . , α(A k )) (see details in [Fer73]). Non-adaptive Gaussian Process Prior We may use another prior to obtain smoothness results on g, and then extend our result further than a simple contraction on laws.
Smoothness is barely compatible with D.P., and even a kernel convolution with D.P. seems problematic in our situation. To get around this difficulty, we propose the use of Gaussian Processes. In this construction, we assume that g 0 ∈ H ν (A), where ν ≥ 1/2 and A are both known and define the corresponding subspace: We define the integer k ν := ⌊ν − 1/2⌋ to be the largest integer smaller than ν − 1/2 and follow the strategy in Section 4 of [vdVvZ08a], but the additional difficulty is to guarantee the periodicity of g. Its construction is inspired from that of the Brownian bridge Our prior is now built as follows. We first independently sample a real Brownian bridge (B τ ) τ ∈[0,1] and (Z 1 , . . . Z kν ) ∼ i.i.d. N R (0, 1). We next compute: (2.5) Hence, this prior based on GP yields a prior on densities on [0; 1] and p w inherits the smoothness k ν of the Gaussian process τ → w τ . We now restrict this prior to the Sobolev balls of radius 2A to obtain a prior q ν,A on M ν ([0, 1])(2A).
In the sequel, we designate Π DP n (resp. Π GP n ) for the posterior distribution built from the prior obtained with the Dirichlet Process (resp. the Gaussian Process).
The values µ = 1/3 (independent on s) and ζ = 3/2 yield the following for a sufficiently large M : This result describes the posterior concentration around some Hellinger neighborhoods of P f 0 ,g 0 within a polynomial rate.
Our prior when µ = 1/3 is adaptive to the regularity s as soon as s ∈ [1/2, 2], where ξ 2 n = n −1/3 (log n) −3/2 . In this case, the convergence rate is n −s/(2s+2) up to a log term. When s ≥ 2, the above choice for ξ 2 n is no longer optimal. However, taking µ < 1/3 yields an adaptive prior on the range s ∈ [ 1 2µ −1; 1 µ −1]. With this last prior, the concentration is no longer guaranteed when s < 1 2µ − 1, and for s > 1 µ − 1 the rate is stuck to n −(1−µ)/2 . Finally, the non-adaptive prior based on ξ 2 n = n −1/(2s+2) recovers the rate −s/(2s+2) for any s larger than 1/2. To the best of our knowledge, the minimax frequentist rate is unknown when both f 0 and g 0 are unknown. Remind that in the standard regression model (without shift) on H s , the minimax rate of estimation for f is n −s/(2s+1) . The rate we obtain in Theorem 2.1 is slightly damaged owing to our upper bound of the statistical complexity of the model coming from the estimation of g, see Proposition 4.2
We still obtain a polynomial contraction rate here, but the smoothness of g has a great impact on the rate (ǫ n ) n≥0 . This bound is damaged when ν is close to 1/2, which is its lower limiting value, and we then obtain a contraction rate of n −1/4 if s ≥ 1. In a sense, Theorem 2.2 is weaker than Theorem 2.1, but it will make it possible to extend our mathematical analysis further since it guarantees that the marginal on the g coordinate of the posterior distribution Π GP n is supported by smooth densities.

Identifiability
We then attempt to derive results on the objects f ∈ H s and g ∈ M ν ([0, 1]) themselves. Hence, a crucial step is to guarantee the identifiability of the model, which is illustrated below. Without any constraint on the first Fourier coefficient, it can be easily checked that identifiability fails. We then restrict our study to: and the prior used on F s is the one induced by π, up to the restriction that the first Fourier coefficient is positive. We constrain g ∈ M([0, 1]) ⋆ (subsequently converted to an inverse problem assumption): Theorem 2.3. The SIM is identifiable as soon as (f 0 , g 0 ) ∈ F s × M([0, 1]) ⋆ : Indeed, the necessary assumption to yield identifiability is slightly more general than the restriction to M([0, 1]) ⋆ . If we define M([0, 1]) + as follows: We will see in the proof of Theorem 2.3 that the set F s × M([0, 1]) + is the minimal set for the identifiability of the model.

Contraction rate on functional objects
for a sufficiently large M , with a contraction rate of µ n = (log n) −ν . (ii) Assume that g 0 ∈ M ν ([0, 1]) satisfies the inverse problem assumption: then we also have: where the contraction rateμ n isμ n = (log n) The optimality of this result is, to the best of our knowledge, an open problem. Note that when β increases, the rate on f is seriously impacted, which is a common feature of statistical inverse problems. This result is consistent with the fact that estimating f when g is known is an inverse problem (see [BG10]). Our next result shows that it is impossible to obtain frequentist convergence rates better than a power of log n, even if our lower bound does not exactly match the upper bound obtained in the previous result.

Link with heteroscedastic deconvolution with unknown variance
Our problem seems strongly related to standard deconvolution with unknown variance setting. For instance, the first Fourier coefficients are and up to a division by θ 1 , it can also be parametrised as which is similar to the problem studied for instance by [Mat02] where ǫ ∼ N C (0, σ 2 ) whose variance σ 2 is unknown. As pointed in [Mat02] (see also the more recent work [BM05] where similar situations are extensively detailed), such a particular setting is unfavourable for statistical estimation since convergence rates are generally of log order. Results obtained in [Mat02] and [BM05] are obtained using the van Trees inequality, which is a Bayesian Cramer-Rao bound (see for instance [GL95] for further details). However, Proposition 6.1 gives a polynomial rate for the posterior contraction around θ 0 1 and this rate, at the first glance, seems contradictory with the results given by [Mat02]. Indeed, [Mat02] considers some lower bounds in a larger class than the estimation problem of θ 1 written as (2.6): from a minimax point of view, the supremum over all hypotheses is taken in a somewhat larger set than ours. Moreover, if one considers (2.6), the density of e −i2πτj is supported by S 1 instead of the whole complex plane which would be a natural extension of (2.7). Hence, g is a singular measure with respect to the noise measure and the ability of going beyond the log rate is due to the degeneracy nature of our problem according to the Gaussian complex noise on θ 1 . It is an important structural information which is not available when one considers general problems such as (2.7). A new proof is thus needed for the SIM.

Frequentist lower bounds
, then a sufficiently small c exists so that the minimax rate over F s × M ν ([0, 1]) satisfies: This result is far from being contradictory with the polynomial rate obtained in Theorems 2.1 and 2.2. A series of remarks can be made. First, Theorems 2.1 and 2.2 provide contraction rates on the probability distributions in P and not on the functional space F s . Second, the link between (f 0 , g 0 ) and P f 0 ,g 0 relies on the identifiability, and the lower bound is derived with a set of functions (f i , g i ) i , which are very hard to identify through I : (f, g) → P f,g . On this set of functions, the injection is very "flat" and the two-by-two differences of I(f i , g i ) are small, so that the pairs of functions (f i , g i ) become very hard to distinguish. It should be noted that a typical feature of this set is that the convolution products are very similar each others: Typically, hard estimations occur when faced with a deconvolution statistical problem with an unknown operator. It is shown in [BG10] that in the SIM, when n −→ +∞, it is impossible to recover the unknown true shifts. The abrupt degradation between the polynomial rates on probability laws in P and the logarithmic rates on functional objects in F s × M ν ([0, 1]) also occurs because of this.

Discussion
In this paper, we exhibit a suitable prior that makes it possible to obtain a contraction rate of the posterior distribution near the true underlying distribution P f 0 ,g 0 . Moreover, this rate is polynomial with the number n of observations, even if our SIM is an inverse problem with an unknown translation operator that depends on g. From a technical point of view, the milestones of such results are the tight link between the white noise model and the Fourier expansion, as well as the smoothness of Gaussian law that makes it possible to obtain an efficient covering strategy. Up to a non-restrictive condition, we also obtain a large identifiability class, but the contraction of the posterior is dramatically damaged in this class since we then obtain a logarithm rate instead of a polynomial one. This last point cannot be much improved by using the standard L 2 distance to measure the neighborhoods of f 0 , as pointed out by our last lower bound. Note that we do not obtain exactly the same rates for our lower and upper reconstruction bounds. This may be due to the rough inequality |ψ a (ϕ)| ≥ |ψa(ϕ)| 2 ψa ∞ used to obtain Equation (6.1), and may be the reason why we do not obtain optimal rates. Indeed, the degradation of the contraction rate occurs when attempting to invert the identifiability map I : (f, g) → P f,g . This difficulty should be understood as a novel consequence of the impossibility to exactly recover the random shift parameters when only n grows to +∞. This phenomenon is highlighted in several papers, including [BG10] or [BGKM13]. However, it may be possible to obtain a polynomial rate using a more appropriate distance adapted to our problem of randomly shifted curves, for instance We plan to tackle this problem in a future study. The important requirement in this case is to find some relationships between the neighborhoods of P f 0 ,g 0 and the neighborhoods of f 0 according to the distance d Fréchet . Another interesting study would consider the SIM with a noise level σ depending on n in the Bayesian framework. This asymptotic setting is linked to the work of [BG12] in which their J curves are sampled at the n points of a discrete design in [0, 1]. Finally, an open issue that still remains to be elucidated concerns the search for a stochastic algorithm to approach the posterior distribution in our SIM.

Metric description of the model
We follow the roadmap of Theorem 2.1 of [GGvdV00] by the introduction of a suitable sieve P ℓǫ,wǫ defined in Section 4.1.2 (see also [SW01] and [Zha00]), and find optimal calibrations of ǫ, ℓ ǫ and w ǫ with respect to n. We then need to find a lower bound of the prior mass around a type of Kullback-Leibler neighborhood of P f 0 ,g 0 ∈ P. These sets are defined as We will indeed consider Hellinger neighborhoods instead of Kullback-Leibler ones (a link is given in Section A). In Section 4.2, we exhibit suitable Hellinger neighborhoods.
Remark that for any dimension p and any couple of points (z 1 , z 2 ) ∈ C p , if z 1 − z 2 is the Euclidean distance in C p , then one has where Φ is the c.d.f. of a real standard Gaussian variable. Now, we compute Now, an ǫ-covering of A θ is simply obtained by covering the tore with intervals of radius (C f H 1/2 /ǫ) 2 , which leads to the announced metric entropy.
We next consider a continuous mixture for g, which is the natural case. Define Now, we will only consider functions f with null Fourier coefficients of order higher than ℓ ǫ and will omit the dependence on ǫ with the notation ℓ. It would be tempting to use Lemma 3.1 and 3.2 of [GvdV01] to bound the metric entropy of P f . However, as pointed by [MM11], it leads to a too weak result since the upper bound of N(ǫ, P f , d H ) will have a strong dependency on ℓ. It is then necessary to adapt the proof of [GvdV01] to obtain a sufficiently sharp upper bound of the entropy of P f (with respect to d T V which is easier to handle here).
These results are still true with d H using The second inequality opens the way for the case of unknown f since we express the dependency on f and ℓ. The rate s/(2s + 2) obtained in Theorem 2.1 comes from this upper bound of the order ℓ 2 (to be compared to ℓ when no shift occurs in the standard regression model). We currently do not know whether smaller covering numbers can be obtained in the SIM.
Proof. To build an ǫ-covering of P f , we now • approximate any mixture g by a finite oneg such that We first fix the notation p = 2ℓ + 1 which is the dimension of the multivariate mixture. For any R > 0, denote E R the centered ball of C p of radius R. For sake of simplicity, we will sometimes omit the dependence of p with ǫ. There exists an absolute constant a such that .
Let ν be a measure on [0, 1] that dominates both g andg.
Term (A) We will pick R such that (A) is smaller than ǫ/2, first set R 2 > (1 + a) 2 p ≥ a −2 (1 + a) 2 θ 2 and with this choice, This simply implies that, From the concentration of χ 2 p statistics given by Lemma 1 of [IL06]: and this term is smaller than ǫ/2 if we pick c large enough, since log 1 ǫ p.
Term (B) We follow the strategy of [GvdV01] which exploits the smoothness of Gaussian laws and exhibit suitable discrete mixtures. We traduce their matching moment conditions to the multivariate setting by exploiting Fourier analysis. Remark that Thus, for all z ∈ E R , we have We now decompose θ = (θ −ℓ , . . . , θ ℓ ) and z = (z −ℓ , . . . , z ℓ ) using polar coordinates: where (a(r, m)) r=1...k,m=−ℓ...ℓ only depends on z and θ. Using Euler's identity, where b stands for a complex vector obtained through the Binomial formula.
Caratheodory's theorem shows that one can findg with a finite support of size For such finite mixture lawg, we obtain ∀z ∈ C p , (1 + c)p, and using the volume of E R and Stirling's formula, we obtain where we used in the last equation p p /p! ≤ C p . If we define k in (4.2) such that k ∼ bℓ for a sufficiently large b, we then obtain for a universal C: In order to bound (B) by ǫ/2, we consider k ǫ ∼ bℓ ǫ for b large enough, since log 1 ǫ ℓ ǫ , we have foundg with a discrete support of cardinal s ǫ ∼ 2bℓ 2 ǫ points, with s ǫ not depending on g, such that d T V (P f,g , P f,g ) ≤ ǫ/2. Now, the first inequality in Proposition 4.2 comes from Lemma 2 of [GW00]. The second inequality is now deduced uniformly from the first one using θ H1 ≤ ℓ θ when f ∈ H ℓ .

General case
We describe the picture when f varies, which is our main objective. We assume that f ∈ H s and define a sieve which depends on a cut-off ℓ and a size w: Remark 4.1. i) If we focus on the role of w ǫ , we see in the proof of Proposition 4.2 that a w ǫ of smaller order than √ ℓ ǫ would not decrease the obtained entropy number. In the same time, a larger radius w ǫ entails a damaged entropy number of order ℓ ǫ (ℓ ǫ ∨ w 2 ǫ ) log 1 ǫ + log ℓ ǫ . At the end, this would also damage the contraction rate of P f,g . ii) Our model is a special case of Gaussian mixture models, nevertheless it may be generalized to other cases within a growing dimension setting. iii) We will use a higher choice of ℓ ǫ than log 1 ǫ , this will be fixed in Section 5.1. iv) Note that we can derive the same upper bound for the metric entropy using the Hellinger distance.
The proof of Theorem 4.1 is based on two simple results. The first one is the Girsanov formula (see e.g. Appendix of [BG10]). In our framework, we obtain for any measurable trajectory Y . The second result is given as follows.
Proof. P f,g is a mixture model:

and the last term is bounded by
If we denote U a random variable N C (0, 1), standard argument using Girsanov's formula yields Proof of Theorem 4.1. We build a ǫ-covering of P ℓ,w with ǫ/2-coverings for f and g. First, let P f,g and Pf ,g two elements of P ℓ,w and use the triangle inequality We will look for a covering method that will use the inequality above and a tensorial argument, it requires to bound both terms. The second term is handled uniformly inf by Proposition 4.2. We handle the first one with Lemma 4.1, deducing ǫ/2-coverings of P f,g for fixed g from ǫ/ √ 2-coverings of f for the . : Therefore only the entropy obtained through Proposition 4.2 matters.

Hellinger neighborhoods
We describe Hellinger neighborhoods of P f 0 ,g 0 in terms of (f, g) and later turn them into Kullback-Leibler ones to compute their prior mass. For sake of simplicity, E 0 F (Y ) will refer to the expectation of F (Y ) when Y ∼ P f 0 ,g 0 . For a cut-off ℓ n , we denote f 0 ℓn ∈ H ℓn the truncation of f 0 at frequency ℓ n , we get Next, we provide sharp upper bounds on (E 1 ), (E 2 ), (E 3 ) to find a suitable lower bound of the prior mass of Hellinger neighborhoods.
Upper bound of (E 1 ) Since d 2 H ≤ d KL , the Girsanov formula (4.3) yields We now obtain the upper bound of (E 1 ) according to the next proposition.
Proposition 4.3. Assume that Y ∼ P f 0 ,g 0 and f 0 ∈ H s , then Proof. Denote Y a random variable sampled from P f 0 ,g 0 .For any function F of the trajectory Y , we will denote E β F (Y ) the expectation of F (Y ) up to the condition that the shift is equal to β, and of course one has For each possible value of β ∈ [0, 1], we define We can now split the randomness of the Brownian motion into two parts: the first one is spanned by the Fourier frequencies from −ℓ n to ℓ n and the second part is its orthogonal (in L 2 ): W = W 1 + W 2 . Of course, W 1 and W 2 are independent.
For any fixed β, D β (α) is measurable with respect to the filtration associated to W 1 , and X β (α) is independent of W 1 . Jensen's inequality implies The notation E W1 β F (Y ) (resp. E W2 β F (Y )) used above refers to the expectation of F (Y ) with respect to W 1 (resp. with respect to W 2 ) with a fixed β.
We can now switch log and sup since log is increasing, and we obtain and Cauchy-Schwarz's inequality yields (Ẽ 1 ) ≤ √ 2 f 0 − f 0 ℓn . Note that untill now we did not use the hypothesis f 0 ∈ H s . It is only needed to get the last inequality in Proposition 4.3.
Upper bound of (E 3 ) We are interested in d H (P f 0 ℓn ,g , P f,g ) when f 0 ℓn is close to f , ℓ n grows up to +∞ ( with the same mixture law on [0, 1]). The important fact will be its exclusive dependence with respect to f 0 ℓn −f . This upper bound is immediate from Lemma 4.1: Upper bound for (E 2 ) We can use a discrete mixture with η-separated support points as follows.

Now, let g be in M([0, 1]), then an adaptation of Lemma 5.1 of [GvdV01] leads to
which permits to conclude.

Description of neighborhoods
We can now gather (E 1 , E 2 , E 3 ) as follows.
There exists a constant C 0 depending only on f 0 such that ∀g ∈ G ǫn , ∀f ∈ F ǫn : d H P f 0 ,g 0 , P f,g ≤ C 0 ǫ n .

Proof of Theorem 2.1
We first study the minoration of the Kullback-Leibler neighborhoods.
We could obtain a suitable lower bound as soon as λ(ℓ n ) ≥ e −cℓ 2 n log ℓn and any λ possessing an heavier tail would also fit. But, we also need to majorize Π DP n (P \ P n ), see below. Proof. We have seen in the proof of Proposition A.1 that M 2 δ is uniformly bounded with respect to f and f 0 for a suitable choice of δ. We restrict our study to the elements f such that f ≤ 2 f 0 . We know from Proposition A.1 that as soon asǫ n log 1 ǫn ≤ cǫ n with c small enough, Vǫ n (P f 0 ,g 0 , d H ) := P f,g ∈ P|d H (P f 0 ,g 0 , P f,g ) ≤ǫ n , f ≤ 2 f 0 ⊂ V ǫn (P f 0 ,g 0 , d KL ). This last condition onǫ n is true as soon as ǫ n :=cǫ n log 1 ǫ n −1 (5.1) withc small enough. Now, Proposition 4.5 permits to describe a subset of Vǫ n (P f 0 ,g 0 , d H ),with subsets Fǫ n and Gǫ n for f and g. Choose ℓ n :=ǫ −1/s n . We first bound the prior mass on Gǫ n . This follows from the bound given by Lemma 6.1 of [GGvdV00] for DP. Note that J n ℓ 2 n =ǫ −2/s n ≤ǫ −4 n and there exists an absolute constant a ∈ (0, 1] such that the condition J n ≤ 2(aǫ n ) −4 is fulfilled. Hence one can find constants C and c such that for n large enough We next consider the prior mass on Fǫ n . Remark that when n is large enough, any element of Fǫ n satisfies f ≤ 2 f 0 and the additional condition on f in the definition of Vǫ n (P f 0 ,g 0 , d H ) is instantaneously fulfilled. Of course: From our assumption on the prior λ, we have λ(ℓ n ) ≥ e −cℓ 2 n log ρ ℓn , and the value of the volume of the (4ℓ n + 2)-dimensional Euclidean ball of radiusǫ 2 n implies .
For n large enough we get Proposition 5.2. For any sequences k n → +∞ and ǫ n → 0 as n → +∞, define w 2 n = 4k n + 2, then there exists a constant c such that and log D (ǫ n , P kn,wn , d H ) k 2 n log k n + log 1 ǫ n .

24
Proof. The upper bound on the packing number comes directly from Theorem 4.1 since we set w n = √ 2k n + 1. To control the prior mass outside the sieve, remark that owing to the construction of our prior, we have where each θ k for −k n ≤ k ≤ k n follows a centered Gaussian law of variance ξ 2 n . Now, there exists some constants c and C such that for sufficiently large n: |k|≥kn λ(k) ≤ Cλ(k n ) ≤ e −ck 2 n log ρ (kn) . Regarding now the second term of the upper bound in (5.4), we use (4.1) to get The value of ξ n yields Π n (P \ P kn,wn ) ≤ e −c[k 2 n log ρ (kn)∧knξ −2 n ] .
Remark 5.1. Note that the limited size of w n and the entropy of the sieve (see Proposition 4.2) prevent the use of a prior independent on the sample size. Indeed, we need in (5.5) a small enough ξ n to avoid a too large weight outside of the sieve P ℓn,wn . Note also that a variance ξ k,n that decays with the frequency k would not bypass this limitation.
We are now able to conclude the proof of the posterior consistency.
Non adaptive prior This case is simpler, for instance we can fix

Proof of Theorem 2.2
The entropy bounds of the model are still valid for the proof of Theorem 2.2. However, the way we previously described the closeness of P f,g and P f,g is not satisfactory with a smooth GP prior: Lemma 6.1 of [GGvdV00] designed for DP is not yet convenient for GP and we must derive another neighborhood structure, easily tractable with GP. The next Proposition, of independent interest for mixture models, does the job. Consider the inverse functions of the distribution functions defined by ∀u ∈ [0, 1], G −1 (u) = inf{t ∈ [0, 1] : g([0, t]) > u}, and recall that the Wasserstein (or Kantorovich) distance W t , for t > 0, is given by 3. Consider f ∈ H s for s ≥ 1/2, and let (g,g) ∈ M([0, 1]), then Proof of Proposition 5.3. The convexity of d T V , and Lemma 4.1 yields by Proposition 4.1. Now, the last inequalities are classical, see for instance [GS02,Theorem 4].
Proof of Theorem 2.2. We mimic the proof of Theorem 2.1 Complementary of the sieve First, we consider the following sieve over P: where k n is a sequence such that k n −→ +∞ as n −→ +∞, and w 2 n = 4k n + 2. Our sieve is included in the set of all mixture laws (without any restriction on the smoothness of g), and since the marginal of the prior on the f coordinate is the same for the DP and the GP priors, we can apply Proposition 5.2 to get , then P f,g ∈ V ǫn (P f 0 ,g 0 , d KL ).
Following the arguments of Section 5.1, we can find γ > 0 and κ > 0 such that Remark 5.2. Note that we have chosen to deal with a non-parametric structure on g for the sake of generality. Now, if g is assumed unknown and to belong to a parametric space of dimension d, it is reasonnable to expect a better convergence rate. More precisely, one can use Lemma 4.1 and Proposition 5.3 to obtain an entropy number of order ǫ −1/s log(ǫ −1 ) and then deduce a standard non parametric contraction rate on P f,g of order n −s/(2s+1) .

Semi-parametric results for Π GP n
In the SIM, an important issue is the identifiability with respect to the unknown curve f and mixture law g. We provide a generic identifiability condition and then deduce from Theorem 2.2 a contraction rate around the true f 0 and g 0 .

Identifiability of the model
In previous works, identifiability generally depends on a restriction on the support of g. For instance, [BG10] assume g centered and compactly supported in [−1/4, 1/4] (shapes were defined on [−1/2; 1/2] instead of [0, 1] in our paper) although f is supposed to have a non vanishing first Fourier coefficient (θ 1 (f ) = 0). The same kind of conditions are also assumed in [BG12]. If the condition on the first harmonic on f is imperative to obtain identifiability of g, the restriction on its support size seems artificial and we detail in the sequel how one can avoid it. Recall that for any curve Y , θ 1 (Y ) = θ 0 1 e −i2πτ + ξ. Up to a change of variable, we can always modify g ing such that θ 0 1 ∈ R + , for instance fixg(ϕ) = g(ϕ + α) where α is the complex argument of θ 0 1 . Consequently, w.l.o.g. we study identifiability of the SIM when f belongs to F s .
Proof of Theorem 2.3. We use three hierarchical steps. First, we prove that if P f,g = Pf ,g =⇒ θ 1 (f ) = θ 1 (f ). Then we deduce that g =g and at last we obtain the identifiability for all other Fourier coefficients of f .
Note that as soon as ν > 1/2, g andg admit densities w.r.t. the Lebesgue measure on [0, 1]. In the sequel we use the same notation g for the density of g.
Point 2: Identifiability on g We still assume that d T V (P 1 f,g , P 1 f,g ) = 0, so that θ 1 =θ 1 and we want to infer that g =g. Using a polar change of variables In the expression above, we denote h = g −g and ψ a (ϕ) is defined as Of course, ψ a ∞ ≤ 4πe a , and we roughly bound |ψ a (ϕ)| ≥ |ψa(ϕ)| 2 4πe a . Hence, Since ν > 1, h and ψ a may be expanded in Fourier series since h ∈ L 2 ([0, 1]): Thus, the L 2 norm of ψ a is given by Hence, we obtain for a ∈ R: A n is analytic and is not the null function, otherwise all its derivative would also vanish but remark that (cos u) n = T n (cos u) and several derivations yield α k T k (cos u) + 2 1−n T n (cos u) T n (cos u)du = 2 1−n π > 0.
Note that in the meantime, we also obtain that A (j) n (0) = 0, ∀j < n, so that A n (a) ∼ a →0 2 1−n π n! a n . (6.4) We can conclude the proof of the identifiability of g using (6.3) in (6.1) to obtain .
Since it implies g =g, it remains to show that θ k (f ) = θ k (f ), ∀k ∈ Z \ {0, 1}. A similar argument as the one of Point 1 yields But we cannot directly conclude here (no restriction is assumed on the phase of Fourier coefficients θ k (f ), k ∈ Z \ {0, 1}). Writeθ k = θ k e iϕ , and g =g implies dz.
Point 4: Minimal assumption on g We now briefly show that M([0, 1]) + is minimal for the identifiability of the model. Indeed, consider any integer k and a couple of functions (f,f ) whose Fourier coefficients are θ 1 (f ) = θ 1 (f ) = 1 and θ k (f ) = θ k (f )e −iϕ = 0. By construction, for any g ∈ M([0, 1]) + , P 1 f,g = P 1 f ,g . Following the definition of F above, when f andf have only two frequencies switched on, it is thus necessary and sufficient to consider As an holomorphic function, F is null iff all its derivatives at 0 vanish. We have already seen that F (0) = 0 and . A simple induction then shows that Thus, if g ∈ M([0, 1]) + , there exists m such that F (m) (0) = 0 and then F cannot be null, unless ϕ = 0. Hence, we see in this framework that Conversely, if g / ∈ M([0, 1]) + , there exists k ∈ Z such that c jk (g) = 0 for all integers j ∈ Z but one cannot distinguish P f,g from Pf ,g as soon as In particular, in this case the model is not identifiable.
The main difficulty was d T V (P 1 f,g , P 1 f ,g ) =⇒ g =g and we will use this to obtain a contraction rate for (f, g) around (f 0 , g 0 ). The main inequality is (6.5) where h = g −g. and we first use it to study a contraction rate around g 0 .
6.2. Contraction rate of the posterior distribution around f 0 and g 0 At this stage, we assume that (f, g) ∈ F s × M ν ([0, 1]), with s ≥ 1 and ν > 1. Sketch of proof: the next proof will be splitted in three parts. First we show that the joint contraction on P f,g around P f 0 ,g 0 implies a contraction property of the first Fourier coefficient of f around θ 1 (f 0 ). We use this last property to obtain the contraction of g around g 0 . At last, we use the whole marginals of the complete process to derive contraction rates for the other Fourier coefficients of f around those of f 0 . Moreover, the contraction rate is n −1/3×[ν(s∧1)/(2ν(s∧1)+1)∧s/(2s+2)] (log n) 1/3 .

33
In conclusion, (6.6),(6.7) and (6.8) shows that for M large enough: We now use (6.5) applied with θ 1 = θ 0 1 to obtain our consistency rate: Now, equivalents given by Lemma C.1 (see Appendix C) shows that the main part of the integral above corresponds to a ∈ [0, c √ n]. One can find κ such that Now, we can apply the Stirling formula to obtain: which is lower bounded by C(θ 0 1 )e −n log(n) . Such lower bound in (6.9) yields for (c,c) sufficiently small. We now end the proof of the Theorem: choose a frequency cut-off k n that depends on n and remark that Equation (6.8) implies that the above term is lower than e ckn log kn ǫ 1/3 n + k −2ν n up to a multiplicative constant, with probability close to 1 as n goes to +∞. The optimal choice for k n yields [k n + 2ν] log k n = 1 3 log 1 ǫn , which ensures that In the last lines, we use the knowledge of ν as well as the radius A of the Sobolev ellipsoid M ν ([0, 1])(2A) to build a suitable threshold k n . But we cannot control easily the posterior weights on M ν ([0, 1])(2A) from the posterior around P f 0 ,g 0 : that's why it is difficult to conclude with an adaptive prior.

Posterior contraction rate around f 0
We then obtain a result for the neighborhoods of f 0 . Proposition 6.1 leads to a polynomial order on θ 1 . It is far from being the case for the other frequencies.
Proof of Theorem 2.4, ii). The proof of ii) is inspired from the one of i).
Point 1: Triangular inequality For any f ∈ F s , we have for any k ∈ Z: and if ǫ n ≪ log(n) −ν , we have as n −→ +∞: . . , Y n −→ 1 (6.10) We can use the Cauchy-Schwarz inequality as follows: Remark 6.1. The lower bound obtained on d T V (P k f,g 0 , P k f 0 ,g 0 ) is important to understand how one can build an appropriate net of functions (f j , g j ) ∈ F s × M ν ([0, 1]) hard to distinguish with the L 2 distance. When |θ k | = |θ 0 k |, it is quite easy to distinguish the two hypotheses. On the contrary, when the modulus are the same, the behaviour of the Fourier coefficients of g 0 becomes important. This is a clue to exhibit a "difficult" net.

Lower bound from a frequentist point of view
We have chosen to use the Fano Lemma (see [IH81] for instance) instead of Le Cam's method, since we will only be able to find some discrete (instead of convex) set of pairs (f j , g j ) in F s × M ν ([0, 1]) closed according to the Total Variation distance.
Proof of Theorem 2.5. We are looking for a set (f j , g j ) j=1...pn such that each P fj ,gj are closed together with rather different functional parameters f j or g j . Reading carefully the Bayesian contraction rate is informative to build p n hypotheses which are difficult to distinguish. First, we know that since each f j should belong to F s , we must impose for any f j that θ 1 (f j ) > 0. Proposition 6.1 shows that two laws P fj ,gj and P f j ′ ,g j ′ are statistically very different as soon as θ 1 (f j ) = θ 1 (f j ′ ). Then we build our net using a common value for θ 1 (f j ): Point 1: Net of functions (f j ) j=1...pn We choose the following construction ∀j ∈ {1 . . . p n } f j (x) = e i2πx + p −s n e i2 (j−1) pn π e i2πpnx . (6.16) The number of elements p n will be next adjusted and will grow to +∞. Our construction naturally satisfies that each f j ∈ F s since the modulus of the p n -th Fourier coefficient is of size p −s n . At last, we have: ∀(j, j ′ ) ∈ {1 . . . p n } 2 , j = j ′ : Point 2: Net of measures (g j ) j=1...pn The cornerstone of the lower bound is how to adjust the measures of the random shifts to make the distributions P fj ,gj , j = 1 . . . p n , as close as possible. First, remark that we will still use Total Variation instead of the entropy between laws since Kullback Leibler distance is difficult to handle with mixtures, we use the chain of inequalities: Hence, from the tensorisation of the entropy, we must find a net such that d T V P fj ,gj , P f j ′ ,g j ′ ≤ η n with − √ η n log η n = O(1/n) to obtain a tractable application of the Fano Lemma. It imposes to find mixture laws such that d T V P fj ,gj , P f j ′ ,g j ′ 1 (n log n) 2 , it is sufficient to build (g j ) j=1...pn satisfying ∀j ∈ {1 . . . p n } d T V P fj ,gj , P f1,g1 1 (n log n) 2 . (6.17) For sake of convenience, we replace p n by p. In a similar way, θ j p = θ p (f j ) given by θ j p = e i2παj θ 1 p where α j = j−1 pn . From (6.16), we have Now, we use the smoothness of Gaussian densities, denote F defined on R 4 by where z = (x 1 + iy 1 , x 2 + iy 2 ) and θ j • ϕ = (e i2πϕ , θ p j e i2πpϕ ). To control F , we adapt the proof of Proposition 4.2. Only the sketch of the proof is given here. We use a truncation in R Rn := B R 2 (0, R n ) 2 . Outside R Rn , we use the key inequality given by (4.2). Inside R Rn we need to satisfy some constraints on the Fourier coefficients. Since here the only non null Fourier coefficients are of order 1 and p, we have finally to ensure that ∀m, l ≤ d ∀(s,s) ∈ {−1; +1} 2 c sm+sℓp (g j )es ℓαj = c sm+sℓp (g 1 ). (6.18) Hence, the maximum size of d is d = p/4. We have from (6.18) and (4.2): d T V P fj ,gj , P f1,g1 = 1 2π 2 RR n |F (x 1 , y 1 , x 2 , y 2 )|dx 1 dy 1 dx 2 dy 2 + 1 2π 2 R c Rn |F (x 1 , y 1 , x 2 , y 2 )|dx 1 dy 1 dx 2 dy 2 e −R 2 n /2 + (eR n ) p/4 (p/4) p/4 4 e −R 2 n /2 + (eR n ) p (p/4) p , where the last point is deduced from inequality (4.2). We choose now R n such as R n := 3 √ log n to obtain that e −R 2 n /2 ≪ (n log n) −2 as required in condition (6.17). Now, we control the last term of the last inequality: the Stirling formula yields (eR n ) p (p/4) p e p log(3 √ log n)−p log p/4 .
If one chooses p n = κ log n with κ > 12, we then obtain that d T V P fj ,gj , P f1,g1 e −Cpn log pn (n log n) −2 .
Such a choice of R n and p n ensures that (6.17) is fulfilled.
In our model, Hellinger neighborhoods are almost Kullback-Leibler ones (up to a log term) since a sufficiently large moment exists for q (q log q/q 1+δ tends to 0 when q tends to +∞, and a second order expansion of q log q − q + 1 around 1 yields a term similar to [ √ q − 1] 2 ).
Proof. This proposition uses a corollary of the Rice formula (see e.g. [AW09]) stated in Lemma A.1. We use Girsanov's formula (4.3): where the last line is obtained using Cauchy-Schwarz's inequality and : We now set δ ∈ (0, 1] and we define the trajectories E δ as Hence, following the definition of M 2 δ of (A.1), we have For δ small enough, (δ ≤ Integrating by parts the last expectation, the use of Lemma A.1 yields Now, we can choose δ non negative and small enough such that M 2 δ < ∞ since for u ≥ √ e, we have e − log 2 (u) 32δ 2 f 0 2 = u −1/32δ 2 f 0 2 , which is an integrable function as soon as δ 2 < 1 32 f 0 2 . The same result holds replacing f 0 by f and M 2 δ is uniformly bounded if f ≤ 2 f 0 . We now show that the technical inequality used in (A.2) is satisfied.
The second point is a simple consequence of inequality u ′ ≤ ℓ u .

Appendix B: Small ball probability for integrated Brownian bridge
Recall that p v defined by (2.5) refers to the probability distribution which is proportionnal to e v . We detail here a lower bound of the prior weight around g 0 . As a log density model, it is enough to find a lower bound of the weight around w 0 if one writes g 0 ∝ e w 0 according to Lemma 3.1 of [vdVvZ08a]. Recall the notation of the prior weight q ν,A (G ǫ ), where G ǫ were previously defined as: Theorem B.1. There exists c such that for ǫ small enough q ν,A (G ǫ ) ≥ ce −ǫ − 1 kν +1/2 .
Proof. Structure of the prior We denote w 0 := log g 0 , which is a k ν -differentiable function of [0, 1], extended to a 1-periodic element of C kν (R). We definẽ q the prior defined by (2.4) on this class of periodic functions (and omit the dependence on ν and A for sake of simplicity). The prior q ν,A is then derived fromq through (2.5). We can remark that our situation looks similar to the one described in paragraph 4.1 of [vdVvZ08a] for integrated brownian motion. The log-density w 0 is approximated by a "Brownian bridge started at random": where B is a real Brownian bridge between 0 and 1. We suppose B built as B t = W t −tW 1 on the basis of a Brownian motion W on [0, 1]. The key point of the operateur J is that J kν (B)(0) = J kν (B)(1) = 0 and J k (f ) ′ = J k−1 (f )− 1 0 J k−1 (f ). Hence, an induction argument yields J kν (B) (j) (0) = J kν (B) (j) (1), whenever j ∈ {1, . . . , k ν }. Hence, J kν (B) and its first k ν derivatives are 1-periodic. Since functions ψ i are also 1-periodic and C ∞ (R), our priorq generates admissible functions of [0, 1] to approximate w 0 . We will denote this set of admissible trajectories C kν 1 (1-periodic functions which are k ν times differentiable). where c i,k (W ) are explicit linear functionals that depend on W 1 and on the collection 1 0 (1 − t) k−j W t dt 1≤j≤k (and not on t), and I k is the operator used in [vdVvZ08a] defined as I 1 (f ) = t 0 f and I k = I 1 • I k−1 for k ≥ 2. Hence, ∀t ∈ [0, 1], T (B, Z 1 , . . . , Z kν ) (k) (t) = W t + c k,k (W )k! + c k+1,k (W )(k + 1)!t + kν i=0 Z i ψ implies that Z 1 i = Z 2 i for i ∈ {0, . . . , k ν }, and next that W 1 = W 2 and B 1 = B 2 . Thus, it is possible to apply Lemma 7.1 of [vdVvZ08b] to deduce that the RKHS associated to the Gaussian process

Transformation of the Brownian bridge
Extremal derivatives We study the process b := kν i=0 Z i ψ i and are looking for realizations of (Z i ) i that suitably matches arbitrarily values w   If one denotes α k := 2πk, the vector of derivatives as d 0 := (w (j) 0 (0)) j=0...kν , Z = (Z 0 , . . . , Z kν +1 ) and the squared matrix of size (k ν + 1) × (k ν + 1): , then we are looking for values of Z such that d 0 = A 0 Z. The matrix A 0 is invertible since it may be linked with the Vandermonde matrix. We can now establish that the support of the prior (adherence of B) is exactly C kν 1 . Indeed, the support of the transformed Brownian bridge J k (B) is included in elements of C kν 1 with at the most k+1 constraints on the values of their k ν +1 first derivatives at the point 0. These constraints are given by the coefficients (c i,kν ) i=0...kν in (B.2). From the invertibility of the matrix A 0 , it is possible to match any term w Small ball probability estimates We now turn into the core of the proof of the Theorem. Since the Total Variation distance is bounded from above by the Hellinger distance, an immediate application of Lemma 3.1 of [vdVvZ08a] shows that it is sufficient to find a lower bound of theq(G ǫ ) wherẽ This integral is strongly related to the density of continuous time random walk: if B n (a) = e −a An(a) 2π , one has B n (0) = 0, ∀n = 0 and B 0 (0) = 1 and at last B ′ n (a) = B n (a − 1) + B n (a + 1) 2 − B n (a).
We can recognize here the forward Kolmogorov equation and B n (a) is the probability that a C.T.R.W. is located at n ∈ Z at time a. We then deduce equivalents of B n (a): from the Brownian approximation of the C.T.R.W. , we have , and (C.1) gives a different information for small a. We stopped here our investigations since integral of (C.1) is much more larger than integral of (C.3).
For a ∈ [ √ n, 2n], we do not have found any satisfactory equivalent on modified Bessel functions. Formula of [AS64] is still tractable but do not lead to "uniform enough" formula (we need to integrate this equivalent).