On the consistency of Fr\'echet means in deformable models for curve and image analysis

A new class of statistical deformable models is introduced to study high-dimensional curves or images. In addition to the standard measurement error term, these deformable models include an extra error term modeling the individual variations in intensity around a mean pattern. It is shown that an appropriate tool for statistical inference in such models is the notion of sample Fr\'echet means, which leads to estimators of the deformation parameters and the mean pattern. The main contribution of this paper is to study how the behavior of these estimators depends on the number n of design points and the number J of observed curves (or images). Numerical experiments are given to illustrate the finite sample performances of the procedure.


A statistical deformable model for curve and image analysis
In many applications, one observes a set of curves or grayscale images which are high-dimensional data. In such settings, it is reasonable to assume that the data at hand Y ℓ j , denoting the ℓ-th observation for the j-th curve (or image), satisfy the following regression model: Y ℓ j = f j (t ℓ ) + σε ℓ j , j = 1, . . . , J, and ℓ = 1, . . . , n, (1.1) where f j : Ω −→ R are unknown regression functions (possibly random) with Ω a convex subset of R d , the t ℓ 's are non-random points in Ω (deterministic design), the error terms ε ℓ j are i.i.d. normal variables with zero mean and variance 1, and σ > 0. In this paper, we will suppose that the f j 's are random elements which vary around the same mean pattern. Our goal is to estimate such a mean pattern and to study the with Ω = R 2 , P ⊂ R × R × R 2 where R α = cos(α) − sin(α) sin(α) cos(α) is a rotation matrix in R 2 , e a is an isotropic scaling and b a translation in R 2 .
-Deformation by a Lie group action: the two above cases are examples of a Lie group action on the space L 2 (Ω) (see [Hel01] for an introduction to Lie groups). More generally, assume that G is a connected Lie group of dimension p acting on Ω, meaning that for any (g, t) ∈ G × Ω the action · of G onto Ω is such that g · t ∈ Ω. In general, G is not a linear space but can be locally parametrized by a its Lie algebra G ≃ R p using the exponential map exp : G → G. If P ⊂ R p . This leads for (θ, f ) ∈ P × L 2 (Ω) to define the deformation operators T θ f (t) := f (exp(θ) · t) .
Then, in model (1.1), we assume that the f j 's have a certain homogeneity in structure in the sense that there exists some f ∈ L 2 (Ω) such that for all t ∈ Ω, and j = 1, . . . , J, where θ * j ∈ P, j = 1, . . . , J are i.i.d. random variables (independent of the ε ℓ j 's) with an unknown density g with compact support Θ included in P satisfying: Assumption 1.1. The density g of the θ * j 's is continuously differentiable on P and has a compact support Θ included in P ⊂ R p . We assume that Θ can be written Θ = θ = (θ 1 , . . . , θ p ) ∈ R p , |θ p 1 | ≤ ρ, 1 ≤ p 1 ≤ p (1.4) where ρ > 0.
The function f in model (1.3) represents the unknown mean pattern of the f j 's. The Z j 's are supposed to be independent of the ε ℓ j 's and are i.i.d. realizations of a second order centered Gaussian process Z taking its values in L 2 (Ω). The Z j 's represent the individual variations in intensity around f , while the random operators T θ j model geometric deformations in time or space. Then, if we assume that the T θ 's are linear operators, equation (1.3) leads to the following statistical deformable model for curve or image analysis Y ℓ j = T θ * j f (t ℓ ) + T θ * j Z j (t ℓ ) + σε ℓ j , j = 1, . . . , J, and ℓ = 1, . . . , n, (1.5) where ε ℓ j are i.i.d. normal variables with zero mean and variance 1. Model (1.5) could be also called a perturbation model using the terminology in [Goo91,Huc10] for shape analysis. To be more precise, let Y ∈ R n×2 be a set of n points in R 2 representing a planar shape. Define a deformation operator T θ for θ = (a, α, b) ∈ Θ = R × [0, 2π] × R 2 acting on R n×2 in the following way T θ Y = e a YR α + 1 n b ′ , where R α = cos(α) − sin(α) sin(α) cos(α) , and 1 n = (1, . . . , 1) ′ ∈ R n . Consistent estimation of a mean shape has been first studied in [Goo91] when a set of random shapes Y 1 , . . . , Y J is drawn from the following perturbation model Y j = T θ * j (µ + ζ j ), j = 1, . . . , J.
(1.6) Model (1.6) is similar to the statistical deformable model (1.5), where µ ∈ R n×2 is the unknown perturbation mean to estimate, and ζ j are i.i.d. random vectors in R n×2 with zero mean. Nevertheless, there exists major differences between our approach and the one in [Goo91]. First, in model (1.5), the deformations parameters θ * j are assumed to be random variables following an unknown distribution, whereas they are just nuisance parameters in model (1.6) for shape analysis, see [Goo91,KM97]. In some applications (e.g. in biomedical imaging [JDJG04]), it is of interest to reconstruct the unobserved parameters θ * j and to estimate their distribution. One of the main contribution of this paper is then to construct upper and lower bounds for the estimation of such deformation parameters. Moreover, in model (1.5), they are too additive error terms, whereas the model (1.6) only include the error term ζ j . In model (1.5), the ε ℓ j is an additive noise modeling the errors in the measurements, while the Z j 's model (possibly smooth) variations in intensity of the individuals around the mean pattern f .
In [KM97], the authors studied the relationship between isotropicity of the additive noise ζ j and the convergence of Procrustean procedures to the perturbation mean µ as J → +∞. It is shown in [KM97] that, for isotropic errors, Procrustean means are consistent, but that, for non-isotropic errors, they may not converge to µ. For a recent discussion on the issues of consistency of sample Procrustes means in perturbation models and extension to non-metrical Fréchet means, we refer to [Huc10] and [Huc11]. In this paper, we carefully analyze the role of the dimension n and the number of samples J on the consistency of Procrustean means in model (1.5). To obtain consistent procedures, we show that it is not required to impose very restrictive conditions on the error terms Z j such as isotropicity for the ζ j in (1.6) for shape analysis. Here, the key quantity is the dimension n of the data (number of design points) which plays the central role to guarantee the converge of our estimators. This point is another major difference with the approach of statistical shape analysis [Goo91] that does not take into account the dimensionality of the shape space to analyze the consistency of Procrustean estimators.
Note that a subclass of the deformable model (1.5) is the so-called shape invariant model (SIM) . . , J, and ℓ = 1, . . . , n, (1.7) i.e. without incorporating in (1.5) the additive terms Z j . The goal of this paper is twofold. First, we propose a general methodology for estimating f and the θ * j 's based on observations coming from model (1.5). For this purpose, we show that an appropriate tool is the notion of sample Fréchet mean of a data set [Fré48,Zie77,BP03] that has been widely studied in shape analysis [Goo91,KM97,Le98,LK00,Huc10] and more recently in biomedical imaging [JDJG04,Pen06]. Secondly, we study the consistency of the resulting estimators in various asymptotic settings: either when n and J both tend to infinity, or when n is fixed and J → +∞, or when J is fixed and n → +∞.

Organization of the paper
Section 2 contains a description of our estimating procedure and a review of previous work in mean pattern estimation. In Section 3, we derive a lower bound for the quadratic risk of estimators of the deformation parameters. In Section 4, we discuss some identifiability issues in model (1.5). In Section 5 we derive consistency results for the Fréchet mean in the case (1.2) of randomly shifted curves. In Section 6 and Section 7, we give general conditions to extend these results to the more general deformable model (1.5). Section 8 contains some numerical experiments. A small conclusion with some perspectives are given in Section 9. All proofs are postponed to a technical Appendix.
2 The estimating procedure

A dissimilarity measure based on deformation operators
To define a notion of sample Fréchet mean for curves or images, let us suppose that the family of deformation operators (T θ ) θ∈P is invertible in the sense that there exists a family of operators (T θ ) θ∈P such that for any (θ, f ) ∈ P × L 2 (Ω)T Then, for two functions f, h ∈ L 2 (Ω) introduce the following dissimilarity measure If d 2 T (h, f ) = 0 then there exists θ ∈ P such that f =T θ h meaning that the functions f and h are equal up to a geometric deformation. Note that d T is not necessarily a distance on L 2 (Ω), but it can be used to define a notion of sample Fréchet mean of data from model (1.5). For this purpose let F denote a subspace of L 2 (Ω) and suppose thatf j are smooth functions in F ⊂ L 2 (Ω) obtained from the data Y ℓ j , ℓ = 1, . . . , n for j = 1, . . . , J, see Section 5.2 and Section 6.2 for precise definitions. Following the definition of a Fréchet mean in general metric space [Fré48], define an estimator of the mean pattern f aŝ Note thatf falls into the category of non-metrical sample Fréchet means whose definitions and asymptotic properties are discussed in [Huc10] for random variables belonging to Riemannian manifolds. However, unlike the usual approach in shape analysis, the Fréchet mean (2.1) is based on smoothed data. In what follows, we show that smoothing is a key preliminary step to obtain the convergence off to the mean pattern f in the deformable model (1.5). It can be easily shown that the computation off can be done in two steps: first minimize the following criterion (θ 1 , . . . ,θ J ) = argmin (θ 1 ,...,θ J )∈Θ J

Previous work in mean pattern estimation and geometric variability analysis
Estimating the mean pattern of a set of curves that differ by a time transformation is usually referred to as the curves registration problem, see e.g. [GK92,Big06,RL01,WG97,LM04]. However, in these papers, studying consistent estimators of the mean pattern f as the number of curves J and design points n tend to infinity is not considered. For the SIM (1.7), a semiparametric point of view has been proposed in [GLM07] and [Vim10] to estimate non-random deformation parameters (such as shifts and amplitudes) as the number n of observations per curve grows, but with a fixed number J of curves. A generalisation of this semiparametric approach for two-dimensional images is proposed in [BGV09]. The case of image deformations by a Lie group action is also investigated in [BLV10] from a semiparametric point of view using a SIM.
In the simplest case of randomly shifted curves in a SIM, [BG10] have studied minimax estimation of the mean pattern f by letting only the number J of curves going to infinity. Self-modelling regression (SE-MOR) methods proposed by [KG88] are semiparametric models where each observed curve is a parametric transformation of the same regression function. However, the SEMOR approach does not incorporate a random fluctuations in intensity of the individuals around a mean pattern f through an unknown process Z j as in model (1.5). The authors in [KG88] studied the consistency of the SEMOR approach using a Procrustean algorithm. Recently, there has also been a growing interest on the development of statistical deformable models for image analysis and the construction of consistent estimators of a mean pattern, see [GM01, BGV09, BGL09, AAT07, AKT09].

Lower bounds for the estimation of the deformation parameters
In this section, we derive non-asymptotic lower bounds for the quadratic risk of an arbitrary estimator of the deformation parameters under the following smoothness assumption of the mapping (θ, t) −→ T θ f (t).
The lower bound given in inequality (3.1) does not decrease as J increases. Thus, if the number n of design points is fixed, increasing the number J of curves (or images) does not improve the quality of the estimation of the deformation parameters for any estimatorθ. Nevertheless, this lower bound is going to 0 as the dimension n → +∞.

General model
The main difference between the general model (1.5) and the SIM (1.7) is the extra error terms T θ * j Z j , j = 1, . . . , J. In what follows, E θ [ · ] denotes expectation conditionally to θ ∈ Θ J . Since the random processes Z j 's are observed through the action of the random deformation operators T θ * j it is necessary to specify how the T θ * j 's modify the law of the process Z j . Assumption 3.2. There exists a positive semi-definite symmetric n × n matrix Σ n (Θ) such that the This assumption means that the law of the random process Z is somewhat invariant by the deformation operators T θ . Such an hypothesis is similar to the condition given in [KM97] to ensure consistency of Fréchet mean estimators in Kendall's shape space using model similar to (1.5) with σ = 0. After a normalization step, the deformations considered in [KM97] are rotations of the plane, and the authors in [KM97] study the case where the law of the error term Z is isotropic, that is to say, invariant by the action of rotations.
Again, the lower bound (3.2) does not depends on J. Thus, increasing the number J of observations does not decrease the quadratic risk of any estimator of the deformations parameters. Moreover, the lower bound (3.2) tends to zero as n → +∞ only if lim n→+∞ n −1 s 2 n (Θ) = 0.

Identifiability conditions 4.1 The shifted curves model
Without any further assumptions, the randomly shifted curves model (3.3) is not identifiable. Indeed, if θ 0 ∈ Θ satisfies θ * j + θ 0 ∈ Θ, j = 1, . . . , J, then replacing f (·) by f (· − θ 0 ) and θ * j by θ * j + θ 0 does not change the formulation of model (3.3). Choosing identifiability conditions amounts to impose constraints on the minimization of the criterion for θ = (θ 1 , . . . , θ J ) ∈ Θ J , which can be interpreted as a version without noise of the criterion (2.2) using the ideal smoothersf j (·) = f (· − θ * j ). Obviously, the criterion D(θ) has a minimum at θ * = (θ * 1 , . . . , θ * J ) such that D(θ * ) = 0, but this minimizer of D on Θ J is clearly not unique. If the true shifts are supposed to have zero mean (i.e. Θ θg(θ)dθ = 0) it is natural to introduce the constrained set Under such assumptions, we will compute estimators of the random shifts by minimizing the criterion (2.2) over the constrained set Θ 0 and not directly on Θ J . Consistency of such constrained estimators will then be studied under the following identifiability conditions: Assumption 4.1. The mean pattern f is such that Assumption 4.2. The support of the density g is included in [−ρ ′ , ρ ′ ] for some 0 < ρ ′ ≤ ρ 2 < 1/4 and is such that Θ θg(θ)dθ = 0.
Under such assumptions, D(θ) can be bounded from below by the quadratic function 1 J θ − θ * Θ 0 2 which will be an important property to derive consistent estimators.
Assumption 4.2 and the condition that ρ < 1/16 in Proposition 4.1 mean that the support of the density g of the shifts is sufficiently small, and that the shifted curves f j (t) = f (t − θ * j ) are in some sense concentrated around the mean pattern f . Such an assumption of concentration of the data around the same mean pattern has been used in various papers to prove the uniqueness and the consistency of Fréchet means for random variables lying in a Riemannian manifold, see [Kar77,Le98,BP03,Afs11,Ken90].
Then, Θ is the set onto which we will carry the minimization of the criterion M (θ) (2.3). In the case of shifted curves and under Assumption 4.1 and 4.2, the only set onto which the criterion D vanishes is the An easy way to choose the set Θ is to take a linear subset of Θ J , see Figure 1 for an illustration. By considering the subset . More generally, if the deformation parameters θ j , j = 1, . . . , J are supposed to be random variables with zero mean, then optimizing D(θ) on Θ 0 is a natural choice. Another identifiability condition for shifted curves is proposed in [GLM07] and [Vim10] by taking where e 1 = (1, 0, . . . , 0) ∈ R J . In this case, θ * Θ 1 = (0, θ * 2 − θ * 1 , . . . , θ * J − θ * 1 ). Choosing to minimize D(θ) on Θ 1 amounts to choose the first curve as a reference onto which all the others curves are aligned, meaning that the first shift θ * 1 is not random, see Figure 1. Following the classical guidelines in M-estimation (see e.g. [vdV98]), a necessary condition to ensure the convergence of M -estimators such as (2.2) is that the local minima of D(θ) over Θ are well separated from the global minimum of D(θ) at θ = θ * Θ (satisfying D(θ * Θ ) = 0). The following assumption can be interpreted in this sense.
for a constant C(Θ, F) > 0 independent of J.
In the shifted curve model, Assumption 4.4 is verified if Assumption 4.1 and 4.2 hold (see Proposition 4.1).

Consistent estimation in the shifted curves model
In this section, we give conditions to ensure consistency of the estimators defined in Section 2 in the shifted curves model (3.3) with an equi-spaced design.

The random perturbations Z j
Following the assummtions of Theorem 3.3, Z will be supposed to be a stationary process Z with covariance function R : [0, 1] −→ R. The law of Z is thus invariant by the action of a shift. Conditionally to θ * j ∈ Θ, the covariance of the vector is a Toeplitz matrix equals to Let γ max (Σ n ) be the largest eigenvalue of the matrix Σ n . It follows from standard results on Toeplitz matrices (see e.g. [HJ90]) that γ max Σ n ≤ lim

Consistent estimation of the random shifts
Using low-pass filtering, and following the discussion in Section 4.1 on identifiability issues, the estimators of the random shifts θ * 1 , . . . , θ * J are given bŷ and Θ 0 is the constrained set defined in (4.2).
Thus, Theorem 5.1 is consistent with the conclusions of Theorem 3.3, that is, if n is fixed, then it is not possible to estimate θ * by letting only J grows to infinity. Hence, under the assumptions of Theorem 5.1, one can only prove the convergence in probability ofθ λ to the true shifts θ * by taking the double asymptotic n → +∞ and J → +∞, provided the smoothing parameter λ = λ n is well chosen.

Consistent estimation of the mean pattern
In the case of randomly shifted curves, the Fréchet mean estimator (2.1) of f isf λ (t) = 1 J J j=1f λ j (t+θ λ j ).
Theorem 5.2. Under the assumptions of Theorem 5.1, for any λ ≥ 1 and x > 0 where A 1 (x, J, n, λ, σ 2 , γ) and A 2 (x, J) are defined in Theorem 5.1, C 2 (Θ, F, f ) and C 3 (Θ, f ) are positive constants depending only on Θ, F, f , and f λ − f 2 Similar comments to those made on the consistency of the estimators of the shifts can be made. A double asymptotic in n and J is needed to show that the Fréchet meanf λ converges in probability to the true mean pattern f . Moreover, if λ n is too large (e.g. such that lim n→+∞ λn n = 0, which correspond to undersmoothing), then Theorem 5.2 cannot be used to prove thatf λ converges to f in probability. This illustrates the fact that, to achieve consistency, a sufficient amount of pre-smoothing is necessary before computing the Fréchet mean (2.1).

A lower bound for the Fréchet mean
From the results of Theorem 3.3, it is expected that the Fréchet meanf λ does not converge to f in the setting n fixed and J → +∞. To support this argument, consider the following ideal estimator This corresponds to the case of an ideal smoothing step from the data (3.3) that would yieldf j = f j for all j = 1, . . . , J. Obviously,f (t) is not an estimator since it depends on the unobserved quantities f and θ * j , but we can consider it as a benchmark to analyse the converge of the Fréchet meanf λ to f . Theorem 5.3. Suppose that the assumptions of Theorem 3.3 are satisfied with ρ < 3 4π . Then, for any where the constant C(f, ρ) > 0 depends on f and ρ.
Hence, in the setting n fixed and J → +∞, even the ideal estimatorf does not converge to f for the expected quadratic risk. This illustrates the central role played by the dimension n of the data to obtain consistent estimators.
6 Notations and main assumptions in the general case

Smoothness of the mean pattern and the deformation operators
In this part, the notation (L θ ) θ∈P is used to denote either (T θ ) θ∈P or their inverse (T θ ) θ∈P .
Assumption 6.1. For all θ ∈ P, L θ : L 2 (Ω) −→ L 2 (Ω) is a linear operator satisfying L θ f ∈ F for all f ∈ F. There exists a constant C(Θ) > 0 such that for any f ∈ L 2 (Ω) and θ ∈ Θ L θ f 2 L 2 ≤ C(Θ) f 2 L 2 , and a constant C(F, Θ) > 0 such that for any f ∈ F and θ 1 , θ 2 ∈ Θ, Assumption 6.1 can be interpreted as a Lipschitz condition on the mapping (f, θ) −→ L θ f . The first inequality, that is L θ f 2 L 2 ≤ C(Θ) f 2 L 2 , means that the action of the operator L θ does not change too much the norm of f when θ varies in Θ. Such an assumption on T θ and its inverseT θ forces the optimization problem (2.2) to have non trivial solutions by avoiding the functional M (θ) in (2.3) being arbitrarily small. It can be easily checked that Assumption 6.1 is satisfied in the case (1.2) of shifted curves with F = H s (A) and s ≥ 1 .

The preliminary smoothing step
For j = 1, . . . , J thef j 's are supposed to belong to the class of linear estimators in the sense of the following definition: Definition 6.1. Let Λ denote either N or R + (set of smoothing parameters). To every λ ∈ Λ is associated a non-random vector valued function S λ : Ω −→ R n such that for all j = 1, . . . , J and all t ∈ Ω where ·, · denotes the standard inner product in R n and Y j = Y ℓ j n ℓ=1 ∈ R n .
Assumption 6.2. For all λ ∈ Λ and all ℓ = 1, . . . , n, the function t −→ S ℓ λ (t) belong to L 2 (Ω), where S ℓ λ (t) denotes the ℓ-th component of the vector S λ (t). Moreover, for all λ ∈ Λ, f ∈ F and θ ∈ Θ, the function In the case (1.2) of randomly shifted curves with an equi-spaced design, then Assumption 6.2 holds . Let us now specify how the bias/variance behavior of the linear estimatorsf λ j depends on the smoothing parameter λ. For this, consider for some function f ∈ F the following regression model where the ε ℓ 's are i.i.d normal variables with zero mean and variance 1. The performances of a linear estimatorf where B λ and V λ denote the usual bias and variance off λ given by . Define also V (λ) = Ω V λ (t)dt, and let us make the following assumption on the asymptotic behavior of the bias/variance off λ : Assumption 6.3. There exist a constant κ(F) > 0 and a real-valued function λ −→ B(λ), such that for all f ∈ F, . Moreover there exists a sequence of smoothing parameters (λ n ) n∈N ∈ Λ N with lim n→+∞ λ n = +∞ such that lim n→+∞ B(λ n ) = 0 and lim n→+∞ V (λ n ) = 0.
Let us illustrate Assumption 6.3 in the case of shifted curves with an equi-spaced design, and a smoothing step obtained by low-pass Fourier filtering. As in Section 5, take F = H s (A) defined in (5.4). In this setting, V (λ) = 2λ+1 n . It can be also checked that B λ (f, ·) 2 L 2 ≤ C(A)B(λ) for some positive constant C(A) depending only on A, and B(λ) = 2λ+1 n + λ −2s . Thus, Assumption 6.3 holds with λ n = n 1 2s+1 .
6.3 Random perturbation of the mean pattern f by the Z j 's Assumption 6.4. For any n ≥ 1, there exists a real γ n (Θ) > 0 such that for any θ ∈ Θ where T θ Z = T θ Z(t ℓ ) n ℓ=1 ∈ R n , and γ max (A) denotes the largest eigenvalue of a symmetric matrix A. Moreover, lim n→∞ γ n (Θ) V (λ n ) = 0, (6.1) where V (λ n ) is the variance defined in Assumption 6.3.
Intuitively, the condition (6.1) means that the variance of the linear smoother S λ (·) has to be asymptotically smaller that the maximal correlations (measured by γ n (Θ)) between T θ Z(t ℓ ) and T θ Z(t ℓ ′ ) for ℓ, ℓ ′ = 1, . . . , n and all θ ∈ Θ. In the case of randomly shifted curves with an equi-spaced design, a simple condition for which Assumption 6.4 holds is the case where Z is stationary process (see the arguments in Section 5.1).
7 Consistency in the general case 7.1 Consistent estimation of the deformation parameters Consider for λ ∈ Λ the following estimator of the deformation parameterŝ and Θ is the constrained set introduced in Assumption 4.3. The estimatorθ λ thus depends on the choice of Θ, and it will be shown thatθ λ is a consistent estimator of the vector θ * Θ ∈ R pJ defined in Assumption 4.3. Note that depending on the problem at hand and the choice of the constrained set Θ, it can be shown that θ * Θ is close to the true deformation parameters θ * . For example, in the case of shifted curves, if Θ = Θ 0 defined in (4.2) and if the density g of the shifts has zero mean, then θ Θ 0 = (θ * 1 −θ * , . . . , θ * J −θ * ) withθ * = 1 J J j=1 θ * j can be shown to be close to θ * (see Lemma C.1 in the Appendix). This allows to show the consistency ofθ λ to θ * as formulated in Theorem 5.1. Therefore, the next result only bounds the distance betweenθ λ and θ * Θ . Theorem 7.1. Consider the model (1.5) and suppose that Assumptions 1.1, 4.3, 4.4 and 6.1 to 6.4 hold with n ≥ 1 and J ≥ 2. Then, for any λ ∈ Λ and x > 0 Using Assumptions 6.3 and 6.4, it follows that lim n→+∞ γ n (Θ) υ(x, J, λ n ) + υ(x, J, λ n ) = 0 for any x > 0 and J ≥ 2. If J remains fixed, Theorem 7.1 thus implies thatθ λ converges in probability to θ * Θ as n → +∞. To the contrary, let us fix n, and consider an asymptotic setting where only J → +∞. For any x > 0 and λ ∈ Λ, lim J→+∞ υ(x, J, λ) = V (λ). Therefore, Theorem 7.1 cannot be used to prove thatθ λ converges to θ * Θ as J → +∞. This confirms thatθ λ is not a consistent estimator of θ * Θ (and thus of θ * ) as n remains fixed and J tends to infinity. thenθ * ≈ 0 for J sufficiently large, and thus f * Θ (t) is close to f which allows to show the consistency of f λ to f as formulated in Theorem 5.2.
The consistency off λ to f * Θ is thus guaranteed when n goes to infinity provided the level of smoothing λ = λ n is chosen so that lim n→+∞ V (λ n ) = lim n→+∞ B(λ n ) = 0. Again, if n remains fixed and only J is let going to infinity then Theorem 7.2 cannot be used to prove the convergence off λ to f * Θ .
We use Fourier low pass filtering with spectral cut-off to λ = 7 which is reasonable value to reconstruct f representing a good tradeoff between bias and variance. We present some results of simulations under various assumptions of the process Z and the level σ of additive noise in the measurements.
Shape invariant model (SIM). The first numerical applications illustrate the role of n and J in the SIM model. Figure 2(b) gives a sample of the data used with σ = 2. The factors in the simulations are the number J of curves and the number of design points n. For each combination of these two factors, we simulate M = 20 repetitions of model (3.3). For each repetition we computed 1 J θ λ −θ * 2 and f λ −f 2 L 2 . Boxplot of these quantities are displayed in Figure 3(a) and 3(b) respectively, for J = 20, 40, . . . , 100 and n = 512 (in gray) and n = 1024 (in black). As the smoothing parameter is fixed to λ = 7, increasing n simply reduces the variance of the linear smoothersf λ j . Recall that the lower bound given in Theorem 3.3 shows that 1 J E[ θ * −θ λ 2 ] does not decrease as J increases but should be smaller when the number of point n increases. This is exactly what we observe in Figure 3. Similarly, the quantity f λ − f 2 L 2 is clearly smaller with n = 1024 than with n = 512. Complete model. We now add the terms Z j in (3.3) to model linear variations in amplitude of the curves around the template f . First, we generate a stationary periodic Gaussian process. To do this, the covariance matrix must be a particular Toeplitz matrix. As suggested in [Gre93] one possibility is to choose where φ is a strictly positive parameter (we took φ = 4) and ς a variance parameter. The level of additive noise is σ = 8, and we took ς = 4. As an illustration, in Figure 2 does not decrease as J increases (see Figure 4(a)) and f λ − f Θ 0 2 has a smaller mean and variance as n increases.
We finally run the same simulations with a non stationary noise Z j (t) = α j ψ(t) where ψ is a positive periodic smooth deterministic function such that ψ L 2 = 1 and α j ∼ N (0, ς 2 ) with ς = 4. Note that, in this case, the sequence γ n (Θ) is of order n and Assumption 6.4 is not verified. The levels of noise (σ and

Conclusion and perspectives
We have proposed to use a Fréchet mean of smoothed data to estimate a mean pattern of curves or images satisfying a non-parametric regression model including random deformations. Upper and lower bounds (in probability and expectation) for the estimation of the deformation parameters and the mean pattern have been derived. Our main result is that these bounds go to zero as the dimension n of the data (the number of sample points) goes to infinity, but that an asymptotic setting only in J (the number of observed curves or images) is not sufficient to obtain consistent estimators. An interesting topic for future investigation would be to study the rate of convergence of such estimators and to analyze their optimality (e.g. from a minimax point of view).
A.2 Proof of Theorem 3.2 As above, let Y ∈ R nJ is the column vector generated by model (1.5). Then, conditionally to θ * , Y is a Gaussian vectors and Assumption 3.2 ensures that its log-likelihood has the same expression as in equation (A.1) but with Λ = Λ(Θ) = (σ 2 Id n + E θ * T θ * j Z j (T θ * j Z j ) ′ ) −1 = (σ 2 Id n + Σ n (Θ)) −1 As the matrix Σ n (Θ) is positive semi definite with it smallest eigenvalue denoted by s 2 n (Θ) (see Assumption 3.2), the uniform bound (A.3) becomes for all p 1 = 1, . . . , p and j = 1, . . . , J. As above the last inequality is a consequence of Assumption 3.1 and the rest of the proof is identical to the proof of Theorem 3.1.

A.3 Proof of Theorem 3.3
For all θ ∈ R the operators T θ f (·) = f (· − θ) are isometric from L 2 ([0, 1]) to L 2 ([0, 1]) as a change of variable implies immediately that T θ f 2 Finally, as the error terms Z j 's are i.i.d stationary random process the covariance function is invariant by the action of the shifts and Assumption 3.2 is satisfied with Σ n (Θ) = Σ n defined in (5.1) (see Section 5.1 for further details). Then, the result of Theorem 3.3 follows as an application of Theorem 3.2.
D Proof of the results in Section 7 D.1 Proof of Theorem 7.1 and the others terms contain the Z j 's and ε j 's error terms. Let T θ * j Z j = T θ * j Z j (t ℓ ) n ℓ=1 and T θ * j f = T θ * j f (t ℓ ) Control of V. We give a control in probability of the stochastic quadratic term Q Z λ and Q ε λ . As previously, one can show that there is a constant C(Θ, F, f ) > 0 such that, where we have used the inequality 2ab ≤ a 2 + b 2 , valid for any a, b > 0 to control the term R Z,ε λ . The quadratic terms Q Z λ and Q ε λ are controlled by Corollaries E.1 and E.2 respectively. It yields immediately to P V ≥ C(Θ, F, f )(γ max (n) + σ 2 ) υ(x, J, λ) + υ(x, J, λ) ≤ 2e −

D.2 Proof of Theorem 7.2
In this part, we use the notations introduced in the proof of Theorem 7.1. We have,