Estimation of the Hurst parameter from continuous noisy data

This paper addresses the problem of estimating the Hurst exponent of the fractional Brownian motion from continuous time noisy sample. Consistent estimation in the setup under consideration is possible only if either the length of the observation interval increases to infinity or intensity of the noise decreases to zero. The main result is a proof of the Local Asymptotic Normality (LAN) of the model in these two regimes, which reveals the optimal minimax rates.


Introduction
Estimation of the Hurst parameter is an old problem in statistics of time series.A benchmark model is the fractional Brownian motion (fBm) B H " pB H t , t P R `q, i.e., the centered Gaussian process with covariance function EB H t B H s " 1 2 t 2H `1 2 s 2H ´1 2 |t ´s| 2H , where H P p0, 1q is the Hurst exponent.The fBm is a well studied stochastic process with a variety of interesting and useful properties, see, e.g., [24], [10].Its increments are stationary and, for H ą 1  2 , positively correlated with long-range dependence It is this feature which makes the fBm relevant to statistical modeling in many applications.
A basic problem is to estimate the Hurst parameter H P p0, 1q and the additional scaling parameter σ P R `given the data X T :" pσB H t , t P r0, T sq.Since both parameters can be recovered from X T exactly for any T ą 0, a meaningful statistical problem is to estimate them from the discretized data X T,∆ :" `σB H ∆ , ..., σB H n∆ ˘(1.1) where ∆ ą 0 is the discretization step and n " rT {∆s.The two relevant regimes, in which consistent estimation from (1.1) is feasible, are the large time asymptotics with a fixed ∆ ą 0 and T Ñ 8, and the high frequency asymptotics with ∆ Ñ 0 and a fixed T ą 0.
A statistically harder problem is obtained if an independent noise is added to the sample.One possibility, which was explored so far, is estimation from the discrete noisy data X T,∆ :" `σB H ∆ `ξ1,n , ..., σB H n∆ `ξn,n ˘, where ξ j,n are i.i.d.random variables independent of B H .While presence of noise does not alter the large time asymptotics in any drastic way, it does slow down the optimal minimax rates in the high frequency regime, see Section 4 for more details.
In this paper we consider an alternative scenario with continuous time noisy sample obtained by adding to the fBm an independent standard Brownian motion B: X T :" `σB H t `?εB t , t P r0, T s ˘, where ε ą 0 is the known noise intensity.From the statistical standpoint, a peculiar feature of this process, called the mixed fBm, is that consistent estimation in the high frequency regime is no longer possible.Indeed, it was shown in [3] that for H ą 3  4 the probability measures induced by the the process (1.3) and the Brownian motion ?εB are mutually absolutely continuous.This implies that the parameters in question cannot be recovered exactly from the sample X T with any finite T and, a fortiori, from its discretization.
Our contribution is a proof of the local asymptotic normality (LAN) property in the large time (T Ñ 8 and fixed ε ą 0) and small noise (T ă 8 fixed and ε Ñ 0) asymptotic regimes.In both cases the analysis reveals the optimal minimax rates and yields explicit expressions for the relevant Fisher information matrices.The rest of the paper is organized as follows.Section 2 outlines the essential background needed for formulation of the main results in Section 3. The results are discussed and compared to the related literature in Section 4. The proofs appear in Sections 5-7.

The LAN property
Let us briefly recall Le Cam's LAN property and its role in the asymptotic theory of estimation.A comprehensive account on the subject can be found in, e.g., [15].An abstract parametric statistical experiment consists of a measurable space pX, Aq, where A is a σ-algebra of subsets of X, a family of probability measures pP θ q θPΘ on A with the parameter space Θ Ď R k and the sample X " P θ 0 for a true value θ 0 P Θ of the parameter variable.Asymptotic theory is concerned with a family of statistical experiments pX h , A h , pP h θ q θPΘ q indexed by a real valued variable h ą 0.
Definition 2.1.A family of probability measures pP h θ q θPΘ is locally asymptotically normal (LAN) at a point θ 0 as h Ñ 0 if there exist nonsingular kˆk matrices φphq " φph, θ 0 q, such that for any u P R k , the Radon-Nikodym derivatives (likelihood ratios) satisfy the scaling property log dP h θ 0 `φphqu dP h θ 0 pX h q " u J Z h,θ 0 ´1 2 }u} 2 `rh pu, θ 0 q, (2.1) where the random vector Z h,θ 0 converges weakly under P h θ 0 to the standard normal law on R k and r h pu, θ 0 q vanishes in P h θ 0 -probability as h Ñ 0. Define a set W 2,k of loss functions ℓ : R k Þ Ñ R `, which are continuous and symmetric with ℓp0q " 0, have convex sub-level sets tu : ℓpuq ă cu for all c ą 0 and satisfy the growth condition lim }u}Ñ0 expp´a}u} 2 qℓpuq " 0, @a ą 0. The following theorem establishes asymptotic lower bound for the corresponding local minimax risks of estimators in LAN families.
Theorem 2.2 (Hájek).Let pP h θ q θPΘ satisfy the LAN property at θ 0 with matrices φph, θ 0 q Ñ 0 as h Ñ 0. Then for any family of estimators p θ h , a loss function ℓ P W 2,k and any δ ą 0, where γ k is the standard normal density on R k .
Estimators which achieve Hájek's lower bound are called asymptotically efficient in the local minimax sense.Often likelihood based estimators, such as the Maximum Likelihood or the Bayes estimators, are asymptotically efficient.However, they can be excessively complicated and it is then desirable to construct simpler estimators, which are at least rate optimal.In complex models this can be easier to carry out separately for each component of the parameter vector, following some ad-hoc heuristics.Proving rate optimality of so obtained estimators requires finding the best minimax rates for each parameter.
Let us explain how such entrywise rates can be derived using the bound from Theorem 2.2.Analysis of the likelihood ratio in (2.1) typically shows that in LAN families φph, θ 0 q must satisfy the condition φph, θ 0 q J M ph, θ 0 qIpθ 0 qM ph, θ 0 q J φph, θ 0 q Ý ÝÝ Ñ hÑ0 Id, (2.2) where the matrices M ph, θ 0 q and Ipθ 0 q are determined by the statistical model under consideration.The matrix Ipθ 0 q is positive definite and independent of h, and it can be often regarded as the analog of the usual Fisher information matrix.

Consider the Cholesky decomposition
Lph, θ 0 qLph, θ 0 q J " M ph, θ 0 qIpθ 0 qM ph, θ 0 q J , where Lph, θ 0 q is the unique lower triangular matrix with positive diagonal entries.Then (2.2) holds with φph, θ 0 q J :" Lph, θ 0 q ´1 and the last entry of the vector φph, θ 0 q ´1p p θ h ´θq is given by " φph, θ 0 q ´1p p θ h ´θq Let r ℓ P W 2,1 be a loss function of a scalar variable, r ℓ : R Þ Ñ R `, and define ℓpxq :" r ℓpx k q, x P R k .This loss function belongs to W 2,k and Hájek's bound implies This inequality identifies the last diagonal entry of Lph, θ 0 q as the best minimax rate in estimation of θ k .Similar bound for an arbitrary entry is obtained by permuting the components of θ so that it becomes the last.The most commonly encountered instance of (2.2) is when the matrix M ph, θ 0 q is diagonal.Then Lph, θ 0 q " M ph, θ 0 qSpθ 0 q where Spθ 0 q is the Cholseky factor of Ipθ 0 q.Since Ipθ 0 q is positive definite, all diagonal entries of Spθ 0 q are positive (and constant in h) and, in view of (2.3), the best minimax rate is determined only by M kk ph, θ 0 q.This is the case for our model in the large time asymptotic regime with h :" 1{T (see Theorem 3.1).In the small noise regime with h :" ε, the matrix M ph, θ 0 q is non-diagonal (see Theorem 3.2), which results in a logarithmic discrepancy between the best minimax rates in estimation of each parameter.

3.1.
Large time asymptotics.The covariance function of the fBm with parameter variable θ " pH, σ 2 q P p 3 4 , 1q ˆp0, 8q ": Θ can be written as with the constant a H :" Γp2H `1q sinpπHq.The function p K θ pλq does not decay sufficiently fast to be integrable on R and hence, strictly speaking, it is not a spectral density of a stochastic process in the usual sense.It can be thought of as the spectral density of the fractional noise, a formal derivative of the fBm.
Denote by P T θ the probability measure on the space of continuous functions Cpr0, T s, Rq, induced by the mixed fBm (1.3) with parameter θ and ε ą 0 being fixed.
In view of the discussion in the previous section, this result implies that the rate T ´1{2 is minimax optimal for both H and σ 2 .As explained in Section 4.3, this rate is achievable and, moreover, the lower bound can be approached arbitrarily close by estimators based on sufficiently dense grid of discretized observations.3.2.Small noise asymptotics.With a convenient abuse of notations, let P ε θ now denote the probability measure induced by the mixed fBm (1.3) with parameter θ and a fixed interval length T ą 0. Define the matrix Theorem 3.2.Assume that φpε, θ 0 q satisfies the scaling condition with Ipθ 0 , 1q defined in (3.3).Then the family pP ε θ q θPΘ is LAN at any θ 0 P Θ as ε Ñ 0. Condition (3.4) cannot be satisfied by any diagonal matrix φpε, θ 0 q, since in this case the limit, if exists and finite, must be a singular matrix.Otherwise the choice of φpε, θ 0 q is not unique.As explained in the previous section, the upper and lower triangular Cholesky factors of the matrix M pε, θ 0 qIpθ 0 , 1qM pε, θ 0 q J reveal the optimal minimax estimation rates for H and σ 2 .Corollary 3.3.¯.
If only one parameter is to be estimated, while the other one is known, the relevant LAN property corresponds to the respective one-dimensional family.The following theorem shows that the optimal minimax rates is these cases improve by a logarithmic factor.is LAN at any 2) For any fixed H 0 P p 3 4 , 1q, the family ´Pε pH 0 ,σ 2 q ¯σ2 Pp0,8q is LAN at any σ 2 0 P R `as ε Ñ 0 with φpε, σ 0 q :" ε 1{p4H 0 ´2q 1 a T I 22 pθ 0 , 1q .

A discussion
4.1.On the information matrix.The expression for the Fisher information matrix in (3.3) is known as Whittle's formula.It was discovered by P. Whittle [29,30] and was originally derived for discrete time stationary Gaussian processes with continuous spectral densities, see also [28].Its validity was extended in [6,7] to processes with long range dependence, for which the spectral density has an integrable singularity at the origin.Whittle's formula in continuous time is a more subtle matter due to complexity of the absolute continuity relation between Gaussian measures on function spaces.In fact, according to the survey [9], it was never rigorously verified beyond processes with rational spectra.One important class for which further generalization is plausible are processes observed with additive "white noise", that is, where B is a standard Brownian motion and Z is a centered Gaussian process with stationary increments.The mixed fBm is a special case from this class.
Results in [25] imply that the probability measure induced by X is equivalent to the Wiener measure if and only if for some kernel K θ P L 2 pr0, T sq.In this case, the Radon-Nikodym derivative has the same form as in (5.4).Using the theory of finite section approximation from [13] it is indeed possible to prove Theorem 3.1 for such processes under the additional, crucial to the approach of [13], assumption K P L 1 pRq.
This assumption is violated by the kernel (3.1), which makes the method of [13] inapplicable.This is not entirely surprising in view of the difficulties, needed to be overcome in [6] to extend Whittle's theory to discrete time processes with the long range dependence.The results in our paper are proved using a different approach, based on the ideas from [26] and the recent applications to processes with the fractional covariance structure [4].4.2.On the joint and separate estimation.Logarithmic discrepancy in the minimax rates between joint and separate estimation as in Corollary 3.3 and Theorem 3.4 is known to occur in the high-frequency regime in experiments with discrete data such as (1.1).The optimal rates for the separate estimation of H and σ 2 for ∆ " T {n are n ´1{2 1 log n and n ´1{2 respectively, see [19] and references therein.These rates are achievable, e.g., by estimators based on discrete power variations as in [17], [20], [5].
It was long noticed that analogous estimators achieve slower rates, degraded by logarithmic factor, n ´1{2 and n ´1{2 log n, ( when both parameters are unknown.These rates were recently proved minimax optimal in [1], where the LAN property was shown to hold with a non-diagonal matrix M ph, θ 0 q in (2.2).High frequency estimation from the noisy data (1.2) was considered in [12], where the optimal minimax rates for joint estimation of H ą 1 2 and σ 2 were found to be n ´1{p4H`2q and n ´1{p4H`2q log n, respectively.These rates are slower than those in (4.1), confirming the intuition that noise should make the estimation problem harder.Estimators for the mixed fBm with H ď 3  4 in the high frequency regime were constructed using the power variations technique in [8].4.3.Rate optimal estimators.It is plausible that the local minimax lower bounds guaranteed by the LAN property of Theorems 3.1 -3.4 are attained by the Maximum Likelihood or the Bayes estimators with positive prior densities.Proving such asymptotic efficiency would require analysis which goes beyond the scope of this paper.However these estimators are of little practical interest, since they require solving numerically the integral equation ( 5.3) and approximating the stochastic integrals in (5.4).Simpler rate optimal estimators can be constructed otherwise as explained below.

Large time.
A simpler alternative is to base estimation on the discrete data X k∆ , k " 1, ..., rT {∆s with a small discretization step ∆ ą 0. In particular, one can use the discrete likelihood based estimators or the simpler Whittle's spectral estimator.The theory from [6] implies that such estimators achieve the optimal T ´1{2 rate and, moreover, the limit risks can be made arbitrarily close to the bound provided by Theorem 3.1 by choosing ∆ small enough.4.3.2.Small noise.In the small noise regime the optimal rates of Corollary 3.3 and Theorem 3.4 can be achieved by a modification of the estimator suggested in [12].Let us briefly sketch the idea.Take any mother wavelet function ψ with compact support and two vanishing moments.Define its translates and dilations ψ j,k ptq " 2 j{2 ψp2 j t ´kq, j P N, k P Z.

Consider the wavelet coefficients of σB
and define the energy of the j-th resolution level Standard calculations show that these random variables satisfy where Natural estimators for the wavelet coefficients are obtained by replacing the fBm in (4.2) with its noisy observation and, accordingly, In view of (4.4), the method of moments suggests the estimators The bias of these estimators decreases with j whereas their variance increases.In view of the residual in (4.4) and the optimal rate, known from Corollary 3.3, it is reasonable to suggest that the optimal choice of j should be such that This choice is only an "oracle" since it requires H to be known.To mimic this choice of j, asymptotics (4.3) can be used again.To this end (4.5)can be rewritten as 2 jp2´2Hq " 2 j ε, which, in view of (4.3), suggests the selector where J ε " r2 log 2 ε ´1s and J is an arbitrary nonessential constant.It can be shown that with high probability J ε will be close to 1 2H´1 log 2 ε ´1 ă J ε and the ultimate estimator is set to be p Hpεq :" p Proposition 4.1.The estimation error ε ´1{p4H´2q p p Hpεq ´Hq is bounded in P θ -probability, uniformly over compacts in Θ, as ε Ñ 0.
Remark 4.2.In the context of Theorem 3.2, this result implies rate optimality for a particular class of loss functions of the form ℓ M puq " p|u| ´M q `^1 ď 1 t|u|ěM u , since the above estimator satisfies for any compact K Ă Θ and all M large enough.
Estimation of σ 2 can be based on (4.3) as well.The method of [12] implies that the estimator where p Hpεq is defined in (4.6), is rate optimal.
Similarly, the estimators rate optimal for the corresponding parameter, when the other parameter is known.

The proofs preview
Let B " pB t , t P R `q and B H " pB H t , t P R `q be independent standard and fractional Brownian motions on a probability space pΩ, F, Pq.The mixed fBm (1.3) with θ " pH, σ 2 q P p 3 4 , 1q ˆR`s atisfies the canonical innovation representation [14] X t " where B is a Brownian motion with respect to F X t " σtX s , s ď tu, ρ t pX, θq " and the function gpt, s; θq solves the integral equation εgpt, s; θq with the kernel K θ p¨q defined in (3.1).This equation has the unique solution in L 2 pr0, tsq since its kernel is Hilbert-Schmidt for H ą 3 4 .The stochastic integral in (5.2) can therefore be defined in the usual way (see [23]).
Let P T and P T θ be the probability measures on Cpr0, T s, Rq induced by the Brownian motion ?ε B and the mixed fBm with parameter θ, respectively.By the Girsanov theorem, applied to the innovation representation (5.1), P T " P T θ with the Radon-Nikodym derivative Thus asymptotics of the likelihood ratios in (2.1) is determined by the limiting behavior of the solution to (5.3).
In the large time asymptotic regime with a fixed ε ą 0 and T Ñ 8, we show in Lemma 6.6 that the Laplace transform p g t pzq " ż R gpt, s; θqe ´zs ds, z P C, where gpt, s; θq is extended by zero outside p0, tq, satisfies the decomposition Here X c pzq is defined by the Cauchy type integral (6.20) of a certain explicit function.It can be shown that the inverse Fourier transform of p gpiλq :" 1 ´1{Xp´iλq solves the limit equation obtained from (5.3) by taking t Ñ 8.This is the Wiener-Hopf equation of the second kind.Its solvability is not entirely obvious at the outset since the classical theory requires that K θ P L 1 pRq (see, e.g., [21]).Having an essentially explicit solution is instrumental for deriving the formula for the Fisher information (3.3).The residual p R t pzq quantifies the proximity between solutions to equations on the finite and infinite intervals.The most challenging part of the proof is to estimate its growth as a function of t.This is what ultimately determines the correlation properties of the process ρ t pX, θq and its derivatives, see Lemmas 6.1-6.2.To obtain suitable estimates we construct a representation of the solution to (5.3) in terms of certain auxiliary integral equations (6.27), which turn out to be more tractable for asymptotic analysis.
In the small noise asymptotic regime with T being fixed and ε Ñ 0, equation (5.3) degenerates in the limit to an integral equation of the first kind, which does not have a classic solution.Nevertheless, the structure of kernel (3.1), corresponding to self-similarity of the fBm, implies certain scaling properties of the solution to (5.3) (see Lemma 7.2), which can be used to derive the small noise asymptotics from the large time limit.

Proof of Theorem 3.1
In view of the formula (5.4) the likelihood ratio in Definition 2.1 takes the form where X is the mixed fBm with parameter θ 0 and φpT q is defined in Theorem 3.1.The matrix Ipθ 0 , εq is invertible, and establishing the LAN property in Theorem 3.1 amounts to proving that for any u P R 2 since by the CLT for stochastic integrals [22,Theorem 1.19], this convergence also implies the convergence in distribution with Z " N `0, Ipθ 0 , εq ˘.
6.2.2.An equivalent representation.Next we will derive an alternative expression for the Laplace transform (6.8).The key observation to this end is that p g t pzq is an entire function and hence discontinuity in the denominator in the right hand side of (6.9) must be removable: lim Φ 0 pzq `e´tz Φ 1 p´zq Λpzq , @τ P R.
The functions Φ 0 pzq and Φ 1 pzq are sectionally holomorphic on CzR `, satisfy the boundary conditions (6.19) and the growth estimates (6.11).Using the usual technique for solving the Hilbert boundary value problems, such functions can be expressed in terms of solutions to certain auxiliary integral equations.
The first step towards the construction of this representation consists of finding a function Xpzq, sectionally holomorphic on CzR `and satisfying the homogeneous boundary condition, cf.(6.19), X `pτ q ´Λ`p τ q Λ ´pτ q X ´pτ q " 0, @τ P R `.
This is a standard instance of the Hilbert boundary value problem [11].
Since the function log Λ `pτ q Λ ´pτ q " 2iαpτ q satisfies the Hölder condition on R `Y t8u, all solutions to this problem, which do not vanish on Czt0u, have the form Xpzq " z k X c pzq for some integer k P Z, where the canonical part is found by the Sokhotski-Plemelj formula (6.20) The following lemma summarizes some of its useful properties.
Lemma 6.5.The function defined in (6.20) satisfies the asymptotics and is related to Λpzq, defined in (6.10), through the identity Proof.The claimed asymptotics readily follows from (6.18).To prove (6.22), we can write log X c pzqX c p´zq " By changing the integration variable and using the symmetry (6.16), the second integral can be written as Since log `Λ˘p τ q{ε ˘" Opτ 1´2H q, this implies log X c pzqX c p´zq " 1 2πi The function Λpzq is non-vanishing and holomorphic on the lower and upper half-planes, hence each of the integrals can be computed by the standard contour integration.When Impzq ą 0 the first integral gives logpΛpzq{εq and the second vanishes, which proves validity of (6.22) in the upper half-plane.The same argument applies to the lower half-plane.
This function turns out to be real valued, where the dashed integral is the Cauchy principle value.In view of estimates (6.12) and (6.21), the functions Sp´τ q and Dp´τ q have at most square integrable singularities at the origin if we choose k ď 0. From here on we will fix k " 0 so that Xpzq " X c pzq.This choice is not the only possible, but it makes further calculations simpler.Thus the expressions in the right hand side of (6.24) satisfy the Hölder condition on R `Y t8u and therefore, by the Sokhotski-Plemelj theorem, the functions (6.23) satisfy In the next subsection we will argue that, for all sufficiently large t, they have unique solutions such that q t p¨q `1 2 and p t p¨q `1 2 belong to L 2 pR `q.Setting z :" ´τ for τ P R `in (6.26) shows that Sp´τ q and Dp´τ q solve (6.27) multiplied by ε.Since by construction Sp´τ q and Dp´τ q are square integrable near the origin, due to uniqueness of the solutions to (6.27), they must coincide with εp t pτ q and εq t pτ q and, consequently, Spzq " εp t p´zq and Dpzq " εq t p´zq, where q t pzq and p t pzq are the unique sectionally holomorphic extensions to CzR ´.Plugging these expressions along with (6.22) and (6.23) into (6.9)we obtain the following result.

Spzq
Lemma The following lemma asserts that it is a contraction on L 2 pR `q for all sufficiently large t.
The equations in (6.27) can be written as f `1 2 " ˘At pf `1 2 q ¯1 2 pA t 1q.(6.31) A direct calculation shows that A t 1 P L 2 pR `q.Hence these equations have unique solutions in L 2 pR `q given, e.g., by the Neumann series.The estimates for these solutions, derived in the next lemma, play the key role in the asymptotic analysis.
Lemma 6.8.For any closed ball B Ă Θ, there exist constants r max ą 0, T min ą 0 and C ą 0 such that for any r P r0, r max s and all t ě T min where m t pzq is any of the functions In view of (6.17) and, consequently, due to (6.25), Calculations as in (6.34) then show that }ψ} ď Ct r´1{2 and the claimed bound for the next two functions in (6.32) are proved as above.The last two bounds for the second order derivatives are verified along the same lines.
6.3.Proof of Lemma 6.1.In this subsection we will omit θ 0 from the notations for brevity.Covariance function of the gradient process satisfies The first bound in (6.6) is derived in the following lemma.
Proof.Let us estimate the growth of the integrand in (6.39), Λpiλq, at the origin and at infinity.In view (6.36) and (6.20), Combining this estimate with (6.21) gives Consequently, in view of formula (6.10), Similarly we can estimate the derivative f 1 pλq with respect to λ, Standard bounds for the Fourier integral of such functions [16] imply for some constant c ą 0. The claimed estimate follows by combining the two bounds.
The next lemma proves the second bound in (6.6).
Lemma 6.10.There exist constants b P p0, 1 2 q, C ą 0 and T min ą 0 such that for all s, t ě T min , ˇˇR where we used the conjugacy Xpiλq " Xp´iλq.Hence the expression for R p1q ij ps, tq in (6.40) satisfies where we used (6.46) and defined Due to the estimate (6.43), Thus f 1 , f 2 P L 2 pRq.By estimate (6.32) with r " 0, The same estimate is valid for the rest of the integrals in (6.47) and the first bound in (6.45) follows.The second bound is proved similarly.ˇˇB i q t piλq ˇˇ2dλ. (6.49) Due to (6.43), ˇˇB i log Xpiλ, ηq ˇˇ2 ď C|λ| ´r for any r P p0, 1q.Hence, with r ą 0 small enough, Lemma 6.8 guarantees that all the integrals in (6.49) are bounded by Ct r´1 .Applying the same argument to the second term in (6.48) we conclude that ˇˇR p3q ij ps, tq ˇˇď Cs r{2´1{2 t r{2´1{2 .This verifies the last bound in (6.45) with b :" 1{2 ´r{2 P p0, 1{2q.
In view of (6.22) the expression in (6.39) can be written as The last equality holds since for any r, τ P R `the integral in the brackets vanishes, as can be readily checked by the standard contour integration.
6.4.Proof of Lemma 6.2.This lemma involves only one dimensional distributions of the process ρ t pX, θq and its partial derivatives.On the other hand, unlike in Lemma 6.1, θ may be distinct from θ 0 , the true value of the parameter, which determines the distribution of the sample X T .In this subsection, we will stress this distinction by adding the relevant parameter value to the notations.We have to show that for all sufficiently small δ ą 0 there exist constants C ą 0 and T min ą 0 such that sup }θ´θ 0 }ďδ E `Bi B j ρ t pX, θq ˘2 ď C, @t ě T min .
Similarly to (6.37)-( 6 where the bound holds due to decomposition (6.28).It remains to prove that both terms in the right hand side are bounded functions of t P rT min , 8q for some T min ą 0, uniformly over θ in a δ-vicinity of θ 0 .This is done in the following two lemmas.Lemma 6.12.For all sufficiently small δ ą 0, there exists a constant C ą 0 such that Proof.The second order derivatives of αpτ, θq defined in (6.17) are continuous in τ and satisfy, cf.(6.36) ˇˇB i B j p R t piλ; θq ˇˇ2Λpiλ; θ 0 qdλ ď Ct ´c, @t ě T min .

Thus
where the last bound is true due to Lemma 6.8.
7. Proofs of Theorems 3.2 and 3.4 The LAN property in the small noise setting is derived from the large time asymptotics.It will be convenient to change some notations in order to to emphasize the more relevant variables.In particular, we will indicate the dependence of solution to (5.3) on ε by the subscript and keep in mind its dependence on θ, omitting it from the notations.Thus the equation (5.3) reads where we defined c H " Hp2H ´1q.

7.1.
The key lemmas.The following lemma reveals a useful relation between derivatives of g ε pt, sq with respect to the parameter and time variables.
where T is fixed and dependence on ε is emphasized by superscripts.Here, cf.(6.3), We will argue that for an appropriate choice of φpεq " φpε, θ 0 q (7.12) Then the second term in (7.10) converges to ´1 2 }u} 2 in probability and the stochastic integral converges in distribution to u J Z with Z " N p0, Idq, see [18,Ch. IX.5].

7. 3 . 22 ‚` 1 `
Proof of Corollary 3.3.For brevity denote the matrices in (3.4) by M :" M pε, θ 0 q and I :" Ipθ 0 ; 1q.Consider the Cholesky decompositionM IM J " LL J ,where L is the unique lower triangular matrix with positive diagonal entries.A simple calculation shows that L " op1q ˘, ε Ñ 0 6.2.Equation (5.3).The probabilistic structure of the process ρ t pX, θq in (5.2) is determined by the solution to equation(5.3).Thus the first step towards the proof of Lemmas 6.1 and 6.2 is to derive a useful representation for it.In this subsection we show that it can be decomposed into the sum of the principle term independent of t, and the residual term which vanishes with t Ñ 8.The analysis is based on the Laplace transform Since integration in (6.8) is carried out over a bounded interval, p g t pzq is an entire function and hence gpt, ¨q can be recovered by the inverse Fourier transform ´8 p g t piλqe isλ dλ.6.2.1.The Laplace transform.The decomposition mentioned above is based on the following representation formula for the Laplace transform.
where the domain of gpt, ¨q is extended to R by zero outside the interval p0, tq.
Ct r´1 .The other two bounds are verified similarly.Note that φpsq :" B j p t psqs .32)Proof.Let us start with proving the bound for the first two functions in (6.32).Calculations are similar for both equations in (6.27) and we will consider the first one for definiteness.Rearranging it as in (6.31) and multiplying by s ´r shows that the function φpsq :" `pt psq `1 2 ˘s´r solves the equationφ " B t φ `ψ,(6.33)whereψpsq :" ´1 2 pA t 1qpsqs ´r with A t as in (6.30) and where r ă 1{2 is assumed.Calculations as in the proof of Lemma 6.7 show that B t is a contraction on L 2 pR `q for all t ě T min .Indeed, for any f, g P L 2 pR `q and r ă 1{4, 0 e ´tτ τ ´r{2´1{2 dτ ˙2 ď r´1 dτ ď Ct r´1 , where we used (6.35) and applied the Minkowski and Cauchy-Schwarz inequalities.This completes the proof for the first two functions in (6.32).
s ´xqB j gpt, t ´yqK θ 0 px ´yqdxdyὲ ż s Restriction of Λpzq defined in (6.10) to the imaginary axis is Λpiλq " ε `p K θ 0 pλq, λ P Rzt0u, where p K θ 0 is the Fourier transform (3.2).Hence by extending the domain of gpt, ¨q to R by zero and applying Plancherel's theorem we get E B i ρ s pXqB j ρ t pXq " piλq ´qt piλq ˘Bi log Xpiλq ˇˇ`2 ˇˇB i p t piλq ˇˇ`2 ˇˇB i q t piλq ˇˇ, t XpiλqB j log Xp´iλqdλ.
i log XpiλqB j log Xpiλqdλ.