Joint estimation for SDE driven by locally stable Lévy processes

Considering a class of stochastic differential equations driven by a locally stable process, we address the joint parametric estimation, based on high frequency observations of the process on a fixed time interval , of the drift coefficient, the scale coefficient and the jump activity of the process. This work extends [4] where the jump activity was assumed to be known and also [3] where the LAN property and the estimation of the three parameters are performed for a translated stable process. We propose an estimation method and show that the asymptotic properties of the estimators depend crucially on the form of the scale coefficient. If the scale coefficient is multiplicative: a(x, σ) = σa(x), the rate of convergence of our estimators is non diagonal and the asymptotic variance in the joint estimation of the scale coefficient and the jump activity is the inverse of the information matrix obtained in [3]. In the non multiplicative case, the results are better and we obtain a faster diagonal rate of convergence with a different asymptotic variance. In both cases, the estimation method is illustrated by numerical simulations showing that our estimators are rather easy to implement. MSC 2010 subject classifications: Primary 60G51, 60G52, 60J75, 62F12; secondary 60H07, 60F05 .


Introduction
In this paper, we consider a class of stochastic differential equations driven by a symmetric locally α-stable process and we study the joint estimation of (θ, σ, α) based on high-frequency observations of the process on the time interval [0, T ] with T fixed (without restriction we will next assume that T = 1). In recent years, there has been growing interest in modeling with pure-jump Lévy processes (see for example Jing et al. [13] and [17]) and estimation of such processes is of particular interest. A large literature is devoted to parametric estimation of jump-diffusions from high-frequency observations and we know that, due to the Brownian component, the estimation of the drift coefficient is not possible without assuming that T goes to infinity. For pure-jump processes, assuming that the jump activity α ∈ (0, 2), the situation is completely different and we can estimate all the parameters on a fixed time interval. When X is a Lévy process, the first results in that direction have been established among others by Aït-Sahalia and Jacod [1] [2], Kawai and Masuda [14] [16], Masuda [18], Ivanenko, Kulik and Masuda [10] and more recently by Brouste and Masuda [5]. Concerning the parametric estimation of pure-jump driven stochastic equations the literature is less abundant and only partial results are available. The estimation of (θ, σ) is performed by Masuda in [19], assuming that α is known and with the restriction α ∈ [1,2). The estimation method proposed in [19] is based on an approximation (for small h) of the distribution of the normalized increment h −1/α (X t+h − X t − hb(X t , θ))/a(X t , σ) by the α-stable distribution. However this approximation is not relevant if α < 1. To solve this problem, Clément and Gloter [6] consider the following modified increment h −1/α (X t+h − ξ Xt h (θ))/a(X t , σ), where (ξ x t (θ)) t≥0 solves the ordinary equation This permits to estimate (θ, σ), for α ∈ (0, 2) known. Turning to the efficiency of these estimation methods, the LAMN property is established in Clément and al. [7] for the estimation of (θ, σ) assuming that the scale coefficient a is constant and that (L α t ) t is a truncated stable process. In this paper, we perform the joint estimation of the three parameters (θ, σ, α) assuming that α ∈ (0, 2). Our methodology follows the ideas of [6] and is based on estimating functions (we refer to Sørensen [22] and to the recent survey by Jacod and Sørensen [12] for asymptotics in estimating function methods). Let us recall brieflty the methodology developed in [6]. Observing that the conditional distribution of h −1/α (X t+h − ξ Xt h (θ))/a(X t , σ) is close to the α-stable distribution (this is estimated in total variation distance in [6]) the idea is to approximate the transition density p h (x, y) of the process (X t ) t by where ϕ α is the density of a symmetric α-stable variable S α 1 . This approximation permits to construct a quasi-likelihood function and then a natural choice of estimating function is to consider the associated score function. In the present paper, the additional estimation of the jump activity α requires extensions to non bounded functions of total variation distance estimates and limit theorems established in [6], to prove the asymptotic properties of our estimators. We stress on the fact that these asymptotic properties are established without restriction on the jump activity α.
The estimation of θ achieves the optimal rate and the information established in [7] for a simplified stochastic equation but the rate of convergence and the asymptotic variance-covariance matrix in estimating (σ, α) depend on the function a. To take into account this new phenomenon, we distinguish between two cases.
If the function a is multiplicative (multiplicative case), a(x, σ) = σa(x), then we show that the rate of convergence is non diagonal and we compute the asymptotic variance of the estimator. This case extends the previous results established respectively in [18] and [5] for a translated α-stable process, where it is shown that the Fisher information matrix is singular in estimating (σ, α) with a diagonal norming rate, but that the LAN property holds with a non singular information matrix using a non diagonal norming rate. Furthermore, we can conjecture that in the multiplicative case our estimator is efficient since the asymptotic variance in estimating (σ, α) is the inverse of the information matrix appearing in the LAN property established in [5] for the translated αstable process. A consequence of the non diagonal rate is that the asymptotic errors in estimating σ and α jointly are proportional, which is supported also by our numerical simulations.
On the other hand, if the scale coefficient a does not separate σ and x (non multiplicative case), s → ∂σa a (X s , σ 0 ) is almost surely non constant, the result is new and surprising. Indeed our estimator is asymptotically mixed normal with a diagonal norming rate, faster than in the multiplicative case. Moreover, this rate achieves the optimal rate of convergence in estimating marginally σ and α. Especially this shows that, contrarily to the multiplicative case, the rate in estimating jointly (θ, σ) and α coincides with the one obtained assuming that α is known. Remark that the efficiency in the non multiplicative case is still an open problem since the LAMN property is not yet established for a non constant scale coefficient a.
The paper is organized as follows. Section 2 introduces the notation and assumptions. In Section 3 we state our main results: estimation method and asymptotic properties of the estimators. The main limit theorems to prove consistency and asymptotic mixed normality of our estimators are established in Section 4. Section 5 contains some simulation results that illustrate the asymptotic properties of the estimators.

Notation and assumptions
We consider the class of stochastic one-dimensional equations: where (L α t ) is a pure-jump locally α-stable process defined on a filtered space (Ω, F, (F t ) t∈[0,1] , P). To simplify the notation we assume that θ, σ are real parameters. We observe the discrete time process (X ti ) 0≤i≤n with t i = i/n, for i = 0, . . . , n that solves (2.1) for the parameter value β 0 = (θ 0 , σ 0 , α 0 ) and our aim is to estimate the parameter β 0 .
We make some regularity assumptions on the coefficients a and b that ensure in particular that (2.1) admits a unique strong solution. We also specify the behavior of the Lévy measure near zero of the process (L α t ) t∈ [0,1] .
< ∞. This assumption is satisfied by a large class of processes: α-stable process (g = 1), truncated α-stable process (g = τ a truncation function), tempered stable process (g(z) = e −λ|z| , λ > 0). [6], obtained under H2, that give a rate of convergence in total variation distance between respectively the rescaled distributions of X 1/n and L α 1/n , and the locally α-stable distribution and the stable distribution. The key point is that the rate of convergence ε n satisfies √ nε n → 0. However, as in [3], [10] and [24], we could consider, with some proof modifications (in this paper and in [6]), a more general class of locally stable processes and weaken H2. In particular, our methodology permits to consider ν symmetric admitting the decomposition

Remark 2.1. Our results rely on Theorem 4.1 and Theorem 4.2 in
If ν 1 is supported on R \ 0, we assume additionally that ν 1 is absolutely continuous for |z| ≤ η with where g 0 and g 1 are continuously differentiable on {|z| ≤ η} and g 0 (0) = 1. Then setting g(z) = g 0 (z) + g 1 (z)|z| α−β , we have One can check that H2(b) is not satisfied for this function g since ∂ z g is not bounded on {|z| ≤ η}. But it can be proven that the result of Theorem 4.1 in [6] remains true under the weaker assumption z → z∂ z g(z) bounded, which is satisfied by g defined above. Turning to the result of Theorem 4.2 in [6] (established under the condition g(z) = 1 + O(|z|)), we can obtain (with a different proof ) the slower rate of convergence ε n = min( The rate of convergence and the information in the joint estimation of (θ 0 , σ 0 , α 0 ) depend crucially on the function a and we will prove that if a separates the parameter σ (multiplicative case), the rate of convergence is not diagonal.
We observe that in the multiplicative case the assumptions H1 can be written simply in terms of the function a as soon as σ 0 > 0.
To estimate the parameter β 0 = (θ 0 , σ 0 , α 0 ), we extend the methodology proposed in [6] based on estimating equations (see also [22]). Considering X 1/n solution of (2.1) (with β = (θ, σ, α)) and introducing the ordinary differential equation it is proved in [6] (combining Theorem 4.1 and Theorem 4.2) that n 1/α (X 1/n − ξ x0 1/n (θ))/a(x 0 , σ) converges in total variation distance to S α 1 , a stable random variable with characteristic function e −C(α)|u| α . Thus if X 1/n admits a density, denoted by p 1/n (x 0 , y, β), then p 1/n converges in L 1 -norm to where ϕ α is the density of S α 1 . We mention that the existence of the density p 1/n is established under stronger assumptions on the Lévy measure (essentially integrability conditions for the large jumps part), see for example [4] or [9], but is not required in our method. So to estimate β, the previous convergence suggests to consider the following approximation of the likelihood function .
Note that ϕ α can be computed numerically (see for example [21]). A natural choice of estimating functions is therefore the score function. This leads to the following functions with for k = 1, 2, 3 Note that to compute the above functions, we used To simplify the notation, we introduce the functions From Dumouchel [8], we know that as |z| goes to infinity. This permits to deduce that h α , ∂ z h α , k α , ∂ z k α are bounded on R × (0, 2) and that for |z| large enough We also observe that ∂ z f α and z → z∂ z k α (z) are bounded and that z → z∂ α h α (z) is bounded, for |z| large, by C log |z|.
Throughout the paper, we denote by C a generic constant whose value may change from line to line.

Main results
We estimate β by solving the equation G n (β) = 0, where G n is defined by (2.5) with g 1 , g 2 and g 3 given by (2.6), (2.7), (2.8). We prove that the resulting estimator is consistent and asymptotically mixed normal. However the rate of convergence and the asymptotic information matrix depend on the function a. Let us define the matrix rate u n by where v n is specified below, depending on the coefficient a.
Under the assumption NDNM, we obtain a diagonal rate of convergence as stated in the following theorem. Then there exists an estimator (θ n ,σ n ,α n ) solving the equation G n (β) = 0 with probability tending to 1, that converges in probability to (θ 0 , σ 0 , α 0 ). Moreover we have the stable convergence in law with respect to σ(L α0 s , s ≤ 1) where N is a standard Gaussian variable independent of I(β 0 ) and .
Turning to the multiplicative case (assumption NDM), we have the following result.

Theorem 3.2.
We assume that H1, H2 and NDM hold. We assume moreover that v 1,1 Then there exists an estimator (θ n ,σ n ,α n ) solving the equation G n (β) = 0 with probability tending to 1, that converges in probability to (θ 0 , σ 0 , α 0 ). Moreover we have the stable convergence in law with respect to σ(L α0 s , s ≤ 1) where N is a standard Gaussian variable independent of I(β 0 ) and .

Remark 3.2.
If we have some additional information on the parameter α 0 , we can replace the solution to the ordinary equation (2.2) by an approximation (see also Proposition 3.1 in [6]). In particular, if where n 1/2 ε n goes to zero. This control is sufficient to show that the results of Theorem 3.1 and Theorem 3.2 hold with the estimating functions G n (β) = −∇ β log L n (β) where L n is the quasi-likelihood function obtained by replacing z n by z n in the expression (2.3).

Remark 3.3.
Since I(β 0 ) and I(β 0 ) are positive definite a.s., we can check that the estimator (θ n ,σ n ,α n ) proposed in Theorem 3.1 and Theorem 3.2 is also a local maximum of the quasi-likelihood function L n defined by (2.3), on a set with probability tending to one (see Sweeting [23]).
For the reader convenience we recall the sufficient conditions established in Sørensen [22] to prove the existence, consistency and asymptotic normality of estimating functions based estimators. To this end, we define the matrix J n (β 1 , β 2 , β 3 ) by For η > 0, we also define where ||.|| is a vector or a matrix norm and A T is the transpose of the matrix A.
With these notations, Theorem 3.1 and Theorem 3.2 are consequence of the two following conditions: C2: (u T n G n (β 0 )) n stably converges in law to W (β 0 ) 1/2 N where N is a standard Gaussian variable independent of W (β 0 ) and the convergence is stable with respect to the σ-field σ(L α0 s , s ≤ 1). Before starting the proof, we compute explicitly u T n G n (β 0 ) and J n . This permits to understand how appear the conditions on the matrix v n depending on the assumptions on a. We have where we have used the short notation with z n defined by (2.4) and with ξ solving (2.2). Using the relation ∂ α h α = ∂ z f α , we now express each term of the matrix J n . We have From these computations and using the limit theorems established in Section 4, we can check conditions C1 and C2 and proceed to the proof of Theorem 3.1 and Theorem 3.2. We first remark that in the above expressions we can replace Furthermore, by a standard localization procedure we can assume that a is bounded. Indeed setting a K (x, σ) = a(x, σ)I K (a(x, σ)) where I K is a smooth real function, equal to 1 on [−K, K] and vanishing outside [−2K, 2K], and considering the process X K solution of (2.1) with coefficients b and a K , then X = X K on Ω K = {ω ∈ Ω; sup 0≤t≤1 |a(X t − (ω), σ 0 )| ≤ K} and P(Ω K ) → 1 as K goes to infinity. Consequently, in the next proof sections, we assume that a is bounded.

Condition C2
We recall that h α0 , k α0 are bounded and that f α0 is asymptotically equivalent to the logarithm. Moreover some straightforward computations permit to show that Eh α0 (S α0 1 ) = Ek α0 (S α0 1 ) = Ef α0 (S α0 1 ) = 0 and E(h α0 k α0 )(S α0 1 ) = 0. Therefore from Corollary 4.1, we deduce the convergence in probability and from Theorem 4.1 we obtain the stable convergence in law ⎛ where I(β 0 ) is given by (3.1) and N is a standard Gaussian variable independent of I(β 0 ). Now with u n given by and using the approximation (3.10) it yields and the stable convergence in law of u T n G n (β 0 ) is proved.

Condition C1
We have to check the uniform convergence in probability sup β1,β2,β3∈V where the coefficients of the matrix J n are given by (3.6)-(3.9).
After a meticulous study of each term appearing in the matrix u T n J n (β 1 , β 2 , β 3 )u n and using the approximations (3.10) and (3.11), condition C 1 reduces to prove the following uniform convergence in probability for functions f depending on a, b and their partial derivatives with respect to the parameters θ, σ and g α belonging to the set of functions h α , k α , ∂ z k α , These functions satisfy the assumptions of Theorem 4.2. Moreover, using the symmetry of ϕ α (ϕ α and f α are even) and the integration by part formula, we can prove The result follows then from Theorem 4.2 (convergence (4.3) and (4.4)).

Proof of Theorem 3.2
We first observe that from NDM ∂ σ a/a = 1/σ.

Condition C1
We will prove sup β1,β2,β3∈V We have: and using the symmetry of J n , the proof reduces to the following convergence in probability From the expression of J n given in (3.6)-(3.9) and using the approximations and consequently we just have to prove sup β2,β3∈V log n α 2 0 0 1 .
To simplify the notation we introduce the following normalized sums:

E. Clément and A. Gloter
and from (3.7) (3.8), (3.9) we obtain A simple computation gives moreover . Then using once again that v n is bounded by log n, (3.16) is proved as soon as we have the following convergence in probability (with q ≤ 4) Recalling the equalities (3.12), the above convergence results from (4.5) in Theorem 4.2.

Limit theorems
We state in this section some limit theorems (Central Limit Theorem and uniform Law of Large Numbers) that are crucial to obtain the asymptotic properties of our estimators. We follow the approach proposed in [6], extending the results to non bounded functions, with some uniformity with respect to the parameter α. The next key proposition extends to non bounded functions the control in total variation distance established in [6] (Theorem 4.1 and Theorem 4.2).

Proposition 4.1. Let f be a real function such that
for some constants C > 0 and q > 0. Then assuming H1, H2 and a bounded, we have where n 1/2 ε n → 0 as n goes to infinity.
Considering now (X n t ) t∈[0,1] that solves the equation we can check that the processes (X t/n , n 1/α0 L α0 t/n ) t∈[0,1] and (X n t , L n t ) t∈[0,1] have the same law, and (4.1) reduces to prove We can split (L n t ) in two parts (small jumps and large jumps): where μ n andμ n are respectively the Poisson measure and the compensated Poisson random measure associated to (L n t ). Since 2/δ > 1, we deduce using successively Hölder's inequality and Burkholder's inequality since a is bounded. Considering now the large jumps part and assuming more- since δ < α 0 , and i) follows. Observing that (4.1) implies E|n 1/α0 L α0 1/n | δ ≤ C (taking b = 0 and a = 1), we obtain ii).
From this proposition, we obtain a Central Limit Theorem for non bounded functions.
Theorem 4.1. We assume H1, H2 and a bounded. Let h i : R → R, i = 1, 2, 3 be C 1 functions such that for some constants C > 0 and q > 0 and let f i : R → R be continuous functions. We assume that Eh i (S α0 1 ) = 0 for i = 1, 2, 3. Then we have the stable convergence in law with respect to σ(L α0 s , s ≤ 1): where z n is defined by (2.4), N is a standard Gaussian variable independent of Σ and for 1 ≤ i, j ≤ 3 Proof. Using Proposition 4.1 and following the proof of Corollary 3.1 in [6], we obtain the convergence in probability for j = 1, 2, 3 . Now we can extend the proof of Theorem 3.2 in [6] to non bounded functions h j with logarithmic growth and we obtain the stable convergence in law An immediate consequence of Theorem 4.1 is the following convergence in probability.

Corollary 4.1. We assume H1, H2 and a bounded. Let
for some constants C > 0 and q > 0 and Eh(S α0 1 ) = 0. Then we have the convergence in probability We finally establish some uniform convergence results that extend Theorem 3.1 in [6].
where K 0 is a neighborhood of (θ 0 , σ 0 ) and let (z, α) → g α (z) be a C 1 function (with respect to (z, α)) such that ∂ z g α is bounded (uniformly in α on compact subset of (0, 2)) and such that

Lemma 4.2. Assuming
where (S α t ) t is a symmetric α-stable process with characteristic function u → e −|u| α . Assumption NDM holds true and we can apply results of Section 3.1. As matrix rate, we choose v 1,1 Then, from the stable convergence result of Theorem 3.2, one has the convergence to Z of the vector ⎛ ⎜ ⎝ Thus, the rate of estimation is n 1/α0−1/2 for θ 0 , and √ n for α 0 . Moreover, we get that Hence, the rate of estimation for σ 0 is √ n log(n) and asymptotically the estimation errors for the parameters σ 0 and α 0 are proportional and have a correlation tending to −1. Comparing with the situation of non-multiplicative model, addressed in Theorem 3.1, we see that both parameters α 0 and σ 0 are estimated with rate slower by a log(n) factor in the multiplicative case.
From Table 1, we see that for α 0 = 0.7 the joint estimation of the three parameters works well. Especially, the estimator of the drift parameter performs extremely well for α 0 = 0.7, which is expected, since the rate of estimation is n 1/0.7−1/2 n 0.93 . For α 0 = 1.3 (Table 2), the estimation of σ 0 and α 0 works well while the estimation of θ 0 has some bias which reduces slowly as n increases. For α 0 = 1.7, we found that the estimation of the drift parameter θ 0 has both a very large bias and standard deviation. Actually, the convergence of the estimatorθ n occurs with the extremely slow rate n 1/1.7−1/2 n 0.0882 , and it seems impossible, in practice, to get a correct estimate of the drift parameter when α 0 = 1.7.
On the other hand, we see that the estimation of σ 0 and α 0 works well again. It means that the impossibility to estimate correctly the drift parameter for α 0 = 1.7 has no negative impact on the estimation of the other parameters.
In Tables 4-6, we give an estimation of the standard deviation of the error of estimation rescaled in a way that it theoretically converges to a Gaussian variable whose variance can be computed using (5.1). Let us stress that, as the asymptotic law ofθ n is mixed normal, the estimation errorθ n − θ 0 is rescaled by a factor involving the random quantity that we approximate, in practice, by a Riemann sum based on the simulated observations (X i/n ) i=0,...,n . As the entries of the matrix K α0 given in (5.1) are not explicit, the theoretical asymptotic standard deviations for these rescaled errors are computed using numerical integration. These theoretical standard deviations are reported in the last line of each tables 4-6.
In Tables 4-6, we see that the asymptotic behavior of the estimator is exactly as predicted from the theoretical study: the rate of estimation for θ 0 , σ 0 , and α 0 are exactly n 1/α0−1/2 , n 1/2 / log(n) and n 1/2 . Moreover, the asymptotic rescaled standard deviations are close to the theoretical one.
In Figures 1-3, we plot the histograms of the distribution of the rescaled errors of estimation, together with the density of their Gaussian limits. For the sake of shorntess, we only plot the results for n = 2048 and α 0 ∈ {0.7, 1.3, 1.7}. It appears that the empirical distributions are close to their theoretical limits, in all cases.
In Table 7, we display the empirical correlation between the estimatorsσ n andα n for different values of α 0 and n. As expected from the theory, in the multiplicative case this correlation tends to −1 as n → ∞.
Our last numerical experiment in the multiplicative case is related to Remark 3.2, where we state that for α 0 > 2/3, one can replace in the contrast function, the quantity ξ x 1/n (θ) by its one step Euler approximation ξ x 1/n (θ) x+b(x, θ)/n. We see, by comparison of Table 8 with Table 1, that the quality of estimation is the same when one uses the approximation of ξ x 1/n (θ) as when one uses its true value.

Discussion about the implementation
The minimization of the contrast function (2.3) was conducted using quasi-Newton methods implemented in Python Numpy package. It necessitates to compute numerically the values of the contrast function and of its derivatives, and thus involves numerous evaluations of the functions ϕ α , ∂ z ϕ α and ∂ α ϕ α . These three functions are computed using their integral representations given in [20] and [21], that can be numerically intensive. However, numerical evaluation of the quantities ϕ α (z n (X i−1 n , X i n , β)), ∂ z ϕ α (z n (X i−1 n , X i n , β)) and ∂ α ϕ α (z n (X i−1 n , X i n , β)) for different values of i = 1, . . . , n can be computed in parallel, using different threads for different values of i. In our numerical simulation, we used CUDA programming language, to implement the computation of the contrast function, and its derivative, with a multi-threaded code on GPU. Using a Nvidia GTX1080 GPU, the Monte-Carlo experiments presented in Table 1-3 with n = 2048 and 1000 iterations take around 2 hours each. Hence, searching the values of the parameter for one observation of length n = 2048 takes a few seconds, showing that our contrast method is implementable, and fast, in practice.
Theorem 3.2 states existence of some zero of the gradient of the contrast function, that yields a consistent estimator. However it does not prevent existence of other zeros that would not be a convergent estimator. Nevertheless, in practice the maximization algorithms always find a consistent estimator, and do not seem to be trapped on local maximum, or non consistent maximum, of the quasi-likelihood function. Searching directly the zeros of the gradient of the contrast function provides convergent estimators as well. This suggests that the zero of the gradient function might be unique for most simulations and reaches the global maximum of the contrast function. To support this, we draw one sample path of observations (X i/n ) i=0,...,n with n = 2048, θ 0 = 0.5, σ 0 = 1, α 0 = 0.7, and explore the shape of the contrast function (2.3). In Figure 4, we plot the graph of for α ∈ {0.5, 0.6, 0.7, 0.8, 0.9}. Figure 4a) plots the cross section for the true value α 0 = 0.7, and we see that the maximum in (θ, σ) is reached in a unique point close to the true value (θ 0 , σ 0 ). In Figures 4b)-e), we see that the maximization showing that when maximizing the contrast function with respect to (θ, σ, α), the maximum will be reached for α close to α 0 = 0.7. Figure 5 shows the cross section of the contrast function at θ = θ 0 = 0.5. We see that the maximum in (α, σ) is reached near the true value (α 0 , σ 0 ). Eventually, maximizing with respect to the three parameters, by Python Numpy package, yields toβ n = (0.495, 0.849, 0.709) and the quasi-Newton maximization algorithm converges after 18 steps.

A non multiplicative model driven by an α-stable process
We consider (X t ) t∈[0,1] solution of where (S α t ) t is a symmetric α-stable process. The assumption NDNM holds true, and thus we can apply Theorem 3.1. As a consequence the rate of estimation is n 1/α0−1/2 for θ 0 , √ n for σ 0 and √ n log(n) for α 0 . Comparing to multiplicative case, the rate of estimation is log(n) faster for both parameters σ 0 and α 0 . We make numerical simulations to see if the rate is indeed faster, in practice, in the non-multiplicative case than in the multiplicative one. The asymptotic law of the estimation error is mixed Gaussian by Theorem 3.1, and we define rescaled errors of estimation that have Gaussian laws. Let us define Then, from the stable convergence result of Theorem 3.1, we have In Tables 9-14, we present results of numerical simulations conducted with the true values of the parameters θ 0 = 0.5, σ 0 = 1 and α 0 ∈ {0.7, 1.3, 1.7}. We show a Monte-Carlo evaluation, based on 1000 replications, for the mean and standard deviation of these estimators. Moreover, we evaluate the standard deviation of the rescaled errors of these estimators defined as on the left hand-side of (5.2)-(5.4). We compare these standard deviations with the theoretical limit given by the standard deviation of the variables appearing on the right hand-side of (5.2)-(5.4). From the results in Tables 9-11, we see that the estimation of the three parameters performs well for α 0 = 0.7 and α 0 = 1.3, and the parameters σ 0 and α 0 are well estimated for α 0 = 1.7 as well. Moreover, from Tables 12-14, we see that the asymptotic behavior of the estimator is in practice very close to the description given by theoretical results (5.2)-(5.4). In Figures 6-8, we Table 12 Std of rescaled errors: Non multiplicative case α 0 = 0.7    plot the distributions of the rescaled errors of estimation given by the left hand sides of (5.2)-(5.4) together with their Gaussian limits, when n = 2048 and α 0 ∈ {0.7, 1.3, 1.7}. From these figures, we see again that the law of the estimator is very close to its theoretical description. Especially, we observe numerically that the rate of estimation of σ 0 , α 0 is different in this non-multiplicative model than for the multiplicative model of Section 5.1. Another difference is that the estimation errors of σ 0 and α 0 are no longer asymptotically proportional in the non-multiplicative case, which is consistent with the numerical evaluation of the correlation between these two estimators given in Table 15.

Non linear drift S.D.E.
In this section, we consider a model with non linear drift: dX t = (X t − θ 1 + X 2 t )dt + exp(σ sin(X t ))dL α t .
Here, the quantity ξ x 1/n (θ) can not be explicitly computed and we use instead the Euler approximation ξ x 1/n (θ) = x + b(x, θ)/n. We focus on the case α 0 = 0.7 and from Remark 3.2, this Euler approximation is valid as α 0 > 2/3. We compare the results for two different driving Lévy processes, one being exactly α-stable, and the other one being locally α-stable.
The empirical means and standard deviations of the estimators are given in Table 16, for θ 0 = 1, σ 0 = 1 and α 0 = 0.7. In practice, we see that the estimators work well. However, the empirical standard deviation ofθ n seems unstable, as it is not perfectly decreasing with n, and we found rather different values, for different runs of simulations (each with 1000 replications). In Figure 9, we plot the distributions of the rescaled errors together with their Gaussian limits. All errors outside of the interval [−20, 20] are clipped to the interval borders. We see that the empirical distributions are close to their Gaussian limits, except for several larger values that fall outside the interval [−20, 20]. It explains why the estimators work well in practice, while the estimation of their variances can be unstable, due to a few extreme values.

Process driven by a tempered α-stable process
Here, we assume that the Lévy process (L α t ) t is a tempered α-stable process, whose Lévy measure is given by ν(dz) = cα |z| 1+α e −|z| 1 R\{0} (z)dz. To simulate tempered stable random variables, we use the rejection based method proposed in [15].
The empirical means and standard deviations of the estimator are given in Table 17, for θ 0 = 1, σ 0 = 1 and α 0 = 0.7.  We see that the estimator works very well in practice. Especially, the estimation of the drift parameter has a smaller standard deviation than in the stable case. In Figure 10, we plot the distributions of the rescaled errors of estimation together with their Gaussian limits. We see that the empirical distributions fit very well the theoretical ones. Especially, we do not observe the presence of extreme values, as it is the case when the model is driven by a stable process.