Parametric estimation for discretely observed stochastic processes with jumps

Abstract: We consider a two dimensional stochastic process (X, Y ), which may have jump components and is not necessarily ergodic. There is an unknown parameter θ within the coefficients of (X, Y ). The aim of this paper is to estimate θ from a regularly spaced sample of the process (X, Y ). When the dynamic of X is known, an estimator is constructed by using a moment-based method. We show that our estimators will work if the Blumenthal-Getoor index of the jump part of Y is less than 2. What is perhaps the most interesting is the rate at which the estimators converge: it is 1/ √ n (as when the underlying processes are not contaminated by jumps) when that index is not greater than 1. When the dynamic of X is unknown, we introduce a spot volatility estimator-based approach to estimate θ. This approach can work even if the sample is contaminated by microstructure noise.


Introduction
In this work, we consider a process (X, Y ) defined by the following stochastic differential equation where W denotes a standard Brownian motion, J a Lévy process with no Brownian part. The process (X, Y ) depends on an unknown parameter θ. The goal of this note is to estimate this parameter θ from regularly spaced observations of the process (X, Y ).
The parametric estimations for discretely observed processes have been intensively studied in the case that the underlying processes (X, Y ) possess some ergodic properties (see [12,33,34] and the references therein). To the best of our knowledge there are very few results about the non-ergodic situation and most are in the case of continuous diffusion processes (see [11,31]).
This note is thus the first attempt to construct estimators which work even when the underlying processes X and Y contain jump components, and without any assumption about ergodicity. It should be mentioned here that from a practical point of view, one may think of X for instance as either the (log) price of an asset or the exchange rates process and Y as a state variable such as the (log) price of another asset which is correlated with X (see [4,9]). Allowing X and Y to have jump components is nowadays of great interest. Some recent researches show that the models where jumps occur are able to fit skews and smiles that can hardly be captured by continuous models (see [5] and the references therein). It is needless to say that many classical estimation schemes for continuous diffusion processes are not suitable for processes with jumps.
In this paper, we present two classes of estimators for the parameter θ. When the dynamic of X is known, an estimator is constructed by using a momentbased method. This estimator is in the spirit of Jacod's recent work [11] for continuous diffusion processes. As far as the author knows, there are basically two ways to overcome the difficulty while working with jump processes. The first approach makes use of a threshold parameter (see [18,33]). Although this approach can deal with jump processes of infinite activity, its results depend very sensitively on the threshold parameter, which is very difficult to efficiently detect in general. We adopt here the second approach called multipower method, which has been developed recently in [2,23,29,30]. We will show that our estimator will work whenever the Blumenthal-Getoor index α of J is less than 2. In particular, if α ≤ 1 then the estimatorθ n will converge to the true parameter θ * at the optimal rate 1/ √ n (as when the underlying processes are not contaminated by jumps, see [11]) in the sense that n δ (θ n − θ) P −→ 0, for any δ < 1/2. On the other hand, if the index α ∈ (1, 2), the level of activity of the jump process J does effect the behavior of the estimatorθ n . More precisely, the rate of convergence ofθ n is 1/n 1/α−1/2 .
When the dynamics of X are unknown, we introduce a new method called spot volatility estimator-based approach to estimate θ. More precisely, under some assumptions, we approximate σ(θ, X(t n i )) 2 , i = 1, . . . , n by a sequence of statisticsσ(t n i ) 2 , i = 1, . . . , n which depends only on the observation data of Y . Then the estimatorθ n of θ is selected such that it minimizes with a suitable function A. An interesting feature of this method is that it can work even if the observations of Y are contaminated by microstructure noise. This situation happens, for example, when process Y is observed on a high frequency time scale (e. g. intradaily data) while X is observed on a lower frequency time scale (e. g. daily data). A naïve way to avoid the effect of microstructure noise to the estimators is to sample Y over longer time scale. However, it is not wise to accept that throwing away such a lot of data can be an optimal solution. Nevertheless, it is visibly clear that the rate of convergence of estimatorŝ θ n depends on the efficiency of the spot volatility estimators and hence, when microstructure noise and jump effects occur, the rates are slower than in the non-noisy case and a 1/ √ n-rate can not be attained. Estimatorsθ n could also hardly reach the rate which is achieved when the dynamic of X is known. A comprehensive discussion about spot volatility estimation in various situations can be found in [1, 15-17, 19, 23-28] and [21] 1 .
The present paper is organized in the following way. The moment-based approach and spot volatility estimator-based approach are presented in Sections 2 and 3, respectively. In each section a numerical example is carried out to illustrate the behavior of the estimators. Some spot volatility estimation schemes are provided in Section 4.

Preliminary
Throughout this section, we consider the process (X, Y ) defined on a filtered probability space (Ω, F, (F t ) t≥0 , P) by where W,W denote two Brownian motions which can be correlated but the sigma algebra σ{W t − W s ,W t −W s ; t ≥ s} is independent of F s for all s ∈ [0, T ). Functions a(θ, x), σ(θ, x),σ(θ, x) are known; function b is unknown. Parameter θ belongs to the set Θ which is a compact subset of R. J andJ are Lévy processes with no Brownian component. We also assume that the sigma algebra σ{J t − J s ,J t −J s ; t ≥ s} is independent of F s for all s ∈ [0, T ). The common assumptions in the literature suppose that the jump processes J andJ are either independent of the sigma algebra σ{W,W } or of finite activity (see [2]). Nevertheless, we remark that in our discussion we do not need these assumptions. The rate of convergence of our estimator depends on the Blumenthal-Getoor index α of J which is defined by where ν is the Lévy measure of J. Necessarily, α ∈ [0, 2]. In this paper, we suppose that process J has finite second moment and Blumenthal-Getoor index α < 2. Furthermore, we suppose that the Lévy processJ has a characteristic triplet (μ, 0,ν) which is known. Hereμ andν denote the drift and the Lévy measure ofJ, respectively (see [5]). In order to simplify our argument,J is also supposed to have finite moment of all orders. We now assume that the coefficients of equations (2.1) satisfy the following conditions. (A1).
Let denote θ * the true value of the parameter θ. Following the paper of Jacod [11], we introduce the following mild assumptions about the identifiablitily of θ * from the diffusion term σ(θ, x).
In the following we denote ∆ n = T /n, t n i = i∆ n , X n i = X t n i−1 , ∆ n i Z = Z t n i − Z t n i−1 for any process Z. Before establishing our estimator for the parameter θ, we give some remarks on the model (2.1). First, in our setting, though process X is observable, the discrete sample of X only may not be sufficient to make inferences about θ. This situation happens when, for example, the dynamic of X does not depend on θ at all (see Section 2.3 for a non-trivial example). Second, the model (2.1) appears in mathematical control and system theory of stochastic systems where processes X and Y are respectively the input and output of a real-time stochastic system (see [35] and the references therein). Third, if we consider a special case where processes X and Y coincide, the model (2.1) becomes This jump-diffusion model is widely used not only in finance to model the asset price [6,20,32] but also in soil moisture model [22], hydrology [3], population model [8], etc. However, as we already mentioned above, it seems that the estimation for θ in the non-ergodic setting has not been discussed so far in literature.

Estimator
Thank to Assumption (A1), the above equation always has a unique solution. Let In practice, because functions a,σ and the triple charateristics ofJ are known, ψ can be calculated by using for instance Monte Carlo methods. We denote (2.4) The contrast function is defined by We will show that function U n is continuous in θ; hence it attains a minimum on the compact set Θ and due to the measurable selection theorem we can find a measurable variableθ n such that U n (θ n ) = min θ∈Θ U n (θ). (2.6) Denote δ 0 = min{1/2, 1/α − 1/2}. We now state the main result of this paper.
Theorem 2.1. Assume that θ * is in the interior of Θ, then for any δ less than δ 0 .
It should be noted that in [33], an estimator for θ * is proposed, whose rate of convergence is n 1/2 independently of the jump behavior of J. However, the situation in [33] is quite different from ours because they treat the case that the underlying processes are ergodic and let the terminal time T tend to infinity in the estimator.

Numerical Example
In this section we consider the following toy model and J is a α stable Lévy process with jumps truncated by 1, or in other words, J is a Lévy process with no Brownian component and has a Lévy measure ν defined by for some positive constants A and B (see [5]).
It is worth to mention here that in this model, though process X does depend on parameter θ, a discrete sample of X, or even a continuous one, is not enough to infer a consistent estimation for θ when the terminal time T is fixed (see [12]). It is easy to verify that this model satisfies conditions (A1) and (A2) with The experiment is designed as follows: we fix θ = θ * = 2 and simulate the values of processes X and Y by using Euler's method with a very small time-discretization step. We consider the error defined by Error n =θ n − θ * .
We consecutively take the number of observations n = 10 3 , n = 10 4 and the Blumenthal-Getoor index α = 0, α = 0.7, α = 1.2. After iterating the simulation 1000 times for each case, we get the histograms for the distribution of Error in Figures 1 and 2. We see that the asymptotic behavior of our estimator is better for a small α. The quality becomes slightly worse when the jump part has higher activity, however it remains acceptable.

Proofs
From now on the symbol C stands for a positive generic constant which can be changed from a line to another but not depend on t, n or θ.

Estimates on moments
We first state a few lemmata which will be used later. This lemma can be proved by carefully following the argument in [13], Lemma 4.12.
The next lemma gives a bound for conditional moments of process X defined in (2.1).

Lemma 2.3.
Under assumption (A1), for each p ≥ 1, there exists a constant C p such that for any 0 ≤ s < t ≤ T , On the other hand, by Theorem 1 in [14], for any p ≥ 1, there exists a constant K = K(p, T,J) such that for any 0 ≤ s < t ≤ T . Hence, it follows from Jensen's inequality, assumption (A1) and Lemma 2.2 that Applying Gronwall's inequality, we get the desired result.
Now we split the proof of the Theorem 3.1 into several lemmata. First we give some estimates for function ψ(θ, x).
hence, for any s ≤ 1, iii) By classical differentiation properties for stochastic differential equations (see for example [31]), we have Using condition (A1), Gronwall's lemma and following a routine argument in SDE theory, we could end up with where C p is a constant which depends only on p. Next, we have and hence This estimation implies iii) for j = 1. The demonstration for the case j = 2, 3 is similar and will be omitted.
then there exists a constant C such that

11)
where F n i = F t n i .
Proof. By Lemma 2.2, it follows that Similarly, Now we prove (2.11) for j = 0. We have by Lemma 2.4, it follows that Taking into account (2.12) and (2.13) we get the desired result. The demonstration for j = 1, 2, 3 is similar and will be omitted.
We recall the following result about moment estimate for Lévy process (see [14,23]).
The following lemma gives a uniform estimate for U n (θ) −Ũ n (θ).
We introduce the following auxiliary function (2.15) Lemma 2.9. F (θ, x) is three times differentiable in θ, and

Proof. It follows by Lemma 2.4 that
We denote By following the similar argument as in the proof of Lemma 2.6 we can show that E(ξ 2 j |F n i−1 ) ≤ CA(X n i ) 2 ∆ 2 n , j = 0, 1. Hence this estimation implies the first part of this lemma. A proof for the second part can be carried out by a similar argument as above.
Lemma 2.10. For each j = 0, 2, Proof. Thank to condition (A1), A(x) j ∂ j F (θ,X n i ) ∂θ j are uniformly bounded by a constant. By applying Itô's formula for functions F (θ, x) and ∂ 2 F (θ,x) This fact, together with Lemma 2.9, leads to On the other hand, it follows from Lemma 2.6 that This relation and (2.17) yield Furthermore, by Lemma 2.6, we have hence, for any θ 1 , θ 2 ∈ Θ, It also follows from Lemma 2.6 that there exists a constant C such that This fact, together with (2.18), (2.19) and Theorem 20 in [10], yields By taking into account Lemma 2.8 we get the desired result.
Lemma 2.11. For each δ ∈ (0, δ 0 ), there exists a constant C such that hence it follows from Lemma 2.6 that Therefore, Combining this relation with Lemma 2.8 yields the desired result.

Proof of Theorem 2.1
We are now in position to give proof of the main theorem. First, we will show thatθ n P −→ θ * as n → ∞. (2.20) Let us denote and for each ǫ, η > 0, Because of condition (A2), lim η→0 P(C(ǫ, θ)) = 1. Since taking into account Lemma 2.10 we get (2.20). Next, applying Taylor's expansion, we have where µ n is a random point between θ * andθ n . Since θ * is in the interior of Θ, for any n large enough we have ∂Un(θn) ∂θ = 0, and It follows from (2.20) that µ n P → θ * , and by virtue of Lemma 2.10, we have where the last equality follows from condition (A2). By Lemma 2.11, for any δ ∈ (0, δ 0 ), we have Combining this fact with (2.21), (2.22) yields for any δ ∈ (0, δ 0 ), and this relation completes the proof.

Preliminary
In this section, we consider a process (X, Y ) defined on a filtered probability space (Ω, F, (F t ) t∈[0,T ] , P) and given by the following stochastic differential equation where W is a Brownian motion, J is a Lévy process without Brownian part, parameter θ belongs to Θ, which is a compact subset of R. Assume that we observe X without micorstructure noise at time grid t n i = iT /n for i = 0, 1, . . . , n. Process Y is observed with microstructure noise at another time grid t m j = jT /m for j = 0, 1 . . . , m. More precisely, at each time t m j , we cannot observe Y (t m j ) butỸ (t m j ) = Y (t m j ) + ǫ(t m j ) with ǫ(.) being a microstructure noise. We suppose that ǫ(.) satisfies the following assumption.
(MN). i) For any p > 0, there exists a constant ϑ p such that sup j,m ii) For each m, the random variables {ǫ(t m j ); j = 0, . . . , m} are independent and have the same expectations.
In the following, we will denote X n i = X(t n i ). The following assumption plays a key role in the construction of our estimators.
for some positive constants L and δ 0 . Here, θ * is the true value of parameter θ. Assume further that m = O(n κ ) for some κ > 0 and denote δ = κδ 0 .
In Section 4, under conditions on the integrability and Hölder continuity of coefficients b and σ, we will present several classes of estimator {σ(t n i ) 2 } which satisfy (B1). We will also make use of the following assumptions which are gathered here for easy reference: (B2). Function σ(θ, x) is two times differential in θ, and there exists a function (3.6)

Estimator
The contrast function is defined as follows (3.7) Since function θ → g n (θ) is continuous, it has a minimum on the compact set Θ, and due to the measurable selection theorem we can find a measurable (with respect to the observed sigma algebra at stage n) variableθ n satisfying g n (θ n ) = inf θ∈Θ g n (θ). Proof. For each n and ǫ > 0, we denote It follows from Assumption (B3) that, for any ǫ > 0, there exists ǫ 1 > 0 such that the event A ∞ = lim inf n→∞ ζ n (ǫ) > ǫ 1 has probability large than 1 − ǫ/3. We denote Since the sequence {A k ; k ≥ 1} is nondecreasing and On the other hand, we have Hence, it follows from (3.9) and Markov's inequality that lim sup n→∞ P(|θ n − θ * | > ǫ) ≤ ǫ 2 + lim sup n→∞ 8ǫ 1 Eg n (θ * ). (3.10) Furthermore, it follows from Hölder's inequality and Assumption (B1) that for any ǫ > 0. This relation yields the desired result. Now we are able to state the main theorem of this section. It tells us that the parametric estimatorθ n converges at the same rate as the estimator for spot volatility does. More discussion about the rate of convergence will be provided at the end of Section 4. Theorem 3.2. Suppose that Assumptions (B1)-(B4) hold. Then if the true parameter θ * is in the interior of Θ, the estimatorsθ n are n δ -consistent, in the sense that the sequence n δ (θ n − θ * ) is tight.

Example: X is a deterministic process
Let (X, Y,Ỹ ) defined by where J is a stable Lévy process with stable index β = 0.7, and ǫ(.) has normal distribution N(0, ε 2 ).
After iterating the simulation 1000 times, we get the histogram for the error θ − θ in Figure 3.

Example: X is a stochastic process
Let (X, Y,Ỹ ) defined by where W andW are two Brownian motion, J is a stable Lévy process with stable index β = 0.7, and ǫ(.) has normal distribution N(0, ε 2 ).
After iterating the simulation 1000 times, we get the histogram for the error θ − θ in Figure 4. Remark. In practice, it is necessary and very important to find an effective concrete scheme to computeθ n in formulae (2.6) and (3.8). This problem will be discussed in future work.

Spot volatility estimators
In this section, we present some simple estimation schemes for the spot volatility and study their rate of convergence in L 1 -sense. A further discussion about these schemes can be found in [23], and an improvement of them was presented in [27]. In [29,30], similar schemes were proposed for estimating integrated volatility. Other estimation schemes, which use the Fourier series method, were introduced in [15-17, 19, 21].
We consider a stochastic process (Y (t)) t≥0 defined on the filtered probability space (Ω, F, (F t ) t≥0 , P) given by where W is a standard Brownian motion, J a Lévy process with no Brownian part, A a measurable drift process, and B a process which is adapted to the sigma algebra (F t ) t≥0 . Let α denote the Blumenthal-Getoor index of J. Before stating our estimators, we introduce the following assumptions: and F t is independent of the sigma algebra σ(W s − W t , s > t) for all t ≥ 0.
(C2). The volatility coefficient B satisfies the following Hölder continuity condition for some β ∈ (0, 1], where L is a constant. This proposition can be proved by following the arguments in the proofs of Theorem 3.1 [23] and Proposition 2 [25].

Noisy case
In this section, we consider the case that the observation of Y is corrupted by noise. In other words, the observed data is not Y (t m i ) but ratherỸ (t m i ) = Y (t m i ) + ǫ(t m i ). We call ǫ(.) microstructure noise. It is visibly clear that the estimatorB(.) 2 will explode as the number of observations increase if we replace When process Y does not contain a jump part, i.e., J ≡ 0, we propose another estimator, which has a better rate of convergence, as follows.   [23]. Table 1 clarifies the rates of convergence of parametric estimator (θ n ) in Theorem 3.2 in various situations mentioned above. It is worth to note that these rates of convergence may not be optimal. An improvement of the spot volatility estimation scheme will lead to a better rate for estimator (θ n ). In particular, it has been shown in [21] that under more restricted assumptions on the model of Y , one can propose a scheme which has better rates of convergence in Proposition 4.3. Table 1 Rates of convergence of parametric estimator (θn)

Ideal Case
Noisy Case