Rate-optimal estimation of the Blumenthal-Getoor index of a L\'evy process

The Blumenthal-Getoor (BG) index characterizes the jump measure of an infinitely active L\'evy process. It determines sample path properties and affects the behavior of various econometric procedures. If the process contains a diffusion term, existing estimators of the BG index based on high-frequency observations only achieve rates of convergence which are suboptimal by a polynomial factor. In this paper, a novel estimator for the BG index and the successive BG indices is presented, attaining the optimal rate of convergence. If an additional proportionality factor needs to be inferred, the proposed estimator is rate-optimal up to logarithmic factors. Furthermore, our method yields a new efficient volatility estimator which accounts for jumps of infinite variation. All parameters are estimated jointly by the generalized method of moments. A simulation study compares the finite sample behavior of the proposed estimators with competing methods from the financial econometrics literature.


Introduction
Models for continuous time stochastic processes with jumps have gained increased interest in the statistical literature, most prominently in financial econometrics where they are used as a model for asset prices (Andersen et al., 2002;Christensen et al., 2014). The jump behavior of these processes X t can be broadly characterized in terms of the jump activity index, given by (1) Here, ∆X s = X s − X s− denotes the size of a jump at time s. If X t is a Lévy process, α is also known as the Blumenthal-Getoor index (Blumenthal and Getoor, 1961). The index α depends on the small jumps only, and for semimartingales, its range is α ∈ [0, 2]. Various qualitative properties of the process X t can be expressed in terms of the jump activity index. If the process has only finitely many jumps in total, then α = 0, and if the jumps are of finite variation, we have α ≤ 1. Conversely, α < 1 implies jumps of finite variation. Furthermore, the value of α has implications for various econometric procedures. For example, if the jumps are treated as a nuisance, jump-robust estimation of integrated volatility requires α < 1 (Jacod and Reiss, 2014), as well as an efficient drift estimator due to Gloter et al. (2018). In these applications, a higher jump activity typically induces a non-negligible bias which can not be easily corrected if the jumps are considered as a nuisance. Hence, highly active jumps need to be modeled more explicitly, as done by Amorino and Gloter (2018) for drift estimation, and by Todorov (2014, 2016) for volatility estimation.
As the jump activity index is a central property of infinite activity jump models, it is natural to consider statistical estimation of its precise value. Recent interest in this topic has been initiated by Aït-Sahalia and Jacod (2009), who study the estimation of α based on discrete high-frequency observations X i/n , i = 1, . . . , n, where X is an Itô semimartingale with a non-vanishing diffusion component. They specify (1) more precisely by defining α in terms of the spot jump compensator ν t , assuming that ν t ((−x, x) c ) = r t |x| −α + O(|x| δ−α ) as |x| → 0 for a predictable process r t , and some δ > 0. The statistical challenge is that, based on discrete observations at a given frequency, the small jumps can hardly be distinguished from the continuous diffusion movement. The solution of Aït-Sahalia and Jacod (2009) is to introduce a threshold sequence τ n ∝ h ω n → 0 and consider If ω < 1/2, the contribution of the diffusion towards the statistic U (τ n ) will be negligible. The jump activity can be identified via the approximate scaling relation U (τ n ) ∝ τ −α n , and Aït-Sahalia and Jacod (2009) show that this approach lends itself to derive an estimator of α with rate of convergence n α/10 . Replacing the indicator in (2) by a suitable smooth function, Jing et al. (2012) improve this rate to n α/8 . So far, the best rates have been achieved by Reiß (2013) for the case that X t is a Lévy process, and by Bull (2016) for Itô semimartingales. Both authors construct estimators which converge at rate n α/4− for arbitrary > 0. In both cases, the precise form of the estimator depends on the desired rate defect > 0.
In the considered high-frequency setting, the optimal rate of convergence for estimating α is conjectured to be n α/4 , up to logarithmic factors. This lower bound is justified by the results of Aït-Sahalia and Jacod (2012), who study the diagonal entries of the Fisher matrix of a fully parametric submodel consisting of the sum of a Brownian motion and a symmetric α-stable Lévy motion. A matching LAN result is not available since the off-diagonal entries have not been studied. This lower bound is discussed in Section 3. It should be highlighted that the achievable rate of convergence for estimating α depends on whether the process contains a non-vanishing diffusion component. If we consider a pure-jump Itô semimartingale, the jump activity index can be estimated at rate √ n based on high-frequency observations (Todorov, 2015). Although the estimators of Reiß (2013) and Bull (2016) almost achieve the optimal rate of convergence, there is so far no procedure which attains the n α/4 lower bound, even in the case where X t is a Lévy process. This issue has also been formulated as an open problem by Reiß (2013). In this paper, we propose a new estimator of α for the Lévy case. If only α is unknown, the estimator achieves the optimal rate of convergence, matching the lower bound of Aït-Sahalia and Jacod (2012). If an additional proportionality factor r needs to be estimated, our estimator is rate-optimal up to a factor of log n for both r and α. Furthermore, we show that the diagonally rescaled Fisher matrix in the submodel considered by Aït-Sahalia and Jacod (2012) is asymptotically singular for the combined parameter (α, r), and hence we conjecture that our rate of convergence is in fact optimal. Our procedure also yields an efficient estimator of the volatility σ 2 of the diffusion component of X t in the presence of jumps of infinite variation. Under analogous conditions on the jump behavior, Todorov (2014, 2016) have derived a different efficient estimator of volatility which is robust to highly active jumps. Hence, our estimator is an alternative to the method of Jacod and Todorov (2014), although the latter is valid for Itô semimartingales and we restrict our attention to Lévy processes. The proposed estimator is based on the generalized method of moments, and we estimate the jump and the diffusion parameters jointly in a single step as the solution of a system of estimating equations.
Our model allows for an asymmetric behavior of the small jumps. In particular, for a Lévy process X t with characteristic triplet (µ, σ 2 , ν), we suppose that the Lévy measure ν is locally stable in the sense that, for z close to 0, Here, M is a natural number, r ± m ≥ 0, m = 1, . . . , M , and the 0 < α M < . . . < α 1 < 2 are the successive Blumenthal-Getoor indices, as introduced by Aït-Sahalia and Jacod (2012). The approximation in (3) will be made precise in the sequel. In particular, the BG index of X t will be α = α 1 . We construct an estimator for the parameter vector θ ∈ R 3M +1 consisting of the volatility σ 2 , the indices α m , and the proportionality factors r ± m .
The remainder of this paper is structured as follows. In Section 2, we present our model and the proposed estimator. A central limit theorem is given, establishing the rate n α/4 . The rate of convergence and related lower bounds are discussed in Section 3. By means of a simulation study (Section 4), we compare the finite sample properties of our method with the jump activity estimators of Bull (2016); Reiß (2013) and the volatility estimator of Jacod and Todorov (2014). All technical results, which might be of independent interest, are outlined in Section 5.1, and the detailed proofs are gathered in Section 5.2.

Notation
For two real numbers a, b, we denote a ∧ b = min(a, b), a ∨ b = max(a, b). The indicator function of a set A is denoted as 1 A . For a function f = f (a, b, . . .), ∂ a f denotes the partial derivative w.r.t. a, and for a function f (θ) ∈ R m with θ ∈ R k , the gradient matrix is denoted by (D θ f ) j,l = ∂ θ l f j . For δ > 0, B δ (0) is the ball around 0 with radius δ in R k , where k is evident from the context. I d ∈ R d×d denotes the identity matrix. The multivariate normal distribution with covariance matrix Σ and mean 0 is denoted as N (0, Σ), and ⇒ denotes weak convergence of probability measures resp. random elements. The expectation operator is E, and dependence upon a parameter θ is denoted as E θ .
(1 ∧ |z| 2 ) ν(dz) < ∞. We choose an odd truncation function ξ such that |ξ| ≤ 2 and ξ(z) = z for z ∈ (−1, 1). Then X t admits the Lévy-Itô decomposition where N (dz, ds) is a Poisson point process with intensity measure ν(dz) ⊗ ds, and B t is a standard Brownian motion, independent of N . The value of µ depends on the choice of the truncation function ξ, but for our purposes, it will turn out that µ is negligible anyways. To make the approximation (3) precise, we suppose that for some L > 0 and ρ > 0. The approximating measureν is given by the Lebesgue densityν for some natural number M and parameters α = (α 1 , . . . , α M ) ∈ (0, 2) M , and r = ( ≥0 . The remainder term in (5) is treated as a nuisance. In particular, this remainder may still consist of infinite activity jumps. Our main result will require ρ < α M , such that the nuisance jumps are in a sense less active than the Lévy measureν and asymptotically negligible. The parameters of the modeled part are summarized as where Θ contains all parameter vectors θ as specified, such that additionally The value α = α 1 is of central importance. In particular, we need to impose the lower bound α M > α/2 to ensure identifiability of the full parameter vector θ, see Aït-Sahalia and Jacod (2012). Note that the definition (6) is the same as given by Jacod and Todorov (2016) for the symmetric case.
In the high-frequency sampling setting considered here, we are given n observations X ihn , i = 1, . . . , n with observation frequency h n → 0 such that nh n = T is constant. Without loss of generality, let T = 1 and h = h n = 1/n. Equivalently, we observe the n increments ∆ n,i X = X ihn − X (i−1)hn ∼ X hn , which constitute a triangular array of random variables with iid rows. The law of X hn is not fully described by the parameters (σ 2 , r, α) due to the remainder in (5). Hence, we approximate it by a fully specified Lévy processZ t with characteristic triplet (0, σ,ν). The processZ t may be represented asZ where B t , S m t , m = 1, . . . , M , are independent Lévy processes, B t is a standard Brownian motion, and the S m t are skewed α m -stable process with Lévy measure |z| −1−αm (r + m 1 z>0 + r − m 1 z<0 ).
We suggest to estimate the parameter θ via the method of moments. In particular, we choose 3M + 1 functions f j : R → R, f = (f 1 , . . . , f 3M +1 ), and a suitable scaling factor u = u n , and defineθ =θ n to be a solution of the equation Here and in the following, E θ f (Z h ) denotes the expectation such thatZ h is determined by the parameter vector θ. SinceZ h is a fully parametric approximation of X h , the function F n (θ) can be be computed numerically, such thatθ n is a feasible estimator. To distinguish a generic parameter value from the parameters governing X t , we denote by θ 0 the true parameter such that (5) holds.
To study the limit ofθ n , we employ the standard framework for estimating equations as reviewed by Jacod and Sørensen (2018). Under the assumptions imposed below, we show thatθ n − θ 0 ≈ −(D θ F n (θ 0 )) −1 F n (θ 0 ), up to negligible terms. In order forθ n to have good asymptotic properties, the choices of the moment functions f and the scaling factor u n are crucial. In particular, to derive a central limit theorem for F n (θ 0 ) (see Lemma 5.4), we need to control the sampling variance in (8) as well as the bias incurred by approximating X t byZ t . Furthermore, the asymptotic behavior of D θ F n (θ) as n → ∞ needs to be treated (see Lemma 5.5). To this end, the following properties turn out to be sufficient.
The smoothness imposed by Condition F1 is used to bound the bias incurred by approximating Ef (uX hn ) by E θ f (uZ hn ), see Corollary 5.3 below. To control the sampling variance, we do not only require smoothness of the employed moment functions, but they further need to be of a specific shape.
Additional identifiability conditions are specified in assumption I below. The first moment function f 1 is approximately quadratic near zero, and will serve to identify the volatility σ 2 . The functions f j (x) are smooth thresholds, which distinguish the diffusion from the jump component. An example of suitable moment functions is given in section 4. To ensure that the threshold is effective, we require that u n X hn → 0 in probability, i.e. u n = o( √ n). By choosing an appropriate scaling sequence as follows, the moments Ef j (u nZh n ), j ≥ 2, will be dominated by the jump component.
Although potentially not sharp, the upper bound on the factor τ is required to derive our asymptotic result. For details, see the technical Lemma 5.1 below and the subsequent discussion. When choosing u n in accordance with condition U, it suffices to use a reasonable upper bound on σ. Furthermore, the simulation results presented in section 4 show that larger values of u n also perform well in finite samples.
To formulate our main result on the asymptotic behavior ofθ, we introduce the quantities which exist if g ∞ , g ∞ < ∞. Furthermore, we introduce the matrices and the matrix A(θ) ∈ R (3M +1)×(3M +1) , given by and for m = 1, . . . , M , j = 2, . . . , 3M + 1, These derivatives exist because f ∞ , f ∞ are finite. Finally, we introduce the symmetric positive semidefinite matrix Σ(θ) given by If clear from the context, we will omit the dependence on θ. Using this notation, we can formulate the remaining identifiability condition.
Remark 1. Analyzing the degrees of freedom of the equation |A(θ)| = 0 suggests that condition I is, in fact, the generic case. To demonstrate this point, we construct a set of moment functions satisfying the identifiability condition. Consider the case M = 1 with α m = α and r ± m = r ± , m = 1. We can construct a set of moment functions satisfying condition I as follows. Let f 1 = f and g be symmetric functions satisfying conditions F1 such that f 1 (0) = 0, and g vanishes on Hence, A(θ 0 ) is regular for (r + + r − ) > 0 and all α ∈ (0, 2) if g is chosen such that a = 0. This is in particular the case for the choice of the moment functions for the simulation study in section 4.
The main result of this paper is the consistency and asymptotic normality ofθ n , as summarized by the following theorem.
Theorem 2.1. Let X t be a Lévy process satisfying (5) with some ρ < α/2, and parameter vector θ 0 ∈ Θ. Let f satisfy assumptions F1 and F2, and be such that A(θ 0 ) is regular, and let u n → ∞ be chosen according to U. Then there exists a sequence of random vectorsθ n solving (8), such thatθ n → θ in probability as n → ∞. This sequence is eventually unique, and, as n → ∞, The resulting rate of convergence for the BG index α = α 1 is thus found to be (n log n) α 4 , which improves upon existing estimators and matches the lower bound of Aït-Sahalia and Jacod (2012) up to logarithmic factors. However, the rate matrix of Theorem 2.1 is non-diagonal. The phenomenon of a non-diagonal rate matrix has also been observed in the pure jump case, i.e. σ 2 = 0, see Brouste and Masuda (2018). We further discuss this aspect and the resulting marginal rates of convergence forα m andr ± m in the next section. Nevertheless, the matrices Γ −1 n , A(θ 0 ), and Σ(θ 0 ) are block-diagonal, such that the volatility estimatorσ 2 is asymptotically independent of the estimator of the jump part.
The presented central limit theorem also holds for the fully specified case without nuisance, i.e. L = 0 in (5). Even in this parametric case, we find that a simple GMM estimator based on 3M + 1 fixed moment functions, corresponding to u n = 1, will not achieve the best rate of convergence. A careful construction of the estimating equation (8) is thus not only required to handle the nuisance term, but also for the underlying parametric problem itself.
The proposed estimator for α can be contrasted with existing methods in the literature. In an earlier study, Reiß (2013) suggests a test procedure for the value of α based on a statistic T m n with tuning parameter m ∈ N. Therein, it is established that T m n → Q(α) as n → ∞ at rate n α 4 − (m) , and (m) → 0 as m → ∞. By inverting the function Q, this approach yields a near-optimal estimator for α. The statistics T m n are constructed based on nonlinear sample moments as in (8), where the f j are linear combinations of trigonometric functions, i.e. f j (x) = k w k,j exp(iλ k x). Choosing the weights w k,j carefully such that k w k,j λ 2p k = 0 for p = 1, . . . , m − 1, Reiß (2013) is able to reduce the variance of the corresponding sample moments. The arbitrarily small defect in the rate of convergence n α/4− (m) derived therein is thus due to the sampling variance. In contrast, by choosing the moment functions to vanish near zero according to Condition F2, we obtain a smaller variance of the sample moments.
An alternative estimator achieving the rate n α/4− is presented by Bull (2016), which also uses functions which vanish near zero. Therein, the value Ef (u n X hn ) is approximated by a finite series expansion, and extending this expansion reduces the rate defect . In contrast, we use the approximation Ef (u n X hn ) ≈ Ef (u nZh n ). Although the latter value is not available in explicit form and needs to be determined numerically, this approach allows us to decrease the bias of the estimating equation further than by any finite series expansion. In particular, we only incur a bias due to approximating the Lévy measure of X t , but not due to a discretization of the time evolution of the process. Thus, our method effectively circumvents the variance issue of Reiß (2013) and the bias issue of Bull (2016). This allows us to eliminate the polynomial rate defect and achieve a faster rate of convergence.

Asymptotic optimality
It is natural to ask whether our proposed estimator is asymptotically optimal. From Theorem 2.1, we find that which matches the optimal estimator in the situation without jumps. That is,σ 2 n is efficient. In general, jumps of infinite variation reduce the achievable rate of convergence for volatility estimators (Jacod and Reiss, 2014). Here, we are able to recover efficiency by modeling the infinite variation part of the jump measure explicitly via (5). The same methodology has been applied by Todorov (2014, 2016) to construct an efficient estimator of σ 2 . Note that the latter studies treat more general types of semimartingales, while we only derived a result for Lévy processes. In contrast to the existing estimators, which use a multi-step debiasing procedure, we determineσ 2 by a single set of estimating equations. While our approach is conceptually simple, solving the estimating equations (8) is computationally expensive. A comparison of the finite sample performance is presented in Section 4.
As the asymptotic variance of the estimators α m and r ± m depends on the choice of f , they can not be expected to be variance efficient. Furthermore, they are coupled via Γ n and via the matrix A(θ 0 ), which is in general dense. Inspecting the limit in Theorem 2.1, we find that To assess these rates of convergence, we may compare with the lower bound of Aït-Sahalia and Jacod (2012). Therein, the authors compute the diagonal terms of the Fisher information I n θ based on n observations ofZ 1/n for the symmetric case r + m = r − m = r m and M = 2. Their analysis of the diagonal entries I n αm,αm and I n rm,rm suggests that an asymptotically optimal estimator (α * m ,r * m ) should satisfŷ Notably, even for M = 1, the rates (11) are faster than (10) by a logarithmic factor. This difference could potentially be explained by the neglected off-diagonal terms of I θ . A similar phenomenon occurs in the pure jump case σ 2 = 0, M = 1, where for any sequence of diagonal matrices D n , the limit of D n I n (α,r) D n is singular, see (Masuda, 2015, Thm. 3.4) and (Aït-Sahalia and Jacod, 2008, Thm. 2). Recently, Brouste and Masuda (2018) studied this case, and established the LAN property with a non-diagonal rescaling matrix D n . They find that the optimal rate of convergence is slower than suggested by the diagonal entries of the Fisher matrix, by a factor of log n. A similar phenomenon is observed when estimating the Hurst parameter of a fractional Brownian motion based on high-frequency observations (Brouste and Fukasawa, 2018). There is no LAN result available for estimation of the BG index in the case σ 2 > 0, and a full investigation of the LAN property in the present case is out of scope of this paper. Nevertheless, we can adapt the proof of Aït-Sahalia and Jacod (2012) to unveil the off-diagonal entries I n α 1 ,r 1 . It turns out that the diagonally rescaled Fisher matrix is asymptotically singular, just as in the pure-jump case.
Proposition 3.1. Let I h denote the Fisher information matrix ofZ h with M = 1 and α 1 = α, r + 1 = r − 1 = r. Then, as h → 0, In particular, the limiting matrix is singular.
The diagonal entries of the Fisher information matrix should match the optimal rates of convergence in the case where only a single parameter is unknown, e.g. if (σ 2 , r + 1 , r − 1 ) are known and α 1 should be estimated. In this situation, a natural version of our estimator is to consider only a single moment function f . Analogous to (8), for any m ∈ {1, . . . , M }, we may estimate α m as the solution of With a slight abuse of notation, we may also estimate r ± m by the equationF n (r ± m ) = 0. To distinguish jumps and diffusion, we suppose f satisfies the same conditions as f 2 , . . . , f 3M +1 , i.e. it should vanish around zero.
Under the same conditions, and if all parameters except for r + m resp. r − m are known, there exists a consistent sequence of estimatorsr ± m solvingF n (r ± m ) = 0 such that, as n → ∞, .
Since u n is of order n/ log n, Proposition 3.2 establishes precisely the rates (11). In the setting of Aït-Sahalia and Jacod (2012), in particular M = 2, this shows that α m resp.r ± m are rate efficient if the remaining parameters θ are known. In contrast, if all parameters θ are unknown,θ achieves the optimal rate of convergence, up to a logarithmic factor. Due to the singularity of the Fisher matrix, we conjecture that the achieved rates (10) are in fact optimal.

Simulation study
By means of a Monte Carlo study, we compare the finite sample performance of our estimator with the estimators of Reiß (2013) and Bull (2016) for the Blumenthal-Getoor index α, and with the volatility estimator of Jacod and Todorov (2014). To this end, we sample paths of a Lévy process X t given by We denote by S α,β t the α-stable Lévy motion with skewness parameter β ∈ (−1, 1). That is, the characteristic function of S α,β t is given by (see e.g. Zolotarev (1986)) The Lévy measure corresponding to this standardization can be expressed in the form (6) with M = 1, r + −r − r + +r − = β, and (r + + r − ) = 1 Γ(1−α) cos(πα/2) if α = 1. Here, we will set β = −1/3 and study the cases α = 1.3 and α = 1.7. Then (5) is satisfied with ρ = 0.5, such that S 0.5,0 t is a nuisance term, andZ t = B t + S α,β t . In view of applications in financial econometrics, we consider the time horizon T = 1, and sampling frequencies h = 0.2/23400,h = 1/23400, and 5/23400. This sampling schemes correspond to 0.2 resp. 1 resp. 5 seconds per quote on a trading day of 6.5 hours.
To determine the solution of the estimating equation (8), we need to compute the moments E θ f (uZ h ) and their gradients. This can be done numerically by means of a continuous Fourier transform since E exp(iλZ h ) is available in closed form. The employed moment functions f 1 , . . . , f 4 are handcrafted to satisfy F1 and F2. In our simulations, we use Although this choice of u is too large to comply with assumption U, we found it to perform better than smaller values for the given sampling scenario.
The methods of Reiß (2013) and Bull (2016) each have a tuning parameter m ∈ N, and larger values of m increase the rate of convergence. However, smaller values of m can be superior in finite samples. In our simulations, we found that the estimator of Bull performed best when setting m = 3, and the estimator of Reiß performed best when setting m = 2, across all observation frequencies. Furthermore, the method of Reiss involves a rescaling parameter U n and two weighting measures w 1 , w 2 . We choose the weighting measure w 1 to be supported on the set {1/m, 2/m, . . . , 1}, and w 2 to be supported on the set {2/m, 4/m, . . . , 2}. The truncation parameter is set to U = h −(1−2m)/(4m−1) , as suggested by equation (3.8) therein.
In Table 1, we compare the simulated performance of our moment estimator for α and σ 2 with the estimators of Jacod and Todorov (2014), Reiß (2013), and Bull (2016). For the latter two, we choose the best tuning parameter m as specified above. The estimator of Jacod and Todorov (2014) is implemented as in equation (5.3) therein, with ζ = 1.5 and u = | log h| 1 30 . It is found that the new estimators perform best in the considered setting The good performance of the estimator of Reiß in the case α = 1.7 is somewhat surprising, since the analysis of Reiß (2013) only yields a suboptimal rate of convergence. However, for the latter estimator, no central limit theorem is available. Hence, it is possible that the estimator in fact converges at a rate which is faster than the rate derived by Reiß (2013). It should also be noted that all benchmarked methods require various tuning parameters. Most notably, all methods require some form of scaling factors. Furthermore, our new estimator depends on the the employed moment functions f j , and the estimator of Bull (2016) requires the choice of a truncation kernel function. It is thus possible that a very careful choice of these parameters might affect the ranking implied by Table 1.
The volatility estimatorσ 2 is efficient, and from (9), the errorσ 2 − σ 2 should be of order √ 2 hσ 4 . From the results of Table 1, we find that this asymptotic performance is not achieved for the considered sample sizes. This defect holds for our proposed estimator as well as for the benchmark method of Jacod and Todorov (2014), and it is bigger for large values of α. This is potentially due to the relatively large jump component of the simulated process (13). On the other hand, the asymptotic distribution of Theorem 2.1 yields a good approximation of the finite sample behavior ofα, as shown in Figure  1. Clearly, the match with the asymptotic normal distribution improves for smaller h. Furthermore, the approximation is better for the smaller value α = 1.3.

Technical tools
In this section, we present the proofs of Theorem 2.1 and Propositions 3.1 and 3.2. Preliminary technical results are presented in Subsection 5.1, as they might be of independent interest, in particular Lemma 5.1 and Corollary 5.3. The detailed proofs are presented in Subsection 5.2.

Preliminary results
To study the asymptotic behavior of the estimating equation (8) by standard techniques (see e.g. Jacod and Sørensen (2018)), we need • a central limit theorem for the term 1 n n i=1 f (u n ∆ n,i X) − E θ f (u nZh ), and • properties of the derivatives D θ E θ f (u nZh ).
To determine asymptotic variances, as well as for some technical steps of the following proofs, it is useful to derive some explicit approximations of Ef (u nZh ).
Lemma 5.1. Let f ∈ C 2 be such that f, f and f are bounded and f (0) = 0, and let X t be a Lévy process with characteristic triplet (µ, σ 2 ,ν). The implicit constants in the following expressions depend on f and (µ, σ 2 ,ν), but neither on t nor on u. Moreover, all O(·) and o(·) terms are bounded resp. vanishing uniformly on compacts in Θ.
(iii) If f (0) = 0, f (0) = 0 but f (4) = 0, and f (3) , f (4) are bounded, then for any (iv) If f (0) = 0 and µ = 0, σ 2 = 0, then there exists a constantC bounded uniformly on compacts, such that for all f and all u > 1, t ≥ 0, The case (i), which is exploited in the proofs several times, imposes a subtle upper bound on u. Although this bound need not be sharp, the Lemma will not hold for u = τ / t| log t| if τ is too large. To make this plausible, note that for an α-stable process S α t , the probability P (|S α t | ≥ η t| log t|/τ ) tends to zero as t → 0, roughly polynomially in t. On the other hand, for the Brownian motion, P (|B t | > η t| log t|/τ ) = P (|B 1 | > η | log t|/τ ) → 0 polynomially as well, but the polynomial order of this decay will depend on the specific value of τ . For the jump term to dominate, as in case (i) of Lemma 5.1, τ must be small. The uniformity w.r.t. θ of the previous results will be used later on to derive the consistency of the estimator.
Another ingredient to obtain a central limit theorem is a bias bound, i.e. a bound on the error of approximating Ef (u n ∆ n,i X) by E θ f (u nZh ). For two random variables X and Y , recall the definition of the 1-Wasserstein metric d W and the total variation distance d T V given by where the supremum is taken over all bounded resp. Lipschitz continuous, measurable functions g : R → R. These distances are used in the proof of the following Lemma, which quantifies the error of approximation implied by the local stability assumption (5).
Note that the presented result of 5.3 can not be directly formulated in terms of d T V or d W , distinguishing it from the results of Mariucci and Reiß (2018). An alternative bound on the total variation distance between X t andZ t is presented by (Clément and Gloter, 2018, Proposition 4) and (Amorino and Gloter, 2019, Proposition 2), stating that d T V (X t ,Z t ) ≤ Ct 1∧ 1 α log(t) as t → 0. Their assumptions on the Lévy measure ν(dz) imply that our condition (5) holds, with ρ ≤ (α − 1) ∨ 0. Thus, if α > 1 and u t −1/2 , our bound (16) is sharper since tu α−1 t 3 2 − α 2 t 1 α . In the case α ≤ 1, our bound is of the same order of magnitude as the one presented by Clément and Gloter (2018) and Amorino and Gloter (2019). Furthermore, our result may also be applied in the case ρ > α − 1. However, we impose additional smoothness assumptions upon the considered function f , which is suitable for our statistical purposes because the moment functions are chosen by the statistician.
Corollary 5.3 and Lemma 5.1 allow us to derive the following central limit theorem for the estimated moments. In particular, we use Lemma 5.1 to control the sampling variance, and Corollary 5.3 to control the bias.
Lemma 5.4. Let nh n = T = 1 constant, i.e. h n = 1/n, and choose u n → ∞ according to U. Let f satisfy F1 and F2, and suppose that the Lévy process X t satisfies (5) with some ρ < α/2. Then, as n → ∞, Note that the rate of convergence for the first moment f 1 is slower than for f j , j ≥ 2. This is due to our special choice of f j , j ≥ 2, which vanish near zero. Hence, these moments are primarily driven by the jump component, which is of a smaller order than the diffusion term. On the other hand, the jump parameters α m , r ± m are harder to identify, i.e.
. This is established in the following Lemma.
These results allow us to establish the consistency ofθ n . We do not consider global uniqueness of the solution of the estimating equation (8). Hence, we only obtain the existence of a consistent sequences of random variables satisfying the equation.
Lemma 5.7 (Consistency). Let X t be a Lévy process satisfying (5) with some ρ < α/2, and parameter vector θ 0 . Let f satisfy assumptions F1, F2, and I, and let u n → ∞ be chosen according to U. There exists a sequence of random vectorsθ n solving (8), such thatθ n → θ in probability as n → ∞. This sequence is eventually unique, i.e. for any other consistent sequenceθ * n solving the estimating equation, it holds P (θ n =θ * n ) → 0. To obtain a central limit theorem forθ n , we may apply a Taylor expansion to obtain the representation where Df j,k = ∂ θ k Eθ j f j (u nZh ) for someθ j on the line segment between θ 0 andθ n , for j = 1, . . . , 3M + 1. This standard approach allows to establish Theorem 2.1, as detailed in Subsection 5.2.

Proofs
Proof of Lemma 5.1. At the price of changing the term µ, we may assume w.l.o.g. that ξ(z) = z1 |z|≤1 . In view of the Lévy-Itô decomposition (4), we write where N is a Poisson counting measure with intensityν(dz) ⊗ ds, and J u t denotes the corresponding integral term. The explicit form ofν allows for computation of µ u , as The term log(u) is added to cover the case α 1 = 1. This bound on µ u will be used in the sequel.
To derive the claims of the Lemma, we start with a rough bound for the probability The first term tends to zero identically as t → 0. To study the jump term, choose a bounded, smooth function g(x) ≥ 1 |x|≥λη such that g(0) = g (0) = 0. Then by Itô's formula, and a substitution in the integral, we obtain for a constantC depending on α, r and is bounded on compacts in these parameters. The function g can be chosen such that the latter term is finite. Thus, P (|uX t | > λη) = O(u α t), uniformly on compacts in α, r.
For the Gaussian term in (21), we employ the tail bound , the latter bound is of order less than O(u α t), uniformly on compacts. In particular, Note that the latter inequality does not hold if u = τ / √ −t log t for a proportionality factor τ which is too large.
To obtain an asymptotically exact value, we plug the former rough bound into Itô's formula. In case (i), we have We moreover used that Ef (uX s ) = O(u α t), and µ u u = O(u 2 t) as established previously. These upper bounds hold uniformly on compacts in Θ. To proceed, note that J ± α f is a bounded continuous function, since which is furthermore bounded uniformly on compacts in α. By virtue of this boundedness, uX s P − → 0 implies EJ ± αm f (uX s ) = J ± αm f (0) + o(1). To ensure that this last approximation holds uniformly on compacts in Θ, note that (J ± αm f ) ∞ = J ± αm f ∞ is also bounded, such that it suffices to control E(|uX s | ∧ 1) uniformly. But we already established that for any η, P (|uX s | > η) → 0 uniformly on compacts in Θ. Hence, uniformly on compacts in σ 2 , α, r. This proves the first claim. If, on the other hand, f (0) = 0, f (0) = 0, a different term dominates in (22). We obtain uniformly on compacts in Θ.
For the case f (0) = 0, f (4) (0) = 0, we may apply the result of case (ii) to obtain For the last claim, we use Itô's formula again. Recall that the truncation function satisfies ξ(z) = z for |z| ≤ 1, and |ξ(z)| ≤ 2. Then We treat all terms in (23) individually. Part (i) The small jumps can be handled by noting d W (J 1 t . Consider the slightly more general process be defined analogously based onX t . These are compound Poisson processes, which can be written as We compute . Then there exists a constantC which is bounded on compacts in Θ and L, such that for z < b/2, and α = α 1 , In particular, this yields E|Ũ 1 | ≤Ca 1∧α for a potentially different constantC. here and in the following, the constantC may vary from line to line, and is bounded on compacts in θ, L, and ρ.
Furthermore, since ν andν are sufficiently similar, We now consider the distance d W (U 1 ,Ũ 1 ) occurring in (25), which can be expressed in terms of their cumulative distribution functions as Recall that |η((a, b]) −η((a, b])| = O(a −ρ ). Furthermore, the assumed similarity of ν and ν implies that |ν( as a → 0, whenever b ≥ 2a. In this case, for −b ≤ v ≤ −a, The analogous bound holds for (27) for the Wasserstein distance, to obtain for a → 0 and a ≤ b 2 , where we used ρ < 1. Using (25), we may hence bound, Now note that, such that by Fubini's theorem, where we performed a linear substitution in the second step. Hence, Using this in (34), We now study the latter two terms.
Part (iv) The total variation distance can be bounded by noting that J ≤ t ν((−1, 1) c ).
Integration and differentiation may be exchanged because f is a Schwartz function and ψ has polynomial growth. In particular, via the Lévy-Khintchine formula, the Lévy symbol may be determined as The second term appears because the Lévy measureν is allowed to be asymmetric. In its expression, we used that ξ(z) = z for z ∈ (−1, 1), and denote Hence, by inverting the Fourier transform, So far, we assumed f to be a Schwartz function, but the right hand side of (44) makes sense whenever f ∈ C 2 . We can extend the whole equation (44) to this case by approximating f suitably with a sequence of Schwartz functions f n , such that sup |x|≤K |f (k) n (x)− f (k) (x)| → 0 as n → ∞ for each K > 0, and k = 0, 1, 2, and sup n f (k) n ∞ < ∞. Hence, standard arguments allow us to pass to the limit on both sides of the equation (44) To handle the asymmetry termξ u , we exploit (43) to derive The second integral can be bounded as follows. For any ∈ (0, 1) and any p = 1, there is ap between p and 1 such that By continuity, the same bound holds for p = 1. Thus, we obtain ∂ r ± mξ u ≤ u ξ ∞ + α m | log u|u αm∨1 + ξ ∞ u αm ≤Cu αm∨1 (1 + | log u|). Similarly, Note also that ∂ σ 2 ξ u = 0. For specific partial derivatives, we thus have shown that For fixed f , the functions f , J ± αm f and ∂ αm J ± αm f are bounded, uniformly on compacts in θ. Moreover, P θ (|uX h | > η) → 0 uniformly on compacts in Θ for any η, as established in the proof of Lemma 5.1. Therefore, E θ f (uX h ) → f (0) uniformly on compacts as h → 0, as well as E θ J ± αm f (uX h ) → J ± αm f (0) and E θ ∂ αm J ± αm f (uX h ) → ∂ αm J ± αm f (0). This completes the proof of (17), and (18) follows analogously by applying a linear transformation to (45). Finally, (19) is a consequence of (45) upon noting that E θ f (uX h ) = O(hu α ), see Lemma 5.1.
This corresponds to the entries A(θ) j,1 = 0 for j ≥ 2. In combination with Lemma 5.5, this suffices to establish the convergence (20).
by C n the event Since the first set is deterministic, and since F n (0) /d n P − → 0, we have P (C n ) → 1. On the set C n , it holds that 0 ∈ B λdn (F n (0)). Then Lemma 6.2 of Jacod and Sørensen (2018) with y = 0, f =F n and r = d n , states that there exists a unique pointT n ∈ B dn (0) which solvesF n (T n ) = 0.