On limit theory for functionals of stationary increments Levy driven moving averages

In this paper we obtain new limit theorems for variational functionals of high frequency observations of stationary increments L\'evy driven moving averages. We will see that the asymptotic behaviour of such functionals heavily depends on the kernel, the driving L\'evy process and the properties of the functional under consideration. We show the"law of large numbers"for our class of statistics, which consists of three different cases. For one of the appearing limits, which we refer to as the ergodic type limit, we also prove the associated weak limit theory, which again consists of three different cases. Our work is related to [9,10], who considered power variation functionals of stationary increments L\'evy driven moving averages.


Introduction
The last two decades have witnessed a great progress in limit theory for high frequency functionals of continuous time stochastic processes. The interest in infill asymptotics has been motivated by the increasing availability of high frequency data in natural and social sciences such as finance, physics, biology or medicine. Limit theorems in the high frequency framework are an important probabilistic tool for the analysis of small scale fluctuations of the underlying stochastic process and have numerous applications in mathematical statistics e.g. in the field of parametric estimation and testing. Such limit theory has been investigated in various model classes including Itô semimartingales (see e.g. [7,23,24]), (multi)fractional Brownian motion and related processes (see e.g. [3,4,5,6,20,26]), and many others.
In this paper we investigate the asymptotic theory for high frequency functionals of stationary increments Lévy driven moving averages. More specifically, we focus on an infinitely divisible process with stationary increments (X t ) t≥0 , defined on a probability space (Ω, F, P), given as where L = (L t ) t∈R is a two-sided Lévy process with no Gaussian component and L 0 = 0, and g, g 0 : R → R are continuous functions vanishing on (−∞, 0). In particular, this class of stochastic processes contains the linear fractional stable motion, which has the form (1.1) with g(s) = g 0 (s) = s α + and the driving Lévy process L is symmetric stable. The linear fractional stable motion is the most common heavy-tailed self-similar process, and hence exhibit both the Joseph and Noah effects of Mandelbrot, cf. [33,Chapter 7]. Fractional Lévy processes are other examples of processes of the form (1.1), see e.g. [30,Chapter 2.6.8]. Recent papers address various topics on linear fractional stable motions including analysis of semimartingale property [11], fine scale behavior [12,19], simulation techniques [17] and statistical inference [1,18,27,28]. We consider the class of variational functionals of the type V (f ; k) n := a n n i=k f (b n ∆ n i,k X), (1.2) where f : R → R is a measurable function, (a n ) n∈N , (b n ) n∈N are suitable normalising sequences, and the operator ∆ n i,k X denotes the kth order increments of X defined as ∆ n i,k X := k j=0 (−1) j k j X (i−j)/n , i ≥ k.
The usual first and second order increments take the forms ∆ n i,1 X = X i/n − X (i−1)/n and ∆ n i,2 X = X i/n − 2X (i−1)/n + X (i−2)/n . The reason for considering general kth order increments lies in statistical applications. Indeed, using higher order increments, with k ≥ 2, is often desirable since this gives rise to better convergence rates for various estimators (cf. [27]). This fact is also seen in our asymptotic results Theorems 2.1, 2.5 and 2.6. The choice of the normalising sequences (a n ) n∈N and (b n ) n∈N depends on the interplay between the form of the kernel g, the infinitesimal properties of the driving Lévy process L and the growth/smoothness of the function f .
The asymptotic behaviour of statistics of the form (1.2) in the context of power variation, i.e. f (x) = |x| p for some p > 0, has been characterized in the work [9,10]. Further papers on related topics include [29] that investigate asymptotic normality for functionals of the type (1.2) in the low frequency setting and for bounded functions f (the article [28] extends the results of [29] to certain unbounded functions). Much more is known about weak limit theory for statistics of discrete moving averages driven by heavy tailed i.i.d. noise; we refer to [22,36,37] among others. However, the asymptotic theory is investigated mostly for bounded functions f and under assumptions on the kernel and the noise process, which are not comparable to ours. We will conclude the discussion of related literature by mentioning the two papers [12,Section 5] and [19], which show "law of large numbers" results of the ergodic type in the context of fractional Lévy processes.
The aim of this work is to investigate the limit theorems for general functionals V (f ; k) n . We will start with first order asymptotic results, which consist of three different limits depending on the interplay between f , g and L. More specifically, the "laws of large numbers" include stable convergence towards a certain random variable, ergodic type convergence to a constant when the driving motion L is assumed to be symmetric β-stable and convergence in probability to an integral of some stochastic process. In the second step we will also prove three weak limit theorems associated with the ergodic type convergence, consisting of a central limit theorem and two convergence results towards stable distributions. Motivated by statistical applications, such as parametric estimation of linear fractional stable motion (cf. [1,18,27,28]), we will apply our theory to functions f of the form among others. One of the major difficulties when showing weak limit theorems lies in the fact that the ideas suggested in e.g. [22,29,36,37] in the setting of bounded functions f do not directly extend to a more general class of functions (also the proofs in [9] for the power variation case use the specific form of the function f (x) = |x| p ). As it has been noticed in earlier papers on discrete moving averages (see e.g. [36,37] and references therein) the Appell rank of the function f often plays an important role for the weak limit theory. It is defined as m ⋆ = min{m ∈ N : Φ (m) where S is a symmetric β-stable random variable with scale parameter 1, ρ > 0 and Φ (m) ρ denotes the mth derivative of x → Φ ρ (x). In this paper we will show that it is much more convenient to impose assumptions on the function Φ, rather than on the function f itself, to obtain weak limit theorems for a wide class of functionals V (f ; k) n . This is one of the main results of our work.
The paper is structured as follows. Section 2 presents the required assumptions, the main results and some remarks and examples. We present some preliminaries in Section 3. The proofs of the first order asymptotic results are collected in Section 4. Section 5 is devoted to the proofs of weak limit theorems, with a few more technical results postponed to Section 6.

The setting and main results
We start by introducing various definitions, notations and assumptions that will be important for the presentation of the main results. We recall that the Blumenthal-Getoor index of L is defined as β := inf r ≥ 0 : where ν denotes the Lévy measure of L. Furthermore, ∆L s := L s − L s− with L s− := lim u↑s, u<s L u stands for the jump size of L at point s. If L is stable with index of stability β ∈ (0, 2), the index of stability and the Blumenthal-Getoor index coincide, and both will be denoted by β. Let F = (F t ) t∈R be the filtration generated by the Lévy process L and (T m ) m≥1 be a sequence of F-stopping times that exhausts the jumps of (L t ) t≥0 . That is, {T m (ω) : m ≥ 1} = {t ≥ 0 : ∆L t (ω) = 0} and T m (ω) = T n (ω) for all m = n with T m (ω) < ∞.
Assumption (A) ensures in particular that the process X t , introduced in (1.1), is welldefined in the sense of [31], see [9,Section 2.4]. When L is a β-stable Lévy process, we may and do choose θ = β. By adjusting the Lévy measure ν, we may also include the case where (2.1) is replaced by g(t) ∼ c 0 t α as t ↓ 0 for some c 0 = 0.
For Theorem 2.1(i) below, we need to slightly strengthen Assumption (A) if θ = 1: In order to formulate our main results, we require some more notation. For p > 0 we denote by C p (R) the space of r := [p]-times continuous differentiable functions f : R → R such that f (r) is locally (p − r)-Hölder continuous if p ∈ N. We introduce the function where y + := max{y, 0} for all y ∈ R. We recall that a sequence (Z n ) n∈N of random variables defined on (Ω, F, P) with values in a Polish space (E, E) converges stably in law to Z, which is defined on an extension (Ω ′ , F ′ , P ′ ) of the original probability space, if for all bounded continuous g : E → R and for all bounded F-measurable random variables Y it holds that where E ′ denotes the expectation on the extended space. We denote the stable convergence in law by Z n L−s −→ Z, and refer to [21,32] for more details. Note, in particular, that stable convergence in law is a stronger property than convergence in law, but a weaker property than convergence in probability. In the framework of stochastic processes we write Z n u.c.p.
− −− → Z for uniform convergence in probability, i.e. when sup holds for all T > 0. Furthermore, we denote by Z n f.i.d.i. −→ Z the stable convergence of finite dimensional distributions.

Law of large numbers
Our first theorem presents the "law of large numbers" for the statistic V (f ; k) n defined at (1.2). The sequence (U m ) m≥1 below is i.i.d. U (0, 1)-distributed, defined on an extension (Ω ′ , F ′ , P ′ ) and independent of F. Here and throughout the paper we denote by SβS(ρ) the symmetric β-stable distribution with scale parameter Theorem 2.1. Suppose Assumption (A) holds and assume that the Blumenthal-Getoor index satisfies β < 2. We have the following three cases: (i) Let k > α and suppose that (A-log) holds if θ = 1. For some p > β ∨ 1 k−α assume that f ∈ C p (R) and f (j) (0) = 0 for j = 0, . . . , [p]. With the normalising sequences a n = 1 and b n = n α we obtain the stable convergence (ii) Suppose that L is a symmetric β-stable Lévy process with scale parameter ρ L > 0. Moreover, assume that E[|f (L 1 )|] < ∞, and H := α + 1/β < k. Then, setting a n = 1/n and b n = n H , we obtain
(iii) Suppose that (1 ∨ β)(k − α) < 1 and that f is continuous and satisfies |f (x)| ≤ C(1 ∨ |x| q ) for all x ∈ R, for some q, C > 0 with q(k − α) < 1. With the normalising sequences a n = 1/n and b n = n k it holds that Theorem 2.1 may be viewed as a generalization of [9, Theorem 1.1] from power variation to general functionals. The limiting random variable in Theorem 2.1(i) is indeed well-defined, as we show in Lemma 4.1 below. We remark that one of the conditions of Theorem 2.1(i) is the restriction α < k − 1/p. This restriction on the parameter α gets weaker when p is getting larger, but on the other hand the condition f ∈ C p (R) is stronger for a larger p. Thus, there is a trade-off between these two conditions.
The three cases of the theorem are closely related to the three limits for the power variation derived in [9, Theorem 1.1]. Let us briefly explain the main intuition behind Theorems 2.1(ii) and (iii).
The crucial step in the proof of Theorem 2.1(ii) is the approximation where Y is the linear fractional stable motion defined via It is well known that the process Y is H-self-similar and its increment process is ergodic (see e.g. [16]). Hence, under assumptions of Theorem 2.1(ii), we may conclude by Birkhoff's ergodic theorem for e.g. k = 1: This is exactly the statement of (2.3) for the case k = 1.
Under assumptions of Theorem 2.1(iii) it turns out that the stochastic process F defined at (2.4) is a version of the kth derivative of X. Hence, we conclude by Taylor expansion: This explains the statement of Theorem 2.1(iii).
Remark 2.2. In contrast to the power variation case investigated in [9], the assumptions of Theorems 2.1(i) and (ii), and of Theorems 2.1(i) and (iii), are not mutually exclusive, and hence two limit theorems can hold at the same time. This phenomenon appears already in the simpler setting of Lévy processes. Assume for example that L is a symmetric β-stable Lévy process and consider the function f (x) = sin 2 (x). If k = 1 and we choose a n = b n = 1 we deduce the convergence using, in particular, |f (x)| ≤ Cx 2 . On the other hand when we choose the normalising sequences a n = n −1 and b n = n 1/β we readily deduce by strong law of large numbers that This example shows that we can obtain two different limits for two different scalings.
In the next step we present a functional version of Theorem 2.1. For this purpose we introduce the sequence of processes In the proposition below we will use the Skorokhod M 1 -topology, which was introduced in [35]. For a detailed exposition we refer to [40].
Proposition 2.3. Suppose Assumption (A) holds and assume that the Blumenthal-Getoor index satisfies β < 2. We have the following three cases: (i) Under conditions of Theorem 2.1(i) we obtain the stable convergence Moreover, the stable convergence also holds with respect to Skorokhod M 1 -topology if additionally the following assumption is satisfied: (FC) Each of the two functions x → f (x)1 {x≥0} and x → f (x)1 {x<0} is either nonnegative or non-positive.
(ii) Under conditions of Theorem 2.1(ii) we deduce that where S and ρ 0 have been introduced in (2.3).
(iii) Under conditions of Theorem 2.1(iii) we have where (F u ) u∈R has been defined at (2.4).
We remark that the uniform convergence results of Proposition 2.3(ii) and (iii) are easily obtained from Theorem 2.1(ii) and (iii) by the following argument. Observe the decomposition f = f + − f − , where f + (resp. f − ) denotes the positive (resp. negative) part of f . Then f + , f − satisfy the same assumptions as f in the setting of Theorem 2.1(ii) and (iii). Furthermore, since f + , f − ≥ 0, the statistics V (f + ; k) n t and V (f − ; k) n t are increasing in t and the corresponding limits in Proposition 2.3(ii) and (iii) are continuous in t. Consequently, the uniform convergence is obtained from the pointwise convergence by Dini's theorem.

Weak limit theorems
In this section we present weak limit theorems associated to the ergodic type limit from Theorem 2.1(ii). Throughout this section we assume that E[|f (S)|] < ∞, where S ∼ SβS (1). As mentioned in the introduction, the crucial quantity in this context is the function Φ ρ defined via x ∈ R, ρ > 0.
Similarly to limit theory for discrete moving averages, see e.g. [22,36,37], the Appell rank of the function f often plays a key role for the asymptotic behaviour of the statistic In our setting, the Appell rank m ⋆ ρ is defined as . . . Note that we have Appell rank one if and only if Φ ′ ρ (0) = 0, and Appell rank greater or equal two if and only if Φ ′ ρ (0) = 0. The Appell rank is an analogue of the Hermite rank used in the context of Gaussian processes. However, the non-Gaussian case is usually much more complicated due to the lack of orthogonal series expansions. While the Appell rank m ⋆ ρ usually depends on the parameter ρ, we always have that m ⋆ ρ = 1 for all ρ > 0 in the framework of the imaginary part of the characteristic function f 3 (x) = sin(ux) and the empirical distribution function f 4 (cf. Remark 6.7). Moreover, m ⋆ ρ > 1 for all ρ > 0 when f is an even function, in fact, in this case we have that 0 = ∂ ∂x Φ ρ (0) = ∂ 2 ∂x∂ρ Φ ρ (0) (cf. Remark 6.7). Indeed, m ⋆ ρ > 1 for all ρ > 0 therefore holds in the setting of power variations f 1 and f 2 , real part of the characteristic function f 3 (x) = cos(ux) and the log-variation f 5 .
In the following two theorems we present weak limit results associated with Theorem 2.1(ii) in the case of "short memory" (small α) or "long memory" (large α). The long memory case depends heavily on the Appell rank of the function f , whereas the short memory case does not depend on the Appell rank. In the theorems below we follow the notation of Theorem 2.1, i.e. L is a symmetric β-stable Lévy process with scale parameter ρ L , (X t ) is given by (1.1), (1), a n = 1/n and b n = n H . Theorem 2.5 ("Short memory"). Suppose Assumption (A2) holds, Assumption (B) holds with p < β/2, and E[f (L 1 ) 2 ] < ∞. Then for all α ∈ (0, k − 2/β) we have where the variance is given as η 2 := lim m→∞ η 2 m with η m defined in (5.16).
Remark 2.7. (i) We note that the limiting distribution in Theorem 2.6(i) is only nondegenerate in the Appell rank one case, or more precisely when ∂ ∂x Φ ρ 0 (0) = 0, which follows from (5.40).
(ii) We also remark that the condition m ⋆ ρ ≥ 2 in Theorem 2.6(ii) is required to hold for all ρ > 0, which is in strong contrast to the discrete framework of e.g. [37] where only assumptions on m ⋆ ρ 0 are made. The reason for our stronger condition on the Appell rank is the fact that the scaled increments n H ∆ n i,k X are only asymptotically SβS(ρ 0 )-distributed.
(iii) Theorems 2.5 and 2.6(ii) give a rather complete picture of possible limits when the Appell rank is strictly large than one. Indeed, we cover all cases α ∈ (0, k − 1/β) except the critical value of α = k − 2/β. This is not the case for the setting of Appell rank one. Not only we need to assume that β ∈ (1, 2), but we also have that k − 2/β < k − 1. Hence, the limit theory in the framework of β ∈ (0, 1], and also β ∈ (1, 2) with α ∈ [k − 2/β, k − 1], is still an open problem.
(iv) Notice that Theorem 2.5, which has the fastest rate of convergence, never holds for k = 1 since β ∈ (0, 2). Hence, for the purpose of statistical estimation, it makes sense to use higher values of k to end up in the setting of Theorem 2.5. We refer to [27] for more details on statistical applications using higher order increments.
Similarly to Proposition 2.3 one might be able to prove the functional versions of Theorems 2.5 and 2.6. However, we dispense with the precise exposition of these results in this paper.

Outline of the proofs of Theorems 2.5 and 2.6
The strategy of the three proofs Theorems 2.5, 2.6(i) and 2.6(ii) are quite different, and are briefly outlined in the following.
• For the proof of Theorem 2.5 we approximate V (f ; k) n by More precisely, the main part of the proof is to show It is then sufficient to establish asymptotic normality of (V n,m ) n∈N for each m ≥ 1, which follows by the central limit theorem for m-dependent sequences of random variables. This general approach to deriving central limit theorems is popular in the literature, see [29] for an example.
• The main idea of the proof of Theorem 2.6(i) is to approximate V (f ; k) n , in a suitable sense, by a linear functional V n of (n H ∆ n i,k X) n i=k given by where c n are certain chosen constants. With such an approximation in hand, the proof boils down to showing that the SβS-stable random variables V n converge in distribution.
• For the proof of Theorem 2.6(ii) we decompose V (f ; k) n as where {Z r } k≥n is suitable defined i.i.d. sequence of random variables to be defined in (5.43) below. We argue that the first sum, on the right-hand side of (2.10), is asymptotically negligible and that the random variables Z r are in the domain of attraction of a (k − α)β-stable random variable with location parameter 0, scale parameter ρ 1 and skewness parameter η 1 as defined in (5.47) in the proof. Similar decompositions have been applied to derive stable limit theorems for discrete time moving averages, see for example [22].

Preliminaries
Throughout all our proofs we denote by C a generic positive constant that does not depend on n or ω, but may change from line to line. For a random variable Y and q > 0 we denote Y q = E[|Y | q ] 1/q . Throughout this paper we will repeatedly use the fact that if L is a symmetric β-stable Lévy process with scale parameter ρ L , then for each function ψ ∈ L β (ds) the integral R ψ(s) dL s is a symmetric β-stable random variable with scale parameter . We will also frequently use the notation which leads to the expression for the the kth order increments of X. For the functions g i,n we obtain the following estimates.

Lemma 3.1. Suppose that Assumption (A) is satisfied. It holds that
Proof. The first inequality follows directly from Assumption (A). The second inequality is a straightforward consequence of Taylor expansion of order k and the condition |g (k) (t)| ≤ Ct α−k for t ∈ (0, δ). The third inequality follows again through Taylor expansion and the fact that the function g (k) is decreasing on (δ, ∞).
We briefly recall the definition and some properties of the Skorokhod M 1 -topology, as it is not as widely used as the J 1 -topology. It was originally introduced by Skorokhod [35] by defining a metric on the completed graphs of càdlàg functions, where the completed graph of φ is defined as The M 1 -topology is weaker than the J 1 -topology but still strong enough to make many important functionals, such as supremum and infimum, continuous. It can be shown that the stable convergence in Theorem 2.1(i) does not hold with respect to the J 1 -topology (cf. [8]). Since the M 1 -topology is metrizable, it is completely characterized through convergence of sequences, which we describe in the following. A sequence φ n of functions in D(R + , R) converges to φ ∈ D(R + , R) with respect to the Skorokhod M 1 -topology if and only if φ n (t) → φ(t) for all t in a dense subset of [0, ∞), and for all t ∞ ∈ [0, ∞) it holds that lim Here, the oscillation function w is defined as We refer to [40] for more details on the M 1 -topology.
4 Proof of Theorem 2.1

Proofs of Theorem 2.1(i) and Proposition 2.3(i)
We concentrate on the proof of Proposition 2.3(i), since it is a stronger statement than Theorem 2.1(i). The proof is divided into three parts. First, we assume that L is a compound Poisson process and show the finite dimensional stable convergence for the statistic V (f ; k) n t . Thereafter we argue that the convergence holds in the functional sense with respect to the M 1 -topology, when f satisfies condition (FC). Finally, the results are extended to general Lévy processes by truncation. For this step, an isometry for Lévy integrals, which is due to [31], plays a key role.
Since C q (R) ⊂ C p (R) for p < q we may and do assume that p ∈ N. Note that, if f ∈ C p (R) and f (j) (0) = 0 for all j = 0, . . . , [p], then for any N > 0 there exists a constant C N such that By the assumption p > 1 k−α , this implies the following estimate to be used in the proof below. For all N > 0 there is a constant C N such that The following lemma ensures in particular that the limit in Theorem 2.1(i) exists.
where i m denotes the random index such that T m ∈ im−1 n , im n .
Proof. Throughout the proof, K denotes a positive random variable that does not depend on n, but may change from line to line. For the first inequality note that |h k (l + U m )| ≤ C(l − k) α−k for all l > k and |h k (l + U m )| ≤ C for l ∈ {0, ..., k}. This implies in particular Therefore, we find by (4.1) a random variable K such that for all l ≥ 0 and all m. Consequently, the left-hand side of (4.3) is dominated by where we used that (α − k)p < −1, and that m: The inequality (4.4) follows by the same arguments since Lemma 3.1 implies the existence of a constant C > 0 such that for all n ∈ N n α g im+l,n (T m ) ≤ C for l ∈ {0, ..., k}, and

Compound Poisson process as driving process
In this subsection, we show the finite dimensional stable convergence of V (f ; k) n t under the assumption that L is a compound Poisson process. The extension to functional convergence when condition (FC) is satisfied follows in the next subsection, the extension to general L thereafter.
Let 0 ≤ T 1 < T 2 < ... denote the jump times of (L t ) t≥0 . For ε > 0 we define We note that Ω ε ↑ Ω, as ε ↓ 0. Letting It turns out that M i,n,ε is the asymptotically dominating term, whereas R i,n,ε is negligible as n → ∞. We show that, on Ω ε , as n → ∞. Here (U m ) m≥1 are independent identically U ([0, 1])-distributed random variables, defined on an extension (Ω ′ , F ′ , P ′ ) of the original probability space, that are independent of F. For this step, the following expression for the left hand side is instrumental.
On Ω ε it holds that Here, i m denotes the random index such that The following lemma proves (4.5).
Lemma 4.2. For r ≥ 1 and 0 ≤ t 1 < · · · < t r ≤ t we obtain on Ω ε the stable convergence Proof. By arguing as in [9, Section 5.1], we deduce for any d ≥ 1 the F-stable convergence f (n α ∆L Tm g im+l,n (T m )) and we obtain by the continuous mapping theorem for stable convergence  |f (∆L Tm n α g im+l,n (T m ))|.
Recalling the decomposition (4.5) and applying the triangle inequality, the proof can be completed by showing that as n → ∞. We first argue that the random variables {n α M i,n,ε , n α ∆ n i,k X} n∈N,i∈{k,...,[nt]} are uniformly bounded by a constant on Ω ε , which will allow us to apply the estimate (4.1). The random variables M i,n,ε satisfy by construction either Consequently, they are uniformly bounded by Lemma 3.1, where we used that k > α and that the jumps of L are bounded on Ω ε . The uniform boundedness of n α ∆ n i,k X = n α (M i,n,ε + R i,n,ε ) follows by [9, Eqs. (4.8), (4.12)] which implies that for any η > 0 In order to show (4.12) we apply Taylor expansion for f at n α M i,n,ε , and bound the terms in the Taylor expansion using (4.1) and the following lemma.
and v m t∞ is the random index defined in (4.8). By Lemma 3.1 the random variables It follows by comparison with the integral n k+1 (s − k) (α−k)γ ds that the right hand side multiplied with n (k−α)γ−1 is convergent, where we used that (α − k)γ ∈ (−1, 0) and that the number of jumps of L(ω) in [0, t] is uniformly bounded for ω ∈ Ω ε .
Considering the sum J n in (4.12), Taylor expansion up to order r = [p] shows that where T R r denotes the Taylor rest term. Recalling the estimate (4.2), we can now estimate the jth Taylor monomial T j for j = 0, . . . , [p] by applying Lemma 4.
). Using (4.13) and recalling that p > k − α, we obtain that for sufficiently small η > 0 where the second inequality follows from Lemma 4.3 since (k − α)γ j − 1 = −j/p. For the Taylor rest term T R r we obtain by the mean value theorem: with ξ i,n ∈ (n α |M i,n,ε |, n α |X i,n,ε |) where we set (a, b) := (b, a) for a > b. Since n α |M i,n,ε | and n α |X i,n,ε | are bounded and f (r) is locally (p − r)-Hölder continuous, it follows that From (4.13) it follows that T R r → 0 as n → ∞, where we recall that (α − k)p < −1.

Functional convergence
In this subsection we show that if f satisfies (FC) and under the assumption that L is a compound Poisson process, the convergence in This assumption puts us into the comfortable situation that our limiting process is monotonic. Recall the definition of the processes V n,ε and Z introduced in (4.5) and (4.7), respectively. In Lemma 4.2 the stable convergence of the finite dimensional distributions of V n,ε to Z was shown. By Prokhorov's theorem the functional convergence V n,ε L M 1 −s −−−−→ Z on Ω ε follows from the following lemma. Recalling the identity (4.6) and the asymptotic equivalence of [nt] i=k f (n α M i,n,ε ) and V (f ; k) n t shown in (4.12) and thereafter, the functional convergence in Proposition 2.3(i) follows.

Now, for general f satisfying condition (FC) we decompose
. Both functions f + and f − satisfy (FC'), and the functional convergence of V (f + ; k) n and V (f − ; k) n follows, with the corresponding limits denoted by Z + and Z − . Note that Z + jumps exactly at those times, where the Lévy process L jumps up, and Z − at those, where it jumps down. In particular, Z + and Z − do not jump at the same time, which implies that summation is continuous at (Z + , Z − ) with respect to the M 1 -topology (cf. [40,Theorem 12.7.3]). Thus, an application of the continuous mapping theorem yields the convergence of V (f ; k) n = V (f + ; k) n + V (f − ; k) n towards Z = Z + + Z − . Let us stress that indeed the sole reason why the extra condition (FC) is required for functional convergence is that summation is not continuous on the Skorokhod space in general, and the convergence of V (f + ; k) n and V (f − ; k) n does not necessarily imply the convergence of V (f ; k) n .

Extension to infinite activity Lévy processes
In this section we extend the results of Proposition 2.3(i) to moving averages driven by a general Lévy process L, by approximating L by a sequence of compound Poisson processes (L(j)) j≥1 . To this end we introduce the following notation. Let N be the jump measure of L, that is N ( |V (X, f ; k) n s − V (X(j), f ; k) n s | > ε = 0, for all ε > 0. (4.16) Proof. In the following we say that a family {Y n,j } n,j∈N of random variables is asymptotically tight if for any ε > 0 there is an N > 0 such that lim sup n→∞ P(|Y n,j | > N ) < ε, for all j ∈ N.
We deduce first for p > β ∨ 1 k−α the asymptotic tightness of the two families The authors of [9] show the stable convergences in law where Z j and Z are defined as in [9,Eq. (4.34)]. The asymptotic tightness of the first family of random variables in (4.17) follows thus from the tightness of the family {Z j } j∈N , see [9,Eq. (4.35)]. The asymptotic tightness of the second family of random variables from (4.17) follows from the first by the estimate max i=1,...,n |a i | ≤ n i=1 |a i | p 1/p for a 1 , ..., a n ∈ R. The second statement of (4.19) implies (4.18) by similar arguments. The (asymptotic) tightness of the two families on the right-hands side of (4.17) and (4.18) allows us, for the proof of (4.16), to assume that |∆ n i,kX (j)| and |∆ n i,k X| are uniformly bounded by some N > 0.

Proof of Theorem 2.1(ii)
As mentioned earlier the proof relies upon replacing the increments of X by the increments of its tangent process, which is the linear fractional stable motion Y , defined as It is well known that the process Y is self-similar with index for any a > 0, see [38]. Moreover, the discrete time stationary sequence (Y r − Y r−1 ) r∈Z is mixing and hence ergodic, see for example [16]. Denoting by V (f ; Y ) n the variation functional (1.2) with a n = n −1 and b n = n H applied on the process Y , it follows from Birkhoff's ergodic theorem, see [25,Theorem 10.6], that By (3.1), the random variable ∆ 1 k,k Y ∼ SβS(ρ 0 ) with ρ 0 = ρ L h k L β (R) , and the right hand side is the limiting expression in Theorem 2.1(ii). It is therefore sufficient to argue that E |V (X; f ) n − V (Y ; f ) n |] → 0, as n → ∞. To show (4.22) we use that which follows by the triangle inequality and stationarity of → 0 for all p < β, which by Lemma 6.5 used on p = 1 implies that the right-hand side of (4.23) converges to zero. This completes the proof of Theorem 2.1(ii).

Proof of Theorem 2.1(iii)
Let us first remark that the growth condition |f (x)| ≤ C(1∨|x| q ) for some q with q(k−α) < 1 is weaker for larger q and can therefore be thought of as if k > α, whereas for k ≤ α we require only that f is of polynomial growth. Since by the assumptions of the theorem we have k − α < 1, we may and do assume that q > 1. We recall that a function ξ : R → R is absolutely continuous if there exists a locally integrable function ξ ′ such that This implies that ξ is differentiable almost everywhere and the derivative coincides with ξ ′ almost everywhere. If ξ ′ can be chosen absolutely continuous we say that ξ is two times absolutely continuous, and similarly we define k-times absolute continuity.
By an application of [15, Theorem 5.1] it has been shown in [9,Lemma 4.3] that under the condition (k − α)(1 ∨ β) > 1 the process X admits a k-times absolutely continuous version and the k-th derivative is a version of the process (F u ) u∈R defined in (2.4). Moreover, [9,Lemma 4.3] shows that for every q ≥ 1, q = θ with q(k − α) < 1 the process F admits a version with sample paths in L q ([0, 1]), almost surely, which implies 1 0 |f (F u )| du < ∞. With these prerequisites at hand, Theorem 2.1(iii) is a consequence of the following Lemma, which despite its intuitive statement requires some work. We denote by W k,q the space of k-times absolutely continuous functions ξ on [0, 1] satisfying ξ (k) ∈ L q ([0, 1]). Lemma 4.6. Let ξ ∈ W k,q , and suppose that f is continuous and |f (x)| ≤ C(1 ∨ |x| q ) for some q ≥ 1. As n → ∞ it holds that Proof. Assume first ξ ∈ C k+1 ([0, t]). Taylor approximation shows that where |a i,n | ≤ C/n for all n ≥ 1, k ≤ i ≤ n. We can therefore assume without loss of generality that f has compact support and admits a concave modulus of continuity ω f , i.e. a continuous increasing function for all x, y. We have by Jensen's inequality that The result follows by the convergence of Riemann sums In the following we extend the result to general ξ ∈ W k,q by approximating ξ with a sequence (ξ m ) m≥1 of functions in C k+1 ([0, 1]). To this end, choose ξ m such that  | ds ≤ C/m 1/q , since we assumed q ≥ 1. Since ξ m,(k) converges in L q ([0, 1]), the family (|ξ m,(k) | q ) m≥1 is uniformly integrable. Hence, by the assumption |f (x)| ≤ C(1 ∨ |x| q ) for x ∈ R, we obtain uniform integrability of {f (ξ m,(k) ) m≥1 }. By continuity of f , we have that f (ξ m,(k) ) → f (ξ (k) ) in measure, and therefore also in L 1 ([0, 1]): lim sup In order to show (4.26) we split the sum into sums over the following sets of indices, where N and M are positive constants: and estimate the corresponding sums separately. The following relationship between ∆ n i,k ξ and ξ (k) will be essential. For all ξ ∈ W k,q we have In particular, it follows that (4.27) The A N n term: We show that for given ε > 0 we can find sufficiently large N such that lim sup where C 0,k := N (2k k ) −1 . Therefore, again by (4.27), it follows that Consequently, recalling that q ≥ 1, we have by Jensen's inequality where the first inequality follows from (4.27) and the third from (4.29). This shows that for sufficiently large N it holds that  |f (x)| (4. 33) where |A N n ∩ D m,n | denotes the number of elements of A N n ∩ D m,n . Using (4.27) we have and it follows that The argument for J 1 n,m,N,M is similar to the one used for I 3,m,n,N above. We assume that M > N. For i ∈ B N,M m,n it holds by (4.27) that Consequently, we have for all m ∈ N By letting ε → 0 we obtain (4.26) and the proof of the lemma is complete.

Proofs of Theorems 2.and 2.6
Before carrying out the proofs we will introduce some notation and estimates to be used in the following.  for n ∈ N. By our assumptions on the function g it holds that g n (s) → s α + , and consequently φ n t (s) → h k (t − s) as n → ∞, where h k was defined in (2.2). Therefore, we complement (5.1) by defining We recall that (F t ) t∈R denotes the filtration generated by L and introduce additionally the σ-algebras F 1 s := σ(L r − L u | s ≤ r, u ≤ s + 1), remarking that (F 1 s ) s∈R is not a filtration. We denote U n j,r := r+1 r φ n j (s) dL s , where n ∈ N ∪ {∞} and j ≥ k, and introduce the notation Note that Y n r ∼ SβS(ρ n ) for all r ≥ k and n ∈ N, which follows by (3.1). Preliminary estimates: For ξ < β and γ > 0 there is a C > 0 such that for all ρ ∈ (0, 1] and S ∼ SβS(1) we have where the first case follows by [9,Lemma 5.5], and the second case is a standard estimate. The function φ n j introduced above satisfies the estimate for all j ∈ N and all n ∈ N ∪ {∞}, which follows from Taylor expansion and the condition (A2) in Section 2. Moreover, φ n j satisfies the following estimate that has been derived in [9,Eq. (5.92)]. There exists a C > 0 such that for all n ∈ N and j ∈ N We have that g β ∈ C ∞ (R), according to [34,Remark 28.2], and for all r ≥ 1, the rth derivative of g β satisfies |g (r) Indeed, to show the estimate (5.7) we use the dual representation for stable densities given in [41, (2.5.5)], which implies that x > 0, (5.8) whereg is the density of a 1/β-distribution. By r-times differentiation of (5.8), the estimate (5.7) follows. Hence, from the estimate (5.7) used on r = 1 and (5.6), it follows that G ∈ C 1 ((0, ∞)). By [9, Lemma 5.3] we have that Hence, for large enough n, we obtain the estimate and by (5.10) and (5.9) it follows that where a n = √ n for Theorem 2.5, a n = n k−α−1/β for Theorem 2.6(i), and a n = n

Proof of Theorem 2.5
We recall the definition of Y n r and S n from (5.1) and (5.12), and define additionally, for a < b, a, b ∈ [0, ∞] and m ≥ 0, An application of Lemma 6.5 on p = 2 yields that the covariances θ n,m j converge to θ ∞,m j for all m, j, as n → ∞. Since the sequence (Y n,m r ) r=k,... is m-dependent, (5.14) follows now from the central limit theorem for m-dependent sequences, see e.g. [13], with the limiting variance (5.16) Next, we argue that η 2 m is a Cauchy sequence, which then shows (5.15) with η 2 := lim m→∞ η 2 m . This is indeed an immediate consequence of (5.13) since |η m | − |η r | = lim n→∞ n −1/2 S n,m L 2 − S n,r L 2 ≤ lim sup n→∞ n −1/2 S n,m − S n,r L 2 ≤ lim sup n→∞ n −1/2 S n,m − S n L 2 + lim sup n→∞ n −1/2 S n − S n,r L 2 → 0 as m, r → ∞ by (5.13). The proof of (2.7) can thus be completed by deriving (5.13), which we do in the following.
We can express S n and S n,m as the telescoping sums Indeed, the first telescoping sum coincides with S n almost surely, since by the backwards martingale convergence theorem and Kolmogorov's 0-1 law it holds that E[f (Y n r )|F r−j ] a.s.
−→ E[f (Y n r )], as j → ∞. We denote for n ≥ 1 and m, r, j ≥ 0 we show that each summand on the right hand side converges to 0. Observing that cov(ξ n,m r,j , ξ n,m r ′ ,j ′ ) = 0, unless r − j = r ′ − j ′ , an application of Cauchy-Schwarz inequality and Fatou's lemma yields Estimation of Q n,1,m : We introduce the notation Using the relation Φ n and ρ n j−1 is the scale parameter of the SβS random variable Y n,j−1 r . It follows from Lemma 6.1 that D n,j satisfies the estimate where p is as in (2.5), provided {ρ n j−1 } j≥2,n∈N is bounded away from 0 and ∞. This is indeed the case, as follows from the estimates where the convergence follows by the dominated convergence theorem, since Assumption (A) implies the existence of a C > 0 such that |φ n r (s)| ≤ C|r − s| α for all s ∈ [r − 1, r] and all n ≥ 1.
Applying (5.20) on the right hand side of (5.19) yields the estimate
Estimation of Q n,2,m : This term is estimated by similar, and in fact easier, arguments as used for the estimation of Q n,1,m which we do not repeat here.
and it is sufficient to argue that lim ) 2 ] → 0 as m → ∞. However, this follows by Lemma 6.5 with p = 2, and completes the proof of (5.13), and thus of Theorem 2.5.

Proof of Theorem 2.6(i)
In the following section we set for all n ∈ Ñ To prove Theorem 2.6(i), it is enough to show that the following (5.21) and (5.22) hold, where and σ is given in (5.40).
Estimation of W n : By the substitution s = r − j we obtain the representation Since {D n s : s ∈ Z} is a martingale difference sequence, the von Bahr-Esseen inequality [2, Theorem 1] yields that for any γ ∈ (1, β) where the second inequality follows by Minkowski's inequality. We have that where the first inequality follows by boundedness of ∂ 2 ∂ρ∂x Φ ρ (x), for the second inequality we use that ρ n , ρ n j are bounded away from 0 and ∞, cf. Lemma 6.3, and the last inequality is (5.4). By a calculation similar to (5.18) we obtain the identity and hence for all r ∈ (1, 2) with rγ < β, we have where the estimate |Φ ρ n j (x)| ≤ C|x| r is used in the second inequality (cf. (5.23)), and (5.26) is used in the third inequality. From (5.25) and (5.27) we deduce We may and do choose r and β such that r(α − k) = −1 and −β < rγ(α − k) < −1.
Recall that −β < β(α − k) < −1 by assumption, and r, γ > 1 satisfies rγ < β. We start by estimating A ′ n as follows where we have used rγ(α − k) < −1 in the last inequality. By Jensen's inequality we have where we have used rγ(α − k) < −1 in the second inequality, and rγ(α − k) > −2 in the last inequality. For γ < β close enough to β we have that r(α − k) > −1 for all r ∈ (1, β/γ), by the assumption α > k − 1. The substitution v = n − s yields that where the last inequality follows from the Minkowski inequality. In the following we define the random variables ϑ n r,j,l , l ≥ j, by By a telescoping sum argument similar to (5.17), we obtain the representation ζ n r,j = ∞ l=j ϑ n r,j,l .
Since {ϑ n r,j,l : l = j, 2, . . . } is a martingale difference sequence, the von Bahr-Esseen inequality [2,Theorem 1] yields that We estimate B ′ 1 , B ′′ n and B ′′′ n in a similar fashion as in (5.28)-(5.30), but need to divide into several cases depending on the value of γ(α − k). We arrive with the following estimates The three terms on the right-hand side of (5.36) converge to zero as n → ∞. Indeed, it follows that the first term converges to zero, by choosing γ ∈ (1, β) close enough to β and then choose r ∈ (1, β/γ) close enough to β/γ, which can be done under the above restrictions on r and γ. The second term converges to zero due to the assumption γ(α − k) < −1 and the third term converges to 0 for γ close enough to β by the assumption α > k − 1. Hence, (5.36) completes the proof of (5.21).
Proof of (5.22): In the following we write g i,n,k for g i,n , given in (3.2), to stress the dependence of the order of increments k ≥ 1. We have where the last equality follows by the telescoping sum structure. According to the mean value theorem there exists θ 1 , θ 2 ∈ [−k/n, 0] (depending on n and s) such that where the last inequality follows by Assumption (A2) and the mean value theorem for s < −1, and by the assumption α > k − 1 for the case |s| ≤ 1. The function c in (5.38) is in L β (ds), due to the fact that α < k − 1/β. Hence, by the dominated convergence theorem, we have R n k−1 g n,n,k−1 (s)−g k−1,n,k−1 (s) (5.39) as n → ∞. By [9, Lemma 5.3], ρ n → ρ ∞ which implies that Φ ′ ρ n (0) → Φ ′ ρ ∞ (0) by continuity of ρ → Φ ′ ρ (0) on (0, ∞). Therefore, by (5.37) and (5.39) we conclude that which completes the proof of Theorem 2.6(i).

Proof of Theorem 2.6(ii)
Before we start the proof of Theorem 2.6(ii) we will deduce some estimates on Φ ρ (x) relying on the assumption of Appell rank greater or equal 2 in this theorem. Let ǫ ∈ (0, 1) be fixed. The mean value theorem, together with assumptions (2.5) and (2.6) and the Appell rank greater or equal two condition, ∂ ∂x Φ ρ (0) = 0 for all ρ > 0, implies that for all x, y ∈ R and ρ ∈ [ǫ, ǫ −1 ]. Specializing (5.41) to y = 0 yields that estimate [9,Eq. (5.15)] is in our context replaced by (5.41), where we need to argue that for sufficiently large N the set {ρ n j : n ∈ {N, ..., ∞}, j ∈ N} is bounded away from 0 and ∞, which is done in Lemma 6.3.
It therefore remains to show that Z r is in the domain of attraction of a (k − α)β-stable random variable, which we do in two steps. First we define the random variable and show that it is in the domain of attraction of a (k − α)β-stable random variable S. Thereafter we argue that we can find r > (k − α)β such that By choosing γ > 1/(k − α) it follows that Φ and Q are well-defined. Moreover, an application of the dominated convergence theorem shows that Φ is continuous. In order to show that Q is in the domain of attraction of a (k − α)β-stable random variable we next determine constants c − , c + such that and Here the constant τ γ , γ ∈ (0, 2), is defined as In the following we derive explicit expressions for c + and c − , which are stated in (5.52) and (5.54) below. For x > 0 it holds by substituting t = (x/u) 1/(k−α) that where k α = α(α − 1) . . . (α − k + 1). The convergence as well as the existence of the integral follow from the estimate (5.42) and the dominated convergence theorem, where we use that {ρ ∞ j } is bounded away from 0 and ∞. The convergence of the integrand from (5.49) as x → ∞ follows since by the mean value theorem for all t ∈ R there is a ξ Similarly we obtain for x < 0 that We argue next that where τ β was defined in (5.48) and ρ L denotes the scale parameter of the Lévy process L.
To this end we make the decomposition and analyse the two summands separately. Consider the first summand and assume κ + > 0. By (5.50) it follows that Φ(y) → ∞ as y → ∞ and we have for sufficiently large x that Applying Lemma 6.6 with ξ(x) = Φ(x) and ψ(x) = x 1/(k−α) κ + , we deduce from (5.50) that where the second identity follows from [33, Property 1.2.15]. If κ + < 0, it follows from (5.50) that lim sup x→∞ Φ(x) ≤ 0 and therefore that Φ(x) is bounded for x ≥ 0. We obtain The same identity holds for κ + = 0, as follows from Lemma 6.6, (5.50), and the estimate We conclude that the first summand of (5.53) satisfies By similar arguments, applying Lemma 6.6 on the function ξ(x) = Φ(−x) and using (5.51), we obtain for the second summand of (5.53) the convergence which completes the proof of (5.52). Arguing similarly for P(Q < −x) we derive that This shows that Q is in the domain of attraction of a (k − α)β-stable random variable with location parameter 0, and scale and skewness parameters as given in (5.47).
Now the proof of the theorem is completed by showing (5.45). To this end it is by Markov's inequality sufficient to show that E[|Z k − Q| r ] < ∞ for some r > (k − α)β. Since (k − α)β > 1 an application of Minkowski's inequality yields We remark that by the mean value theorem there exists a constant C > 0 such that for all x ∈ [0, 1] and j ∈ N it holds that Since {ρ ∞ j } j∈N is bounded away from 0, there is a δ > 0 with δ < ρ ∞ j for all j. Letting r ε = (k − α)β + ε with ε ∈ (0, δ), an application of Lemma 6.4 yields For sufficiently small ε > 0, both powers of j on the right-hand side of (5.56) are smaller than −1, which together with (5.55) implies Z k − Q r < ∞, and thus (5.45). Since Q is in the domain of attraction of a (k − α)β-stable random variable with scale parameter ρ 1 and skewness parameter η 1 , and r > (k − α)β, so is Z k . This completes the proof of Theorem 2.6(ii).

Auxiliary results
In this section we show some technical results used in the proofs of Theorems 2.5 and 2.6.
Proof. It is sufficient to consider the case r = 1, since for fixed j, l, n the sequence (ϑ n r,j,l ) r∈N is stationary. Without loss of generality we may assume that l ≥ 2 ∨ j since the case l = j = 1 can be covered by choosing a larger constant. To this end we remark that (E[|ϑ n 1,1,1 | γ ]) n∈N is bounded, since Y n r ∼ SβS(ρ n ) with ρ n (which was introduced in (5.2)) bounded away from 0 and ∞ by [9,Lemma 5.3]. By definition of ϑ it holds that Define for −∞ ≤ a < b ≤ 1 the random variable Let in the following L be an independent copy of L and define U n [a,b] accordingly, and denote by E the expectation with respect to L only. Moreover, we denote by ρ n j,l = φ ( W n j,l + u + v) du dv .
We denote ϕ p (x) := |x| p ∧ |x|. Suppose in the following that γ > β. Using Lemma 6.1, Jensen's inequality, the inequality ϕ p (|x − y|) ≤ 2(ϕ p (|x|) + ϕ p (|y|)) and the independence of U and U , we obtain that In the third inequality we used the estimate (5.3), where we remark that by assumption γ > β and pγ < β, and the expression (3.1) for the scale parameter of integrals with respect to a stable Lévy process. The last inequality follows from (5.4). For γ < β we use the same arguments above, however, due to the fact that (5.3) gives at different estimate in this case we obtain the bound E[|ϑ n 1,j,l | γ ] ≤ Cl (α−k)γ j (α−k)γ , which concludes the proof. Lemma 6.3. The set {ρ n j : n ∈ {N, ..., ∞}, j ∈ N} is bounded away from 0 and ∞ for sufficiently large N ∈ N.
Note that Lemma 6.5 relies heavily on the β-stable assumption, and a similar result (with no continuity assumptions on f ) does not hold for e.g. discrete random variables.