Approximation in law of locally $\alpha$-stable L\'evy-type processes by non-linear regressions

We study a real-valued L\'evy-type process $X$, which is locally $\alpha$-stable in the sense that its jump kernel is a combination of a `principal' (state dependent) $\alpha$-stable part with a `residual' lower order part. We show that under mild conditions on the local characteristics of a process (the jump kernel and the velocity field) the process is uniquely defined, is Markov, and has the strong Feller property. We approximate $X$ in law by a non-linear regression $\widetilde X^x_{t}=\mathfrak{f}_t(x)+t^{1/\alpha}U^{x}_t$ with a deterministic regressor term $\mathfrak{f}_t(x)$ and $\alpha$-stable innovation term $U^{x}_t$, and provide error estimates for such an approximation. A case study is performed, revealing different types of assumptions which lead to various choices of regressor/innovation terms and various types of the estimates. The assumptions are quite general, cover the super-critical case $\alpha<1$, and allow non-symmetry of the L\'evy kernel and unboundedness of the drift coefficient.


Introduction
Heuristically, a Lévy-type process is a Lévy process whose characteristic triplet is allowed to depend on the current value of the process, see [2]. This natural definition has essentially the spirit of the classical Kolmogorov's definition of diffusion process as a location-dependent Brownian motion with drift. Lévy-type processes are widely used nowadays in huge variety of models in physics, biology, finance etc., where the random noise -by different reasons -can not be assumed Gaussian, and thus the entire model does not fit to the diffusion framework. However, when compared to the classical theory of diffusions, the general theory of Lévy-type processes still misses some crucial components. In particular, the following general questions remain unsolved to a large extent: (I) how does one construct a Lévy-type process with given characteristics?
(II) what kind of local properties of the law of the process can be derived, and under which assumptions on characteristics?
In this paper we provide a detailed study of the class of Lévy-type processes, which on one hand has a particular importance for applications, and on the other hand makes it possible, in a reasonably simple framework, to reveal numerous hidden challenges which one encounters while trying to resolve the general questions (I), (II). The particular class under the investigation can be shortly described as a mixture of a real-valued α-stable like process with state dependent drift, intensity, and skewness parameters with a certain (state dependent) lower order jump part; see a detailed definition in Section 2. Note that the α-stable noise, because of its scaling property, has an exceptional importance in applications. Presence of the "residual" lower order part is motivated by the following two reasons. First, this part allows one to introduce a wide spectrum of tempering/damping effects for the tails of the noise, which combines both the α-stable and

Notation and preliminaries
In what follows, C ∞ denotes the class of continuous functions R → R vanishing at ∞, and C 0 denotes the class of continuous functions with compact support. By C 2 ∞ , C 2 0 we denote the classes of twice differentiable functions f such that f, f ′ , f ′′ belong to C ∞ or C 0 , respectively. A Lévy-type operator L with the domain C 2 ∞ is defined by Here b : R → R, a : R → R + are given functions, and µ(x; du) is a Lévy kernel; that is, a measurable function w.r.t. x and a Lévy measure w.r.t. du. There are two natural and closely related ways to associate with a Lévy type operator L the Lévytype process X. Within the first one, X is a time-homogeneous Markov process which generates a Feller semigroup (that is, a strongly continuous semigroup in C ∞ ) such that its generator A coincides with L on C 2 ∞ (or, which is slightly more general, on C 2 0 ). The second way is based on the notion of the Martingale Problem (MP). Recall that a process X is said to be a solution to the martingale problem (L, D), if for every f ∈ D the process Lf (X s ) ds, t ≥ 0 is a martingale w.r.t. the natural filtration for X. A martingale problem (L, D) is said to be well posed in D(R + ), if for any probability measure π on R there exists a solution X to this problem with càdlàg trajectories and Law(X 0 ) = π, and if for any two such solutions their distributions in D(R + ) coincide. By an alternative definition, Lévy-type process associated to L is a solution to the MP (L, D) with L given by (2.1) and D = C 2 ∞ or C 2 0 . Arbitrary Lévy process X satisfies both of the above definitions; the corresponding the operator L is defined by (2.1) with b(x) ≡ b, a(x) ≡ a, µ(x, ·) ≡ µ, where (b, a, µ) is the characteristic triplet for X. This explains the name Lévy-type process, which we use systematically. The principal problem (I) outlined in the Introduction can be now formulated precisely: given a triplet b(x), a(x), µ(x, du), is there a Lévy-type process associated to L in either/both of two ways explained above? Is such process uniquely characterized, e.g. is it a unique Feller process with the prescribed restriction of the generator, and is the MP (L, D) well posed? The problem (II) then consists in a further description -in the most explicit way it is possible -of the transition probability P t (x, dy) of the process X.
We will study these two questions in the particular setting of locally α-stable Lévy-type operators/ processes. To provide a smooth introduction into the setting, let us first consider a process X solution to the SDE dX t = c(X t ) dt + σ(X t− ) dZ t , (2.2) where Z is an α-stable process, α ∈ (0, 2), and (for simplicity of presentation, only) σ(x) ≥ 0. The characteristic triplet for Z has the form (b Z , 0, µ Z ), where µ Z (du) = µ (α;λ,ρ) (du) := λ 1 + ρ sgn u |u| α+1 du is a general one-dimensional α-stable Lévy measure with the intensity parameter λ ≥ 0 and the skewness parameter ρ ∈ [−1, 1]. Assuming that (say) coefficients c, σ of (2.2) are bounded, and applying in a standard way the Itô formula, one gets that X is a solution to the MP (L, C ∞ 2 ) with This is the Lévy operator (2.1) with µ(x; du) = σ(x) α µ (α;λ,ρ) (du) = µ (α;λσ(x) α ,ρ) (du), (2.3) and the solution to SDE (2.2) can be understood as a certain Lévy-type process. In this setting, the general questions (I), (II) posed above can be re-formulated in the following way: is X well defined by (2.2)? If yes, what is the principal behavior of its transition probability as t → 0? We will answer these questions by means of an analytical parametrix method, which do not use the SDE (2.2) and exploits the Lévy-type operator (2.1), only. This makes it possible for us to provide the following two substantial extensions for the model. First, we consider a general α-stable Lévy kernel with state dependent parameters of intensity λ : R → R + and skewness ρ : R → [−1, 1]: µ (α) (x; du) = µ (α;λ(x),ρ(x)) (du) = λ(x) 1 + ρ(x) sgn u |u| α+1 du.
The main assumption imposed on the residual kernel is that, uniformly in x, the Blumenthal-Getoor activity index for |ν| is strictly smaller than α; that is, for some β < α (2.5) Since the Blumenthal-Getoor index for an α-stable Lévy measure equals α, this condition actually means that the small jump behavior of µ is asymptotically the same as for its α-stable part µ (α) , and this is our reason is to call the kernel (2.4) locally α-stable.
A general α-stable law necessarily lacks the diffusion term (a = 0), but may contain a non-trivial shift (b = 0). Thus we specify the locally α-stable Lévy-type operator as an operator of the form (2.1) with µ(x, du) given by (2.4), a(x) ≡ 0, and possibly non-trivial b(x): (2.6)

The main results
In this section we specify the conditions imposed on the model, and formulate the main results.
We also discuss a case distinction between three different types of assumptions which lead to three types of estimates, together with examples, possible extensions, and related references.

Conditions
In what follows, L is the Lévy-type operator defined by (2.6), and (2.5) is assumed. Throughout the paper we denote by C a generic constant whose particular value may vary from place to place. We define the compensated drift coefficient by and assume the following.
(i) λ, ρ are Hölder continuous with some index ζ ∈ (0, α); (ii) for some 0 < λ min < λ max , H ν . (On the residual kernel ν). We deal with two types of upper bounds: (i) (weak bound) the kernel ν(x, du) satisfies (2.5) and the following "tail condition": (ii) (strong bound) the kernel has the density H cont . (Continuity assumptions). The kernel ν(x, du) is assumed to have the following weak continuity property: for any f ∈ C(R) with a compact support in R \ {0}, the function is continuous. The drift coefficient b is assumed to be continuous.
Remark 3.1. In the super-critical regime α < 1, the balance condition (3.1) is close to the necessary one for the process to be well defined. This observation dates back to [23], where a natural example of an SDE driven by a symmetric additive α-stable noise with η-Hölder continuous b is given, which has two different weak solutions. We emphasise that in the current setting the balance condition involves the compensated drift coefficient b instead of the original b; this is a new effect, which have not been seen e.g. in [17], and which becomes visible because of non-symmetry of the noise.
Remark 3.2. The stronger condition H ν (ii) applies well to the general framework where µ(x, du) is a tempering of the α-stable kernel, with some structure of the kernel preserved; namely, the piece-wise power-type upper bound for the density is available. Note that γ > 0 is arbitrary, and if γ < α, the tails of µ(x, du) are actually heavier than for µ (α) (x, du). In this case, µ(x, du) is rather a "boosting", than the tempering of the α-stable kernel. The weaker condition H ν (i) applies to the microstructural residual noise in the spirit of [1]; in this case, without posing any structural assumptions on ν, we assume just (integral) growth/tail bounds.
Remark 3.3. A good way to understand the role of continuity condition H cont is to observe that, if (say) ν ≡ 0 and b is discontinuous, it is impossible for the operator (2.6) that Lf is continuous for all f ∈ C 2 0 , and thus the first definition of the Lévy-type process becomes inappropriate. This complication is of a technical kind, and it is plausible that condition H cont can be relaxed or even completely removed. This however is not related to our main goal to describe, in a most explicit possible way, the transition probability of the process. Thus, to make the paper easier to read, we adopt H cont and avoid further technical complications.

The main statements
Our first main result uniquely identifies a locally α-stable Lévy type process with given characteristics.
Theorem 3.1. Let L be given by (2.6) and conditions H drif t , H (α) , H ν (i), and H cont hold true. Then the martingale problem (L, C 2 0 ) is well posed in D(R + ) and, at the same time, the solution X of this martingale problem is the unique Feller process, whose generator A restricted to C ∞ 0 coincides with L. This process is strong Feller and possesses a transition probability density p t (x, y).
Next, we describe the principal behavior of the transition probability density p t (x, y) as t → 0. To simplify and unify the exposition, we first introduce the general framework used throughout the rest of the section, and give a short discussion. We aim to specify a family of probability densities where the residual kernel R t (x, y) is negligible (in a properly determined sense) as t → 0. This will essentially mean that, conditioned by X 0 = x, the family X t , t ≥ 0 admits an approximation in law by the following regression-type family: here U t,x is a random variable with the distribution density g t,x . The latter actually will be an αstable density with the parameters depending on t, x. In what follows, we call f t (x) a (deterministic) regressor term for X, and U t,x an α-stable innovation term. It is natural to call entire (3.6) a conditionally α-stable approximation to X, in the same spirit with the standard conditionally Gaussian approximation for a diffusion. We will see, however, that in general the regressor f t (x) should be taken in a more sophisticated form than just x + b(x)t, typical for the diffusion case. We will provide several versions of the representation (3.5), depending on the actual assumptions imposed in the characteristics of the process, with various regressors f t (x) and with different types of bounds on the residual kernel R t (x, y). In order to make a systematic exposition of all possibilities available, we choose first the regressor, which is well suitable under the basic set of conditions H drif t , H (α) , H ν (i), an then explain how it can be modified and simplified under additional assumptions. Let us introduce more notation. By g (λ,ρ,υ) (w) we denote the density of the α-stable distribution density with the intensity λ, skewness ρ, and a shift υ: Next, we denote note that the positivity of δ η is just the balance condition (3.1). We fix (arbitrary) positive δ < δ η,ζ,β . We also fix (arbitrary) T > 0 and furthermore consider t ≤ T , only. Denote the partial compensator of the kernel (2.4) with the truncation level t 1/α , and partially compensated drift coefficient, respectively. Define the corresponding mollified coefficient This coefficient is chosen in such a way that see Appendix A.1. We define χ s (x), s ≥ 0, x ∈ R as the solution to the Cauchy problem Note that by (3.11) the family of Lipschitz constants Lip (B t ), t > 0 is integrable on any finite segment, thus χ t (x) is well defined. We define that is, λ t (x) and λ t (x)ρ t (x) are the averages of the functions λ(·), λ(·)ρ(·) along the trajectory χ · (x) on the segment [0, t]. We also denote υ(x) = 2λ(x)ρ(x), and put Note that t 0 W α (t, s) ds = 1; (3.14) that is, υ t (x) is also an average of υ(·) along the trajectory χ · (x), but with respect to a certain (non-uniform) probability distribution on [0, t]. We finally define the α-stable density with the "χ-averaged" parameters λ t (x), ρ t (x), υ t (x) defined above. Now we are ready to state our second main result, which deals with the case of microstructural residual noise; that is, under condition H ν (i). Recall that we consider t ∈ [0, T ], where T is arbitrary but fixed; the particular values of the constants C below may depend on T and particular choice of δ < δ η,ζ,β .
II. Assume in addition that for some δ ν > 0 Then sup Our last main result provides a point-wise kernel-type estimate for the residual term R t (x, y) under the stronger assumption H ν (ii). Denote As a direct corollary, we get an upper bound for the entire transition probability density p t (x, y). Denote G (α) (x) = |x| −α−1 ∧ 1.  The way to represent the transition probability density in the form of "the principal part + the residual term", adopted in our approach, differs from the one commonly used in the literature, where the main statements are typically given in the form of upper/lower heat kernel estimates.
Our representation makes it possible to treat systematically several classes of models, where the residual terms admit different types of bounds according to substantially different assumptions. These bounds are obtained by means of the parametrix method, which crucially relies on the fact that a (properly chosen) "zero order approximation" to the unknown p t (x, y) and corresponding "differential error term" (see (4.1) below) admit certain prior bounds. In the classical parametrix setting for parabolic PDEs such bounds are provided by Gaussian kernel type estimates, which has a natural extension to the α-stable model (without an additional kernel ν) in the subcritical regime (with either α > 1 or α ≤ 1 and b ≡ 0); see [11], [13]. The super-critical regime with α < 1 and non-trivial b is technically more involved, but still admits certain kernel-type estimates where the stable kernels are combined with deterministic shifts along the solutions to certain ODEs, see [17]. Presence of the kernel ν changes the situation drastically, and we would like to emphasize the substantial difference between the following three principal types of the estimates: (i) integral-in-y; (ii) uniform-in-(x, y); (iii) kernel-type. The first one is essentially the estimate for the residual part of the semigroup operator P t : C ∞ → C ∞ in the corresponding operator norm. Next, it will become clear from the proof of uniform-in-(x, y) estimate in Section 5 below that the main property required for such estimate to hold is the integral-in-x bound, see (6.4), which is actually the similar estimate, but in the dual setting. These two estimates, the "direct" one and the "dual" one, should be treated separately. On one hand, under just the weak intensity-type condition (2.5) one gets integral-in-y bound for the "differential error term" (see (4.2) below), which allows one both to construct the process and to provide integral-in-y estimate for the residual term R t (x, y). On the other hand, the following example shows that the "dual" estimate in general may fail; this is the actual source of the additional assumption (3.18) in statement II of Theorem 3.2.
Example 3.1. Let the "nuisance part" of the noise correspond to the possibility of the process X t to jump, at a Poisson time instants, to the point 0; that is, ν(x, du) = δ −x (du). Then t (x, y) denotes the transition probability density for the process with the kernel µ (α) (x, du).
, and thus for α ≤ 1 the function p t (x, y) is unbounded at the vicinity of the point y = 0.
Of course the kernel-type estimate (3.21) yields both the integral-in-y estimate (3.17) and the uniform-in-(x, y) bound (3.19). However, it requires a substantially stronger condition H ν (ii) and in particular smoothness of ν(x, du), which is too restrictive in models with a microstructural residual noise in the spirit of [1]. Our second example shows that the additional assumption (3.18) is actually much weaker than H ν (ii), and can hold true for strongly singular nuisance kernels.
More generally, let ν(x; du) possess a bound where ν ′ is a Lévy measure satisfying (2.5) and c(x, u) satisfies |c(x, u)| ≤ C|u|, for each u the function x + c(x, u) is C 1 and is invertible (in x), and Then we can obtain (3.18) first changing the variables x ′ = x + c(x, u) and then using the Fubini theorem and (2.5) in the same way as above.
Another strong reason for us to emphasize and to study separately the the integral-in-y and uniform-in-(x, y) estimates is that, while being less structure demanding than the kernel estimates, they have on their own a considerable field of applications, let us briefly mention two of them which appears in theoretical statistics.
First, let the process X be discretely observed at the time instants kh, k = 1, . . . , n, and the coefficients depend on an external parameter θ ∈ Θ. The likelihood function of the model is given by the formula which is semi-explicit because of lack of explicit formulae for p t (θ; x, y). On the other hand, if h = h n → 0 (that is, the statistical model is considered in the high frequency setting), this implicit term can be approximated by with a controllable L 1 -error, which enables further rigorous asymptotic analysis of the model. It is visible (and this one of our ongoing projects) that on this way one can establish Local Asymptotic (Mixed) Normality of the model, the latter being the key property for efficiency analysis of the model and construction of statistical estimates; see more discussion in [1]. Second, both the integral-in-y and uniform-in-(x, y) estimates for R t (x, y) provide the basis for required for the asymptotic study of the Least Absolute Deviation (LAD) estimator for a drift parameter, which is the subject of the subsequent paper [18].

SDEs
For the reader's convenience, we formulate separately the version of our main results in the case where the process X is a solution to an SDE. Consider the SDE where Z is an α-stable process, N (dt, du) is an independent of Z Poisson point measure with the compensator dtν ′ (du), and N (dt, du) = N (dt, du) − dtν ′ (du) is the corresponding martingale measure. Assume that Z has the characteristic triplet (0, 0, µ (α;λ,ρ) ) and |c(x, u)| ≤ C|u|. Denote Proposition 3.1. Let the following assumptions hold: • σ is ζ-Hölder continuous and for some c 1 , c 2 > 0 • the functions b(x) and x → c(x, ·) ∈ L 1 ((u 2 ∧ 1)ν ′ (du)) are continuous.
Then the SDE (3.26) has unique weak solution X, and this solution is a strong Feller Markov process. The transition probability of this process has a density p t (x, y) which has representation • the density of the α-stable innovation term has the form g t, • the residual term R t (x, y) satisfies (3.17). In addition, The uniqueness of the weak solution to the SDE is close to the well posedness of the MP (L, C ∞ 0 ); for a (simple) formal argument we refer to [17,Section 4.3]. All the other statements follow from Theorems 3.1 -3.3 by simple re-arrangements.

Possible extensions
Let us briefly discuss several possible modifications and extensions of the main results available under additional conditions. First, let us briefly mention that the case of state-dependent α = α(x) can be treated similarly, but with a more sophisticated and less transparent estimates. We postpone its study to the companion paper [10], where the multidimensional locally α-stable model is considered in the widest possible generality. It is also visible the sensitivities (derivatives) of p t (x, y) w.r.t. t, x, y, and external parameters can be treated with the same method; in particular we refer to [9], [17] for representations and bounds for ∂ t p t (x, y) and to [5] for an application of such bounds in the accuracy bounds for approximation of integral functionals. In order not to overextend the exposition, in the current paper we do not address the sensitivities, leaving their study to a further research.
Next, let us mention that the particular form of the conditionally α-stable approximation (3.16) adopted in Theorem 3.2 is not the only possible one: one can change the regressor f t (x) = χ t (x) and the α-stable innovation term in a consistent way, providing the following alternative representation, which may be more convenient e.g for simulation purposes. Define for a given t > 0 the family and put where R t (x, y) satisfies (3.17). Under the additional condition (3.18) R t (x, y) satisfies (3.19), and under the condition H ν (ii) the term R t (x, y) satisfies (3.21). In the latter case, χ t (x) in the right hand side of (3.21) can be replaced by χ t t (x). Sketch of the proof. It is clear from the definition of the densities g t,x , g t,x that On the other hand, one can show similarly to (A.35) that That is, the required bounds for follow by respective bounds for R t (x, y) and the basic properties of stable densities (e.g. (A.29)).
These representations have the same principal structure: we define the regressor as the solution to the ODE driven by the (mollified) partially compensated drift, and then determine the parameters of the α-stable density of the innovation term by averaging of the correspondent space dependent parameters of the model w.r.t. the solution to the ODE on the time interval [0, t]. These principal components can be further simplified by the cost of making the bounds less precise and (possibly) under additional assumptions. First, let us mention briefly that the true solution χ t to (3.12) can be replaced by its k-th Picard iteration χ (k) t . The situation here is similar to the one studied in [17, Section 2.2], thus we omit a detailed discussion and just mention that for such an approximation to be successful one needs In particular, the naive choice of the regressor f t (x) = x + b(x)t mentioned in Introduction corresponds to the case k = 1. That is, for such a choice be successful it is required that α > 1 + η, which in particular excludes small values α ≤ 1/2. Next, in the case of bounded b, the innovation term can be further simplified. Namely, in this case it is easy to verify that for α = 1 in the case α = 1 an extra log t −1 term should appear. Since λ is ζ-Hölder continuous, this yields and the similar bouns hold true for ρ t , υ t , λ t , ρ t , υ t . Then essentially the same argument as in the proof of (4.32) (see Appendix A.4) makes it possible to deduce representations which just correspond to the values of the parameters "frozen" at the initial point x. The error terms R f rozen t (x, y), R f rozen t (x, y) under the corresponding conditions satisfy analogues of (3.17), (3.19), and (3.21) with δ changed to δ ∧ ζ. Note that δ < ζ/α; that is, for α ≥ 1 the bounds actually remain unchanged.

Some related results
We do not give a detailed overview of the related results in this very wide and extensively developing domain, referring an interested reader to [9], [19], and a survey paper [12] for such reviews. Here we just discuss several references directly related to the particular issues treated in the current paper.
1. Various types of estimates. We have already mentioned that the most attention in the available literature is devoted to kernel-type estimates, see detailed surveys in [9], [12]. The separate study of integral-in-y and uniform-in-(x, y) estimates is apparently new; note however the forthcoming book [14], Sections 5.4, 5.5, where a systematic treatment is given, which leads to the pair of dual L 1 -C ∞ estimates of the same kind. We note that the additive-in-space bounds, adopted in [14] as the main assumptions (e.g. [14, (5.69)]) in certain settings may become too restrictive. Namely, the additive structure of a bound makes the integral-in-x and the integral-in-y estimates synonymic, which does not allow one to distinguish between the "direct" and the "dual" estimates. On the other hand, we have seen in Example 3.1 that in the general locally α-stable model such a distinction appears quite naturally. We also mention that the L 1 -theory, based on integral-in-y estimates only, has a deep connection, at least on the level of the principle ideas, with the approach to the well-posedness of the martingale problem for integro-differential operators which dates back to [6] and [15], [16].
2. Non-symmetry of the Lévy noise. The heat kernel estimates for Lévy and Lévy type processes were mainly studied for symmetric noises; the non-symmetric setting becomes the subject of a study just in the few last years. The most advanced study in this direction available to the author is given by the recent preprint [21]; we refer there for an overview of few other recent results in the same direction. In the model from [21], the external drift (our b) is not included, as well as the nuisance kernel ν. On the other hand, the class of the kernels treated therein is substantially wider then our class of α-stable principal parts.
3. Non-boundedness of the drift coefficient. It is traditional for the literature exploiting the analytical parametrix-type methods that the coefficients are assumed to be globally bounded. On the other hand, it was specially pointed to the author by H. Masuda that, e.g. for various applications in statistics, it is highly desirable that the theory cover mean reverting models of Ornstein-Uhlenbeck type. This explains the special attention paid in the paper to the case of unbounded b. The only reference known to the author, where such non-boundedness is allowed, is an apparently yet not published preprint [7].
4 Preliminaries to the proofs: the parametrix method and an integral representation for p t (x, y) In this section we make preparation for the proofs of the main results. We introduce an integral equation whose unique solution p t (x, y) later on will be proved to be the transition probability density of the target process X. Such a construction is motivated by the parametrix method, which is a classical tool for constructing fundamental solutions to parabolic Cauchy problems. We present here only the rigorous step-by-step exposition without additional discussion of the heuristics behind the method; for such a discussion e.g. [9], [17].

4.1
The parametrix method: an outline, and the choice of the zero order approximation In this section, we introduce the main objects and explain the method. We will repeatedly use the following notation for space-and time-space convolutions of functions: We will fix a function p 0 t (x, y), a "zero order approximation" to the unknown p t (x, y), which will belong to C 1 (0, ∞) in t and to C 2 ∞ in x. In particular, the following function will be well defined point-wisely: here and below the lower index of an operator indicates the variable at which the operator is applied. Under the proper choice of p 0 t (x, y), the kernel Φ t (x, y) will satisfy The cornerstone of the construction is given by the 2nd type Fredholm integral equation which we interpret in the following way. With the time horizon T > 0 being fixed, consider the Banach space of the kernels Υ t (x, y) on [0, T ] × R × R with the norm Consider also the Banach space L T ∞,∞,1 of functions f t (x, y) with the norm Any kernel Υ ∈ L T ∞,1,1 generates a bounded linear operator in L T with the operator norm of A Υ bounded by Υ ∞,1,1 . By (4.2), the kernel Φ t (x, y) belongs to L T ∞,1,1 . Then we naturally interpret (4.3) as an equation in the Banach space L T ∞,∞,1 . It is an easy calculation that (4.2) yields and therefore the solution to the equation (4.4) in L T ∞,∞,1 is uniquely specified by the classical von Neumann series representation: with the series convergent in L T ∞,∞,1 and L T ∞,1,1 , respectively. Now, let us proceed with specification of the zero-order approximation p 0 t (x, y) for our particular model. We define the function κ s (y), s ≥ 0, y ∈ R as the solution to the Cauchy problem d ds κ s (y) = −B s (κ s (y)), s ≥ 0, κ 0 (y) = y, y ∈ R.
Define for z ∈ R, t > 0 which is a characteristic exponent of certain α-stable law. We denote by h t,z (w) the corresponding distribution density h t,z (w) = 1 2π R e −iwξ+Ψα(t,z;ξ) dξ, and define p 0 t (x, y) = h t,y (κ t (y) − x). (4.8) Note that the function p 0 t (x, y) admits the following alternative representation. Denote recall that W α (t; s) is defined in (3.13). Define g t,z (w) = g ( λt(z), ρt(z), υt(z)) (w). (4.9) Then we following identity holds true, see Appendix A.4 for the proof: Both the identity (4.8) and the alternative representation (4.10) will be substantially used in the sequel.

Kernel Φ t (x, y): decomposition and estimates
Define an auxiliary operator The following identity is crucial for the entire construction.
This identity can be easily verified using the formula (4.8) and a standard Fouier analysis-based argument; see Appendix A.4. Using this identity with z = y, w = κ t (y) and the fact that ∂ t (κ t (y)) = −B t (κ t (y)), we get (4.12) On the other hand, for the operator (2.1) we have the following decomposition: where (4.14) Now we can represent Φ in the following form: In what follows, we estimate separately the components of Φ in the decomposition (4.15) and deduce an integral estimate for the entire Φ, which holds true under H ν (i). We will repeatedly use representation (4.10) and the following observation. The functions λ t (z), ρ t (z), υ t (z) are bounded since they are obtained by averaging of bounded functions w.r.t. probability measures. In addition, λ t (z) is uniformly separated from zero. That is, for the function (4.9) with z = y the bounds (A.22), (A.29) -(A.32) can be used.

We have by (A.22), (A.29)
p 0 That is, the first and the third parts in the above decomposition of Φ ν satisfy a bound similar to the bound (4.16) for Φ drif t . For the second part, we simply write is just a notation. This gives Summary: Proof of (4.2). The above calculation gives and thus We have for any α, β, γ > 0    The proof is completely analogous and is omitted.

Solution to (4.3): specification and further re-arrangement
For any k > 1 we have where we denote s 0 = 0, s k = t, which is just (4.5). That is, the solution p t (x, y) to the integral equation (4.3) is uniquely defined by (4.6). Note that the resolvent kernel Ψ t (x, y) for the integral equation (4.3) inherits from Φ t (x, y) the integral bounds and the tail behavior. Namely, we have Then it is easy to show by induction that, for any k, The solution to (4.3) can be written as (4.31) Note that representation (4.30) differs from the one claimed in Theorem 3.2, in particular, the zero order term p 0 t (x, y) in (4.30) is not equal to the principal term (3.25) in (3.5). The difference between these two terms admits the following bound; the proof is postponed to Appendix A.4: We have sup Now it is easy to prove the following.
Proof. We first note that there exists C > 1 such that, for |x| large enough, Next, g t,x are stable densities with uniformly bounded intensities and shifts, and thus sup x,t |w|>ε Since f ∈ C ∞ is uniformly continuous, this yields which proves the first assertion. The second assertion follows from the first one by (4.34).

Proof of Theorem 3.1
We have defined the function p t (x, y) as a solution to the integral equation (4.3). In this section we make a further analysis of its representation (4.6) and prove that function p t (x, y), in a certain approximate sense, provides a fundamental solution to the Cauchy problem for the operator ∂ t − L. This fact will be a cornerstone for the proof of Theorem 3.1.

Continuity properties and approximate fundamental solution
For f ∈ C ∞ , one has P t f ∈ C ∞ , t ≥ 0, and P t , t ≥ 0 is a continuous family of bounded linear operators in C ∞ .
Proof. The proof is fairly standard, thus we just sketch it. We have The function p 0 t (x, y), given by an explicit formula (4.8), is continuous w.r.t. x, t for any y. Then one can deduce continuity of P t f (x) using the bounds (4.24), (4.28) and a standard domination convergence argument; e.g. [9, Section 3.3]. Using (4.29), one can show in addition that Proof. The argument here is close to the one from the previous proof, with p 0 t changed to Φ; recall that Ψ t (x, y) satisfies Therefore we omit the details, and only discuss two points which make the difference with the previous proof. First, the bound (4.22), when compared to (4.24), contains an extra term t −1+δ . This is the reason why (5.4) is stated for t ∈ [τ, T ] with positive τ . Next, we yet have to verify that Φ t (x, y) is continuous in x, t. Recall the decomposition (4.15), and observe that the term Φ (α) has the required continuity. However, two other terms in the decomposition (4.15) may fail to be continuous. Namely, since the function 1 |u|>t 1/α is discontinuous, weak continuity of the kernel ν(x, du) does not imply, in general, continuity of the corresponding integral m ν t (x). This trouble is artificial, and can be fixed by a proper re-arrangement of the compensating terms in these two summands. Namely, we take function θ ∈ C(R) with θ(u) = 0, |u| ≤ 1 2 , θ(u) = 1, |u| ≥ 1, and put , y), and the terms Φ drif t and Φ ν have the required continuity. The latter can be verified via a routine calculation involving the continuity condition H cont , we omit a detailed discussion.
The parametrix construction described in Section 4.1 originates in the general interpretation of p t (x, y) as a (sort of) fundamental solution to the Cauchy problem for the operator ∂ t − L; that is, in other words, p t (x, y) should satisfy the backward Kolmogorov equation for the (yet unknown) process X. In some cases one can show that p t (x, y) indeed satisfies in a classical way; for instance, this is the mainstream approach in the classical diffusive/parabolic setting, see [4]. A necessary pre-requisite for such an approach is to prove that p t (x, y) belongs to C 1 w.r.t. t and to C 2 ∞ (which is just the domain of L) w.r.t.
x. In the current setting, zero order approximation p 0 t (x, y) has the required smoothness properties, however one can hardly extend these properties to p t (x, y) using (5.1) in the way used in the proof of Lemma 5.1. The main obstacle is that ∂ x p 0 t (x, y), ∂ 2 xx p 0 t (x, y) exhibit strongly singular behavior as t → 0 (see (A.29), (A.30)), which does not allow one to differentiate (5.1). This observation leads to the following auxiliary construction. Define for ε > 0 The following lemma shows that p t,ε (x, y) approximates p t (x, y) and satisfies an approximative analogue of (5.6). This is our reason to call the family {p t,ε (x, y), ε > 0} an approximate fundamental solution.
Lemma 5.3. For every f ∈ C ∞ we have the following.

lim
t,ε→0+ 3. For every ε > 0, P t,ε f (x) belongs to C 1 as a function of t, to C 2 ∞ as a function of x, and ∂ t P t,ε f (x), L x P t,ε f (x) are continuous w.r. t. (t, x).

For every
Proof. Statements 1 -3 follow easily by the same continuity/domination argument which was used in Lemma 5.1 and thus we omit the proof; see [9, Section 4.1] for a detailed exposition of similar group of statements.
To prove statement 4, we apply the argument from the proof of [9, Lemma 5.2]. Since the additional time shift by ε > 0 removes the singularity at the point t = 0 in (5.7), the continuity/domination argument similar to the one used in Lemma 5.1 allows one to interchange the operator ∂ t − L x with the integrals in the definition of P t,ε f . Then, recalling the definition (4.1) of Φ t (x, y) and (5.2), we get see [9, (4.13)] By the continuity of Φ t (x, y) in t, we have On the other hand, since Ψ f t (x) is continuous, we have by , which combined with (5.5) completes the proof of (5.12). On the other hand it follows from (4.24) and (4.28) that Combined with (5.12), this yields (5.13). |h ε (t, x)| → 0, |x| → ∞; Note that, by Lemma 5.3, for any f ∈ C ∞ the function h f (t, x) = P t f (x) is approximate harmonic for ∂ t − L. The corresponding approximating family is given by (5.14)

The Positive Maximum Principle and the semigroup properties
In this section we establish the semigroup properties for the family of the operators {P t , t ≥ 0}. A classical method for this is based on the Positive Maximum Principle (PMP) for the operator L. It is usually applied when p t (x, y) is a (true) fundamental solution for ∂ t − L; e.g. [11]. In our setting p t (x, y) satisfies (5.6) in a weaker approximate sense; however, the classical PMPbased argument admits an extension which is well applicable in such an approximate setting. This extended argument is essentially due to [9,Section 4]. For the reader's convenience and for further reference convenience, here we give a systematic version of this argument, based on the notion of approximate harmonic functions.
Recall that an operator L with a domain D is said to satisfy PMP if for any f ∈ D and x 0 such that Clearly, the operator (2.6) with the domain D = C 2 ∞ satisfies PMP; note that Lf is continuous for any f ∈ C 2 ∞ , but does not necessarily belong to C ∞ . Let {h ε (t, x), ε ∈ (0, 1]} be the approximating family from Definition 5.1, then by assertion (i) there exist υ > 0, θ > 0, ε 1 > 0 such that these functions are continuous in (t, x) (because each h ε is continuous) and satisfy u ε (t, x) → θt > 0, |x| → ∞ uniformly in t ∈ [0, T ] (because of the assertion (i)). Then for some R > 0 and ε < ε 1 inf t≤T,x∈R is actually attained at some point in [0, T ] × [−R, R]; we fix one such a point for each ε, and denote it by (t ε , x ε ). We observe that t ε is separated from 0 when ε is small enough. Indeed, by the assertion (i) and non-negativity assumption h(0, x) ≥ 0, there exist ε 0 > 0, τ > 0 such that Since u ε (t ε , x ε ) = min this yields t ε > τ for ε < ε 0 . Now we can conclude the proof in a quite standard way. Let ε < ε 0 ∧ ε 1 . Since x ε is the maximal point for −u ε (t ε , ·) and −u ε (t ε , x ε ) > 0, we have by the PMP Since t ε is the maximal point for u ε (·, x ε ) and t ε > τ , we have where the sign "<" may appear only if t ε = T . Then On the other hand, we by the assertion (ii) from Definition 5.1 This gives contradiction and shows that (5.15) fails.
Now the semigroup properties for the family {P t , t ≥ 0} can be derived in a standard way.
Corollary 5.1. 1. Each operator P t , t ≥ 0 is positivity preserving: for any f ≥ 0 one has P t f ≥ 0.

The family {P t } is a semigroup:
3. For any f ∈ C 2 0 (R), Proof. Statement 1 follows from Proposition 5.1 applied to h(t, x) = h f (t, x), which is already known to be approximate harmonic. To prove statement 2, we fix s ≥ 0, f ∈ C ∞ and apply Proposition 5.1 to functions which are approximate harmonic and satisfy h ± (0, ·) = 0. Finally, to prove statement 3 we apply Proposition 5.1 to the function with the approximating family defined by Note that h ε (t, x) satisfies assertion (i) from Definition 5.1 by Lemma 5.3, and Applying (5.12) and (5.13), we get assertion (ii) from Definition 5.1.
It is easy to deduce from (5.18) that for every x, and Lf k ≤ C. Using (4.30), (4.24), and (4.31) we can apply the dominated convergence theorem and prove t 0 P s Lf k (x) ds → 0, k → ∞, which combined with (5.18) gives the required identity. Summarizing all the above, we conclude that P t , t ≥ 0 is a strongly continuous semigroup in C ∞ , which is positivity preserving and conservative; that is, this semigroup is Feller. It follows from (5.18) that C 2 0 belongs to the domain of its generator, and the restriction of this generator to C 2 0 equals L. For any probability measure π on R there exists a Markov process {X t } with the transition semigroup {P t }, càdlàg trajectories, and the initial distribution Law (X 0 ) = π; see [3,Theorem 4.2.7]. Finally, by Lemma 5.1 the process X is strong Feller.

The martingale problem: uniqueness
Note that any Feller process Y , whose generator A restricted to C 2 0 coincides with L, is a D(R + )solution to the martingale problem (L, C 2 0 ); this is essentially the Dynkin formula combined with [3,Theorem 4.2.7]. In particular, this is the case for the Markov process X, constructed in the previous section. In this section, we prove that the D(R + )-solution to the martingale problem (L, C 2 0 ) with a given initial distribution π is unique; this will complete the proof of Theorem 3.1. The argument here is almost the same as in [17], with the one important which appears because the drift term now is not necessarily bounded.
By [3,Corollary 4.4.3], the required uniqueness holds true if for any two D(R + )-solutions to (L, C 2 0 ) with the same initial distribution π corresponding one-dimensional distributions coincide. In what follows, we fix some solution Y and prove that It is easy to prove that Y t , t ≥ 0 is stochastically continuous; see [9]. Then for any function h(t, x) which is differentiable w.r.t. t, belongs to C 2 0 w.r.t. x, and has continuous and bounded ∂ t h(t, x), L x h(t, x), the process is a martingale, see [3,Lemma 4.3.4 (a)]. We use this fact for a certain family of functions which approximate here and below f ∈ C ∞ , T > 0 are fixed. Consider a family of functions {ϕ R , R > 0} ⊂ C 2 such that ϕ R C 2 ≤ C and Define and is bounded together with its derivatives uniformly for t ∈ [0, T 1 ], |x| ≤ R for any T 1 < T, R > 0. Multiplying this function by ϕ R , we get a function from the class C 2 0 . That is, we have that In addition, we have Thus for |x| ≤ R we can write where Q t,ε f is defined in Lemma 5.3, and Observe that, for |x| ≤ R, |h T,f ε (t, y)| =: F T,f R,ε . Now we can finalize the proof. Without loss of generality, we assume that the initial distribution π has a compact support, and take R large enough, so that supp π ⊂ (−R, R). Denote τ R = inf{t : |Y t | ≥ R} > 0, then for any T 1 < T we have Using Lemma 5.3, we pass to the limit as ε → 0 and get |h T,f (t, y)|.
Taking R → ∞ and using Lemma 5.1, we get by the domination convergence theorem Taking T 1 → T and using the domination convergence theorem again, we get which proves (5.19).

Proof of Theorem 3.2
Statement I follows straightforwardly from (4.31) and (4.32). To prove statement II, we further re-arrange decomposition (4.15). Namely, we write Since the kernel G (α,α−ζ,α) t (u, v) is bounded by Ct −1/α , satisfies (4.33), and is symmetric, one has Next, it is straightforward to see that Φ integral t (x, y) satisfies the similar sup-bound: since p 0 t (x, y) is bounded by Ct −1/α , we have by (2.5), (3.3) To obtain an integral bound for Φ integral t (x, y), we recall that Then by (3.18) Combined with (6.2), (6.3), this yields These bounds can be extended to the kernel Ψ = k≥1 Φ ⊛k : The second bound follows from the second bound in (6.4) literally in the same way with (4.28). To get the first bound, we slightly modify the argument from Section 4. 3. In what follows we use the notation of this section. Let k ≥ 1, τ 1 , . . . , τ k ∈ [0, T ] be given, and let j ∈ {1, . . . , k} be such that τ j = max i=1,...,k τ i . We have Now we take 0 ≤ s 1 ≤ · · · ≤ s k−1 ≤ t and put s 0 = 0, s k = t, τ i = s i − s i−1 , i = 1, . . . , k. Then the maximal value τ j is ≥ t/k, and we get Taking the sum in k ≥ 1, we obtain the first bound in (6.5).
We also have Repeating the calculation used in the proof of (6.5), we get the sum of these kernels satisfies and if at least one of the indices i 1 , . . . , i k equals 2. Since δ 1 < δ 2 , in the second case Recall that δ 1 = δ, δ 2 = δ ′ and y). Then, using the sub-convolution properties of H 1 , H 2 and the inequality H 1 ≤ H 2 in the same way we did before, we get In the notation from the proof of Proposition A.9, we have Using (A.16) and (A.38), (A.39), we get Combined with (4.32), this completes the proof.
The proof of the following statement is easy and omitted.
Proof. By (A.3), we have Since Now we are ready to prove (3.10), (3.11). We decompose where we denote m (α) Note that the above calculation also gives A.2 Auxiliary family χ t s (x) and properties of χ t (x), κ t (y) Proposition A.3. For any T > 0 there exists C > 1 such that Proof. The function b satisfies (A.5). Then by (3.11) and (A.7) the coefficient B t in the ODE, which defines χ t , satisfies the following linear growth bound: This in a standard way provides In order to relate the families χ s (x), κ s (y), we introduce an auxiliary family χ t s (x), the solution to the Cauchy problem Proposition A.4. For any T > 0 there exists C such that for any Proof. Denote x s = χ t s (x), y s = κ t−s (y), then with the convention 0 0 = 1. Then which provides the required statement by (3.11) since |q t,r | ≤ Lip (B t−r ).
Assume for a while that r ≤ t − r. By Proposition A.1, That is, in any case we have On the other hand, for r ≤ t we have Summarizing these calculations we get Recall that υ(·) is bounded and W α (t; ·) is a probability density. That is, directly from (A.11), (A.12) we get the bound Combined with Proposition A.4, this gives the following.

A.3 Stable densities
The kernel G (α) (x) (see the definition before Corollary 3.1) possess the following properties which can be verified straightforwardly: and for any c > 0 there exists C such that We also have The following two propositions collect the properties of the α-stable densities g (λ,ρ,υ) (x), see (3.7) for the definition.
We have