Short time kernel asymptotics for rough differential equation driven by fractional Brownian motion

We study a stochastic differential equation in the sense of rough path theory driven by fractional Brownian rough path with Hurst parameter H (1/3<H<= 1/2) under the ellipticity assumption at the starting point. In such a case, the law of the solution at a fixed time has a kernel, i.e., a density function with respect to Lebesgue measure. In this paper we prove a short time off-diagonal asymptotic expansion of the kernel under mild additional assumptions. Our main tool is Watanabe's distributional Malliavin calculus.


Introduction
For the usual d-dimensional Brownian motion (w t ) and sufficiently regular vector fields V i (0 ≤ i ≤ d) on R n , consider the following stochastic differential equation (SDE) of Stratonovich type: If the vector fields satisfy the hypoellipticity condition at the starting point a, then the law of y t has a heat kernel i.e., a density function p t (a, a ′ ) with respect to Lebesgue measure da ′ for any t > 0. In probability theory, the short time asymptotic (off-diagonal) problem of p t (a, a ′ ) has extensively been studied and is now a classical topic. See for instance [2,9,10,11,12,13,14,26,38,39,40,41,42,43,44,48,51,52,53,54,55,56] and references therein. (There are also analytic approaches, of course. But, we do not discuss them in this paper.) Among many probabilistic methods, Malliavin calculus is known to be quite powerful. Bismut [14] was first to prove short time kernel asymptotics via Malliavin calculus. Among such proofs, we focus on Watanabe's theory of generalized Wiener functionals and asymptotic theorems for them [56,29,52].
Recently, the theory of "SDE" for fractional Brownian motion (fBm) was developed. As a result, an analogous asymptotic problem is gathering attention. When Hurst parameter H is larger than 1/2, the SDE above is in the sense of Young integration. When 1/4 < H ≤ 1/2, it should be understood as a differential equation in the rough path sense driven by fractional Brownian rough path. In his previous paper [33], the author studied both on-diagonal and off-diagonal short time asymptotic expansion of p t (a, a ′ ) when H > 1/2. The method is Watanabe's asymptotic theory of generalized Wiener functionals (i.e., Watanabe distributions) in [56]. In [33] the coefficient vector fields are assumed to satisfy the ellipticity condition at a and some additional mild conditions are also assumed. Those conditions are almost parallel to the ones in [56]. Simply put, [33] is a "fractional version" of [56] in the framework of Young integration.
The aim of this paper is to prove a similar off-diagonal asymptotic expansion when 1/3 < H ≤ 1/2. Although the basic strategy of proof is similar to the case H > 1/2 in [33], the proof gets much more technically difficult since we work on the rough path space. We will carry it out by combining various recently proven results for Gaussian rough paths. A number of paper have been published on Malliavin calculus for Gaussian rough paths by now. See [1,5,7,8,15,16,17,18,20,27,28,30,34,35] for instance. However, this type of short time kernel asymptotics seems new.
The organization of this paper is as follows: In Section 2 we give a precise formulation of our problem and the statement of our main result (Theorem 2.1). In Section 3 we prove moment estimates for Taylor expansion of Lyons-Itô map. The expansion in the deterministic sense is already known, but we need "L p -version" (or "D ∞ -version") of the expansion in this paper. These estimates play a crucial role in the proof of the main theorem. In Section 4 we presented without proofs two key propositions (Propositions 4.1 and 4.2) on regularity in the sense of Malliavin calculus of the solution of RDE driven by fractional Brownian rough path. Thanks to those propositions, we can use Watanabe's asymptotic theory in the proof of the main theorem in Section 5, following the argument in [56,33]. A difference from [56] is that we can work and, in particular, localize around the energy minimizing path in the domain (not in the range) of Lyons-Itô map since the map is continuous in rough path theory. In section 6 we give proofs of two key propositions given in Section 4.
We do not give a heuristic sketch of our argument for brevity. Since formal computations are basically the same as in the Young case, the reader who wants to know it may consult the corresponding part of the author's previous paper [33].

Setting
In this subsection, we introduce a stochastic process that will play a main role in this paper. From now on we denote by w = (w t ) t≥0 = (w 1 t , . . . , w d t ) t≥0 the d-dimensional fractional Brownian motion (fBm) with Hurst parameter H. Throughout this paper we assume 1/3 < H ≤ 1/2. It is a unique d-dimensional, mean-zero, continuous Gaussian process with covariance Note that, for any c > 0, (w ct ) t≥0 and (c H w t ) t≥0 have the same law. This property is called self-similarity or scale invariance. When H = 1/2, it is the usual Brownian motion. It is well-known that w admits a canonical rough path lift w, which is called fractional Brownian rough path. Let V i : R n → R n be C ∞ b , that is, V i is a bounded smooth function with bounded derivatives of all order (0 ≤ i ≤ d). We consider the following rough differential equation (RDE); This RDE is driven by the Young pairing (w, λ), where λ t = t. The unique solution is denoted by y = (y 1 , y 2 ) and we set y t := a + y 1 0,t as usual. We will sometimes write y t = y t (a) = y t (a, w) etc. to make explicit the dependence on a and w.
A matrix notation is often convenient. So we set b = V 0 and σ = [V 1 , . . . , V d ], which is n × d matrix-valued, and often rewrite RDE (2.1) as follows; dy t = σ(y t )dw t + b(y t )dt with y 0 = a ∈ R n .

Assumptions
In this subsection we introduce assumptions of the main theorems. First, we assume the ellipticity of the coefficients of (2.1) at the starting point a ∈ R n . It is known that, under Assumption (A1), the law of the solution y t has a density p t (a, a ′ ) with respect to the Lebesgue measure on R n for any t > 0 (see [27]). Hence, for any Borel subset U ⊂ R n , P(y t (a) ∈ U) = U p t (a, a ′ )da ′ . Let H = H H be the Cameron-Martin space of fBm (w t ). Note that any γ ∈ H is continuous and of finite q-variation for some q ∈ [1,2). For γ ∈ H, we denote by φ 0 t = φ 0 t (γ) be the solution of the following Young ODE; Set, for a ′ = a, K a ′ a = {γ ∈ H | φ 0 1 (γ) = a ′ }. We only consider the case where K a ′ a is not empty. For example, if we assume (A1) for all a, then this set K a ′ a is not empty. From goodness of the rate function in Schilder-type large deviation for fractional Brownian rough path, it follows that inf{ γ H | γ ∈ K a ′ a } = min{ γ H | γ ∈ K a ′ a }. Now we introduce the following assumption; (A2):γ ∈ K a ′ a which minimizes H-norm exists uniquely.
In what follows,γ denotes the minimizer in Assumption (A2). We also assume that · 2 H /2 is not so degenerate atγ in the following sense.
Later we will give a more analytical condition (A3)', which is equivalent to (A3) under (A2). In [56], Watanabe used (A3)' in his proof of off-diagonal kernel asymptotics. We will also use (A3)'. In order to state (A3)', however, we have to introduce a lot of notations. So, we presented (A3) here for ease of presentation.

Index sets
In this subsection we introduce several index sets for the exponent of the small parameter ε > 0, which will be used in the asymptotic expansion. Unfortunately, index sets in this paper are not the set of (a constant multiple of) natural numbers and are rather complicated. Set where N = {0, 1, 2, . . .}. We denote by 0 = κ 0 < κ 1 < κ 2 < · · · all the elements of Λ 1 in increasing order. For a while, consider the case 1/3 < H < 1/2. Several smallest elements are explicitly given as follows; As usual, using the scale invariance (i.e., self-similarity) of fBm, we will consider the scaled version of (2.1). (See the scaled and shifted RDE (4.2) below). From its explicit form, one can easily guess why Λ 1 appears.
When H = 1/2, all these index sets Λ i , Λ ′ j above are just N.

Statement of the main results
Now we state our main theorem, which are basically analogous to the corresponding one in Watanabe [56]. However, when H = 1/2 and the drift term exists, there are some differences. First, the exponents of t are not (a constant multiple of) natural numbers. Second, cancellation of "odd terms" as in p. 20 and p. 34, [56] does not occur in general.
Remark 2.2 (i) In theory, the constants in the asymptotic expansion in Theorem 2.1 (and in the on-diagonal case in Theorem 4.4 below) are computable. But, actual computation is quite cumbersome and we do not carry it out in this paper. We just mention here that the first constants α 0 in Theorem 2.1 and c 0 in Theorem 4.4 are non-zero.
(ii) It might be interesting to consider the case 1/4 < H ≤ 1/3. In that case, since the third level rough path theory is needed, calculations may become much harder. (iii) Our assumptions (A1)-(A3) are quite similar to the corresponding ones in [56]. Therefore, if we set H = 1/2 in Theorem 2.1 above recovers most of (say, 90 percent of ) the main result in Watanabe [56]. Hence, our result could also be regarded as a rough path proof of [56]. (In this case, however, the index set in Theorem 2.1 is not Λ 4 = N, but is actually 2N, due to cancellation of the odd terms.) Compared to the main theorem in [56], Theorem 2.1 with H = 1/2 does not include the following two cases; (a): In this paper the ellipticity assumption (A1) is assumed. In [56], however, something like "step 2-hypoellipticity" case was also studied. (We simply did not try this case.) (b): In this paper the coefficient vector fields are of C ∞ b . However, the condition on vector fields in [56] is as follows: "For all m = 1, 2, . . . and 0 ≤ i ≤ d, ∇ m V i is bounded." (V i itself is allowed to have linear growth.) Since Bailleul [3] recently solved RDEs with such coefficients, it might be possible to extend our theorem to include such a case by just combining existing methods.
(iv) In a very recent survey [6], many results on various kinds of short time asymptotic problems for RDEs (or Young ODE) driven by fBm are reviewed. For instance, Varadhan's estimate, which is short time asymptotics of log p(t, a, a ′ ), was shown in [7] under the uniform ellipticity condition on the coefficient vector fields when H > 1/4.

Moment estimate for Taylor expansion of Lyons-Itô map
Let p ∈ [2, 3) be the roughness constant and let q ∈ [1, 2) be such that 1/p + 1/q > 1. We denote by GΩ p (R d ) the geometric rough path space with p-variation topology. In this paper, the time interval is always [0, 1]. For the definition and basic properties of geometric rough paths, see Lyons and Qian [47], or Lyons, Caruana, and Lévy [46]. Assume that σ : R n → Mat(n, d) and b : 1], R e ), we consider the following RDE driven by the Young pairing (εx, h) ∈ GΩ p (R d+e ); It was shown in Inahama [31] (or Inahama-Kawabi [36]) that the first level path of the solution admits a Taylor-like expansion in the deterministic sense as ε ց 0. Roughly speaking, the aim of this section is to prove that the expansion holds still true in L r -sense for any r ∈ [1, ∞), when x is the natural lift of fBm with H ∈ (1/3, 1/2] or a similar Gaussian process. We remark that the following RDE is a special case of (3.1) above: Here, σ and x are as above,b : We can easily check this by setting e = d + 1, h = (k, λ), and b = [σ|b] (an n × (d + 1) block matrix). This type of RDE appears when we make a Young translation of a given RDE driven by a scaled Gaussian rough path.

Notations
In this paper we work in Lyons' original framework of rough path theory. We borrow most of notations and terminologies from [47,46]. Before we start detailed discussions, however, we need to set some additional notations.
We denote by x = (x 1 , x 2 ) a generic element in GΩ p (R d ) and we write x t := x 1 0,t as usual. Conversely, for x ∈ C α−var 0 ([0, 1], R d ) with α ∈ [1, 2), we denote the natural lift of x (i.e., the smooth rough path lying above x) by the corresponding boldface letter x.
Note that, for x ∈ C α−var , (x, y) ∈ GΩ p (R d+e ) stands for the natural lift of (x, y), not for the pair (x, y) ∈ GΩ p (R d ) × GΩ p (R e ). In a similar way, for x ∈ GΩ p (R d ) and h ∈ C q−var 0 (R e ) with 1/p + 1/q > 1, (x, h) ∈ GΩ p (R d+e ) stands for the Young pairing. These notations may be somewhat misleading. But, they make many operations intuitively clear and easy to understand when we treat rough paths over a direct sum of many vector spaces.
For a control function ω in the sense of p. 16, [47], we writeω := ω(0, 1). For any defines a control function. Here, the norm on the right hand side denoted the p/j-variation (j = 1, 2) restricted on the subinterval [s, t]. (This control function is equivalent to the one defined by Carnot-Carateodory metric.) Similarly, we set ω λ (s, t) := λ q q−var,[s,t] for λ ∈ C q−var 0 ([0, 1], R e ). For α > 0 and x ∈ GΩ p (R d ), set τ 0 (α) = 0 and Superadditivity of ω x yields αN α (x) ≤ ω x . This quantity (3.4) was first studied by Cass, Litterer, and Lyons [19]. Let x(k) be continuous paths which takes values in R d k for k = 1, . . . , m. Then, we write whenever the iterated integral on the right hand side makes sense. For example, if (x, y) is a smooth rough path lying above (x, y), then its second level path is given by (x 2 , I[x, y], I[y, x], y 2 ) Slightly abusing notations, we denote by I[x, y] the "(x, y)component" of the second level path of any z = "(x, y)" ∈ GΩ p (R d ⊕ R e ), when no confusion may occur. For brevity we will often write V = R d ,V = R e , and W = R n in this section.

ODEs for ordinary Taylor terms
Ordinary terms in the Taylor expansion are known to satisfy a very simple ODE. In this section we recall them, following [31], etc. We will first calculate in 1-variational setting (i.e., the Riemann-Stieltjes sense). After that we will continuously extend these objects to the rough path setting. The ODE that corresponds to (3.1) is the following; Here, (x, h) ∈ C 1−var 0 ([0, 1], R d+e ). By setting ε = 0, we can easily see that the 0th term φ 0 = φ 0 (h) satisfies the following ODE; ODEs for φ 1 and φ 2 are given as follows; and ODEs for φ k = φ k (x, h) (k = 2, 3, 4, . . .) are given as follows. A heuristic explanation for how to derive these ODEs was given in [31]. We write ∂ ε b for the partial derivative in ε and ∇b for the (partial) gradient in y for fixed ε. where and Note that in the definition of A k , the summation is taken over all positive i 1 , . . . , i j such that i 1 + · · · + i j = k − 1. A similar remark goes for the summations in the definition of B k . (As usual we set A k 0 = B k 0 = 0.) Let us recall that we can obtain φ k by variation of constants method since the right hand side of (3.8)-(3.10) is known.
, dh t and consider the following Mat(n, n)-valued ODE; It is easy to see that M −1 t exists and satisfies a similar ODE. Using this, we can easily check that φ k has the following expression; Here, Z k t (with Z k 0 = 0) is a shorthand for the right hand side of (3.8)-(3.10). Finally, we set r k+1 It is obvious that for each ε ∈ [0, 1] and k ∈ N . It is known that this map extends to a continuous map with respect to the rough path topology in the following sense (after the initial values are suitably adjusted, precisely speaking. Note that y ε 0 = a = φ 0 0 .) Proposition 3.1 Let 2 ≤ p < 3 and 1 ≤ q < 2 such that 1/p + 1/q > 1. Then, for each ε ∈ [0, 1] and k ∈ N, the map (3.16) naturally extends to the following locally Lipschitz continuous map; Proof. This was already shown in [31] for arbitrary p ≥ 2. Here, we only give a sketch of proof for later use. First, (x, h) is just Young pairing of x and h. Since (y ε , φ 0 ) is a unique solution of an RDE deriven by (x, h), we obtain (x, h, y ε , φ 0 ). Next, assume that we have (x, h, y ε , φ 0 , . . . , φ k−1 ). Then, A k + B k on the right hand side of (3.10) can be interpreted as a rough path integral, we obtain (x, h, y ε , φ 0 , . . . , φ k−1 , A k + B k ). For M t := Id V⊕V⊕W ⊕k+1 ⊕ M t , we can use a rough path version of variation of constant method to obtain (x, h, y ε , φ 0 , . . . , φ k ). (Observe (3.14) above.) Finally, since r k+1 ε is a linear combination of y ε , φ 0 , . . . , φ k , we obtain (x, h, y ε , φ 0 , . . . , φ k , r k+1 ε ).
By the following proposition, this expansion is worth calling Taylor expansion of Lyons-Itô map.
Proposition 3.2 Keep the same notations and assumptions as in Proposition 3.1 above. Then, the following (i) and (ii) hold.
(i) For any any ρ > 0 and k = 1, 2, . . ., there exists a positive constants C = C(ρ, k) which satisfies that for any x ∈ GΩ p (V) and any h ∈ C q−var For any ρ 1 , ρ 2 > 0 and k = 1, 2, . . ., there exists a positive constantsC =C(ρ 1 , ρ 2 , k), which is independent of ε and satisfies that Proof. This was already shown in [31] for arbitrary p ≥ 2. In that paper, estimates not only for the first level path, but also for the higher level paths are given.

Main results in this section
In this subsection we state the main result of this section, that is, moment estimates for Taylor expansion of Lyons-Itô map. We will prove this theorem rigorously in subsequent subsections. Note that η k may depend on k, x, h, p, q, but not on ε.
In particular, for all k ∈ N and h, (φ k ) 1 p−var ∈ ∩ 1≤r<∞ L r and (r k+1 ε ) 1 p−var = O(ε k+1 ) in L r for any 1 < r < ∞. Remark 3.5 (1) Examples of Gaussian processes whose rough path lifts satisfy the integrability assumptions can be found in Friz and Oberhauser [22] (a Fernique-type theorem) and Cass, Litterer, and Lyons [19] (Integrability of N α ). FBm with Hurst parameter H ∈ (1/4, 1/2] is a typical example. (2) The estimate above is actually uniform in h when it varies in a bounded subset in q-variation space. (But, the uniform version is not needed in this paper.)

Proof of Theorem for k = 0
The rest of this section is devoted for showing Theorem 3.4. Without loss of generality we may assume that the initial value a = 0. In this proof c 1 , c 2 , . . . stands for unimportant positive constants, which is independent of ε ∈ (0, 1] and x, but may depend on p, q, σ, b, h q−var , etc. We say that a geometric p-rough path x is controlled by a control function ω if |x j s,t | ≤ ω(s, t) j/p for any s ≤ t and j = 1, 2. The expansion of the Itô map in the deterministic sense is already given in [31,36] by mathematical induction. We will closely look at it and check the integrability holds or not. In this subsection, we will obtain the moment estimates of r 1 ε . Surprisingly, for those who understand the proof for the deterministic sense, the most difficult part is this initial step of the induction. However, that problem is somewhat similar to the moment estimates of Jacobian process driven by Gaussian rough paths, which was solved by Cass, Litterer, and Lyons [19]. In the sequel we will check that their method also applies to this kind of problem as they conjectured in [19]. Now we prove Theorem 3.
for all i and (s, t).
It is easy to see from (3.6) and (3.7) that r 1 ε,t satisfies the following equation in the 1-variational setting; The first term on the right hand side can be interpreted as a rough path integration of a C [p]+1 b one-form along (x, h, y ε , φ 0 ). Hence, (x, h, y ε , φ 0 , σ(y ε )dx) is controlled by for all i and (s, t). With (3.18) in hand we have only to obtain a nice estimate of q-variation norm of the second term on the right hand side of (3.17). Let us estimate the first level path of r 1 ε , that is the difference of the first level paths of y ε and φ 0 . y ε and φ 0 are the solutions of the RDEs (whose coefficient are the n × (d + e)matrix [σ, b(ε, · )] and [σ, b(0, · )], resp.) driven by (εx, h) and (0, h), resp. A useful estimate of difference of two solutions of RDEs can be found in Theorem 10.26, pp. 233-236, [25]. (0 denotes a rough path such that x j ≡ 0 for j = 1, 2.) There are positive constants c 7 , c 8 such that C Chapter 8,[25] the definition and · p−ω ′ and details) and (εx, h) satisfies the assumption (iii), Theorem 10.26, [25]. Note also that Then, (a trivial modification of) Theorem 10.26, [25] implies that, on any subinterval The total number J of the subintervals is now at most Putting this back into (3.19), we have on each intervalÎ i , Here, the positive constants c i (14 ≤ i ≤ 16) depend on α, too. Since there are J subintervals, we have on the whole interval that for any 0 ≤ s ≤ t ≤ 1. This is the most difficult part in this subsection. For brevity we set a control function ξ 1 by ξ 1 (s, t) 1/p = c 18 exp(c 18 N α (x))ω x,h (s, t) 1/p . Obviously, ξ 1 ∈ ∩ 1<r<∞ L r by assumption. We see from (3.18) and (3.20) that We denote the right hand side by εξ 2 (s, t) 1/p . Then, ξ 2 is a control function such that ξ 2 ∈ ∩ 1<r<∞ L r . From a basic property of Young integration and the above estimate, we have In particular, the Young integral on the left hand side above is of finite q-variation. From (3.17), (3.18), (3.21) and a basic property of Young pairing, we have for some control function ξ 3 such that ξ 3 ∈ ∩ 1<r<∞ L r . This ξ 3 can be written as a simple combination of control functions which appear on the right hand sides on (3.18) and (3.21) and is independent of ε. Thus, we have shown Theorem 3.4 for k = 0 3.5 Proof of Theorem 3.4 for general k ≥ 1 In this subsection we prove Theorem 3.4 for k, assuming that it holds for the cases up to k − 1. In the proof of the deterministic case in [31,36], it is explained how to obtain an estimate of r k+1 ε , which can be expressed as a rough path integral along (x, h, y ε , φ 0 , . . . , φ k−1 ). In this section we write y ε,i , because otherwise notations would become too cumbersome. (Here, i = 1, 2 stands for the level of a path.) Our strategy is quite simple. We carefully look at the proof in [31,36] once again and make sure that every operation is "of at most polynomial order." Therefore, for those who already know the proof for the deterministic case, this subsection is not very difficult.
Before we start, let us first recall the rough path integration theory of level 2 in a general setting. (See e.g. Lyons and Qian [47].) Let X and Y be two Euclidean space and let f : stands for the set of linear maps from X to Y. For z ∈ GΩ p (X ), we will write z s = z 1 0,s . The almost rough pathâ that defines f (z)dz is given as follows (Section 5.2, [47]): It is well-known thatâ satisfies the following relations: for any s ≤ u ≤ t, If z is controlled by ω, then the right hand sides of (3.23) and (3.24) is dominated by a constant multiple of ω(s, t) 3/p . Therefore,â is an almost rough path and its associated rough path a is denoted by f (z)dz ∈ GΩ p (Y). Note that we can and will often takẽ Here, I k+1 and J k+1 stand for sums of the integrals with respect to x and h, respectively. Observe the right hand side of (3.25). There are only x, h, y ε , φ 0 , . . . , φ k−1 and no φ k . See (3.11) and (3.12). Therefore, the right hand side can be regarded as a rough path integral We will prove that the rough path above is controlled by a nice control function with moments of all order. To this end we first calculate ( for all 0 ≤ s ≤ t ≤ 1 and j = 1, 2. (Note that ξ may not depend on ε.) Proof. We construct and estimate (x, h, y ε , φ 0 , . . . , φ k−1 , r k ε , I k+1 ). In fact, we will show that we can take ξ(s, t) = c 1 (1 + η k−1 ) c 2 η k−1 (s, t) for sufficiently large c 1 , c 2 > 0. Here, we wrote and (will write) η k−1 := η k−1,x,h for simplicity. Heuristically, εx, ε j φ j , r k ε are considered as terms of order 1, j, k, respectively. First we consider the first level path. By the assumption of induction, the only unknown component is where the right hand side is defined as follows: where the summation runs over all j ∈ N + and (i 1 , . . . , i j ) ∈ (N + ) j which satisfy that 1 ≤ i 1 + · · · + i j ≤ k − 1.
We use (3.28) with y = φ 0 s and ∆y = r 1 ε,s = y ε s −φ 0 s . Note that |∆y| ≤ εη k−1 1/p by assumption. We can easily see that the remainder term in (3.28) is dominated by c 1 ε k η k−1 k/p . Put r 1 ε = εφ 1 + · · · + ε k−1 φ k−1 + r k ε in the ordinary terms in (3.28). Then, the terms of order ≤ k − 1 are exactly the ones in the right hand side of (3.26). Terms of order ≥ k are dominated by In a similar way we can estimate L 2 . In this case we use and the assumption that |I[r k ε , εx] s,t | ≤ ε k+1 η k−1 (s, t) 2/p . We also use Taylor expansion of the map W ⊕ (W ⊗ V) ∋ (y, ξ) → ∇σ(y) ξ ∈ W and the symmetry of ∇ j σ. Hence, Suppose that there exist c 4 , c 5 > 0 such that for some c 4 , c 5 > 0, which may be different from the ones in (3.29). Then, From (3.29), (3.30) and a basic property of constructing a rough path from an almost rough path (see Section 3.2, [47]), we have for some c 6 , c 7 > 0. Now, let us prove (3.30). In this case the first term on the right hand side of (3.23) reads as follows. (For brevity we will write φ 0 From Taylor expansion of the map the symmetry of ∇ j σ, and the assumption of induction, we can see that the terms of order ≤ k in (3.32) cancel out and W-norm of (3.32) is dominated by In a similar way, the quantity which corresponds to the second term on the right hand side of (3.23) is also dominated by c 4 ε k+1 (1 + η k−1 ) c 5 η k−1 (s, t) 3/p . Thus, we have obtained (3.30) and hence (3.31). We now estimate the second level path of (x, h, y ε , φ 0 , . . . , φ k−1 , r k ε , I k+1 ). Among its unknown components, (I k+1 ) 2 is the most difficult to handle and we will estimate it. (Compared to this, I[I k+1 , v] and I[v, I k+1 ] are easier and we will omit a proof for them. Here, v is one of the "known" components x, h, y ε , φ 0 , . . . , φ k−1 , r k ε ). From (3.22) the second level for the almost rough pathÎ k+1,2 s,t is given bŷ I k+1,2 s,t =L 1 ⊗L 1 ε 2 x 2 s,t , wherẽ This quantity already appeared in (3.26) and was shown to be dominated by a constant multiple of ε k (1 + η k−1 ) (k−1)/p . Hence, we have for some c 8 , c 9 > 0. We can also prove that for some c 10 , c 11 > 0. The left hand side is given by (3.24) and the the last three terms on the right hand side of (3.24) (forÎ k+1 instead of a) were actually already shown to be dominated by the right hand side of (3.34). The other terms can also be estimated in the same way as in (3.32) for all 0 ≤ s ≤ t ≤ 1 and j = 1, 2. (Note that ξ ′ may not depend on ε.) Proof. We will show again that we can take ξ ′ (s, t) = c(1 + η k−1 ) c ′ η k−1 (s, t) for sufficiently large c, c ′ > 0. It suffices to show that, for some c 1 , c 2 > 0, the following estimate of qvariation norm of J k+1 holds; Once (3.36) is obtained, this lemma immediately follows from Lemma 3.6 and Young translation. Set Q t by s Q u dh u holds. By Taylor expansion, |Q 0 | ≤ c 3 ε k+1 . Using (i) Taylor expansion of b, (ii) the relation r 1 ε = y ε − φ 0 = ε 1 φ 1 + · · · + ε k−1 φ k−1 + r k ε , and (iii) the assumption of induction, we can prove as before that • is subtracted in the definition of Q. Otherwise, we might not be able to prove that Q is of order O(ε k+1 ) from the estimate of r k ε .) By the relation 1/p + 1/q > 1 and a basic estimate for Young integral, these estimates for Q imply (3.36).
Proof of Theorem 3.4. Now we prove Theorem 3.4. Let M be as in (3.13). Then, M and M −1 are deterministic, depends only on h, and are of finite q-variation. We see from (3.25) that at least formally Note that the last expression takes the form of Young translation.
To be more precise, setM t := Id V⊕V⊕W ⊕k+2 ⊕ M t and apply (a rough path version of) variation of constant method as in (3.14) to the rough path in (3.35) in Lemma 3.7 above. Then, we obtain x, h, y ε , φ 0 , . . . , We can easily see that this rough path satisfies the same inequality as in (3.35) By applying a simple linear map to the above rough path, we can obtain x, h, y ε , φ 0 , . . . , φ k−1 , φ k , ε −(k+1) r k+1 ε . Since the operator norm of this ε-dependent linear map is bounded in ε, this rough path also satisfies the same inequality as in (3.35) (if ξ ′ is suitably replaced). This proves Theorem 3.4.

Remark for fractional order case
In this subsection we consider the case where the coefficients of RDEs are of fractional order in ε and present analogous results to Proposition 3.1, Proposition 3.2, and Theorem 3.4. The contents of this subsection will be used in later sections.
In this subsection we assume that and h ∈ C q−var 0 (R d ) with 1/p + 1/q > 1 and we set λ t = t. We consider the following RDE driven by the Young pairing (εx, h, λ); This is a variant of RDE (3.2). Strictly speaking, unless H = 1/2 the results in previous subsections cannot be used for RDE (3.37). With minor modifications, however, similar results hold in this case, too. We will explain it below. (Proofs are essentially the same and will be omitted). Let us fix some notations for fractional order expansions. For If H = 1/2, then Λ 1 = N. Instead of (3.15) the Taylor expansion of Lyons-Itô map takes the following form; In this case, φ κ k is the term of "order κ k " and is explicitly given in essentially the same way as in (3.10), (3.11), and (3.12). For the reader's convenience, we will give explicit formal expressions of φ κ k for k = 0, 1, 2, 3 when 1/3 < H < 1/2.
Then, for any x, h and k ∈ N, there exist control functions η k = η k,x,h such that the following (i)-(iii) hold: In particular, for all k ∈ N and h, (φ κ k ) 1 p−var ∈ ∩ 1≤r<∞ L r and (r

Malliavin Calculus for solution of RDE driven by fBM
In this section we study the solution of a (scaled) RDE driven by fractional Brownian motion with H ∈ (1/3, 1/2] via Malliavin calculus. It was already done by Hairer and Pillai [27] (and Cass, Hairer, Litterer, and Tindel [18]). In this section we basically follow their arguments, but in our case we need to check dependency on the small parameter ε ∈ (0, 1]. To keep our argument concise, we do not explain much about Malliavin calculus here. The reader should refer to well-known textbooks such as Nualart [49] and Shigekawa [50]. In this paper we use Watanabe distribution theory and asymptotic theorems for them, which can be found in [56] or Section V-9, [29]. (The results in [56,29] are formulated on the classical Wiener space, but they are still true on an abstract Wiener space.) One thing different from is [56,29] that the index sets of asymptotic expansions may not be N = {0, 1, 2, . . .} in this paper. So, we need to slightly modify these asymptotic theorems. However, we skip details here since a summary was already given in the author's previous work [33].
As before σ : R n → Mat(n, d) and b : R n → R n be C ∞ b . For notational convenience, we will sometimes denote by V i : R n → R n the ith column vector field of σ (1 ≤ i ≤ d), i.e., σ = [V 1 ; · · · ; V d ]. In a similar way we will write V 0 = b.
Proof. This will be shown in Section 6.

Proposition 4.2
In addition to the assumption of Proposition 4.1, we assume the ellipticity assumption (A1). Then, (ỹ ε 1 − a)/ε is uniformly non-degenerate in the sense of Malliavin, that is, Proof. This will be shown in Section 6. Note that the special case "γ = 0 and b ≡ 0 and uniformly elliptic coefficients" was already shown in [5,7], etc.
Consider the asymptotic expansion ofỹ ε as in (3.39). We have already seen this expansion holds true both in the deterministic sense and the L r -sense. Moreover, evaluated at time t = 1, it also holds true in D ∞ -sense.
Proof. By the way it is constructed, φ κ k 1 is an element of inhomogeneous Wiener chaos of order at most [κ k ]. Hence, φ κ k 1 ∈ D ∞ (R n ) and D [κ k ]+1 φ κ k 1 = 0. Next we estimate Sobolev norms of the remainder terms. We see from the stronger form of Meyer's equivalence that, for any integer s ≥ [κ k ] + 1 and any r ∈ (1, ∞), there exists C = C r,s such that holds. By Theorem 3.9 and Proposition 4.1, the right hand side is O(ε κ k+1 ) + O(ε s ) = O(ε κ k+1 ). Thus, we have the desired estimate for such (r, s). Since D r,s -norm is increasing in s, the proof is done.
Now we state and prove on-diagonal short time asymptotics of p t (a, a) = E[δ a (y t )]. Compared to the off-diagonal case, this is not so difficult. From Propositions 4.2, and 4.3, and Watanabe's asymptotic theory for generalized Wiener functionals (i.e., Watanabe distributions), we can obtain the following theorem.
Putting ε = t H , we complete the proof.

Off-diagonal short time asymptotics
In this section, following Watanabe [56], we prove the short time asymptotics of kernel function p t (a, a ′ ) when a = a ′ and 1/3 < H ≤ 1/2. Unlike in [56], we can localize around the energy minimizing path in the geometric rough path space in this paper, since Lyons-Itô map is continuous in this setting. (The case H > 1/2 was done in [33]. The result in this section can be regarded as a rough path version of that in [33].)

Localization around energy minimizing path
Let GΩ B α,m (R d ) be the geometric rough path space with (α, m)-Besov norm for α ∈ (1/3, 1/2] and m > 1 with α − 1/m > 1/3. Explicitly, the norms are given by We have the following continuous embeddings For any β ∈ (1/3, H), fBm (w t ) admits a natural lift a.s. via dyadic piecewise linear approximation and the lift w is a random variable taking values in GΩ H β (R d ). Note that the lift of Cameron-Martin space H is contained in GΩ H β (R d ). Moreover, as ε ց 0, Schilder-type large deviation holds for the laws of εw, which will be denoted by ν ε = ν H ε . (See Friz and Victior [24]). Because of Besov-Hölder embedding mentioned above, these properties also hold with respect to (α ′ , m)-Besov topology if α ′ < H. As usual, the good rate function I is given as follows: Let us clarify the conditions on various indices here. From now on, these will be assumed unless otherwise stated. First, for given H ∈ (1/3, 1/2], we choose p := 1/α ∈ (1/H, 3) and q ∈ ((H +1/2) −1 , 2) so that 1/p+1/q > 1 holds. Then, we choose α ′ ∈ (α, H) and m ∈ N such that (α ′ −α)∨(H−α ′ ) > 1/(4m) and consider GΩ B α ′ ,4m (R d ). (Heuristically, m is a very large integer.) Since m is an integer, w → w i − h i 4m/i iα,4m/i−B is D ∞ in the sense of Malliavin calculus for i = 1, 2, where h is a fixed element. Actually, it is an element of an inhomogeneous Wiener chaos. Due to this fact, the localization is allowed even in the framework of Watanabe distribution theory. This is the reason why we use this Besov-type norm on the geometric rough path space.
The Young translation τ γ works on GΩ B α ′ ,4m (R d ) for any γ ∈ H. The proof is just a slight modification of the Hölder case.
Lemma 5.1 Let H, α ′ , m be as above. Then, for any γ ∈ H, the Young translation τ γ is a continuos map from GΩ B α ′ ,4m (R d ) to itself.
Proof. Generally, we have the following basic result for Young integrals. Let p ′ , q ′ > 0 with 1/p ′ + 1/q ′ > 1. Then, there is a constant C > 0 which depends only on p ′ , q ′ such that for any [s, t] ⊂ [0.1]. Now we prove the lemma. We have Here, the second, the third, and the fourth terms on the right hand side of (5.2) are Young integrals. As usual we set x t = x 1 0,t . By Besov-Hölder embedding theorem, x is α ′ − 1/(4m) Hölder continuous. Moreover, there is a constant c such that (See p. 211, [25]. The constant c > 0 may vary from line to line.) Therefore, γ 2 is of finite 2(1/q − 1/2) Hölder norm. The third and the fourth terms are of finite (1/q − 1/2) + (α ′ − 1/(4m)) Hölder norm. Since H − α ′ < 1/(4m) and we may choose q so that 1/q − 1/2 can be arbitrarily close to H, these three terms are actually of finite (2α ′ + δ) Hölder norm for some δ > 0 and hence are of finite (2α ′ , 2m) Besov norm. Thus, we have shown that τ γ maps GΩ B α ′ ,4m (R d ) to itself. We can show continuity of τ γ by estimating the difference |τ γ (x) i s,t −τ γ (x) i s,t | for i = 1, 2 in essentially the same way. So, we omit details.
For γ ∈ H ⊂ C q−var 0 ([0, 1], R d ), let φ 0 = φ 0 (γ) be a unique solution of (3.40) in the q-variational Young sense, which starts at a ∈ R n . Set, for a = a ′ , This is a closed set in H. We only consider the case that K a ′ a is not empty. For example, if (A1) is satisfied for any a, then K a ′ a is not empty for any a ′ . From the Schilder-type large deviation theory, we see that inf{ γ H | γ ∈ K a ′ a } = min{ γ H | γ ∈ K a ′ a }. We continue to assume (A1). Now we introduce another assumption; (A2):γ ∈ K a ′ a which minimizes H-norm exists uniquely. In the sequel,γ denotes the minimizer in Assumption (A2) and we use the results of the previous section for thisγ.
Note that (i) the mapping γ ∈ H ֒→ C q−var 0 ([0, 1], R d ) → φ 0 1 (γ) ∈ R n is Fréchet differentiable and (ii) its Jacobian is a surjective linear mapping from H to R n at any γ, because there exists a positive constant c = c(γ) such that This can be shown in the same way as in the proof of non-degeneracy of y t under ellipticity assumption. (Actually, it is easier since γ is non-random and fixed here.) Therefore, by the Lagrange multiplier method, there existsν = (ν 1 , . . . ,ν n ) ∈ R n uniquely such that the map attains extremum at (γ,ν). By differentiating in the direction of k ∈ H, we have Here,Ĵ(γ) ±1 are of finite q-variation andĴ (γ) satisfies the following ODE in Young sense; Since the integral on the right hand side is of (5.5) Young integral, γ, · H naturally extends to a continuous linear functional on C p−var 0 ([0, 1], R d ). Next, setν ε = ν ε ⊗ δ ε 1/H λ , where λ is a one-dimensional path defined by λ t = t and ⊗ stands for the product of probability measures. This measure is supported on is continuous. The law ofν ε induced by this map is the law of (εw, ε 1/H λ), the Young pairing of εw and ε 1/H λ.
DefineÎ(x; l) = h 2 H /2 if x is the lift of some h ∈ H and l t ≡ 0 and defineÎ(w, l) = ∞ if otherwise. Here, l is a one-dimensional path. We can easily show that {ν ε } ε>0 also satisfies large deviation principle as ε ց 0 with a good rate functionÎ. We will use this in Lemma 5.2 below to show that we may localize on a neighborhood of the minimizerγ in order to obtain the asymptotic expansion. Now we introduce a cut-off function. Let ψ : R → [0, 1] be a smooth function such that ψ(u) = 1 if |u| ≤ 1/2 and ψ(u) = 0 if |u| ≥ 1. For each η > 0 and ε > 0, we set Here, τ −γ is the Young translation by −γ. It is a continuous map from GΩ H β (R d ) to itself. So, the right hand side is defined for almost all w ∈ W. Shifting byγ/ε, we have This is a D ∞ -functional. Moreover, from Taylor expansion for ψ, the following asymptotics holds; for any η > 0 and any M ∈ N, Since w i 4m/i iα ′ ,4m/i−B is an element of an inhomogeneous Wiener chaos of order 4m, so is its Cameron-Martin shift τ −γ (εw) i 4m/i iα ′ ,4m/i−B . For any r ∈ (0, ∞), L r -norm of this Wiener functional is bounded in ε. Hence, so is its D r,k -norm for any r, k.
The following lemma states that only rough paths sufficiently close to the lift of the minimizerγ contribute to the asymptotics.
Then this forms a fundamental system of open neighborhoods around (x 1 , x 2 ) ≡ (0, 0) with respect to (α ′ , 4m)-Besov topology. By Lemma 5.1, τγ(U η ) is an open neighborhood ofγ in (α ′ , 4m)geometric rough path space. The first set on the right hand side of (5.10) can be written as {εw / ∈ τγ(U 2 −1/(4m) η )}. First taking lim sup εց0 ε 2 log and then letting r ′ ց 1, we obtain lim sup Here, Φ : GΩ p (R d+1 ) → GΩ p (R n ) denotes the Lyons-Itô map that corresponds to the coefficient [σ, b]. In the last inequality we used large deviation upper estimate for a closed set. Notice also that a + Φ(γ, 0) 1 = φ 0 (γ). (In order to keep notations simple, we did not make the Young pairing and the injection (5.1) explicit.) Now let η ′ tend to 0. As η ′ decreases, the right hand side of (5.11) decreases. The proof is finished if the limit is strictly smaller than − γ 2 H /2. Assume otherwise. Then, there exists {γ k } ∞ k=1 ⊂ H such that In particular, {γ k } is bounded in H. Hence, by goodness of the rate function, the lifts By taking a subsequence if necessary, we may assume {γ k } converges to some z in (α ′ , 4m)-Besov topology. By the continuity of Φ, we have a + Φ(z, 0) 1 0,1 = a ′ . Since z ∈ τγ(U 2 −1/(4m) η ) c , z =γ. From the lower semicontinuity of the rate function, we see that z is the lift of some z ∈ H and z 2 H /2 ≤ γ 2 H /2. This clearly contradicts Assumption (A2).

Integrability lemmas
In this subsection, we prove a few lemmas for integrability of Wiener functionals of exponential type which will be used in the proof of the short time asymptotic expansion.
Next we consider Lemma 5.4 Assume (A2). For any M > 0, there exists η > 0 such that Proof. We can prove the lemma in the same way as in Lemma 5.3 above. So we only give a sketch of proof.
In this case we have the following inequality instead of (5.12): The rest is similar. So we omit details.

Proof of off-diagonal short time asymptotics
In this subsection we prove Theorem 2.1, namely, off-diagonal short time asymptotics of the density of the solution (y t ) = (y t (a)) of RDE (2.1) driven by fractional Brownian rough path w with 1/3 < H ≤ 1/2 under Assumptions (A1)-(A3).
First, let us calculate the kernel p(t, a, a ′ ). Take η > 0 as in Lemma 5.6. Then, we see As we have shown in Lemma 5.2, the second term I 2 on the right hand side does not contribute to the asymptotic expansion. So, we have only to calculate the first term I 1 .
Thus, we have shown the lemma.
First we prove the following lemma. Let N δ (x) be as in (3.4) and λ t = t. Note that (i) of the lemma is a deterministic estimate. Lemma 6.1 We assume the same conditions on x, γ, V k , p = α −1 , H as in Propositions 4.1. Then, we have the following (i)-(ii); (i) There exist positive constants δ and C independent of ε, x such that J ε ∞ + K ε ∞ ≤ C exp(CN δ (x)).
Proof. First we prove (i). In this proof, C may change from line to line. Set M = d k=1 ∇V k (y t )dx (k) t + ∇V 0 (y t )dt, where the integral is along a rough path (x, λ). RDE (6.2) can also be written as dJ t = dM t · J t . This is a linear equation for given M and we may use the results in Section 10.7, Friz and Victoir to obtain where P = {t j } m j=0 is any finite partition of [0, 1] such that ω M (t j−1 , t j ) ≤ 1 holds for all j. Since the coefficients of RDE (6.2) are of C 3 b , there exists δ ′ > 0 such that ω M (s, t) ≤ 1 if ω (x,λ) (s, t) ≤ δ ′ . Now we consider the intrinsic control of (τ γ (εx), ε 1/H λ) instead of that of (x, λ). Let where * denotes the transpose of a matrix. Since H > 1/3 and the Hölder roughness α can be chosen sufficiently close to H, α + 2H > 1 holds and the 2D Young integral is well-defined. By slightly modifying this, we can show that the Malliavin covariance matrix Q ε of (ỹ ε 1 − a ′ )/ε is given by By a well-known argument, it is sufficient to show that, for any r > 0, there exists a constant C = C(r) such that P inf v∈R n :|v|=1 v * Q ε v < ρ ≤ Cρ r (0 < ρ ≤ 1). (6.5) Note here that C must not depend on ρ or ε.
For v ∈ R n with |v| = 1, set f i From Young's inequality for 2D integral, Λ i ε ≤ C 1 ( f i ε α−H + |f i ε (0)|) 2 , where C 1 > 0 is a constant independent of ε. Combining this with (6.4), we have where C 2 > 0 is a constant independent of ε, which may vary from line to line.
Proof of Proposition 4.1 In a recent preprint [35], the author proved D ∞ -property of solutions of RDEs driven by Gaussian rough path w including fBm with H > 1/4. The proof is so flexible that we can replace w by τ γ (εw) = εw + γ. If we keep track of εdependency in that argument, then we can easily see that D mỹε is O(ε m ) as ε ց 0 for any m ∈ N. In that proof, the uniform estimate of Jacobian process and its inverse in Lemma 6.1 plays a crucial role. (Although the author did not try, it is probably possible to modify the proof in [27]. Lemma 6.1 is needed, anyway.)