A Fourier analytic approach to pathwise stochastic integration*

We develop a Fourier analytic approach to rough path integration, based on the series decomposition of continuous functions in terms of Schauder functions. Our approach is rather elementary, the main ingredient being a simple commutator estimate, and it leads to recursive algorithms for the calculation of pathwise stochastic integrals, both of Itô and of Stratonovich type. We apply it to solve stochastic differential equations in a pathwise manner.


Introduction
The theory of rough paths [35] has recently been extended to a multiparameter setting independently by Hairer [26] and the autors [23]. While Hairer's approach has a wider range of applicability, both allow to study many interesting problems that were well out of reach with previously existing methods; for example the continuous parabolic Anderson model in dimension two [26,23,10], the three-dimensional stochastic quantization equation [26,9], the KPZ equation [25,24], or the three-dimensional stochastic Navier Stokes equation [55,54]. Our methods developed in [23] are based on harmonic analysis, Littlewood-Paley decompositions of tempered distributions, and a simple commutator lemma. This requires a non-negligible knowledge of Littlewood-Paley theory and Besov spaces, while at the same time the application to rough differential equations (the classical problem that motivated Lyons' theory of rough paths) is possible but more technically involved than we would wish. That is why here we develop the approach of [23] in the slightly different language of Haar / Schauder functions, which allows us to communicate our main ideas while requiring only a very basic knowledge in analysis. Moreover, in the Haar-Schauder formulation the application to rough differential equations poses no additional technical challenges and we understand quite well the link between equations of Itô and Stratonovich type.
It is a classical result of Ciesielski [11]  have been developed for many Fourier and wavelet bases, showing that the regularity of a function is encoded in the decay of its coefficients in these bases; see for example Triebel [50].
But until this day, the isomorphism based on Schauder functions plays a special role in stochastic analysis, because the coefficients in the Schauder basis have the pleasant property that they are just rescaled second order increments of f . So if f is a stochastic process with known distribution, then also the distribution of its coefficients in the Schauder basis is known explicitly. A simple application is the Lévy-Ciesielski construction of Brownian motion. An incomplete list with further applications will be given below.
Another convenient property of Schauder functions is that they are piecewise linear, and therefore their iterated integrals · 0 G pm (s)dG qn (s), can be easily calculated. This makes them an ideal tool for our purpose of studying integrals. Indeed, given two continuous functions f and g on [0, 1] with values in L(R d , R n ), the space of linear maps from R d to R n , and R d respectively, we can formally define In this paper we study under which conditions this formal definition can be made rigorous.
We start by observing that the integral introduces a bounded operator from C α × C β to C β if and only if α + β > 1. Obviously, here we simply recover Young's integral [53]. In our study of this integral, we identify different components: t 0 f (s)dg(s) = S(f, g)(t) + π < (f, g)(t) + L(f, g)(t), where S is the symmetric part, π < the paraproduct, and L(f, g) the Lévy area. The operators S and π < are defined for f ∈ C α and g ∈ C β for arbitrary α, β > 0, and it is only the Lévy area which requires α+β > 1. Considering the regularity of the three operators, we have S(f, g) ∈ C α+β , π < (f, g) ∈ C β , and L(f, g) ∈ C α+β whenever the latter is defined. Therefore, in the Young regime such that f − π < (f g , g) ∈ C α+β . Our aim is then to construct the Lévy area L(f, g) for α < 1/2 and f paracontrolled by g. If β > 1/3, then the term L(f − π < (f g , g), g) is well defined, and it suffices to make sense of the term L(π < (f g , g), g). This is achieved with the following commutator estimate: L(π < (f g , g), g) − · 0 f g (s)dL(g, g)(s) 3β ≤ f g β g β g β .
Therefore, the integral · 0 f (s)dg(s) can be constructed for all f that are paracontrolled by g, provided that L(g, g) can be constructed. In other words, we have found an alternative formulation of Lyons' [35] rough path integral, at least for Hölder continuous functions of Hölder exponent larger than 1/3.
Since we approximate f and g by functions of bounded variation, our integral is of Stratonovich type, that is it satisfies the usual integration by parts rule. We also consider a non-anticipating Itô type integral, that can essentially be reduced to the Stratonovich case with the help of the quadratic variation.
The last remaining problem is then to construct the Lévy area L(g, g) for suitable stochastic processes g. We construct it for certain hypercontractive processes. For continuous martingales that possess sufficiently many moments we give a construction of the Itô iterated integrals that allows us to use them as integrators for our pathwise Itô integral.
Below we give some pointers to the literature, and we introduce some basic notations which we will use throughout. In Section 2 we present Ciesielski's isomorphism, and we give a short overview on rough paths and Young integration. In Section 3 we develop a paradifferential calculus in terms of Schauder functions, and we examine the different components of Young's integral. In Section 4 we construct the rough path integral based on Schauder functions. Section 5 develops the pathwise Itô integral. In Section 6 we construct the Lévy area for suitable stochastic processes. And in Section 7 we apply our integral to solve both Itô type and Stratonovich type SDEs in a pathwise way.
to the distribution of the Brownian bridge. This shows that the law of the Brownian bridge can be reconstructed from a single "typical sample path".
Concerning integrals based on Schauder functions, there are three important references: Roynette [47] constructs a version of Young's integral on Besov spaces and shows that in the one dimensional case the Stratonovich integral · 0 F (W s )dW s , where W is a Brownian motion and F ∈ C 2 , can be defined in a deterministic manner with the help of Schauder functions. Roynette also constructs more general Stratonovich integrals with the help of Schauder functions, but in that case only almost sure convergence is established, where the null set depends on the integrand, and the integral is not a deterministic operator. Ciesielski, Kerkyacharian, and Roynette [12] slightly extend the Young integral of [47], and simplify the proof by developing the integrand in the Haar basis and not in the Schauder basis. They also construct pathwise solutions to SDEs driven by fractional Brownian motions with Hurst index H > 1/2. Kamont [29] extends the approach of [12] to define a multiparameter Young integral for functions in anisotropic Besov spaces. Ogawa [40,41] investigates an integral for anticipating integrands he calls noncausal starting from a Parseval type relation in which integrand and Brownian motion as integrator are both developed by a given complete orthonormal system in the space of square integrable functions on the underlying time interval. This concept is shown to be strongly related to Stratonovich type integrals (see Ogawa [41], Nualart, Zakai [39]), and used to develop a stochastic calculus on a Brownian basis with noncausal SDE (Ogawa [42]).
Rough paths have been introduced by Lyons [35], see also [34,37,31] for previous results. Lyons observed that solution flows to SDEs (or more generally ordinary differential equations (ODEs) driven by rough signals) can be defined in a pathwise, continuous way if paths are equipped with sufficiently many iterated integrals. More precisely, if a path has finite p-variation for some p ≥ 1, then one needs to associate p − 1 iterated integrals to it to obtain an object which can be taken as the driving signal in an ODE, such that the solution to the ODE depends continuously on that signal. Gubinelli [21,22] simplified the theory of rough paths by introducing the concept of controlled paths, on which we will strongly rely in what follows. Roughly speaking, a path f is controlled by the reference path g if the small scale fluctuations of f "look like those of g". Good monographs on rough paths are [32,36,18,16].
Finally let us remark that, even if only quite implicitly, paraproducts based on the classical Fourier transform have already been exploited in the rough path context in the work of Unterberger on the renormalization of rough paths [51,52], where it is referred to as "Fourier normal-ordering", and in the related work of Nualart and Tindel [38].
Notation and conventions. Throughout the paper, we use the notation a b if there exists a constant c > 0, independent of the variables under consideration, such that a c · b, and we write a b if a b and b a. If we want to emphasize the dependence of c on the variable x, then we write a(x) x b(x).
For a multi-index µ = (µ 1 , . . . , µ d ) ∈ N d we write |µ| = µ 1 + . . . + µ d and ∂ µ = ∂ |µ| /∂ µ1 x1 · · · ∂ µ d x d . DF or F denote the total derivative of F . For k ∈ N we denote by D k F the k-th order derivative of F . We also write ∂ x for the partial derivative in direction x. t i 00 for i = 1 is rather arbitrary, but the definition for i = 1 simplifies for example the statement of Lemma 2.1 below.
is the linear interpolation of f between the points t 1 −10 , t 1 00 , t 1 Ciesielski [11] observed that if f is Hölder-continuous, then the series (f k ) converges absolutely and the speed of convergence can be estimated in terms of the Hölder norm of f . The norm · C α is defined as Before we continue, let us slightly change notation. We want to get rid of the factor 2 −p/2 in (2.1), and therefore we define for p ∈ N and 0 ≤ m ≤ 2 p the rescaled functions as well as ϕ −10 := G −10 ≡ 1. Then we have for p ∈ N and 1 ≤ m ≤ 2 p The expansion of f in terms of (ϕ pm ) is given by 2 p m=0 f pm ϕ pm , where f −10 := f (1), and f 00 := f (1) − f (0) and for p ∈ N and m ≥ 1 We write χ pm , df := 2 p f pm for all values of (p, m), despite not having defined χ −10 .
In case there is no ambiguity about the target set, we also write C α instead of C α (R d ).
h. But for α > 1, there is no reasonable identification of C α with a classical function space. For example if α ∈ (1, 2), the space C α ([0, 1], R d ) consists of all continuously differentiable functions f with (α − 1)-Hölder continuous derivative Df . Since the tent shaped functions ϕ pm are not continuously differentiable, even an f with a finite Schauder expansion is generally not The a priori requirement of f being continuous can be relaxed, but not much. Since the coefficients (f pm ) evaluate the function f only in countably many points, a general f will not be uniquely determined by its expansion. But for example it would suffice to assume that f is càdlàg.
One could of course imagine taking other norms on the sequence spaces than the ∞ norm, and one useful definition is This leads to Besov spaces with general integrability indices r and s, and in fact Ciesielski's isomorphism extends to this setting, see [12]. One can therefore develop the theory we present here also for general Besov spaces, and for the approach of [23] this was worked out in [45]. But in order to keep the presentation lighter we refrain from considering general Besov spaces here. The only exception is Theorem 6.5, where we use them in the proof.
Littlewood-Paley notation. We will employ notation inspired from Littlewood-Paley theory. For p ≥ −1 and f : [0, 1] → R d we define We will occasionally refer to (∆ p f ) as the Schauder blocks of f . Note that

Young integration and rough paths
Here we present the main concepts of Young integration and of rough path theory. The results presented in this section will not be applied in the remainder of the paper, but we feel that it could be useful for the reader to be familiar with the basic concepts of rough paths, since it is the main inspiration for the constructions developed below.
Young's integral [53] allows to define f dg for f ∈ C α , g ∈ C β , and α + β > 1. More precisely, let f ∈ C α and g ∈ C β be given, let t ∈ [0, 1], and let π = {t 0 , . . . , t N } be a partition of [0, t], i.e. 0 = t 0 < t 1 < · · · < t N = t. Then it can be shown that the Riemann converge as the mesh size max k=0,...,N −1 |t k+1 − t k | tends to zero, and that the limit does not depend on the approximating sequence of partitions. We denote the limit by |t − s| α+β f α g β for all s, t ∈ [0, 1]. The condition α + β > 1 is sharp, in the sense that there exist f, g ∈ C 1/2 , and a sequence of partitions (π n ) n∈N with mesh size going to zero for which the Riemann sums t k ∈πn f (t k )(g(t k+1 ) − g(t k )) do not converge as n tends to ∞. The condition α + β > 1 excludes one of the most important examples: we would like to take g as a sample path of Brownian motion, and f = F (g). Lyons' theory of rough paths [35] overcomes this restriction by stipulating the "existence" of basic integrals and by defining a large class of related integrals as their functionals. Here we present the approach of Gubinelli [21].
Let α ∈ (1/3, 1) and assume that we are given two functions v, w ∈ C α ([0, 1], R), as well as an associated "Riemann integral" I v,w s,t = |Φ v,w s,t | |t − s| 2α for Φ v,w s,t := I v,w s,t − v(s)w s,t . The remainder Φ v,w is often (incorrectly) called the area of v and w. This name has its origin in the fact that its antisymmetric part (Φ v,w s,t − Φ w,v s,t )/2 EJP 21 (2016), paper 2.
Page 7/37 http://www.imstat.org/ejp/ corresponds to the algebraic area spanned by the curve ((v(r), w(r)) : r ∈ [s, t]) in the plane R 2 . If α ≤ 1/2, then the integral I v,w cannot be constructed using Young's theory of integration, and also I v,w is not uniquely characterized by (2.2). But let us assume nonetheless that we are given such an integral I v,w satisfying (2.2).
Proposition 2.5 ([21], Theorem 1). Let α > 1/3, let v, w ∈ C α , and let Φ v,w satisfy (2.2). Let f and g be controlled by v and w respectively, with derivatives f v and g w . Then there exists a unique function I(f, g) that satisfies for all s, t ∈ [0, 1] If (π n ) is a sequence of partitions of [0, t], with mesh size going to zero, then Of course, all of this extends to a multidimensional setting where v, w, f, g take values in R d1 , R d2 , R d3 , R d4 , respectively (in which case we have to replace for example t s v(r)dw(r) by t s v(r) ⊗ dw(r)).
The integral I(f, g) coincides with the Riemann-Stieltjes integral and with the Young integral, whenever these are defined. Moreover, the integral map is self-consistent, in the sense that if we consider v and w as paracontrolled by themselves, with The only remaining problem is the construction of the integral I v,w . This is usually achieved with probabilistic arguments. If v and w are Brownian motions, then we can for example use Itô or Stratonovich integration to define I v,w . Already in this simple example we see that the integral I v,w is not unique if v and w are outside of the Young regime.
It is possible to go beyond α > 1/3 by stipulating the existence of higher order iterated integrals. For details see [22] or any book on rough paths, such as [32,36,18,16].

Paradifferential calculus and Young integration
In this section we develop the basic tools that will be required for our rough path integral in terms of Schauder functions, and we study Young's integral and its different components.

Paradifferential calculus with Schauder functions
Here we introduce a "paradifferential calculus" in terms of Schauder functions. Paradifferential calculus is usually formulated in terms of Littlewood-Paley blocks and was initiated by Bony [8]. For a gentle introduction see [3].
We will need to study the regularity of p,m u pm ϕ pm , where u pm are functions and not constant coefficients. For this purpose we define the following space of sequences of functions. where it is understood that u pm ∞ := max t∈[t 0 In Appendix A we prove the following regularity estimate: Let us introduce a paraproduct in terms of Schauder functions.
pm ] is the linear interpolation of v between t 0 pm and t 2 pm .
However, we will not use this.
The paraproduct allows us to "paralinearize" nonlinear functions. We allow for a smoother perturbation, which will come in handy when constructing global in time solutions to SDEs.
pm ] is the linear interpolation of DF (v + w) between t 0 pm and t 2 pm , so according to Lemma 3.2 it suffices to note that The local Lipschitz continuity is shown in the same way.

Remark 3.6.
Since v has compact support, it actually suffices to have F ∈ C 1+β/α without assuming boundedness. Of course, then the estimates in Proposition 3.5 have to be adapted. Similarly we can

Young's integral and its various components
In this section we construct Young's integral using the Schauder expansion.
We show that this definition makes sense provided that α + β > 1, and we identify three components of the integral that behave quite differently. This will be our starting point towards an extension of the integral beyond the Young regime.
In a first step, let us calculate the iterated integrals of Schauder functions.
for all m, n. If p = q, then 1 0 ϕ pm (s)dϕ pn (s) = 0, except if p = q = 0, in which case the integral is bounded by 1. If 0 ≤ p < q, then for all (m, n) we have If p = −1, then the integral is bounded by 1.
Proof. The cases p = q and p = −1 are easy, so let p > q ≥ 0. Since χ qn ≡ χ qn (t 0 pm ) on the support of ϕ pm , we have If 0 ≤ p < q, then integration by parts and (3.4) imply (3.5).
Next we estimate the coefficients of iterated integrals in the Schauder basis.
except if p < q = i. In this case we only have the worse estimate Proof. We have χ −10 , d( · 0 ϕ pm χ qn ds) = 0 for all (p, m) and (q, n). So let i ≥ 0. If i < p ∨ q, then χ ij is constant on the support of ϕ pm χ qn , and therefore Lemma 3.8 Now let i > q. Then χ qn is constant on the support of χ ij , and therefore another application of Lemma 3.8 implies that The only remaining case is i = q ≥ p, in which For fixed j, there are at most 2 (i∨p∨q)−i non-vanishing terms in the double sum. Hence, we obtain from Lemma 3.9 that Proof. The case p = −1 or q = −1 is easy. Otherwise we apply integration by parts and note that the estimates (3.8) and (3.9) are symmetric in p and q. If for example p > i, The estimates (3.8) and (3.9) allow us to identify different components of the integral · 0 v(s)dw(s). More precisely, (3.9) indicates that the series p<q This motivates us to decompose the integral into three components, namely Here L is defined as the antisymmetric Lévy area (we will justify the name below by showing that L is closely related to the Lévy area of certain dyadic martingales): The symmetric part S is defined as and π < is the paraproduct defined in (3.1). As we observed in Lemma 3.3, π < (v, w) is always well defined, and it inherits the regularity of w. Let us study S and L.
Proof. We only argue for p with the same arguments. Corollary 3.10 (more precisely (3.8)) implies that where we used 1 − α < 0 and 1 − β < 0 and for the second series we also used that α + β > 1.
Unlike the Lévy area L, the symmetric part S is always well defined. It is also more regular than π < . Lemma 3.13. Let α, β ∈ (0, 1). Then S is a bounded bilinear operator from Proof. This is shown using the same arguments as in the proof of Lemma 3.12.
In conclusion, the integral consists of three components. The Lévy area L(v, w) is only defined if α + β > 1, but then it is quite regular. The symmetric part S(v, w) is always defined and regular. And the paraproduct π < (v, w) is always defined, but it is rougher than the other components. To summarize: Theorem 3.14 (Young's integral). Let α, β ∈ (0, 1) be such that α + β > 1, and let v ∈ C α (L(R d , R n )) and w ∈ C β (R n ). Then the integral (3.12)

Lévy area and dyadic martingales
Here we show that the Lévy area L(v, w)(1) can be expressed in terms of the Lévy area of suitable dyadic martingales. To simplify notation, we assume that v(0) = w(0) = 0, so that we do not have to bother with the components v −10 and w −10 .
We define a filtration (F n ) n≥0 on [0, 1] by setting we set F = n F n , and we consider the Lebesgue measure on ([0, 1], F). On this space, the process M n = n p=0 n ∈ N, is a martingale transform of M , and therefore a martingale as well. Since it will be convenient later, we also define F −1 = {∅, [0, 1]} and M v −1 = 0 for every v. Assume now that v and w are continuous real-valued functions with v(0) = w(0) = 0, and that the Lévy area L(v, w)(1) exists. Then it is given by where in the last step we used that χ pm and χ pm have disjoint support for m = m . The p-th Rademacher function (or square wave) is defined for p ≥ 1 as

Paracontrolled paths and pathwise integration beyond Young
In this section we construct a rough path integral in terms of Schauder functions.

Paracontrolled paths
We observed in Section 3 that for w ∈ C α and F ∈ C In Section 3.2 we observed that if v ∈ C α , w ∈ C β and α + β > 1, then the Young integral I(v, dw) satisfies I(v, dw) − π < (v, w) ∈ C α+β . Hence, in both cases the function under consideration can be written as π < (f w , w) for a suitable f w , plus a smoother remainder. We make this our definition of paracontrolled paths: EJP 21 (2016), paper 2.
which can be shown using similar arguments as for Lemma B.2 in [23]. In other words, for α ∈ (0, 1/2) the space D α v coincides with the space of controlled paths defined in Section 2.2.
The following associativity result, the analog of Theorem 2.3 of [8] in our setting, will be useful for establishing some stability properties of D β v .
Lemma 4.5. Let α, β ∈ (0, 1), and let u ∈ C([0, 1], L(R n ; R m )), v ∈ C α (L(R d ; R n )), and w ∈ C β (R d ). Then The cases (p, m) = (−1, 0) and (p, m) = (0, 0) are easy, so let p ≥ 0 and m ≥ 1. For r < q < p we denote by m q and m r the unique index in generation q and r respectively for which χ pm ϕ qmq ≡ 0 and similarly for r. We apply Lemma 3.9 to obtain for q < p If p < q, then ∆ q w(t k pm ) = 0 for all k and m, and therefore (S q−1 v∆ q w) pm = 0, so that it only remains to bound [S p−1 u(S p−1 v∆ p w) pm − S p−1 (uv)w pm ]| t 0 pm ,t 2 pm ] ∞ . We have ∆ p w(t 0 pm ) = ∆ p w(t 2 pm ) = 0 and ∆ p w(t 1 pm ) = w pm /2. On [t 0 pm , t 2 pm ], the function S p−1 v is given by the linear interpolation of v(t 0 pm ) and v(t 2 pm ), and therefore (S p−1 v∆ p w) pm = 1 2 (v(t 0 pm ) + v(t 2 pm ))w pm , leading to where the last step follows by rebracketing.
As a consequence, we can show that paracontrolled paths are stable under the application of sufficiently smooth functions. Moreover, there exists a polynomial P which satisfies for all F ∈ C Proof. The estimate for DF (f )f v β is straightforward. For the remainder we apply Proposition 3.5 and Lemma 4.5 to obtain The difference F (f ) − F (f ) is treated in the same way.
When solving differential equations it will be crucial to have a bound which is The superlinear dependence on f v ∞ will not pose any problem as we will always have f v = F (f ) for some suitablef , so that for bounded F we get

A basic commutator estimate
Here we prove the commutator estimate which will be the main ingredient in the construction of the integral I(f, dg), where f is paracontrolled by v and g is paracontrolled by w, and where we assume that the integral I(v, dw) exists.
To prove uniform convergence, note that where for the second term it is possible to take the infinite sum over j outside of the integral because j ∆ j (∆ i f ∆ N v) converges uniformly to ∆ i f ∆ N v and because ∆ q w is a finite variation path. We also used that  Note that ∂ t ∆ q w ∞ 2 q ∆ q w ∞ . Hence, an application of Corollary 3.11, where we use (3.10) for the first three terms and (3.11) for the fourth term, yields where in the last step we used α, β, γ < 1. Since α + β + γ > 1, this gives us the uniform convergence of (X N ). Next let us show that X N α+β+γ f α v β w γ for all N . Similarly to (4.4) we obtain for n ∈ N Now we apply Corollary 3.11, where for the last term we distinguish the cases i < j and i = j. Using that 1 − γ > 0, we get where we used both that α + β + γ > 1 and that β + γ < 1.
Remark 4.8. If β + γ = 1, we can apply Proposition 4.7 with β − ε to obtain that C(f, v, w) ∈ C α+β+γ−ε for every sufficiently small ε > 0. If β + γ > 1, then we are in the Young setting and there is no need to introduce the commutator.
For later reference, we collect the following result from the proof of Proposition 4.7: Lemma 4.9. Let α, β, γ, f, v, w be as in Proposition 4.7. Then Proof. Simply sum up (4.5) over N .

Pathwise integration for paracontrolled paths
In this section we apply the commutator estimate to construct the rough path integral under the assumption that the Lévy area exists for a given reference path. Theorem 4.10. Let α ∈ (1/3, 1), β ∈ (0, α] and assume that 2α + β > 1 as well as α + β = 1. Let v = (v 1 , . . . , v d ) ∈ C α (R d ) and assume that the Lévy area . Then I(S N f, dS N v) converges in C α−ε (R n ) for all ε > 0. Denoting the limit by I(f, dv), we have Moreover, I(f, dv) ∈ D α v (R n ) with derivative f and Proof. If α + β > 1, everything follows from the Young case, Theorem 3.14, so let α + β < 1. We decompose The convergence then follows from Proposition 4.7 and Theorem 3.14. The limit is given by from where we easily deduce the claimed bounds.
for all 0 ≤ s < t ≤ 1 and whenever α < 1/2. This means that for any ϕ ∈ C 2α is an α-rough path, which is weakly geometric if and only if ϕ is antisymmetric (we refer to [16] for the definition of (weakly geometric) α-rough paths). Setting ϕ ≡ 0 gives a construction which is quite similar in spirit to the ones of Unterberger [51,52] -of course only for α > 1/3. Note that by varying ϕ we have a simple construction of all rough paths above v.
Next, note that the paracontrolled approach equips us with a natural approximation theory for (para)controlled paths. Indeed, if v ∈ C α and f = π < (f v , v) + f ∈ D α v , and if (v N ) is a sequence converging to v in C α , then we have and d D α (f, f N ) = 0 for all N . Now (4.7) and Remark 3.7 show that the paracontrolled paths are exactly the controlled paths, so we obtain an approximation of an arbitrary controlled path. In the classical approach it is not obvious how to construct a sequence of smooth functions f N that are controlled by v N and that approximate f in controlled path distance (see e.g. Remark 4.9 of [16]).
Let us now combine these two insights to prove that every rough path integral can be obtained as limit of Young integrals. Define for ϕ ∈ C 2α ([0, 1], R d×d ) the bounded operator I ϕ : in other words we replace L(v, v) by ϕ in (4.6). Not surprisingly, what we obtain is nothing else than the rough path integral of f ∈ D α v with respect to (v, V), where V was defined in (4.8). If ϕ is antisymmetric, this is easy to see: Let (v N ) be a sequence of smooth paths such that (v N , V N ) converges to (v, V) in rough path topology (for α ∈ (1/3, α)). Since S and π < are bounded operators, this is equivalent to L(v N , v N ) converging to ϕ in C 2α topology. Define for f ∈ D α v the sequence (f N ) as in (4.9). Then the Young integrals respect to (v, V), but also to I ϕ (f, dv), which proves our claim in the weakly geometric case. Otherwise, decompose ϕ = ϕ symm + ϕ anti into symmetric and antisymmetric part and observe that I ϕ (f, dv) = I ϕ anti (f, dv) + I(f v , dϕ symm ), and the same relation holds for the classical rough path integral. Of course, we should point out that for general ϕ which are not obtained as limit of the piecewise linear dyadic approximations (S N v) we do not have the nice interpretation I ϕ (f, dv) = lim N I(S N f, dS N v). and this is what we will need to solve differential equations driven by v. But we can also estimate the speed of convergence of I(S N f, dS N v) to I(f, dv), measured in uniform distance: Corollary 4.13. Let α ∈ (1/3, 1/2] and let β, v, f be as in Theorem 4.10. Then we have for all ε ∈ (0, 2α + β − 1) Proof. We decompose I(S N f, dS N v) as described in the proof of Theorem 4.10. This gives us for example the term for all ε > 0. From here it is easy to see that But now β ≤ α ≤ 1/2 and therefore α ≥ 2α + β − 1.
Let us treat one of the critical terms, say L( we can apply Lemma 3.12 to obtain The second term on the right hand side can be estimated using the continuity of the Young integral, and the proof is complete.
It is possible to show that this optimal rate is attained by the other terms as well, so that But this requires a rather lengthy calculation, so we decided not to include the arguments here.
Since we approximate f and g by the piecewise smooth functions S N f and S N g when defining the integral I(f, dg), it is not surprising that we obtain a Stratonovich type integral: Proof. The function S N v is Lipschitz continuous, so that integration by parts gives The left hand side converges to F (v(t))−F (v(0)). It thus suffices to show that I(S N DF (v)− DF (S N v), dS N v) converges to zero. By the continuity of the Young integral, Theorem 3.14, it suffices to show that lim N →∞ S N DF (v) − DF (S N v) α(1+ε ) = 0 for all ε < ε. Recall that S N v is the linear interpolation of v between the points (t 1 pm ) for p ≤ N and 0 ≤ m ≤ 2 p , and therefore ∆ p DF (S N v) = ∆ p DF (v) = ∆ p S N DF (v) for all p ≤ N . For p > N and 1 ≤ m ≤ 2 p we apply a first order Taylor expansion to both terms and use the ε-Hölder continuity of D 2 F to obtain which completes the proof.

Remark 4.16.
Note that here we did not need any assumption on the area L(v, v). The reason are cancellations that arise due to the symmetric structure of the derivative of DF , the Hessian of F . Proposition 4.15 was previously obtained by Roynette [47], except that there v is assumed to be one dimensional and in the Besov space B 1/2 1,∞ .

Pathwise Itô integration
In the previous section we saw that our pathwise integral I(f, dv) is of Stratonovich type, i.e. it satisfies the usual integration by parts rule. But in applications it may be interesting to have an Itô integral. Here we show that a slight modification of I(f, dv) allows us to treat non-anticipating Itô-type integrals.
A natural approximation of a non-anticipating integral is given for k ∈ N by Let us assume for the moment that t = m2 −k for some 0 ≤ m ≤ 2 k . In that case we obtain for p ≥ k or q ≥ k that ϕ pm (t 0 km )(ϕ qn (t 2 km ∧ t) − ϕ qn (t 0 km ∧ t)) = 0. For p, q < k, both EJP 21 (2016), paper 2.
ϕ pm and ϕ qn are affine functions on [t 0 km ∧ t, t 2 km ∧ t], and for affine u and w and s < t we Hence, we conclude that for t = m2 −k Here we write [f, v] k for the k-th dyadic approximation of the quadratic covariation [f, v], i.e.
and similarly for [S k−1 f, S k−1 v] k , and [f, v] is the uniform limit of the ([f, v] k ) whenever it exists. From now on we study the right hand side of (5.1) rather than I Itô k (f, dv), which is justified by the following remark. Theorem 5.2. Let α ∈ (0, 1/2) and let β ≤ α be such that 2α + β > 1. Let v ∈ C α (R d ) and f ∈ D β v (L(R d ; R n )). Assume that (L(S k v, S k v)) converges uniformly, with uniformly bounded C 2α norm. Also assume that ([v, v] k ) converges uniformly. Then (I Itô k (f, dv)) converges uniformly to a limit I Itô (f, dv) = I(f, dv) − [f, v]/2 which satisfies and where the quadrativ variation [f, v] is given by speed of convergence can be estimated by and t k := t2 k 2 −k and t k := t k + 2 −(k+1) . In particular, we obtain for t = 1 that Proof. Equation (5.3) follows from a direct calculation using the fact that S k−1 f and holds because the two functions agree in all dyadic points m2 −k .
Remark 5.6. With the Cesàro mean formula (5.4) it becomes possible to study the existence of the quadratic variation using ergodic theory. This was previously observed by Gantert [20]. See also Gantert's thesis [19], Beispiel 3.29, where it is shown that ergodicity alone (of the distribution of v with respect to suitable transformations on path space) is not sufficient to obtain the convergence of ([v, v] k (1)) as k tends to ∞.
It would be more natural to assume that for the controlling path v the non-anticipating Riemann sums converge, rather than assuming that (L(S k v, S k v)) k and ([v, v] k ) converge. This is indeed sufficient, as long as a uniform Hölder estimate is satisfied by the Riemann sums. We start by showing that the existence of the Itô iterated integrals implies the existence of the quadratic variation. Lemma 5.7. Let α ∈ (0, 1/2) and let v ∈ C α (R d ). Assume that the non-anticipating Riemann sums (I Itô k (v, dv)) k converge uniformly to I Itô (v, dv). Then also ([v, v] k ) k converges uniformly to a limit [v, v]. If moreover which implies the convergence of ([v, v] k ) k as k tends to ∞. For 0 ≤ s < t ≤ 1 this gives At this point it is easy to estimate [v, v] 2α , where we work with the classical Hölder norm and not the C 2α norm. Indeed let 0 ≤ s < t ≤ 1. Using the continuity of [v, v], we can find k and s ≤ s Moreover, Remark 5.8. The "coarse-grained Hölder condition" (5.5) is from [44] and has recently been discovered independently by [30].
Similarly, the convergence of (I Itô k (v, dv)) implies the convergence of (L(S k v, S k v)) k : Lemma 5.9. In the setting of Lemma 5.7, assume that (5.5) holds. Then L(S k v, S k v) converges uniformly as k tends to ∞, and Proof. Let k ∈ N and 0 ≤ m ≤ 2 k , and write t = m2 −k . Then we obtain from (5.1) that Let now s, t ∈ [0, 1]. We first assume that there exists m such that t 0 km ≤ s < t ≤ t 2 km .
Combining (5.6) and (5.7), we obtain the uniform convergence of (L(S k−1 v, S k−1 v)) from Lemma 5.7 and from the continuity of π < and S.
For s and t that do not lie in the same dyadic interval of generation k, let s k = m s 2 −k and t k = m t 2 −k be such that s k − 2 −k < s ≤ s k and t k ≤ t < t k + 2 −k . In particular, s k ≤ t k . Moreover Using (5.7), the first and third term on the right hand side can be estimated by For the middle term we apply (5.6) to obtain where Example 4.4, Lemma 5.7, and Lemma 3.13 have been used. It follows from the work of Föllmer that our pathwise Itô integral satisfies Itô's formula: Corollary 5.10. Let α ∈ (1/3, 1/2) and v ∈ C α (R d ). Assume that the non-anticipating Riemann sums (I Itô k (v, dv)) k converge uniformly to I Itô (v, dv) and let F ∈ C 2 (R d , R). Then (I Itô k (DF (v), dv)) k converges to a limit I Itô (DF (v), dv) that satisfies for all t ∈ [0, 1] Proof. This is Remarque 1 of Föllmer [15] in combination with Lemma 5.7.

Construction of the Lévy area
To apply our theory, it remains to construct the Lévy area respectively the pathwise Itô integrals for suitable stochastic processes. In Section 6.1 we construct the Lévy area for hypercontractive stochastic processes whose covariance function satisfies a certain "finite variation" property. In Section 6.2 we construct the pathwise Itô iterated integrals for some continuous martingales.

Hypercontractive processes
Let X : [0, 1] → R d be a centered continuous stochastic process, such that X i is independent of X j for i = j. We write R for its covariance function, R : For a given ρ ∈ [1, ∞) we will work under the following two assumptions: (ρ-var) There exists C > 0 such that for all 0 ≤ s < t ≤ 1 and for every partition s = t 0 < t 1 < · · · < t n = t of [s, t] we have n i,j=1 (HC) Second order polynomials of the process X satisfy a hypercontractivity condition, i.e. for every r ≥ 1 there exists C r > 0 such that for every n and every polynomial P : R n → R of degree 2, for all i 1 , . . . , i n ∈ {1, . . . , d}, and for all p 1 , . . . , p n ≥ −1 and 0 ≤ m 1 ≤ 2 p1 , . . . , 0 ≤ m n ≤ 2 pn These conditions are taken from [17], where under even more general assumptions it is shown that it is possible to construct the iterated integrals I(X, dX), and that I(X, dX) is the limit of (I(X n , dX n )) n∈N under a wide range of smooth approximations (X n ) n that converge to X. Example 6.1. Condition (HC) is satisfied by all Gaussian processes. More generally, it is satisfied by every process "living in a fixed Gaussian chaos"; see [28], Theorem 3.50.
Slightly oversimplifying things, this is the case if X is given by polynomials of fixed degree and iterated integrals of fixed order with respect to a Gaussian reference process. Prototypical examples of processes living in a fixed chaos are Hermite processes.
They are defined for H ∈ (1/2, 1) and k ∈ N, k ≥ 1 as Proof. The case p ≤ 0 is easy so let p ≥ 1. It suffices to note that Proof. Since p > q, for every m there exists exactly one n(m), such that ϕ pm χ qn(m) is not identically zero. Hence, we can apply the independence of X and Y to obtain EJP 21 (2016), paper 2.
Let us write M j := {m : 0 ≤ m ≤ 2 p , χ ij , ϕ pm χ qn(m) = 0}. We also write ρ for the conjugate exponent of ρ, i.e. 1/ρ + 1/ρ = 1. Hölder's inequality and Lemma 3.9 imply m1,m2∈Mj Now write N j for the set of n for which χ ij χ qn is not identically zero. For everyn ∈ N j there are 2 p−q numbers m ∈ M j with n(m) =n. Hence where we used that ρ ∈ [1, 2] and therefore ρ − ρ ≥ 0 (for ρ = ∞ we interpret the right Similarly we apply Lemma 6.3 to the sum over n 1 , n 2 , and we obtain where we used that |N j | = 2 (q∨i)−i . Since |M j | = 2 (p∨i)−i , another application of The result now follows by combining these estimates: Theorem 6.5. Let X : [0, 1] → R d be a continuous, centered stochastic process with independent components, and assume that X satisfies (HC) and (ρ-var) for some ρ ∈ [1, 2). Then we have for all α ∈ (0, 1/(2ρ)), all α ≤ α and all r ≥ 1 as well as and therefore L(X, X) = lim N →∞ L(S N X, S N X) is almost surely 2α-Hölder continuous, where the convergence takes place both in L r (Ω) and almost surely.
Proof. The statement about the Hölder norm of X follows from Kolmogorov's continuity criterion because S N X − X r For the Lévy area note that L is antisymmetric, and in particular the diagonal of the matrix L(S N X, S N X) is constantly zero. To treat the off-diagonal terms it will be convenient to introduce general Besov spaces: let β ∈ R, r, s ≥ 1, and define for Depending on r, s, β, the norm might be finite also for non-continuous functions, but we do not need this. All we need is that f B β ∞,∞ = f β and the trivial observation that whenever r 1 ≤ r 2 and s 1 ≤ s 2 .
Let now k, ∈ {1, . . . , d} with k = and let β ∈ R and r ≥ 1. Then Observe that the term inside the expectation is a second order polynomial of the hypercontractive process X, and therefore we can estimate its L r norm by its L 2 norm raised to the r/2-th power. In combination with Lemma 6.4, this gives Plugging this back into (6.3), we get If β ∈ (1/2, 2), then the right hand side is 2 N r(β−1/ρ) , and if β < 1/ρ (which requires ρ < 2), then N (2 N r(β−1/ρ) ) 1/r < ∞.

Continuous martingales
Here we assume that (X t ) t∈[0,1] is a d-dimensional continuous martingale. Of course in that case it is no problem to construct the Itô integral I Itô (X, dX). But to apply the results of Section 5, we still need the pathwise convergence of I Itô k (X, dX) to I Itô (X, dX) and the uniform Hölder continuity of I Itô k (X, dX) along the dyadics. Recall that for a d-dimensional semimartingale X = (X 1 , . . . , X d ), the quadratic variation is defined as [X] = ([X i , X j ]) 1≤i,j≤d . We also write X s X s,t := (X i s X j s,t ) 1≤i,j≤d for s, t ∈ [0, 1]. Theorem 6.6. Let X = (X 1 , . . . , X d ) be a d-dimensional continuous martingale. Assume that there exist p ≥ 1, β > 0, such that 2pβ > 1, and C > 0 with for all s, t ∈ [0, 1]. Then I Itô k (X, dX) converges to I Itô (X, dX) in C α (R d⊗d ), both almost surely and in L p (Ω). Furthermore, for all α ∈ (0, β − 1/(2p)) we have Proof. The Hölder continuity of X follows from Kolmogorov's continuity criterion. Indeed, applying the Burkholder-Davis-Gundy inequality and (6.4) we have so that E( X 2p α ) C 2p for all α ∈ (0, β − 1/(2p)). Since we will need it below, let us also study the regularity of the Itô integral I Itô (X, dX): A similar application of the Burkholder-Davis-Gundy inequality gives The Kolmogorov criterion for rough paths, Theorem 3.1 of [16], now implies that for all α ∈ (0, β − 1/(2p)) α , so it only remains to prove the bound for the p-th moment of D α . First assume that s = 2 −k and t = 2 −k . As before, we have From here we use a chaining argument to conclude. Define and observe that (6.7) yields E(|M k,m , (6.8) and therefore we have for The sum on the right hand side is finite as long as α < β−1/(2p). For general 0 ≤ s < t ≤ 1 define τ k (r) ∈ 2 −k N 0 for r ∈ [0, 1] such that τ k (r) ≤ r < τ k (r) + 2 −k . If τ k (s) = τ k (t), then |I Itô k (X, dX) s,t − I Itô (X, dX) s,t | ≤ |X τ k (s),s X s,t | + |X s X s,t − I Itô (X, dX) s,t | ≤ 2 −kα X 2 α |t − s| α + 2 −kα M (1) α |t − s| α .
α and thus the proof is complete.
Remark 6.7. We actually showed that which is obviously stronger than E(M p α ) C 2p .
Example 6.8. The conditions of Theorem 6.6 are satisfied by all Itô martingales of the form X t = X 0 + t 0 σ s dW s , as long as σ satisfies E(sup s∈[0,1] |σ s | 2p ) < ∞ for some p > 1. Then we can take β = 1/2 so that for p > 3 we have β − 1/(2p) > 1/3 and X and I Itô (X, dX) are sufficiently regular to apply the results of Section 5.
For the quadratic covariation we have from where we get (7.2) also in the Itô case. In other words we can replace b by λb, σ by λ α σ, and v by v λ . It now suffices to show that v λ , L(v λ , v λ ), and [v λ , v λ ] are uniformly bounded in λ. Since only increments of v appear in (7.1) we may suppose v(0) = 0, in which case it is easy to see that Λ λ v α λ α v α and [v λ , v λ ] 2α [v, v] 2α . As for the Lévy area, we and therefore From here we obtain the uniform boundedness of v λ v λ ,α for small λ, depending only on b, σ, v, L(v, v) and possibly [v, v], but not on y 0 . If σ ∈ C 2+ε b , similar arguments give us a contraction for small λ, and therefore we obtain the existence and uniqueness of solutions to (7.2). Since all operations involved depend on (v, L(v, v), y 0 ) and possibly [v, v] in a locally Lipschitz continuous way, also y λ depends locally Lipschitz continuously on this extended data.
Then y = Λ λ −1 y λ solves (7.1) on [0, λ], and since λ can be chosen independently of y 0 , we obtain the global in time existence and uniqueness of a solution which depends locally Lipschitz continuously on (v, L(v, v), y 0 ) and possibly [v, v]. Theorem 7.1. Let α ∈ (1/3, 1) and let (v, L(v, v)) satisfy the assumptions of Theorem 4.10. Let y 0 ∈ R n and ε > 0 be such that α(2+ε) > 1 and let σ ∈ C 2+ε b (R n , L(R d ; R n )) and b : R n → R n be Lipschitz continuous. Then there exists a unique y ∈ D α v (R n ) such that y = y 0 + · 0 b(y(t))dt + I(σ(y), dv). Remark 7.2. Since our integral is pathwise continuous, we can easily consider anticipating initial conditions and coefficients. Such problems arise naturally in the study of random dynamical systems; see for example [27,2]. There are various approaches, for example filtration enlargements, Skorokhod integrals, or the noncausal Ogawa integral. While filtration enlargements are technically difficult, Skorokhod integrals have the disadvantage that in the anticipating case the integral is not always easy to interpret and can behave pathologically; see [5]. With classical rough path theory these technical problems disappear. But then the integral is given as the limit of compensated Riemann sums (see Proposition 2.5). With our formulation of the integral it is clear that we can indeed consider usual Riemann sums. An approach to pathwise integration which allows to define anticipating integrals without many technical difficulties while retaining a natural interpretation of the integral is the stochastic calculus via regularization of Russo and Vallois [48,49]. The integral notion studied by Ogawa [40,41] for anticipating stochastic integrals with respect to Brownian motion is based on Fourier expansions of integrand and integrator, and therefore related to our and the Stratonovich integral (see Nualart,Zakai [39]). Similarly as the classical Itô integral, it is interpreted in an L 2 limit sense, not a pathwise one.

A Regularity for Schauder expansions with affine coefficients
Here we study the regularity of series of Schauder functions that have affine functions as coefficients. First let us establish an auxiliary result.