Functional Limit Theorems for Volterra Processes and Applications to Homogenization

We prove an enhanced limit theorem for additive functionals of a multi-dimensional Volterra process $(y_t)_{t\geq 0}$ in the rough path topology. As an application, we establish weak convergence as $\varepsilon\to 0$ of the solution of the random ordinary differential equation (ODE) $\frac{d}{dt}x^\varepsilon_t=\frac{1}{\sqrt \varepsilon} f(x_t^\varepsilon,y_{\frac{t}{\varepsilon}})$ and show that its limit solves a rough differential equation driven by a Gaussian field with a drift coming from the L\'evy area correction of the limiting rough driver. Furthermore, we prove that the stochastic flows of the random ODE converge to those of the Kunita type It\^o SDE $dx_t=G(x_t,dt)$, where $G(x,t)$ is a semi-martingale with spatial parameters.


Introduction
The aim of this article is to obtain an approximate effective dynamics for the evolution of a particle moving in a fast oscillating non-Markovian random vector field f (·, y t (ω)). More precisely, we study the small-ε limit of the R d -valued solutions of the equation We begin with studying the convergence of stochastic processes of the form together with their canonical lifts X ε s,t def = t s (X ε r − X ε s ) ⊗ dX ε r , where (y t ) t 0 is a multi-dimensional Volterra process with power law correlation decay t −β and spectral density. We obtain a functional limit theorem together with an enhanced and a rough CLT. The set of admissible functions G i for the CLT are L 2 functions with Hermite rank bounded from below by 1 β (or by 2 β for the rough CLT) and rapidly decaying Hermite coefficients.
We then obtain a rough CLT for the function space valued stochastic process and cast ( . ) as a rough differential equation (RDE) in a Banach space driven by (X ε t ) t∈ [0,T ] . The convergence of the solution of ( . ) is then an easy consequence of the continuous dependence of the RDE on its driver. We emphasize that-even when y is a strong mixing Markov process so we are in the classical domain-it is advantageous to consider the ODE as an RDE. In fact with the approach presented here, we automatically obtain simultaneous convergence of the solutions with any finite number of initial conditions to the solutions of the same limiting SDE of Kunita type, which is not easy to prove with the usual martingale method.
To illustrate the mechanism behind the homogenization problem, we first review the following popular perturbation model:ż ε t = εg(z ε t , y t ) + f (z ε t , y t ), where f, g : R d × R n → R d are sufficiently regular with suitable initial conditions. On the scale of [0, 1 √ ε ], z ε t can be identified with the solution of the slow/fast system: The parameter ε is a positive number tuning the relative speed with respect to the fast motion y and is assumed to be small. If the stochastic process (y t ) t 0 exhibits oscillatory features and f averages to zero with respect to a measure determined by it, the slow motion feels the averaged and the central limit theorem velocity. It is then plausible to deduce a simpler equation, called the effective equation, whose solutions approximate (z ε t ) t 0 on the scale [0, 1 In this article, (y t ) t 0 is taken to be an admissible 'moving average' Gaussian process. Let W t be a Brownian motion in R d , and K(r) ∈ L(R d , R n ) an admissible kernel. A Volterra process is of the form Increments of (fractional) Brownian motions and the fractional Ornstein-Uhlenbeck processes are Volterra processes. In fact, a centered one-dimensional stationary L 2 continuous Gaussian process (y t ) t 0 has the integral representation ( . ) if and only if its spectral measure has a density [Kal , Prop. . ] by simply taking K the Fourier transform of the square root of the density. We will assume an algebraic correlation decay: for some β > 0, s, t 0, For a stationary Gaussian processes, the regularity of the spectral measure can be obtained from the decay rate of the correlation function. In comparison, 'strong mixing', which can be characterized by the spectral density, cannot be classified by the decay rate of its correlation function. Before going ahead explaining how to propose the convergence theorem, we go back to the well known situation in which y t is a Markov process or strong mixing and take g = 0 for simplicity. Then the ansatz is a diffusion process whose generator can be formally written down and the road map for the convergence of x ε t is to show that (x ε ) ε∈(0,1] is relatively compact followed by an application of the martingale problem method. The paramount question is then to determine for which L 2 functions G a functional central limit theorem holds for √ ε t/ε 0 G(y s )ds. This is the Kipnis-Varadhan theory [KV ] which has been become a cornerstone in studying both continuous and discrete stochastic models, see [KLO ] and the references therein. We would like to highlight the fruitful rough path approach explored in [KM , GL , DOP ].
In case (y t ) t 0 is a non-strong mixing Volterra Gaussian noise, how do we formulate the condition on the driver? Let us first take (y t ) t 0 to be the one-dimensional stationary fractional Ornstein-Uhlenbeck process and consider α(G, ε) t/ε 0 G(y s ) ds.
Then the scale α(G, ε) and the scaling limit depend on its Hermit rank, which is the lowest m in the Hermite polynomial expansion of For N = 1, this is straightforward. For N > 1, this models different driving vector fields across different regions, is already more complex and involves CLT's for the iterated integrals in a topology stronger than the weak topology. An enhanced CLT is deduced from and built upon the vast existing literature, c.f. [Taq , MT , CNN ] under a fast chaos decay condition. The effective equation describes in some cases a diffusion, in other cases an anomalous super diffusion, or turns out to be a stochastic differential equation with mixed Itô, Lebesgue, and Young integrals. For related work see [ILRMS ].
The result we described applies to the product from f (x, y) = N i=1 f i (x)G(y) and the one-dimensional toy model noise. The problem remains open for the evolution in the random field generated by multidimensional noise and for the non-product form. For a preliminary examination, let us assume that the applies only if f is sufficiently fast oscillatory so the oscillation compensates the insufficient decay in the auto-correlation of (y t ) t 0 . It is more efficient to take the Hermite expansion of f with respect to the Gaussian process (y t ) t 0 . Wrapping the Hermite polynomials around a Gaussian process amplifies its correlation decay. We use this to determine the decay rate of the random field. In case of E[y 0 ⊗ y 0 ] = id, the Hermite rank of f (x, ·) is the lowest |ℓ| with non-vanishing c ℓ in the following Hermite polynomial expansion: where ℓ = (ℓ 1 , . . . , ℓ n ) is a multi-index and c ℓ = (c 1 ℓ , . . . , c n ℓ ) ∈ R n . Otherwise we use a transformed process (z t ) t 0 for defining the Hermite rank. The Hermite rank precisely characterizes whether the functional CLT holds. We will specify the regularity conditions later.
Main Results. Our main results are presented in Theorem . and Theorem . . The former is an enhanced and rough functional limit theorem for multi-dimensional Voterra processes: X ε ⇒ X and under more stringent decay conditions on the Hermite coefficients, we also show (X ε , X ε ) ⇒ (X, X) where X is a Gaussian field. Theorem . is an application to the homogenization problem for ( . ). We show that, for f ∈ C 3 b , the limiting effective equation is an SDE of Kunita type: is a martingale with spatial parameters whose characteristics are of the form a i,j (x, z, t) = σ i,j (x, z)t. The advantage of the formulation as a Kunita SDE is that for any finite set of initial positions we have the weak convergence of the N -point motion, by which we mean (ϕ ε (z 1 ), . . . , ϕ ε (z N )), where ϕ ε are the solution flows of the SDE and z i ∈ R d are initial points. This is due to the convergence of the drivers in the compactly supported case and solution theory of SDEs of Kunita type. These theorems are proved under further conditions on the Hermite rank of f .
The proof of Theorem . is the content of Section , which relies on computational techniques from Malliavin calculus, and that for Theorem . is presented in Section . For the proof of the convergence of (x ε t ) t∈[0,T ] , we take the continuity theorem route for solution flows of rough differential equations (RDE). For this we cast the equation ( . ) as a Banach space-valued rough differential equation. To put this in the perspective of diffusion processes, this is analogous to lifting a smooth stochastic differential equation to the diffeomorphism group. The rough differential equation method has previously been employed in the very nice work of Kelly and Melbourne [KM ], in a different context. We also resolve a question raised in [KM ] in Proposition . below, with which we also bypass the invocation of the martingale problem method used there, for identifying the limit equation. We then turn this problem back to a finite state problem, and interpret the resulting RDE as classical stochastic differential equations of Kunita type and identify the characteristics of the driving semi-martingales with spatial parameters. Note that we have kept the infinite-dimensional noise in the Kunita type SDE, for we would lose the simultaneous convergence if it is converted to an SDE driven by a finite-dimensional noise.

. The Normalized Noise Process
Definition . An R n -valued stochastic process (y t ) t 0 is called a (stationary) Volterra process if there is an integer d ∈ N, a square-integrable kernel K : R → L(R d , R n ) with K(t) = 0 for t < 0, and a d-dimensional Wiener process (W t ) t∈R such that ( . ) Let β, Θ > 0. We write V n (β, Θ) for the space of n-dimensional Volterra processes which satisfy where K j is the j th row of K. It is also convenient to declare V n def = β,Θ>0 V n (β, Θ).
It is clear that any y ∈ V n is a centered, stationary Gaussian process, which is actually ergodic under the canonical time-shift [CFS ]. We set Σ def = E[y 0 ⊗ y 0 ] throughout the article. By a simple application of the Cauchy-Schwarz inequality, the estimate ( . ) yields This in turn implies the following decay of the temporal correlations of (y t ) t 0 Let us give a few examples comprised by Definition . :

Example A
. Given a fractional Brownian motion B with Hurst parameter H ∈ (0, 1) \ { 1 2 }, one can show that for functions f satisfying Here, c H > 0 is some explicitly known constant. Hence, the fractional Ornstein-Uhlenbeck process dX t = −X t dt + dB t [CKM ] is a Volterra process with kernel It is not hard to check that ( . ) holds with β = 1 − H. . Another example is which leads (for a suitable choice of the normalization constant c H ∈ R) to a fractional Brownian . It is again easy to see that ( . ) holds with β = 1 − H.
Fact . Let Σ ∈ L(R n , R n ) be a positive semi-definite matrix with rank m. Then there is an isometry O ∈ L(R m , R n ) such that D def = O ⊤ ΣO is diagonal and features precisely the non-zero eigenvalues of Σ. Let G be a real-valued function on R n . Then Let (y t ) t 0 be a centered, stationary Gaussian process and recall that Σ = E[y 0 ⊗ y 0 ]. There is no loss of generality in assuming that rank(Σ) = n for otherwise y lives almost surely in a proper subspace of R n . Without further notice, we shall resort to this case in the sequel. Let D and O denote the matrices furnished by Fact . . For any G ∈ L 2 R n , N (0, Σ) , we have the following L 2 R n , N (0, id) convergent expansion: Here, H ℓ : R n → R denotes the Hermite polynomial of degree ℓ: Note that H 0 (x) = 1, H 1 (x) = x and H m , H n L 2 (R,N (0,1)) = δ m,n m!. We want to remark that in [Nua ] the Hermite polynomials are defined with a different normalization. Let us introduce the normalized process (z t ) t 0 by declaring Then (z t ) t 0 is clearly a centered, stationary Gaussian process with E[z 0 ⊗ z 0 ] = id and Note that, by definition, and z ∈ V n (β,Θ) whereΘ = D − 1 2 O ⊤ 2 Θ.
. The Hermite rank of G (with respect to y) is defined by . We say that G satisfies the fast chaos decay condition with parameter p > 1 if Remark . Denoting the Ornstein-Uhlenbeck operator by we have D(T θ ) = {G ∈ L 2 (R n , N (0, id)) : ℓ∈N n 0 |c ℓ | 2 e −2θ|ℓ| ℓ! < ∞}. Thus, by an application of Cauchy-Schwarz we obtain that a function G satisfies the fast chaos decay condition with parameter p if for some δ > 0, G ∈ D(T − 1 2 ln(p−1)−δ ).

Example B The generating functions
satisfy the fast chaos decay condition.

. Malliavin Calculus
In this section we recall some concepts from Malliavin calculus. For details we refer to [Nua ].
is a two-sided Wiener process, we construct an isonormal Gaussian process {W (h) : h ∈ H } by Itô-Wiener integrals We denote by f Sym its symmetrization. We define the multiple Wiener integral with respect to W as follows: To keep the verification of the fourth moment theorem simple we introduce contraction operators between functions that are not necessarily symmetric. Let S andS be two index sets, f = ⊗ i∈S f i and g = ⊗ i∈S g i are primitive tensors with f i , g i ∈ H . For any pair p = {a, b} where a ∈ S and b ∈S we define the contraction of f ⊗ g: We define the multi-contraction with multi-pairs of k-elements.
Then the following product formula holds for f ∈ H ⊗m and g ∈ H ⊗n : where P runs through all multi-pairs from S = {1, . . . , m} andS = {1, . . . , n} so P = m∧n k=0 P k , where P k denotes the collection of all k distinct pairs of indices from S ×S. The 0 th contraction is f ⊗ g. In particular, We have the following straight-forward generalization of [Nua , Proposition . . ]: Lemma . Let ℓ ∈ N n 0 and f ∈ H n with f i H = 1 for each i = 1, . . . , n. Then we have Given ℓ ∈ N n 0 we call a graph of complete pairings, without self-loops and with nodes {1, . . . , n} ℓadmissible if the k th node has exactly ℓ k edges. We denote the collection of all such graphs by Γ ℓ . We will use the fact For a graph G ∈ Γ ℓ , we write γ i,j (G) for the number of edges between the nodes i and j.

Proposition . (Diagram Formula [BH , Taq ])
Let ℓ ∈ N n 0 and X = (X 1 , . . . , X n ) be multivariate Gaussian. Then is multivariate Gaussian jointly with X, and both X and Y have pairwise independent components, then we have

. Hölder Spaces and Their Tensor Product
The algebraic tensor product X ⊗ a Y of two vector spaces X and Y is defined as the subspace of the dual of the bilinear mappings X × Y → R spanned by the elements x ⊗ y, x ∈ X , y ∈ Y, which act by We can also declare a dual action on X ⊗ a Y by x ⊗ y, x * ⊗ y * def = x, x * y, y * for x * , y * in the dual spaces of X and Y, respectively.
If X and Y are Banach spaces, we call a norm · X ⊗Y on X ⊗ a Y a reasonable crossnorm if It is easy to see that both canonical examples, the injective and the projective tensor norm, define reasonable cross norms. Finally, the tensor product of X and Y is defined as the completion of X ⊗ a Y with respect to the norm · X ⊗Y . Without further notice, we always assume that X ⊗ Y is equipped with a reasonable crossnorm. Further details on the tensor product of Banach spaces can be found in the classical monographs [LC , Rya ]. We now turn to the tensor product of interest in the sequel of this work. Let α > 0 and X be a normed space. The classical Hölder space We . We will use the following result. It was shown in [KM , Cor. . ] for a norm equivalent to ( . ).

Lemma . The canonical embedding
x, y ∈ R d , extending to the whole space by linearity, defines a reasonable cross norm on the former, with respect to which the embedding is continuous.

. Rough Path Theory
The theory of rough paths has by now certainly found its way into the mathematical mainstream. This is, of course, also due to the very nice monographs [FV , FH ] to which we refer for further details. We shall work in the framework of controlled rough paths [Gub , FdLP ] popularized in the book of Friz and Hairer.
Let X, Y : [0, T ] → X be Hölder continuous with exponents γ 1 and γ 2 , respectively. If γ 1 + γ 2 > 1, Young's integration theory enables us to define T 0 Y dX as the limit of Riemann sums [u,v] b , this equation is well posed and the solution is continuous in both the driver X and the initial data. In the case γ 1 1 2 , this fails and one cannot define the integral XdX by the above Riemann sum anymore. This is partially remedied by rough path theory, which allows to define an integral with respect to less regular integrators by enhancing the Riemann sum, see ( . ) below.
Let T > 0, γ ∈ (0, 1), and X be a Banach space. We write |X| C γ for the γ-Hölder norm of a function where · X ⊗X is the norm on X ⊗ X . Note that any γ-Hölder function Owing to this identity, the values of the two-parameter process X can actually be recovered from the knowledge It is customary to denote the set of γ-rough paths with values in X by C γ [0, T ], X . Albeit this space is certainly not linear, it becomes a complete metric space in the topology inherited from the Banach space We emphasize that-unless X = {0}-the space C γ [0, T ], X is not separable. In order to avoid norm versus seminorm considerations, we shall tacitly assume that X 0 = 0 which is anyways the case in our ultimate application of the theory.
can be defined, provided that Y is controlled by X. This is to say, there is a In this case, it is also customary to write (Y, Under these conditions, the integral ( . ) is well defined as the limit of compensated Riemann sums along an arbitrary sequence of partitions with mesh tending to 0: Here, we introduced the 'product' Employing the algebraic relation ( . ), it is then an easy exercise to check that whence the integral ( . ) is indeed well defined by the sewing lemma [Gub , FdLP ].
Remark . We note that the theory of controlled rough paths usually assumes the continuous embedding see [FH , Section . ]. It is well known that this is ensured by working with the projective tensor product on X ⊗ X . This, however, turns out to be rather inconvenient when X is a Hölder space, which is the case of interest in this article. It is in fact beneficial to work with the more explicit reasonable crossnorm introduced in Section . . Thankfully, as already observed by Kelly and Melbourne [KM , Proof of Theorem . ], the proof of Proposition . in [FH ] does not really require the embedding ( . ), but it is enough to equip the tensor product with a reasonable crossnorm.

The Multidimensional Limit Theorem . Statement of the Result
First we need to introduce a bit of notation: W denotes the Stratonovich lift of an n-dimensional standard Wiener process W , that is, Then for any real-valued functions G 1 , . . . , G N ∈ L 2 R n , N (0, Σ) with min k=1,...,N H(G k ) > 1 β and any terminal time T > 0 all of the following hold: ⊲ (Finite dimensional distributions) If each G k is of fast chaos decay with parameterΘ(2n − 1) + 1, then where W = (W t ) t∈[0,T ] is a standard Wiener process and Υ is the unique non-negative square root of the matrix Theorem . is a multi-dimensional version of the result in [GL b], see also [Geh ] for related limit theorems for Hermite processes.
Remark . The limiting rough path ( . ) can be rewritten in Itô form as follows. Let W Itô be the Itô lift of W . Then

. Functional Central Limit Theorem
We fix y ∈ V n (β, Θ) and write K for its kernel. Let z be the normalized process ( . ). Recall, z ∈ V n (β,Θ) for the normalized kernel defined in ( . ).
Lemma . For each ℓ ∈ N n 0 , H ℓ (z t ) can be written in terms of the multiple Itô-Wiener integral as follows: Consequently, {H ℓ (z t ), ℓ ∈ N n 0 } is an orthonormal set for any t 0.
Proof. Given a multi-index ℓ = (ℓ 1 , . . . , ℓ n ) ∈ N n 0 , using Lemma . , Let G ∈ L 2 R n , N (0, Σ) . The expansion ( . ) becomes The first step towards the proof of Theorem . is to establish a central limit theorem for the finite-dimensional distributions of the vector-valued process The argument proceeds along a well-established pathway, see e.g. [BT , Geh , NNZ ]: . We first assume G 1 , . . . , G N ∈ L 2 (R n , N (0, Σ)) live in a finite number of chaoses (that is, their Hermite expansion ( . ) is finite) and prove the statement of Theorem . by invoking the fourth moment theorem of Nualart, Peccati, and Tudor [NP , PT ], see also [NOL , NP ]. . A simple truncation argument then shows that the general case can be reduced to .
Proposition . (Fourth Moment Theorem [NP , Theorem . . ]) Let m 2 and ℓ ∈ N m be a multi-index with ordered components, that is, ℓ 1 · · · ℓ m . Let (f ε ) ε>0 ⊂ ⊕ m i=1 H ⊗ℓi . Denote by f i the projection of f to H ⊗ℓi . Assume that for i, j = 1, . . . , m, the limit i.e. all contractions of f ε k with itself vanish, except for the 0 th and the k th . Then we have By Lemma . , we may apply the proposition to stochastic processes of the form { √ ε t i ε 0 H ℓ j (z u ) du}, which is done in the following two key lemmas.
Lemma . Let k, ℓ ∈ N n 0 with |ℓ| ∧ |k| > 1 β and s, t ∈ [0, T ], then Proof. Using H ℓ (z u ) = I τ uK ⊗k and by ( . ), we may assume |ℓ| = |k| and also s t. We have We will use the fact that for v ≤ u, the right-hand side is a sum of products of We first take s = t, and by a change of variables, the claim follows at once.
Next, we show that the contractions vanish as we send ε → 0. Note that, even though {τ tK i } are orthonormal, for two different times s = t, τ sK i and τ tK j may not be orthogonal for i = j. If they were, then the proof of the next lemma would be much simpler.

Remark .
This together with the lemmas below immediately leads to parts and of Theorem . when G k are polynomial functions: X ε → ΥW in finite-dimensional distributions. The second claim follows for polynomials from the Hölder bounds proved below.
The following L 2 estimate, which we will actually lift to an L p bound momentarily, plays a key rôle in proving the weak convergence in Hölder topology: Lemma . (L 2 Hölder bound) Let 0 s t T . Let G ∈ L 2 (R n , N (0, Σ)) be a real-valued function satisfying the fast chaos decay assumption with parameter (2n − 1)Θ. If H(G) > β −1 , then Proof. Owing to stationarity of (y t ) t 0 , there is no loss of generality in assuming s = 0. We expand the square as double integral and make use of the expansion ( . ): The series is absolutely summable, thus one can exchange the order of summation and integration. Recall that Combining this with Proposition . , we find Inserting this estimate back into ( . ), we get since H(G) > β −1 and ℓ∈N n 0 |c ℓ |Θ |ℓ| 2 (2n − 1) |ℓ| 2 √ ℓ! < ∞ by assumption. This concludes the proof.
The following estimate will be used in Section . : Lemma . (L p Hölder bound) Let 0 s t T . Let G ∈ L 2 (R n , N (0, Σ)). If H(G) > β −1 and G satisfies the fast chaos decay assumption with parameter (2n − 1)(p − 1)Θ + 1, where p > 2, then Proof. As in the proof of Lemma . , we may assume s = 0 without any loss of generality. Recall that, by Gaussian hypercontractivity, X L p (p − 1) |ℓ| 2 X L 2 for any scalar random variable in the |ℓ| th Wiener chaos. It follows that The statement follows since the sum is finite by the fast chaos decay assumption, c.f. Definition . .

. Lifted Functional Central Limit Theorem
The aim of this section is to prove the convergence of the iterated integrals. We begin with a definition inspired by [JS ]. Let us write F t def = σ(W s , s t) for the filtration generated by the Wiener process driving y through ( . ).

Definition .
We say that a function G : R n → R with E[G(y 0 )] = 0 satisfies the conditional decay condition (with respect to y) if ∞ 0 E[G(y s )|F 0 ] L 2 (Ω) ds < ∞.
As we shall see in the sequel, if y ∈ V n (β, Θ), then any centered G ∈ L 2 (R n , N (0, Σ)) with H(G) > 2β −1 falls in the regime of Definition . .
Next, we recall a stability result for stochastic integrals, which is a weaker version of [KP , Theorem . ]: Proposition . For each k ∈ N, let X k and M k be stochastic processes adapted to a filtration F k . If (X k , M k ) ⇒ (X, M ) weakly in C [0, T ], R 2N and, for each k ∈ N, M k is an F k local martingale with sup k∈N M k T L 2 (Ω) < ∞, then Remark . The condition sup k∈N M k T L 2 (Ω) < ∞ is of course equivalent to Kurtz-Protter's famous uniformly controlled variation (UCV) condition where · denotes the quadratic variation.
Lemma . Let y be any stationary stochastic process which is ergodic under the canonical time-shift. For each k ∈ {1, . . . , N }, fix a function G k : R n → R satisfying the conditional decay condition with respect to y, the fast chaos decay assumption with parameter (2n − 1)Θ, and having Hermite rank H(G k ) > β −1 . Set Then we have the decomposition where the respective j th -components are defined by: In addition, the following hold: ⊲ For each ε > 0, the process M ε is a martingale with respect to the rescaled filtration F t ε t 0 . ⊲ We have sup ε∈(0,1] M ε t L 2 (Ω) < ∞ for each t 0. ⊲ For each t ∈ [0, T ], we have Z ε t L 2 (Ω) → 0.
Proof. Due to the conditional decay condition, the processes M ε and Z ε are well defined and, in particular, (M ε t ) ε∈(0,1] is uniformly bounded in L 2 for each t ∈ [0, T ]. Indeed, splitting the first term and by the shift invariance of y we have By Lemma . and the conditional decay condition, both terms on the right-hand side are uniformly bounded in ε ∈ (0, 1]. The fact that Z ε t L 2 (Ω) → 0 for each t ∈ [0, T ] follows from the stationarity of (y t ) t 0 in combination with the conditional decay condition on G and Minkowski's integral inequality: We also need the following standard result, see e.g. [Ald , Proposition . ]: Proposition . Let (M k ) k∈N be a sequence of martingales. Suppose that: Combining Propositions . and . with Lemma . , we can prove the following central result: Proposition . Let y be a stationary ergodic stochastic process and X ε be as in Lemma . , with the conditional decay condition with respect to y in place. Let W be a standard Wiener process and let with W s,t denoting the Stratonovich integral, Υ the non-negative symmetric square root of Υ 2 , and Proof. Without loss of generality, we may assume s = 0 by stationarity. By Lemma . and an integration by parts, we have the decomposition Since Z ε t L 2 (Ω) → 0 for each t ∈ [0, T ] by Lemma . , we see that M ε → ΥW in finite-dimensional distributions. Since (M ε t ) is L 2 bounded on [0, T ] with uniform bound in ε, Burkholder-Davis-Gundy inequality shows that (X ε , M ε ) ε∈(0,1] is tight in C [0, T ], R 2N . Consequently, Z ε 0 − Z ε is also tight and converges to zero in probability. Since By Proposition . , By Birkhoff's ergodic theorem and stationarity, for each fixed t ∈ [0, 1] these terms converge almost surely as We used the shift invariance and ergocidity of the process. The first term is 0 and the second term is To see that this convergence of Putting everything together, we have Lemma . Let f : R + → R be locally bounded. If lim ε→0 εf t ε = 0 for any t ∈ [0, 1], then also We return to the representation y t = R K(t − u) dW u where K : R → L(R d , R n ) and set Note that, for every τ > 0, y t =ȳ τ t +ỹ τ t . Moreover,ȳ τ ∈ F τ , whereasỹ τ is independent of F τ . As the normalized process z t = D − 1 2 O ⊤ y t is nothing but a linear transformation of y, we also have the decomposition z t =z τ t +z τ t , wherez Proof. For j = 1, . . . , n letz j,τ be the j th component ofz τ . We have that as required.
Lemma . Let ℓ ∈ N n 0 be a multi-index. Then, for each t τ , Proof. We use the following well-known expansion formula for Hermite polynomials. For any a, b ∈ R with a 2 + b 2 = 1: Taking a = z τ t L 2 and b = z τ t L 2 , we obtain Sincez k,τ t is independent of F τ , after taking the conditional expectation all terms vanish except those with j k = ℓ k , thus, the sum reduces to one term: The asserted estimate follows by an application of Proposition . and the diagram formula to the term Condition . Let γ ∈ 1 3 , 1 2 and p > 2 1−2γ . We impose the following conditions on the data in ( . ): . The fast process y has almost sure sample paths in C 0+ [0, T ], R n . Moreover, there are β, Θ > 0 such that y ∈ V n (β, Θ).
where c j,k are the coefficients in the Hermite expansion of D k f j .
We have the following limit theorem on the solution to ( . ): Theorem . Consider the ODE ( . ) with Condition . for γ ∈ 1 3 , 1 2 in place. Then, for each ε > 0, there is a unique pathwise solution (x ε t ) t∈ [0,T ] . For i, j = 1, . . . , d, let Then the following hold: . Suppose that f (·, y) have common compact support for every y ∈ R n . Then there is a limiting such that X ε converges weakly to X. Furthermore X = (X, X s,t + (t − s)Λ), where X is a Gaussian field with covariance σ i,j (x, z)(t ∧ s). In particular, for any (random) initial condition x 0 independent of (y t ) t 0 , as ε → 0, the solution x ε of ( . ) converges weakly in C γ [0, T ], R d to the solution of the RDE . Suppose furthermore that x 0 ∈ L ∞ is independent of (y t ) t 0 . Then the solution of ( . ) converges weakly in C [0, T ], R d to the unique solution of the following Kunita type Itô SDE: Here F (x, ·) def = X(x) is a martingale with spatial parameters and with characteristics A i,j (x, z, t) = σ i,j (x, z)t. Furthermore, if ϕ ε denotes the solution flow, the N -point motion (ϕ ε t (x 1 ), . . . , ϕ ε t (x N )) converges to the N -point motion of the limiting equation.
Let us record a few remarks on Theorem . :

Remark .
⊲ The limiting equation ( . ) is equivalent to the classical Itô SDE ⊲ Theorem . extends the results of [GL a] from product to non-product drifts and from one-to multi-dimensional environmental fast-scale noise. ⊲ We observe that-unlike the one-dimensional work of the first two authors of this article-the limiting rough path has a non-vanishing Lévy area. The reason for this is the non-reversibility of the Gaussian process (y t ) t 0 in dimension n 2. Indeed, a one-dimensional, stationary Gaussian process is always reversible in the sense that, for each T > 0, In higher dimensions, stationarity of a Gaussian process is a genuinely weaker requirement than reversibility. We also note that the presence of the non-trivial area term matches the findings of [DOP ].

. Weak Convergence in Rough Path Spaces
This method of lifting the RDE on R d to a Banach space of functions has previously been successfully employed in the very nice work of Kelly and Melbourne [KM ]. Since there is however a minor inaccuracy in that article (see Example C) below, we choose to recall an appropriate amount of detail of their approach. Without further notice, we shall assume the regularity assumptions of Condition . in the sequel. Let us first elaborate on the minor inaccuracy in the work of Kelly and Melbourne, then present results leading to tightness of X ε in the rough path spaces over C 3 R (R d , R d ), the space of smooth functions with compact support in B R . This is Proposition . below which resolves a question raised in [KM ] and allows us to bypass the martingale method problem as well as to obtain a Kunita type SDE in the limit.
The example below shows that tightness in rough path spaces over infinite-dimensional Banach spaces is a touchy business.
Example C When proving tightness of the driving rough path X ε = (X ε , X ε ) in [KM , Corollary . ] (there denoted by W ε ), the authors assert that the unit ball in This, they claim, should follow by a standard Arzelà-Ascoli argument as in [FV , Chapter ]. In the latter, however, the authors only consider rough paths with values in a finite-dimensional Euclidean space. In fact, for any infinite-dimensional Banach space X , the embedding C γ ′ [0, T ], X ֒→ C γ [0, T ], X is not compact. To see this, let us take T = 1 and a sequence of points {x n } n∈N ⊂ B X , given by Riesz's lemma, with |x m − x n | 1 2 for any m = n. Set F n t def = tx n and F n 1 2 for all m = n.
A remedy of this lack of compactness was presented in the recent preprint [CFK + ]. There, the authors worked in the p-variation setting, but the arguments of course transfer to Hölder rough paths. We recapitulate a streamlined version of the argument in the sequel; mainly for the reader's convenience, but also to fix some notations.
Let R > 0. We let C α R denote the Hölder functions supported in the ball of radius R, Definition . Let X = (X, X) and Y = (Y, Y) be random variables with values in C γ [0, T ], X . We say that X and Y are equal in finite-dimensional (space-time) distributions if, for each n ∈ N, 0 t 1 < · · · < t m T , and x i 1 , . . . , x i n ∈ R d (i = 1, 2), Our main weak convergence result of this section is as follows: and if the finite-dimensional space-time distributions of any weak limit point of (X n ) n∈N coincide, then there is a random variable Remark . Proposition . resolves the question raised in [KM , Remark . ]. In fact, by employing it, one could bypass the invocation of the martingale problem used in [KM , Section ] in order to characterize the limiting equation.
Proof of Proposition . . Fix γ ′ < γ and α ′ < α. Let ε > 0. Then (X n ) n∈N is tight in the space By Lemma . below this set is relatively compact as a subset of . It remains to show that the finite-dimensional space-time distributions uniquely characterize the limit points of (X n ) n∈N . Let X andX be limit points of (X n ) n∈N . By the Portmanteau theorem, the laws of both X andX are Radon measures. A compactification argument (see e.g. [Bog , Exercise . . ]) shows that these two measures coincide, provided we can exhibit a test set F of bounded continuous functions f : We choose F as the family of characteristic functionals furnished by the space-time evaluations of Definition . : One verifies that F satisfies the requirements above, whence X d =X, as required.
Proof. The embedding ( . ) is certainly continuous. To see that it is actually compact, it is enough to show that the set K 1 1 is relatively compact. We shall make use of a general version of the Arzelà-Ascoli theorem recalled in Proposition . after the proof. Let (X n ) n∈N be a sequence in K 1 . We need to show that both (X n ) n∈N and (X n ) n∈N are relatively compact in the spaces respectively. Since the arguments are similar, we only detail the relative compactness of (X n ) n∈N .
By the algebraic constraint ( . ) and a straight-forward interpolation estimate (see e.g. [FH , Exercise . ]), it is enough to show that (X n 0,· ) n∈N is relatively compact in . First note that this family is equicontinuous. Indeed, again by Chen's relation, for all 0 s t T . We are thus left to show that, for each t ∈ [0, T ], the set X n 0,t : n ∈ N is relatively in order to conclude with Proposition . below. For this it is enough to note the compact embedding which in turn again follows from the Arzelà-Ascoli theorem-but this time in the spatial coordinate.
The following version of the Arzelà-Ascoli theorem, employed in the previous proof, can be found in multiple places, see e.g. [Kel , Theorem . ]: Proposition . (Arzelà-Ascoli) Let X be a compact metric space and Y be a metric space. Let C(X, Y ) be the space of continuous mappings f : X → Y , equipped with the uniform topology. Then a set K ⊂ C(X, Y ) is relatively compact if and only if ε for all f ∈ K, provided that d X (x, y) δ.

. Proof of Theorem .
We can now conclude the proof our homogenization result: Proof of Theorem . . We first assume that f (·, y) and g are supported in B R for each y ∈ R n . Owing to Proposition . and Lemma . , it is enough to show that . We wish to employ Proposition . . To see that we argue as in Lemma . : First notice that, for any i = 1, . . . , d and any p 1, .
By hypercontractivity and our assumption c ℓ (x) = 0 for all x ∈ R d whenever |ℓ| < inf x H f (x, ·) , we get as in ( . ) The series on the right-hand side is finite by the assumption ( . ). By Lemma . , we know that the derivatives of f have (at least) the same Hermite rank. Hence, we can similarly prove that |t − s|, k = 1, 2, 3 and ( . ) follows by Kolmogorov's continuity theorem. Arguing as in Lemma . also shows that sup ε∈(0,1] E[|X ε | C 2γ |] < ∞. Then Theorem . and Remark . show that X ε converges to X with X s,t (x, z) = (Υ ⊗ Υ)W Itô s,t (x, z) + Λ(x, z)(t − s). It remains to prove statement . Observe that X is a Gaussian process with covariance σ i,j (x, z)(s ∧ t) and G(x, t) def = X t (x) + Γt is a semi-martingale with spatial parameters and characteristics A i,j (x, z, t) def = σ i,j (x, z)t and Γ(x)t.
The characteristics are sufficiently regular so that the Kunita type equation is well posed. The regularity of A i,j and Γ comes from the uniform correlation decay assumption inf x H f (x, ·) > 2 β . For example, the functions E[D(f i (x, y s )f j (z, y 0 ) + f i (x, y 0 )f j (z, y s ))] are absolutely integrable in s on [0, ∞), consequently is C 3 b in both variables and jointly continuous in (x, z). The same argument applies to Γ. Furthermore, by the theory for SDEs driven by semi-martingales with spatial parameters [Kun ], there is a unique Brownian flow ϕ s,t (x) to the equation ( . ) and for each s, x, ϕ s,t (x) − x − t s Γ(ϕ s,r )dr is a square integrable martingale with lim h→0 E[(ϕ t,t+h (x) − x)(ϕ t,t+h (y) − y) T ] = σ i,j (x, y)t. The RDE is 'equivalent' to the Kunita type SDE which can be seen from the Riemann sum approximation for integration with respect to the semi-martingale G : where P denotes a partition of [0, t]. On the other hand, the rough integral, where the prime on δ(x t ) denotes its Gubinelli derivative, which is (Dδ) xu (δ(x u )·)(·) when applied to Λ(·, ·), is: It is then routine to verify that ϕ t (x) is the solution to the rough differential equation The latter equation is also well posed. We then fix R > 0 and let η R : R d → R + be a non-negative smooth function with η R = 1 on B R and η R = 0 on B c 2R . Set f R (x, y) def = η R (x)f (x, y). By the first part of the theorem, we know that there is an As before, the convergence is simultaneous for any finite number of initial conditions. Mimicking that for G, let G R = X R + Γ R t denote the spatial semi-martingale with characteristics σ R i,j (x, y) and Γ R t. Then for each initial condition x 0 , the solutions of the Kunita type SDEs converge weakly to the solution of ( . ) with the same initial distribution. We have applied [Kun , Thm. . . ], it is trivial to verify the conditions there: sup x |D a D b σ R (x, y)| y=x , where a, b ∈ {0, 1} and sup x |D x Γ R | are uniformly bounded, also the characteristics are in the required classC α b for some α > 0, c.f. [Kun ], and converge uniformly in (x, y) on compact sets. The convergence is in the sense that, given any initial data (x i 0 ), i = 1, . . . , N , the N -point motion converges. Finally, we show the weak convergence of x ε ⇒ x. By the Portemanteau theorem, it is equivalent to showing that limsup ε→0 P(x ε ∈ A) P(x ∈ A) for any closed set A ⊂ C [0, T ], R d . Since x R,ε ⇒ x R weakly in C [0, T ], R d , the Portemanteau theorem gives the estimate: Hence, limsup R→∞ P(x R ∈ A) P(x ∈ A). Note also that P(|x R | ∞ R) = P(|x| ∞ R) for R > x 0 L ∞ . Markov's inequality gives that P(|x| ∞ R) → 0 as R → ∞. Consequently, sending R → ∞ in ( . ), we have proven that limsup ε→0 P(x ε ∈ A) P(x ∈ A) which concludes the proof for the weak convergence of x ε ⇒ x. The proof for the convergence of the N -point motion is an easy adaption of the above, as we have the N -point convergence for both x ε,R as ε → 0 and for x R as R → ∞.