The Stratonovich heat equation : a continuity result and weak approximations

We consider a Stratonovich heat equation in $(0,1)$ with a nonlinear multiplicative noise driven by a trace-class Wiener process. First, the equation is shown to have a unique mild solution. Secondly, convolutional rough paths techniques are used to provide an almost sure continuity result for the solution with respect to the solution of the 'smooth' equation obtained by replacing the noise with an absolutely continuous process. This continuity result is then exploited to prove weak convergence results based on Donsker and Kac-Stroock type approximations of the noise.


Introduction and main results
The main motivation of the paper comes from [3], where the authors consider, for some fixed T > 0, the stochastic heat equation with some initial data and Dirichlet boundary conditions, where the random fields (θ n ) n≥1 verify that the family of processes θ n (t, x) := t 0 x 0θ n (s, y) dyds converge in law, in the space C([0, T ] × [0, 1]) of continuous functions, to the Brownian sheet. Then, sufficient conditions on θ n are provided such that Y n converges in law, as n → ∞, to the mild solution Y of whereẆ (t, x) stands for the space-time white noise. Applications of this result include the case of a Donsker type approximation, as well as a Kac-Stroock type approximation in the plane.
Such diffusion approximation issues for stochastic PDEs have been extensively studied in the literature. Let us quote here Walsh [32], Manthey [20,21], Tindel [30], Carmona and Fouque [8], Florit and Nualart [12], just to mention but a few. Now, following the line of [3], a natural question to be dealt with is to try to get the same type of weak convergence in a non-additive situation, that is when the termθ n (t, x) in (1) is replaced with f (Y n (t, x))θ n (t, x), for some sufficiently smooth function f : R → R. In this case, one expects that the limit equation is of Stratonovich type, as it was the case in [8] and [12] (see also [2,29] for examples of a similar behaviour). This phenomenon has been recently illustrated by Bal in [1] as well, for the weak approximation of a linear parabolic equation in R d with random potential given by Y n (t, x)θ n (x).
Going back to our setting, and focusing first on what we expect to be our limit equation, we should consider the Stratonovich heat equation Unfortunately, a well-known drawback in this situation is that the solution admits only very low regularity (see [31]), a major obstacle for our treatment of the non-linearity of the problem. For this reason, we have chosen to restrict our attention to the case of a trace-class noise. To be more specific, we will assume thatẆ is the formal derivative of a L 2 (0, 1)-valued Wiener process {W t , t ∈ [0, T ]} with covariance operator Q satisfying the following property: Let (e k ) k≥1 be the basis of eigenfunctions for the Dirichlet Laplacian ∆ in L 2 (0, 1) given by e k (x) := √ 2 sin(kπx), x ∈ [0, 1]. We assume that there exists a sequence of non-negative real numbers (λ k ) k≥1 and a parameter η > 0 such that Qe k = λ k e k for every k ≥ 1 and k≥1 (λ k k 4η ) < ∞. Without loss of generality, we assume that η ∈ (0, 1 8 ). In particular, for any fixed t ≥ 0, the process W t can be expanded in L 2 (Ω; L 2 (0, 1)) as where (β k ) k≥1 is a family of independent Brownian motions. Note that the condition k≥1 (λ k · k 4η ) < ∞ is only slightly stronger than the usual trace-class hypothesis k≥1 λ k < ∞, insofar as η can be chosen as small as one wishes. For instance, it covers the case where Q = (Id−∆) −r with r > 1 2 . Another change with respect to [3] lies in our formulation of the study: compared to the random field approach in [3], here it has turned out to be more convenient to use the Hilbertspace-valued setting of Da Prato and Zabczyk [9]. In particular, we are interested in the mild form of equation (2), which is given by where from now on, we will use the notation Y t (·) := Y (t, ·), ψ is some initial condition and f : R → R is a smooth enough mapping. As usual, (S t ) t>0 denotes the strongly continuous semigroup of operators generated by −∆.
A first part of the paper (Section 2) will be devoted to the interpretation of (4) as a Stratonovich equation, and it will allow us to exhibit an existence and uniqueness result for the solution. We should mention here that the stochastic heat equation in the Stratonovich framework has already been studied in various settings, most of them in the case of a linear multiplicative noise (see e.g. [6,7,17]). Once we have given a full sense to (4), our strategy to study weak approximations of the solution could be stated in the following loose form: (a) We will first establish an almost sure continuity result (in some suitable space-time topology) for the solution of (4) with respect to the solution of the 'smooth' heat equation obtained by replacing W with an absolutely continuous process W (see Theorem 1.1). (b) Then, for two particular families of absolutely continuous processes approximating W , we will rely on our continuity result to show convergence towards the solution in some possibly different probability space, leading us to the expected weak convergence (see Theorem 1.2).
Our strategy to compare the solution Y of (4) with 'smooth' solutions is based on a genuine rough-paths type expansion of the equation, which follows the ideas of [16,11,10]. Roughpaths techniques have indeed proved to be very efficient as far as approximation of non-linear systems in finite dimension is concerned (see [13,Chapter 17]), and it is therefore natural to address the same question in this infinite-dimensional background. Note that the model given by (4) differs from those studied in [11,10], where only finite-dimensional noises are considered, forcing us to revise most of the technical details behind this procedure (see Section 3).
In order to state the above-mentioned results with more details, we need to introduce the spaces in which our random variables will take their values. First, as far as the spatial regularity is concerned, the fractional Sobolev spaces must come into the picture. Thus, for every α ∈ R and p ≥ 2, we will denote by B α,p the fractional Sobolev space of order α based on L p (0, 1), that is ϕ ∈ B α,p ⇐⇒ (−∆) α ϕ ∈ L p (0, 1), where ∆ stands for the Dirichlet Laplacian in L 2 (0, 1) (see e.g. [26] for a thorough study of these spaces). For the sake of conciseness, we will write B α for B α,2 and B for B 0 = L 2 (0, 1) throughout the paper. We will also denote by B ∞ the set of continuous functions on [0, 1], endowed with the supremum norm.
Of course, we will also have to deal with the time regularity of our processes. So, for any subinterval I ⊂ [0, T ] and any Banach space V , we define C 0 (I; V ) as the space of continuous functions y : Moreover, for any λ > 0, we introduce the space C λ (I; V ) of λ-Hölder continuous V -valued functions endowed with the seminorm Note that in the case where I = [0, T ], we will often write C λ (V ) for C λ ([0, T ]; V ). Now, consider any process W defined on the same probability space as W and with absolutely continuous paths in B η,2p , for every integer p ≥ 1 (recall that η has been defined in Hypothesis 1). Then, let { Y t , t ∈ [0, T ]} be the unique solution of the Riemann-Lebesgue equation (considered in a pathwise sense): whereψ ∈ B. As evoked earlier, our first main result will consist in comparing such a solution Y with the solution Y of (4). This result can be stated as follows.
The topologies involved in this statement are directly inherited from our rough-paths analysis of the equation, and their relevance should therefore become clear through the lines of Section 3 (see in particular the proof of the central Proposition 3.9). Note that this bound certainly remains valid with respect to some Hölder norm (in time) for the left-hand side of (7), as our arguments will suggest it. However, due to the technicality of the rough-paths procedure, we have preferred to focus on the behaviour of the supremum norm (see also Remark 3.12).
Our next step will consist in applying the above Theorem 1.1 -on some possibly larger probability space -to two particular families of absolutely continuous processes that approximate W , so as to retrieve weak convergence results for the solution. To define these approximation processes, we will make use of the following additional notation. Namely, on a probability space (Ω, F, P ), given a sequence (X k ) k≥1 of centered i.i.d processes admitting moments of any order, we set, for every t ≥ 0, Thanks to our forthcoming Proposition 4.3, we know that W(X · ) is indeed well-defined as a process on (Ω, F, P ) with values in B η,2p , for all p ≥ 1. Let us also specify that, given a sequence (β n ) n≥1 of real-valued processes, we will henceforth denote by (β n,· ) n≥1 = (β n,k ) n,k≥1 a generic sequence of independent copies of (β n ) n≥1 (defined on a possibly larger probability space). The two families of approximations at the core of our study can now be introduced as follows (we fix T = 1 for the sake of clarity): (i) The Donsker approximation W n := W(S n,· ), where S n is a sequence of appropriately rescaled random walks. To be more specific, let (Z j ) j≥1 be a family of i.i.d random variables with mean zero, unit variance and admitting moments of any order. Then, for each n ∈ N, set Recall that, by Donsker Invariance Principle (see e.g. [19,Thm. 4.20]), S n is known to converge in law to the standard Brownian motion in C 0 ([0, 1]; R), as n → ∞.
(ii) The Kac-Stroock approximation W n := W(θ n,· ), where θ n stands for the classical Kac-Stroock approximation of the (one-dimensional) Brownian motion. Precisely, introduce a standard Poisson process N and a Bernoulli variable ζ independent of N , with P (ζ = 1) = 1/2. Then, set Here again, the sequence (θ n ) n≥1 thus defined converges in law in C 0 ([0, 1]; R), as n → ∞, to a standard Brownian motion (see e.g [18,24]). Of course, the one-dimensional weak convergence of S n (resp. θ n ) towards the Brownian motion is a priori not sufficient for us to apply Theorem 1.1. Our aim is to turn this one-dimensional weak convergence into an almost sure convergence result for W(S n,· ) (resp. W(θ n,· )) with respect to the topology involved in (7), and this will appeal in particular to Skorokhod embedding arguments (see Section 4). Together with Theorem 1.1, the strategy ends up with the following statement. Theorem 1.2. Under the hypotheses of Theorem 1.1, fix an initial condition ψ =ψ ∈ B γ , and denote by Y n the (Riemann-Lebesgue) solution of (6) associated with either the Donsker approximation W n = W(S n,· ) or the Kac-Stroock approximation W n = W(θ n,· ). Then, as n → ∞, Y n converges in law to Y in the space C 0 (B γ ).
The paper is organized as follows. Section 2 is devoted to a few preliminaries on the theoretical study of the Stratonovich heat equation (2). The rough-paths type analysis of this equation is performed in Section 3, and it will lead us to the proof of our continuity result Theorem 1.1. In Section 4, we will tackle the approximation issue for the above-defined Donsker and Kac-Stroock processes by exhibiting a general convergence criterion (see Proposition 4.5), which will entail Theorem 1.2. Eventually, we have added an appendix with material on fractional Sobolev spaces and the proof of a technical result needed in Section 3.4. Remark 1.3. At first sight, the reader familiar with rough-paths type continuity results may be surprised at the absence of some 'Lévy-area' term in our bound (7). Otherwise stated, the convergence of an approximation W n towards W (with respect to some appropriate topology) is sufficient to guarantee the convergence of the associated solution. In fact, on this particular point, the situation is very similar to the case of a one-dimensional SDE with so-called commuting vector fields, i.e., It is a well-known fact (see for instance [28]) that under this commuting assumption, the solution Y of (10) appears as a continuous functional of the sole noise B (that is, no need for any Lévy-area component). In a certain way, Equation (2) fits the above pattern. Indeed, for fixed x ∈ (0, 1), the noisy perturbation can be written as i.e., we (morally) deal with n = ∞ and σ i (·) = √ λ i e i (x)f (·) in (10). So, at least at this heuristic level, our continuity result (7) becomes quite natural. In a more specific way, we will see that due to the commuting property, the Lévy-area term arising from the rough-paths analysis of (2) can be easily reduced to some continuous functional of W (Lemma 3.8).
Remark 1.4. As we shall see it in Section 3, our proof of Theorem 1.1 heavily relies on the properties of the fractional Sobolev spaces B α,p , which we have recalled in the appendix. Unfortunately, many of these properties become much more restrictive as soon as the underlying space dimension is larger than 2, as illustrated by the classical Sobolev embeddings. This accounts for our choice to stick to a one-dimension heat equation. Note however that our considerations on the theoretical study of (4) (Section 2) could be easily extended to a multidimensional setting.
Remark 1.5. The results in this paper remain actually valid for any operator A of the form A = −∂ x (a · ∂ x ) + c, where c ≥ 0 and a : [0, 1] → R is a continuously differentiable function. Indeed, as explained in [10, Section 2.1], such an operator A also generates an analytic semigroup of contractions and one can identify the domains D(A α p ) of its fractional powers with the spaces B α,p , which is sufficient to follow the lines of our reasoning.
Unless otherwise stated, any constant c or C appearing in our computations below is understood as a generic constant which might change from line to line without further mention.

The Stratonovich integral
Recall that we are interested in the following mild equation: where (S t ) t≥0 denotes the strongly continuous semigroup of operators generated by −∆ with Dirichlet boundary conditions, and W is assumed to satisfy Hypothesis 1. The integral appearing in (11) is thus understood in some Stratonovich sense, an interpretation to be clarified in a convolutional setting, which is the main purpose of this first section. Once endowed with this interpretation, it turns out that (11) reduces to a common mild Itô equation with an additional drift term, and accordingly the existence and uniqueness of Y can be derived from well-known results (see Section 2.2).
Note that the following regularity assumption on f will prevail throughout the section.
Hypothesis 2. The function f : R → R is bounded, of class C 2 and with bounded derivatives.
2.1. The Stratonovich integral. In order to interpret , we restrict our attention to a particular class of processes Y . Namely, we assume that, on some filtered probability space (Ω, F, (F t ) t≥0 , P ), {Y t , t ∈ [0, T ]} is the unique B-valued mild solution of the following equation: for some F t -adapted random fields {V i t , t ∈ [0, T ]}, i = 1, 2, with continuous paths in B (recall that B := L 2 (0, 1)). Moreover, we assume that In fact, such a process Y is explicitly given (see e.g. [9]) by Now, a natural idea to define the integral in (11) in some Stratonovich sense would be the following: introduce the kernel G t−s (x, y) of S t−s and, with the representation where the symbol * denotes the space variable and each integral t 0 G t−s (x, * )·f (Y s ), e j B •dβ j s is interpreted in the (classical) Stratonovich sense. Nevertheless, it is a well-known fact that the process Y defined by (14) is not always a B-valued semimartingale (in other words, Y is not always a strong solution of (14), see e.g. [9, Sec. 5.6]), making the definition of these integrals quite obscure at first sight.
To overcome this difficulty, we consider a standard semimartingale approximation of Y : for every ε > 0, let Y ε be the unique (strong) solution of stands for the Yosida approximation of −∆. In particular, −∆ ε defines a monotone and bounded operator which converges pointwise to −∆ (see e.g. [4]). Then, for every fixed ε > 0, Y ε is a semimartingale, and we have (see e.g. [9, Proposition 7.5]) This extrinsic procedure will lead us to the following interpretation: Proposition 2.1. With the above notations, the family of Stratonovich integrals defined for all t ∈ [0, T ], x ∈ (0, 1) by where the latter limit is considered in L 2 (Ω), converges in L 1 (Ω; C 0 ([0, T ]; B)) as ε tends to 0. Its limit, that we denote by where P (ξ) := 1 2 ∞ k=1 λ k e k (ξ) 2 and the notation t 0 S t−s (f (Y s ) · dW s ) refers to the (usual) Itô integral.
Thus, the Stratonovich integral in (11) will henceforth be understood as in the latter proposition, in the class of processes Y satisfying an equation of the form (12). Note that the relation (17) provides us with a familiar decomposition for the Stratonovich integral as the sum of an Itô integral and a trace term, and it must be compared with the decomposition for the (standard) Stratonovich integral.
As a first step in the proof of Proposition 2.1, observe that the two terms in the right-hand side of (17) are indeed well-defined processes in L 2 (Ω; B). This is a straightforward consequence of the boundedness of f, f ′ , the trace-class assumption on W , and the fact that P defines a uniformly bounded function.
We point out that, in the definition (16), we first restrict the integral to (0, u) with u < t in order to avoid the singularity in the derivative of the kernel G. This will be clarified in the proof of the next lemma.
Proof. For any fixed (u, x) ∈ (0, t) × (0, 1) and j ∈ N, the process s → G t−s (x, * ) · f (Y ε s ), e j B , s ∈ [0, u], defines a (real-valued) semimartingale. Hence, we can use Itô's formula to assert that The hypotheses on f and V i , and the fact that s ≤ u < t, guarantee that all terms on the right-hand side above are well-defined. More precisely, using the spectral decomposition of G given by one proves that, P -a.s., and the latter is finite since s ∈ (0, t). As far as the second pathwise integral on the right-hand side of (19) is concerned, we have, for instance, Here, we have used the fact that 0 . Similarly, one easily proves that the last term in (19) is a well-defined square-integrable random variable.
We can now go back to our main statement.
Proof of Proposition 2.1. First, owing to the previous lemma, we have that the limit on the right-hand side of (16) equals to This can be proved using the bounded convergence theorem. Hence, the proof reduces to the two assertions: and Let us first deal with (20). By the isometry property of the stochastic integral, the boundedness of S t−s and the assumptions on f , we have: upon recalling that ∞ k=1 λ k < ∞. In order to prove (21), we use the Sobolev embedding L 1 (0, 1) ⊂ B − 1 4 −ε and the assumptions on f and V 2 . In fact, we have (recall that · B∞ refers to the supremum norm on [0, 1]). Therefore, by the assumptions on V 2 , the convergence (15) guarantees that (20) and (21) hold, and this lets us conclude the proof.

Existence and uniqueness of solution.
With the notations of the previous section, consider the following mild (Itô) equation: where we recall that P (ξ) = ∞ k=1 λ k e k (ξ) 2 . Hypotheses 1 and 2 allow us to apply standard methods and guarantee that this equation admits a unique L 2 (Ω; B)-valued solution Y (see [9]). In particular, we observe that Y solves an equation of the form (12) and these random fields fulfill the assumptions specified in the previous section. Thus, for all t ∈ [0, T ], we can define the Stratonovich integral which yields that Y is also a solution of (11).
Conversely, due to (17), it is readily checked that any solution of (11) in the class of processes satisfying an equation of the form (12) is also a solution of (23) (use the uniqueness of V 1 , V 2 in (14)). This provides us with the following existence and uniqueness result.
Moreover, Y has a version with continuous paths and it holds that

A rough-paths type analysis of the equation
Let us now turn to the proof of Theorem 1.1. As announced in the Introduction, our strategy is based on a rough-paths type expansion of the equation. Accordingly, a few ingredients taken from the so-called convolutional rough paths theory, that is rough paths theory adapted to mild evolution equation, must be introduced in the first place.
3.1. Tools from (convolutional) rough paths theory. We gather here some preliminary material borrowed from [16] (see also [11,10]). As underlined in the latter references, a key point towards a fruitful pathwise analysis of (4) lies in the following elementary observation: due to the semigroup property S t+t ′ = S t · S t ′ , it holds that, for any s < t, (4) can be equivalently written in the convenient form: This should be compared with the behaviour of solutions to standard (stochastic) differential equations: Then, in a rough-paths setting, we are naturally led to extend the definition ofδ to processes with 2 variables, as follows: To make the notations (25)-(26) even more legitimate in this convolutional context, let us point out the following algebraic properties: and defineĈ λ (I; V ) as the set of processes y : As we will see it in the sequel, a proper control for the expansion of t s S t−u (f (Y u ) • dW u ) also requires the extension of both definitions (5) and (27) to processes with 2 or 3 variables. Precisely, if z : and we define C λ 2 (I; V ) (resp. C λ 3 (I; V )) along the same lines asĈ λ (I; V ). Observe for instance that if y ∈ C λ 2 (I; L(V, W )) and z ∈ C β 2 (I; V ), then the process h defined as h tus = y tu z us (s ≤ u ≤ t ∈ I) belongs to C λ+β 3 (I; W ). Note that when I = [0, T ], we will more simply write C λ k (V ) := C λ k (I; V ) for k ∈ {1, 2, 3}. Besides, from now on, we use the following convenient notation for products of processes.
With this convention, it is readily checked that if g : To end up with this toolbox, let us report what may be seen as the cornerstone result of the convolutional rough paths theory, namely the existence of (some kind of) an inverse operator forδ, denoted byΛ, and which will play a prominent role in our forthcoming decomposition (32). In brief, this operator allows us to get both a nice expression and a sharp estimate for the regular terms, i.e., the terms with Hölder regularity strictly larger than 1, that arise from the expansion of The proof of this result can be found in [16, Theorem 3.5].

3.2.
A rough-paths type expansion of the solution. We are now ready to settle our reasoning, which applies to a smooth enough vector field f : (24) is of class C 3 , bounded and with bounded derivatives.
Our main task will actually consist in establishing the following pathwise decomposition for the solution Y to (24): Theorem 3.5. Assume that both Hypotheses 1 and 3 hold true. Fix γ ∈ ( 1 2 , 1 2 + η) and assume that ψ ∈ B γ . Then theδ-variations of the solution Y to (24) can be expanded as where we have set, for all s < u < t, and The theorem must be read as follows: in the expansion of , we can exhibit a main term, namely , and a residual termΛ ts R Y with Hölder regularity strictly larger than 1, in the sense of Theorem 3.4 (take α = 0 in (31)). Besides, from the decomposition (32), we can somehow conclude that the whole dynamics induced by W is "encoded" through the two (stochastic) operator-valued processes L W and L W W . So, before we turn to the proof of (32), let us elaborate on the properties of these two processes.
3.3. The couple (L W , L W W ). At this point, we consider L W ts and L W W ts as stochastic linear operators acting on the space of smooth functions ϕ. The following (straightforward) relation accounts for the algebraic behaviour of the couple (L W , L W W ): it is the convolutional analog of the classical Chen's relation between a process and its Lévy area (see [15]).
Proposition 3.6. The processes L W and L W W obey the following algebraic rules: For all s < u < t and all smooth function ϕ, Now, it matters to identify the regularity properties of L W and L W W as 2-variables processes. A first clue in this direction is given by the following (a.s.) regularity result for the noise W itself.
Our second ingredient towards the regularity properties of (L W , L W W ) relies on two successive observations. First, due to their relative simplicity, the two expressions (33)-(34) can be integrated by parts. Then, owing to the some obvious commuting properties, we can turn L W W into an easy-to-handle functional of δW . This is what we propose to detail in the proof of the following Lemma.
Proof. With the expansion (3) in mind, it is easily checked, by setting where the limits are taken in L 2 (Ω, B). The proof then reduces to applications of Itô's formula and we only elaborate on (38). For fixed i, j ∈ {1, . . . , N }, apply Itô's formula to the (random) By taking the sum over i, j, we deduce the formula λ i e 2 i and by passing to the limit (in L 2 (Ω, B)), we get Formula (38) immediately follows.
We are now in a position to extend both L W ts and L W W ts to larger classes of functions ϕ and retrieve the following (a.s.) bounds, which will be at the core of our identification procedure: Proposition 3.9. Under the hypotheses of Theorem 3.5, for any small ε > 0, there exists ε > 0 and p ≥ 1 such that (almost surely) for some constant c ε,ε,p .
Proof. In fact, thanks to the representation formulas (37)-(38) and the pathwise regularity of W (Lemma 3.7), all of these bounds can be derived from the classical properties of the fractional Sobolev spaces (see Appendix A). For instance, owing to (72), one has, for any p ≥ 1 and α ≥ 1 so that for anyε small enough, In the same way, By taking α small enough, i.e., p large enough, we get the expected bound, namely The other estimates for L W can be proved along the same lines. As far as L W W is concerned, observe for instance that if ε > 0 is small enough, then one has The (analogous) proofs for the other bounds are left to the reader.
3.4. Proof of Theorem 3.5. First, we need to justify that the right-hand side of the decomposition (32) is well-defined. This will rely (among others) on the following a priori controls for the solution Y . For the sake of clarity, we have postponed the proof of this statement to Appendix B.
Recall that according to our convention (29), the definition of K Y in (43) must be understood as K Y ts := (δY ) ts − L W ts (f (Y s )) for every s < t ∈ [0, T ]. Lemma 3.11. Under the hypotheses of Theorem 3.5, let Z be the process given by Z 0 = ψ and Proof. First, according to Theorem 3.4, we need to justify that R Y ∈ C µ 3 (B) for some µ > 1. To this end, expand R using the algebraic rules (30) and (42), which gives (40) and (42), it is readily checked where we have used (71) to get the last inequality (recall that a ts := S t−s − Id).
Then, as far as L W N is concerned, let us expand N using standard differential calculus, which provides us with the expression where the additional operator-valued process L aW is defined by B, B)), it is sufficient to prove that N ∈ C We are thus in a position to applyΛ to R Y , and so Z is properly defined through (44). The regularity of Z and the bound (45) are immediate consequences of (39)-(40) and the contraction property (31) ofΛ. The details are left to the reader.
Remark 3.12. Although not optimal, the two regularity results (42) and (43) are thus sufficient for us to prove that the right-hand side of the decomposition (32) is indeed well-defined. We also retrieve an important stability phenomenon here: Y and Z both belong to the same spacê C 2η (B ∞ )∩C 0 (B γ ). A posteriori, this accounts for our choice in favor of this particular topology.
We can eventually proceed to prove Theorem 3.5.
Proof of Theorem 3.5. We need to identify the increments of Y with those of the process Z defined in Lemma 3.11. To do so, we naturally rely on some expansion of the right-hand side of (23). Precisely, we have that where the process N Y ts = δ(f (Y )) ts − (δW ) ts · f (Y s ) · f ′ (Y s ) has already been considered in the proof of Lemma 3.11. Therefore, with this notation, it holds that Now, by the contraction property (31), we know thatΛ(R Y ) ∈ C µ 1 2 (B) for some µ 1 > 1. Besides, with the same ingredients as in the proof of Lemma 3.10 (Burkholder-Davis-Gundy inequality plus Lemma 6.1, see Appendix B), we can easily lean on the expansion (47) of N to prove that J Y ∈ C µ 2 2 (B) for some µ 2 > 1 (note thatδJ Y = R Y ). Consequently,δ(Z − Y ) ∈ C µ 2 (B) with µ = inf(µ 1 , µ 2 ) > 1, and this entails thatδ(Z − Y ) = 0. Indeed, for any partition P [s,t] = {s = t 1 < . . . < t n = t} of [s, t], one has, due to the telescopic sum property reported in Proposition 3.2, and we conclude by letting the mesh |P [s,t] | := max i |t i+1 − t i | tend to 0.
As a straightforward consequence of the decomposition (32), we can exhibit an almost sure bound for Y in terms of W . Indeed, by plugging the estimate (45) back into the equation, we deduce that for any subinterval for some constant λ > 0, and similar estimates for At this point, a basic patching argument easily leads us to the following statement: Corollary 3.13. Under the hypotheses of Theorem 3.5, there exist ε > 0 and p ≥ 1 such that for some deterministic function G ε,p : (R + ) 2 → R + bounded on bounded sets.

3.5.
Comparison with smooth solutions. The previous considerations will allow us to prove our continuity result (Theorem 1.1) and for this purpose, we first go back to the case where the driving noise is an absolutely continuous process W (with values in B η,2p ), assumingly defined on the same probability space as W . In this situation, our mild equation is naturally understood in a pathwise sense as a classical (Riemann-Lebesgue) mild equation, i.e., and the (pathwise) existence and uniqueness of the solution Y follows from standard PDE results. The key step towards a comparison between Y and Y lies in the following result, which points out the similarity between the couple (L W , L W W ) at the core of the previous considerations and the couple (L W , L W W ) constructed from W : Lemma 3.14. Define the operator-valued processes L W and L W W in the classical Riemann-Lebesgue sense as for every smooth function ϕ. Then both formulas (37) and (38) remain valid when substituting W for W , and accordingly the bounds (39) and (40) hold true for W as well.
Proof. It suffices to replace the use of Itô's formula in the proof of Lemma 3.8 with standard integration by parts. Indeed, as an absolutely continuous process, W obeys the rules of standard differential calculus and one has for instance Consequently, it holds that which precisely fits the pattern of (38).
Another consequence of the similarity between (L W , L W W ) and (L W , L W W ) through the two formulas (37) and (38) is a set of (readily-checked) Lipschitz-type bounds: with the notations of Proposition 3.9, one has, for some polynomial expression c W, W , and this bound remains valid for all of the other topologies involved in Proposition 3.9.
Then, as far as the solution Y is concerned, note that and it is obvious in this (absolutely continuous) situation that J Y ∈ C µ 2 (B) for some µ > 1. Therefore, we can easily follow the lines of our previous identification procedure (see the proofs of Lemma 3.11 and Theorem 3.5) in order to exhibit a similar formula for theδ-variations of Y : Lemma 3.15. Under the hypotheses of Theorem 1.1, assume that ψ ∈ B γ . Then theδvariations of the solution Y to (50) can be expanded as where R Y tus : In particular, the bound (49) remains valid for Y when replacing ψ (resp. W ) with ψ (resp. W ).
With these identifications in hand, the proof of Theorem 1.1 becomes a matter of a standard rough-paths argument, and we only sketch out the main steps of the procedure (see e.g. the proof of [10,Lemma 5.2] for further details on the computations).
Proof of Theorem 1.1. In order to compare Y with Y , we can now rely on their respective decompositions (32) and (54). By setting g := f f ′ , we get that with a similar splitting for R Y − R Y (based on the expansion (47)). Now, as in Lemma 3.11, we consider the following appropriate topology: By using the decomposition (55) and the bounds (52)-(53), standard differential calculus shows that for any subinterval for some constant λ > 0. As in Corollary 3.13, we can then rely on an elementary patching argument to reach the global bound (7).
Remark 3.16. The above strategy sheds new light on the classical Itô-Stratonovich correction phenomenon arising in the approximation of stochastic heat equations. Indeed, on the one hand, it emphasizes that the convergence of Y towards Y reduces to the convergence of (L W , L W W ) towards (L W , L W W ), and on the other, continuous bounds such as (53) clearly highlight the relevance of the Stratonovich interpretation of L W W in this context. In a way, the correction phenomenon is therefore more directly observed through the decomposition (34) of L W W as the sum of an Itô integral and a trace term.

Approximations in law
We now aim to prove our approximation result, that is Theorem 1.2. Thus, from now on, we assume that the hypotheses in Theorem 1.2 are all satisfied. Recall that the approximation processes involved in this statement, namely the Donsker and the Kac-Stroock approximations, have been specified in the Introduction (see (8) and (9)), as well as the notations W and β n,· . Besides, in this part of the paper we take T = 1 for the sake of simplicity. 4.1. Preliminary results. As a first step towards Theorem 1.2, we need to check that the processes we have constructed via W are indeed well-defined. To do so, we will make use of the following bound.
This inequality can be easily deduced from the following result, which is clear for r = 1 and was proved by Rosenthal for r > 1 (see [25,Thm. 3]).
Theorem 4.2. Let Y 1 , . . . , Y n be independent centered random variables satisfying E |Y i | 2r < ∞, where r ≥ 1. Then, there exists a constant C r such that The transition from real-valued to B η,2p -valued processes will be ensured by the following result.
Proposition 4.3. Let (X k ) k≥1 be a sequence of centered i.i.d. random variables on some probability space (Ω, F, P ). Assume that each X i has moments of any order, and consider a sequence (λ k ) k≥1 of positive numbers such that k≥1 λ k k 4η < ∞ for some (fixed) η > 0. Then, for every p, q ≥ 1, the random series of functions k √ λ k X k e k converges in L 2pq (Ω, B η,2p ) to an element X which satisfies for some constant C p,q,λ,η which only depends on p, q and k≥1 λ k k 4η .
Proof. Set X n := n k=1 √ λ k X k e k and observe that X m −X n B η,2p = X (m,n),η L 2p (0,1) , where we have set X (m,n),η (ξ) := m k=n+1 k 2η √ λ k X k e k (ξ). Then, by Jensen's inequality, and thanks to Lemma 4.1, we get E X (m,n),η 2pq due to the uniform bound e k B∞ ≤ √ 2. In particular, E X m − X n 2pq B η,2p tends to zero as both m and n tend to infinity, so that X n converges in L 2pq (Ω, B η,2p ). The bound (56) can of course be derived from (57).
In particular, due to Hypothesis 1, we can conclude that W n = W(S n,· ) and W n = W(θ n,· ) are indeed well-defined processes with values in B η,2p . Let us now get a little bit closer to the assumptions of Theorem 1.1 by checking that in both cases, W n admits an absolutely continuous version. Proof. Since the (deterministic) approximation grid for S n,k does not depend on k, it is easily seen that W(S n,· ) t = W(S n,· ) i n + n · t − i n · W(S n,· ) i+1 n − W(S n,· ) i n if t ∈ i n , i + 1 n .
In particular, W(S n,· ) is a piecewise linear process (with values in B η,2p ) and accordingly it is absolutely continuous.
As far as the Kac-Stroock approximation is concerned, first we can see that it has a continuous version with values in B η,2p . Indeed, applying Proposition 4.3, For the sake of clarity, we will also denote by W(θ n,· ) this continuous version. To prove the existence of an absolutely continuous version, we will see that with probability 1, whereθ n t := √ n · (−1) ζ+N (nt) . Indeed, thanks to Proposition 4.3, W(θ n,· ) t is well-defined for every t ∈ [0, 1] as an element of L 2p (Ω, B η,2p ) and As a consequence W(θ n,· ) is (a.s.) Bochner-integrable. Moreover, for each t ∈ [0, 1], by similar arguments as in the proof of Proposition 4.3. Since the last expression tends to 0 as N → ∞, we obtain that for each t ∈ [0, 1] W(θ n,· ) t = t 0 W(θ n,· ) s ds a.s.

4.2.
A general convergence criterion. One of our key ingredients to prove Theorem 1.2 via Theorem 1.1 lies in the following statement, which puts forward sufficient conditions for an approximation of the noise (defined on the same probability space) to converge with respect to the topology involved in (7). Proposition 4.5. Let (β n ) n≥1 be a sequence of centered processes and β a Brownian motion, all defined on a same probability space (Ω, F, P ), and such that the following two conditions are satisfied: (i) For every integer p ≥ 1, there exists a constant C p such that for all s, t ∈ [0, 1] and all n ≥ 1, E |β n t − β n s | 2p ≤ C p |t − s| p .
(ii) For every integer p ≥ 1, there exists a constant C p such that for all n ≥ 1, for some fixed parameter ν > 0. Then if we consider independent copies (β n,k ) k≥1 (resp. (β k ) k≥1 ) of β n (resp. β) on a same probability space, we have that, for any integer p ≥ 1 and any ε > 0, a.s.
Let us first see how to combine the above conditions (i) and (ii) so as to exhibit convergent bounds in Hölder topology.
Lemma 4.6. Under the hypotheses of Proposition 4.5, for all integers n, p ≥ 1, all ε ∈ (0, 1) and s < t ∈ [0, 1], one has for some constant C p which only depends on p.
Proof. If |t − s| ≤ n −ν , then due to the condition (i), it holds that On the other hand, if |t − s| > n −ν , one has, thanks to the condition (ii), Proof of Proposition 4.5. By using successively Proposition 4.3 and Lemma 4.6, we get, for any q ≥ 1, We are thus in a position to apply the Garsia-Rodemich-Rumsey Lemma 6.1 (with δ * = δ) and assert that, for q large enough, As a result, it holds that which, thanks to the Borell-Cantelli Lemma, leads us to the conclusion, that is as n tends to infinity.
Example: As an immediate illustration of Proposition 4.5, let us consider here the Wong-Zakai approximation of a given noise W satisfying Hypothesis 1. Precisely, set and denote by Y n the solution of the equation understood in the classical Riemann-Lebesgue sense. Note that W n can be equivalently described as follows: with the expansion (3) of W in mind, i.e. W = W(β · ), we have that W n = W(β n,· ), where, for each k ≥ 1, β n,k stands for the linear interpolation of β k with mesh 1 n . Therefore, it suffices to check that the conditions (i) and (ii) in Proposition 4.5 are satisfied by β n := β n,1 , which is a matter of elementary computations (it can be also seen as a particular case of the forthcoming Proposition 4.8).
Together with Theorem 1.1, we retrieve the following almost sure approximation result: Under the hypotheses of Theorem 1.1, let Y n be the Wong-Zakai approximation of (4) with mesh 1 n and initial condition ψ. Then, as n → ∞, one has N [Y −Y n ; C 0 (B γ )] → 0 a.s. This almost sure result in a non-linear situation is closely related to those of [5] or [2], where Wong-Zakaï approximations for some parabolic type equations have been considered. We also note that convergence in law for this type of approximations in the framework of stochastic evolution equations has been studied in [29]. Now, let us turn to the proof of the weak approximation results of Theorem 1.2, and which successively involve the Donsker approximation β n = S n and the Kac-Stroock approximation β n = θ n . In both cases, we wish to exploit the criterion of Proposition 4.5, which naturally leads us to the following 2-step procedure: Step 1: Show that Condition (i) is satisfied, i.e., sup n E |β n t − β n s | 2p ≤ C p |t − s| p .
Once these two conditions have been checked, the proof of the weak convergence Y n → Y in C 0 (B γ ) becomes a straightforward consequence of Theorem 1.1 and Proposition 4.5, since W(β n,· ) ∼ W(β n,· ) and accordingly, ifȲ n denotes the solution of (6) associated withW n := W(β n,· ), it holds thatȲ n ∼ Y n .
Note that for both approximations S n and θ n , the result in Step 2 will be derived from a Skorokhod embedding argument (see [27]). In the Donsker situation (Proposition 4.9), this relies on a classical strategy towards the celebrated invariance principles (see [22,Section 5.3]). In the Kac-Stroock situation (Proposition 4.11), we will take advantage of an identification result due to Griego, Heath and Ruiz-Moncayo (see [14]).

Donsker approximation.
Here, we proceed to tackle the above 2-step procedure for the Donsker approximation S n .
Step 1 (Donsker case): Proposition 4.8. For every p ≥ 1, there exists a positive constant C p such that, for all 0 ≤ s < t ≤ 1, sup Proof. First, note that S n can also be expressed as Then, by Lemma 4.1, we have Step 2 (Donsker case): Proposition 4.9. Let (Z i ) i∈N be a sequence of i.i.d. centered random variables with unit variance. Then, there exists a probability space (Ω,F ,P ), a Brownian motionβ defined on it and, for each n ≥ 1, a family of independent random variables (Z ..,n with the same law as Z i , such that the following is satisfied. Set Then, for every integer p ≥ 1, Proof. As mentioned earlier, it is based on a general Skorokhod embedding theorem (see [27, p. 163]), which, in our particular situation, can be stated as follows : there exists a probability space (Ω,F ,P ), a Brownian motionβ defined on it and, for each n ∈ N, a sequence {τ ..,n of independent and positive random variables such that the random vector , . . . ,β τ (n) 1 + · · · + τ (n) n has the same law as Moreover, it holds that E τ Set T n 0 := 0 and T (n) i for i ≥ 1. With this notation, we can infer that We define nowZ Observe that, if t ∈ i−1 n , i n , and hence, for t ∈ i−1 n , i n , we have that Thus, we only need to bound the first term in the latter expression, and to this end, we will use the following decomposition: On the one hand, the maximal inequality for Brownian motion yields On the other hand, by Cauchy-Schwarz inequality, we have Note that, by Lemma 4.1, Thus, in order to estimate the term A n 2 , we only need to study the probability appearing in (61). To do so, observe first that since t ∈ [ i−1 n , i n ], we have, for n such that 1 2 n −1/4 > 1 n (that is, for n ≥ 3), Then, using again Lemma 4.1, we get Therefore, A n 2 ≤ C p n −p/4 , which concludes the proof.

4.4.
Kac-Stroock approximation. Along the same lines as in the Donsker case, we proceed now to analyze the Kac-Stroock approximations based on θ n .
Step 1 (Kac-Stroock case): Proposition 4.10. For every integer p ≥ 1, there exists a positive constant C p such that, for all 0 ≤ s < t ≤ 1, sup Proof. We have that where in the latter equality we have used the symmetry of the integrand. Taking into account that the two possible values of random variable (−1) N (nu 1 )+···+N (nu 2p ) only depend on the fact that the exponent is even or odd, we can write the latter expression above as Using that for u 1 < u 2 < · · · < u 2p , the random variables N (nu 2i )− N (nu 2i−1 ) are independent with Poisson distribution of parameter n(u 2i − u 2i−1 ), we have that (65) is equal to This term can be bounded by This concludes the proof.
Step 2 (Kac-Stroock case): Proposition 4.11. There exists a probability space (Ω,F ,P ), a Brownian motionβ defined on it and, for each n ∈ N, a processθ n with the same law as θ n in (9) such that, for any ν ∈ (0, 1 4 ) and any p ∈ N, for some constant C p,ν .
Proof. First of all, it is clear that we can suppose p(1/4 − ν) ≥ 1 (otherwise, we can use Jensen's inequality). Then, following the lines of [14, Section 2], we consider a probability space (Ω,F ,P ) with the following mutually independent objects defined on it: (i) a Brownian motionβ, (ii) for each n ∈ N, a sequence of independent random variables {ξ has an exponential distribution with parameter 2 √ n, and, for each m ∈ N, Then, letθ n = {θ n (t), t ≥ 0} be a piecewise linear process given on the grid T andθ n (0) = 0. The τ (n) i 's are independent random variables exponentially distributed with parameter 2n, and it is proved in [14] that the processθ n thus defined has the same law as θ n . Now, to show (66), we decompose the term E |β(t) −θ n (t)| 2p as the sum of the following two terms: for some ℓ = 1, . . . , 8n, we have that where, for the last inequality, we have used the fact thatβ(T (68) The first term in (68) can be bounded with the same argument as in the proof of Proposition 4.9 (see (60)), which gives and since we only have to focus on the probability appearing in (69). To do so, let us notice that P t ∈ A n ℓ , |t − T where we have used the same argument as in (63) to get the last bound. Going back to (67), we deduce that E n 1 ≤ C p n n −p/4 ≤ C p n −νp , since p is assumed to satisfy p( 1 4 − ν) ≥ 1. Eventually, we must deal with E n 2 . In fact, we have that E n 2 ≤ E θ n (t) −β(t) j 's are independent random variables exponentially distributed with parameter 2n. The latter probability can be bounded by using Stirling's inequality, as follows: This lets us conclude the proof.

Appendix A: fractional Sobolev spaces
We gather here some classical properties of the fractional Sobolev spaces (B α,p ) α∈R,p∈N , which are extensively used throughout the paper. We recall the notations B α for B α,2 and B for B 0 . Let us first label the following well-known regularizing properties of the semigroup (see [23]).
Let us also label here the classical Sobolev embedding which yields in particular: B α ⊂ B ∞ as soon as α > 1 4 .

Appendix B: A priori estimates on the solution
It only remains to prove the two a priori controls (42) and (43) for the solution Y of (4) (or equivalently (24)). To do so, we will rely on the following result, taken from [11,Lemma 6.5], and which extends the classical Garsia-Rodemich-Rumsey in two directions: 1) it covers the case ofδ-variations and 2) it applies to more general processes defined on the 2-dimensional simplex S 2 = {(t, s) ∈ [0, T ] 2 : t ≥ s}. Proof of Lemma 3.10. In both cases, we will resort to the previous Lemma, which essentially reduces the problem to moment estimates. Thus, the following Burkholder-Davis-Gundy type inequality (borrowed from [9, Lemma 7.7]) naturally comes into play: for every α ≥ 0, one has, by setting U 0 := Q 1/2 (B), where the notation HS(U 0 , B α ) refers to the space of Hilbert-Schmidt operators defined on U 0 and taking values in B α . Note also that the family (λ k e k ) defines an orthonormal basis of U 0 and accordingly Now, to show that Y ∈Ĉ 2η (B ∞ ), observe first that for every q ≥ 1 and any small ε > 0, The second summand in (80) is trivially bounded by c p |t − s| 2q( 3 4 −ε) since