A Theory of Hypoellipticity and Unique Ergodicity for Semilinear Stochastic PDEs

We present a theory of hypoellipticity and unique ergodicity for semilinear parabolic stochastic PDEs with"polynomial"nonlinearities and additive noise, considered as abstract evolution equations in some Hilbert space. It is shown that if Hormander's bracket condition holds at every point of this Hilbert space, then a lower bound on the Malliavin covariance operator $M_t$ can be obtained. Informally, this bound can be read as"Fix any finite-dimensional projection $\Pi$ on a subspace of sufficiently regular functions. Then the eigenfunctions of $M_t$ with small eigenvalues have only a very small component in the image of $\Pi$". We also show how to use a priori bounds on the solutions to the equation to obtain good control on the dependency of the bounds on the Malliavin matrix on the initial condition. These bounds are sufficient in many cases to obtain the asymptotic strong Feller property introduced in [HairerMattingly06]. One of the main novel technical tools is an almost sure bound from below on the size of"Wiener polynomials", where the coefficients are possibly non-adapted stochastic processes satisfying a Lipschitz condition. By exploiting the polynomial structure of the equations, this result can be used to replace Norris' lemma, which is unavailable in the present context. We conclude by showing that the two-dimensional stochastic Navier-Stokes equations and a large class of reaction-diffusion equations fit the framework of our theory.


Introduction
The over arching goal of this article is to prove the unique ergodicity of a class of nonlinear stochastic partial differential equations (SPDEs) driven by a finite number of Wiener processes. In this section, we give an overview of the setting and the results to come later without descending into all of the technical assumptions required to make everything precise. This imprecision will be rectified starting with Section 3 where the setting and basic assumptions will be detailed.
In this article we will investigate non-linear equations of the form (1.1) Here L will be some positive selfadjoint operator with compact resolvent. Typical examples arising in applications are L = −∆ or L = ∆ 2 . N will be assumed to be a "polynomial" nonlinearity in the sense that N (u) = m k=1 N k (u), where N k is kmultilinear. Examples of admissible nonlinearities are the Navier-Stokes nonlinearity (u ·∇)u or a reaction term such as u − u 3 . The g k are a collection of smooth, time independent functions which dictate the "directions" in which the randomness is injected. The {Ẇ k : k = 1, . . . , d} are a collection of mutually independent one-dimensional white noises which are understood as the formal derivatives of independent Wiener processes through the Itô calculus. We assume that the possible loss of regularity due to the nonlinearity is controlled by the smoothing properties of the semigroup generated by L. See Assumption A.1 below for a precise meaning.
On one hand, our concentration on a finite number of driving Wiener processes avoids technical difficulties generated by spatially rough solutions since W (x, t) = g k (x)W k (t) has the same regularity in x as the g k which we take to be relatively smooth. On the other hand, the fact that W contains only a finite number of Wiener processes means that our dynamic is very far from being uniformly elliptic in any sense since for fixed t, u( · , t) is an infinite-dimensional random variable and the noise acts only onto a finite number of degrees of freedom. To prove an ergodic theorem, we must understand how the randomness injected by W in the directions {g k : k = 1, . . . , d} spreads through the infinite dimensional phase space. To do this, we prove the nondegeneracy of the Malliavin covariance matrix under an assumption that the linear span of the successive Lie brackets 1 of vector fields associated to N and the g k is dense in the ambient (Hilbert) space at each point. This is very reminiscent of the condition in the "weak" version of Hörmander's "sum of squares" theorem. It ensures that the randomness spreads to a dense set of direction despite being injected in only a finite number of directions. This is possible since although the randomness is injected in a finite number of directions it is injected over the entire interval of time from zero to the current time.
In finite dimensions, bounds on the norm of the inverse of the Malliavin matrix are the critical ingredient in proving ergodic theorems for diffusions which are only hypoelliptic rather than uniformly elliptic. This then shows that the system has a smooth density with respect to Lebesgue measure. In infinite dimensions, there is no measure which plays the "universal" role of Lebesgue measure. One must therefore pass through a different set of ideas. Furthermore, it is not so obvious how to generalise the notion of the 'inverse' of the Malliavin matrix. In finite dimension, a linear map has dense range if and only if it admits a bounded inverse. In infinite dimensions, these two notions are very far from equivalent and, while it is possible in some cases to show that the Malliavin matrix has dense range, it is hardly ever possible in a hypoelliptic setting to show that it is invertible, or at least to characterise its range in a satisfactory manner (See [MSVE07] for an linear example in which it is possible).
The important fact which must be established is that nearby points act similarly from a "measure theoretic perspective." One classical way to make this precise is to prove that the Markov process in question has the strong Feller property. For a continuous time Markov process this is equivalent to proving that the transition probabilities are continuous in the total variation norm. While this concept is useful in finite dimensions, it is much less useful in infinite dimensions. In particular, there are many natural infinite dimensional Markov processes whose transition probabilities do not converge in total variation to the system's unique invariant measure. (See examples 3.14 and 3.15 from [HM06] for more discussion of this point.) In these settings, this fact also precludes the use of "minorization" conditions such as inf x∈C P t (x, · ) ≥ cν( · ) for some fixed probability measure λ and "small set" C. (see [MT93,GM06] for more and INTRODUCTION examples were this can be used.)

Ergodicity in infinite dimensions and main result
In [HM06], the authors introduced the concept of an Asymptotic Strong Feller diffusion. Loosely speaking, it ensures that transition probabilities are uniformly continuous in a sequence of 1-Wasserstein distances which converge to the total variation distance as time progresses. For the precise definitions, we refer the reader to [HM06]. For our present purpose, it is sufficient to recall the following proposition: Proposition 1.1 (Proposition 3.12 from [HM06]) Let t n and δ n be two positive sequences with {t n } non-decreasing and {δ n } converging to zero. A semigroup P t on a Hilbert space H is asymptotically strong Feller if, for all ϕ : H → R with ϕ ∞ and Dϕ ∞ finite one has for all n and u ∈ H, where C : R + → R is a fixed non-decreasing function.
The importance of the asymptotic strong Feller property is given by the following result which states that in this case, any two distinct ergodic invariant measures must have disjoint topological supports. Recalling that u belongs to the support of a measure µ (denoted supp(µ)) if µ(B δ (u)) > 0 for every δ > 0 2 , we have: Theorem 1.2 (Theorem 3.16 from [HM06]) Let P t be a Markov semigroup on a Polish space X admitting two distinct ergodic invariant measures µ and ν. If P t has the asymptotic strong Feller property, then supp(µ) ∩ supp(ν) is empty.
To better understand how the asymptotic strong Feller property can be used to connect topological properties and ergodic properties of P t , we introduce the following form of topological irreducibility. Definition 1. 3 We say that a Markov semigroup P t is weakly topologically irreducible if for all u 1 , u 2 ∈ H there exists a v ∈ H so that for any A open set containing v there exists t 1 , t 2 > 0 with P ti (u i , A) > 0.
Also recall that P t is said to be Feller if P t ϕ is continuous whenever ϕ is bounded and continuous. We then have to following corollary to Theorem 1.2 whose proof is given in Section 2.

Corollary 1.4 Any Markov semigroup P t on Polish space which is Feller, weakly topologically irreducible and asymptotically strong Feller admits at most one invariant probability measure.
The discussion of this section shows that unique ergodicity can be obtained for a Markov semigroup by showing that: 1. It satisfies the asymptotic strong Feller property. 2. There exists an "accessible point" which must belong to the topological support of every invariant probability measure.

INTRODUCTION
It turns out that if one furthermore has some control on the speed at which solution return to bounded regions of phase space, one can prove the existence of spectral gaps in weighted Wasserstein-1 metric [HM08].
The present article will mainly concentrate on the first point. This is because, by analogy with the finite-dimensional case, one can hope to find a clean and easy to verify condition along the lines of Hörmander's bracket condition that ensures a regularisation property like the asymptotic strong Feller property. Concerning the accessibility of points however, although one can usually use the Stroock-Varadhan support theorem to translate this into a deterministic question of approximate controllability, it can be a very hard problem even in infinite dimensions. While geometric control theory can give qualitative information about the set of reachable points [Jur97,AS04], the verification of the existence of accessible points seems to rely in general on ad hoc considerations, even in apparently simple finite-dimensional problems. We will however verify in Section 8.3 below that for the stochastic Ginzburg-Landau equation there exist accessible points under very weak conditions on the forcing.
With this in mind, the aim of this article is to prove the following result: Theorem 1.5 Consider the setting of (1.1) on some Hilbert space H and define a sequence of subsets of H recursively by A 0 = {g j : j = 1, . . . , d} and If the linear span of A ∞ def = n>0 A n is dense in H, then the Markov semigroup P t associated to (1.1) has the asymptotic strong Feller property.
The precise formulation of Theorem 1.5 is given in Theorems 5.4 and 6.7 below. Note that our general theorem is slightly stronger than Theorem 1.5 since it allows to consider arbitrary "non-constant" Lie brackets between the driving noises and the drift, see (1.4) below. As further discussed in Section 1.5 or 3.1, N m (h 1 , . . . , h m ) is proportional to D h1 · · · D hm N (u) where D h is the Fréchet derivative in the direction h. In turn, this is equal to the successive Lie-bracket of N with the vector fields h 1 to h m .
Under the same assumptions as Theorem 1.5, the existence of densities for the finite dimensional projections of P t (x, · ) was proven in [BM07]. The smoothness of these densities was also discussed in [BM07], but unfortunately there were two errors in the proof which require us to give a close, but modified argument in Section 6 and Section 7 to prove the needed results.
The remainder of this section is devoted to a short discussion of the main techniques used in the proof of such a result and in particular on how to obtain a bound of the type 1.2 for a parabolic stochastic PDE.

A roadmap for the impatient
Readers eager to get to the heart of this article but understandably reluctant to dig into too many technicalities may want finish reading Section 1, then jump directly to Section 5 and read up to the end of Section 5.3 to get a good idea of how (1.2) is obtained from bounds on the Malliavin matrix. Then they may want to go to the beginning of Section 6 and read to the end of Section 6.4 to see how these bounds can be obtained.

How to obtain a smoothing estimate
A more technical overview of the technique will be given in Section 5.2 below. In a nutshell, our aim is to generalise the arguments from [HM06] and the type of Malliavin calculus estimates first developed in [MP06] to a large class of semilinear parabolic SPDEs with polynomial nonlinearities. Both previous works relied on the particular structure of the Navier-Stokes equations. The technique of proof can be interpreted as an "infinitesimal" version of techniques developed in [EMS01,KS00] and extended in [BKL01,MY02,Mat02,Hai02,BM05] combined with detailed lower bounds on the Malliavin covariance matrix of the solution.
In [EMS01] the idea was the following: take two distinct initial conditions u 0 and u ′ 0 for (1.1) and a realisation W for the driving noise. Try then to find a shift v belonging to the Cameron-Martin space of the driving process and such that u(t) − u ′ (t) → 0 as t → 0, where u ′ is the solution to (1.1) driven by the shifted noise W ′ = W + v. Girsanov's theorem then ensures that the two initial conditions induce equivalent measures on the infinite future. This in turn implies the unique ergodicity of the system. (See also [Mat08] for more details.) The idea advocated in [HM06] is to consider an infinitesimal version of this construction. Fix again an initial condition u 0 and Wiener trajectory W but consider now an infinitesimal perturbation ξ to the initial condition instead of considering a second initial condition at distance O(1). This produces an infinitesimal variation in the solution u t given by its Fréchet derivative D ξ u t with respect to u 0 . Similarly to before, one can then consider the "control problem" of finding an infinitesimal variation of the Wiener process in a direction h from the Cameron-Martin space which, for large times t, compensates the effect of the variation ξ. Since the effect on u t of an infinitesimal variation in the Wiener process is given by the Malliavin derivative of u t in the direction h, denoted by D h u t , the problem in this setting is to find an h(ξ, W ) ∈ L 2 ([0, ∞], R d ) with E D ξ u t − D h u t → 0 as t → ∞ , (1.3) and such that the expected "cost" of h t is finite. Here, the Malliavin derivative D h u t is given by the derivative in ε at ε = 0 of u t (W + εv), with v(t) = t 0 h(s) ds. If h is adapted to the filtration generated by W , then the expected cost is simply ∞ 0 E h s 2 ds. If it is not adapted, one must estimate directly lim sup E t 0 h s dW s where the integral is a Skorokhod integral. As will be explained in detail in Section 5.2, once one establishes (1.3) with a finite expected cost h, the crucial estimate given in (1.2) (used to prove the asymptotic strong Feller property) follows by a fairly general procedure.
As this discussion makes clear, one of our main tasks will be to construct a shift h having the property (1.3). We will distinguish three cases of increasing generality (and technical difficulty). In the first case, which will be referred to as strongly contracting (see Section 5.1.1), the linearised dynamics contracts pathwise without modification (all Lyapunov exponents are negative). Hence h can be taken to be identically zero. The next level of complication comes when the system possesses a number of directions which are unstable on average. The simplest way to deal with this assumption is to assume that the complement of the span of the forced directions (the g k 's) is contracting on average. This was the case in [EMS01,KS00,BKL01,MY02,Mat02,Hai02,BM05]. We refer to this as the "essentially elliptic" setting since the directions essential to determine the system's long time behavior, the unstable directions, are directly forced. This is a reflection of the maxim in dynamical systems that the long time behavior is determined by the behavior in the unstable directions. Since the noise affects all of these directions, it is not surprising that the system is uniquely ergodic, see Section 4.5 of [HM06] for more details.
The last case (i.e. when the set of forced directions does not contain all of the unstable directions) is the main concern of the present paper. In this setting, we study the interaction between the drift and the forced directions to understand precisely how randomness spreads to the system. The condition ensuring that one can gain sufficient control over the unstable directions, requires that the g k together with a collection of Lie brackets (or commutators) of the form span all of the unstable direction. This condition will be described more precisely in Section 6.2 below. In finite dimensions, when this collection of Lie brackets spans the entire tangent space at every point, the system is said to satisfy the "weak Hörmander" condition. When this assumption holds for the unstable directions (along with some additional technical assumptions), we can ensure that the noise spreads sufficiently to the unstable directions to find a h capable of counteracting the expansion in the unstable directions and allowing one to prove (1.3) with a cost whose expectation is finite.
We will see however that the control h used will not be adapted to the filtration generated by the increments of the driving Wiener process, thus causing a number of technical difficulties. This stems from the seemingly fundamental fact that because we need some of the "bracketed directions" (1.4) in order to control the dynamic, we need to work on a time scale longer than the instantaneous one. In the "essentially elliptic" setting, on the other hand, we were able to work instantaneously and hence obtain an adapted control h and avoid this technicality.

The role of the Malliavin matrix
Since the Malliavin calculus was developed in the 1970's and 1980's mainly to give a probabilistic proof of Hörmander's "sum of squares" theorem under the type of bracket conditions we consider, it is not surprising that the Malliavin matrix M t = Du t Du * t plays a major role in the construction of the variation h in the "weak Hörmander" setting. A rapid introduction to Malliavin calculus in our setting is given in Section 4. In finite dimensions, the key to the proof of existence and smoothness of densities is the finiteness of moments of the inverse of the Malliavin matrix. This estimate encapsulates the fact the noise effects all of the directions with a controllable cost. In infinite dimensions while it is possible to prove that the Malliavin matrix is almost surely nondegenerate it seems very difficult to characterise its range. (With the exception of the linear case [DPZ96]. See also [DPEZ95,FM95,Cer99,EH01] for situations where the Malliavin matrix can be shown to be invertible on the range of the Jacobian.) However, in light of the preceding section, it is not surprising that we essentially only need the invertibility of the Malliavin matrix on the space spanned by the unstable directions, which is finite dimensional in all of our examples. More precisely, we need information about the likelihood of eigenvectors with sizable projections in the unstable directions to have small eigenvalues. Given a projection Π whose range includes the unstable directions we will show that the Malliavin matrix M t satisfies an estimate of the form for all p ≥ 1. Heuristically, this means we have control of the probabilistic cost to create motion in all of the directions in the range of Π without causing a too large effect in the complementary directions. We will pair such an estimate with the assumption that the remaining directions are stable in that the Jacobian (the linearization of the SPDE about the trajectory u t ) satisfies a contractive estimate for the directions perpendicular to the range of Π. Together, these assumptions will let us build an infinitesimal Wiener shift h which approximately compensates for the component of the infinitesimal shift caused by the variation in the initial condition in the unstable directions. Once the variation in the unstable directions have been decreased, the assumed contraction in the stable directions will ensure that the variation in the stable directions will also decrease until it is commiserate in size with the remaining variation in the unstable directions. Iterating this argument we can drive the variation to zero. Note that one feature of the bound (1.5) is that all the norms and scalar products appearing there are the same. This is a strengthening of the result from [MP06] which fixes an error in [HM06], see Section 6 for more details.
The basic structure of the sections on Malliavin calculus follows the presentation in [BM07] which built on the ideas and techniques from [MP06,Oco88]. As in all three of these works, as well as the present article, the time reversed adjoint linearization is used to develop an alternative representation of the Malliavin Covariance matrix. In [Oco88], only the case of linear drift and linear multiplicative noise was considered. In [MP06], a nonlinear equation with a quadratic nonlinearity and additive noise was considered. In [BM07], the structure was abstracted and generalized so that it was amenable to general polynomial nonlinearities. We follow that structure and basic line of argument here while strengthening the estimates and correcting some important errors.
Most existing bounds on the inverse of the Malliavin matrix in a hypoelliptic situation make use of some version of Norris' lemma [KS84,KS85a,Nor86,MP06,BH07]. In its form taken from [Nor86], it states that if a semimartingale Z(t) is small and one has some control on the roughness of both its bounded variation part A(t) and its quadratic variation process Q(t), then both A and Q taken separately must be small. While the versions of Norris' lemma given in [MP06,BM07,BH07] are not precisely of this form (in both cases, one cannot reduce the problem to semimartingales, either because of the infinite-dimensionality of the problem or because one considers SDEs driven by processes that are not Wiener processes), they have the same flavour in that they state that if a process is composed of a "regular" part and an "irregular" part, then these two parts cannot cancel each other. This harkens back to the more explicit estimates based on estimates of modulus of continuity found in [KS85b,Str83]. The replacement for Norris' lemma used in the present work covers the case where one is given a finite collection of Wiener process W j and a collection of not necessarily adapted Lipschitz continuous processes A α (t) (for α a multi-index) and considers the process It then states that if Z is small, this implies that all of the A α 's with |α| ≤ M are small. For a precise formulation, see Section 7 below. It is in order to be able to use this result that we are restricted to equations with polynomial nonlinearities. This result on Wiener polynomials is a descendant of the result proven in [MP06] for polynomials of degree one. In [BM07], a result for general Wiener polynomials was also proven. Is was show there that if Z(t) = 0 for t ∈ [0, T ] then then A α (t) = 0 for t ∈ [0, T ]. This was used to prove the existence of a density for the finite dimensional projections of the transition semigroup. In the same article, the same quantitative version of this result as proven in the present article was claimed. Unfortunately, there was a error in the proof. Nonetheless the techniques used here are built on and refine those developed in [BM07].

Satisfying the Hörmander-like assumption
At first glance the condition that the collection of functions given in (1.4) are dense in our state space may seem hopelessly strong. However, we will see that it is often not difficult to ensure. Recall that the nonlinearity N is a polynomial of order m, and hence, it has a leading order part which is m-homogeneous. We can view this leading order part as a symmetric m-linear map which we will denote by N m . Then, at least formally, the Lie bracket of N with m constant vector fields is proportional to N m , evaluated at the constant vector fields, that is , which is again a constant vector field. While the collection of vector fields generated by brackets of this form are only a subset of the possible brackets, it is often sufficient to obtain a set of dense vector fields. For example, if As observed in [BM07], to obtain a simple sufficient criteria for the brackets to be dense, suppose that Λ ⊂ C ∞ is a finite set of functions that generates, as a multiplicative algebra, a dense subset of the phase space. Then, if the forced modes A 0 = {g 1 , · · · , g d } contain the set {h, hh : h,h ∈ Λ}, the set A ∞ constructed as in Theorem 1.5 will span a dense subset of phase space.

Probabilistic and dynamical view of smoothing
Implicit in (1.3) is the "transfer of variation" from the initial condition to the Wiener path. This is the heart of "probabilistic smoothing" and the source of ergodicity when it is fundamentally probabilistic in nature. The unique ergodicity of a dynamical system is equivalent to the fact that it "forgets its initial condition" with time. The two terms appearing on the right-hand side of (1.2) represent two different sources of this loss of memory. The first is due to the randomness entering the system. This causes nearby points to end up at the same point at a later time because they are following different noise realisations. The fact that different stochastic trajectories can arrive at the same point and hence lead to a loss of information is the hallmark of diffusions and unique ergodicity due to randomness. From the coupling point of view, since different realizations lead to the same point yet start at different initial conditions, one can couple in finite time.
The second term in (1.2) is due to "dynamical smoothing" and is one of the sources of unique ergodicity in deterministic contractive dynamical systems. If two trajectories converge towards each other over time then the level of precision needed to determine which initial condition corresponds to which trajectory also increases with time. This is another type of information loss and equally leads to unique ergodicity. However, unlike "probabilistic smoothing", the information loss is never complete at any finite time. Another manifestation of this fact is that the systems never couples in finite time, only at infinity. In Section 5.1.1 about the strongly dissipative setting, the case of pure dynamical smoothing is considered. In this case one has (1.2) with only the second term present. When both terms exist, one has a mixture of probabilistic and dynamical smoothing leading to a loss of information about the initial condition. In Section 2.2 of [HM08] it is shown how (1.2) can be used to construct a coupling in which nearby initial conditions converge to each other at time infinity. The current article takes a "forward in time" perspective, while [EMS01,BM05] pull the initial condition back to minus infinity. The two points of view are essentially equivalent. One advantage to moving forward it time is that it makes proving a spectral gap for the dynamic more natural. We provide such an estimate in Section 8.3 for the stochastic Ginzburg-Landau equation.

Structure of the article
The structure of this article is as follows. In Section 2, we give a few abstract ergodic results both proving the results in the introduction and expanding upon them. In Section 3, we introduce the functional analytic setup in which our problem will be formulated. This setup is based on Assumption A.1 which ensures that all the operations that will be made later (differentiation with respect to initial condition, representation for the Malliavin derivative, etc) are well-behaved. Section 4 is a follow-up section where we define the Malliavin matrix and obtain some simple upper bounds on it. We then introduce some additional assumptions in Section 6.1 which ensure that we have suitable control on the size of the solutions and on the growth rate of its Jacobian.
In Section 5, we obtain the asymptotic strong Feller property under a partial invertibility assumption on the Malliavin matrix and some additional partial contractivity assumptions on the Jacobian. Section 6.3 then contains the proof that assumptions on the Malliavin matrix made in Section 5 are justified and can be verified for a large class of equations under a Hörmander-type condition. The main ingredient of this proof, a lower bound on Wiener polynomials, is proved in Section 7. Finally, we conclude in Section 8 with two examples for which our conditions can be verified. We consider the Navier-Stokes equations on the two-dimensional sphere and a general reactiondiffusion equation in three or less dimensions.

Abstract ergodic results
We now expand upon the abstract ergodic theorems mentioned in the introduction which build on the asymptotic strong Feller property. We begin by giving the proof of Corollary 1.4 from the introduction and then give a slightly different result (but with the same flavour) which will be useful in the investigation of the Ginzburg-Landau equation in Section 8.3. Throughout this section, P t will be a Markov semigroup on a Hilbert space H with norm · .
Proof of Corollary 1.4. Since P t is Feller, we know that for any u ∈ H and open set A with P t (u, A) > 0 there exists an open set B containing u so that Combining this fact with the weak topological irreducibility, we deduce that for all u 1 , u 2 ∈ H there exists v ∈ H so that for any ǫ > 0 there exists a δ, t 1 , t 2 > 0 with inf z∈B δ (ui) P ti (z, B ǫ (v)) > 0 (2.1) for i = 1, 2. Now assume by contradiction that we can find two distinct invariant probability measures µ 1 and µ 2 for P t . Since any invariant probability measure can be written as a convex combination of ergodic measures, we can take them to be ergodic without loss of generality. Picking u i ∈ supp(µ i ), by assumption there exists a v so that for any ǫ > 0 there exists t 1 , t 2 and δ > 0 so that (2.1) holds. Since u i ∈ supp(µ i ) we know that µ i (B δ (u i )) > 0 and hence Since ǫ was arbitrary, this shows that v ∈ supp(µ 1 ) ∩ supp(µ 2 ), which by Theorem 1.2 gives the required contradiction.
We now give a more quantitative version of Theorem 1.2. It shows that if one has access to the quantitative information embodied in (1.2), as opposed to only the asymptotic strong Feller property, then not only are the supports of any two ergodic invariant measures disjoint but they are actually separated by a distance which is directly related to the function C from (1.2).
Theorem 2.1 Let {P t } be a Markov semigroup on a separable Hilbert space H such that (1.2) holds for some non-decreasing function C. Let µ 1 and µ 2 be two distinct ergodic invariant probability measures for P t . Then, the bound u 1 − u 2 ≥ 1/C( u 1 ∨ u 2 ) holds for any pair of points (u 1 , u 2 ) with u i ∈ supp µ i .
Proof. The proof is a variation on the proof of Theorem 3.16 in [HM06]. We begin by defining for where δ n is the sequence of positive numbers from (1.2). As shown in the proof of Theorem 3.12 in [HM06], one has where d n is the 1-Wasserstein distance 3 on probability measures induced by the metric d n . Observe that for all u, v ∈ H, d n (u, v) ≤ 1 and lim d n (u, v) = 1 {u} (v). Hence by in Lemma 3.4 of [HM06], for any probability measures µ and ν, lim n→∞ d n (µ, ν) = d TV (µ, ν) where d TV (µ, ν) is the total variation distance 4 . Let µ 1 and µ 2 be two ergodic invariant measures with µ 1 = µ 2 . By Birkhoff's ergodic theorem, we know that they are mutually singular and thus d TV (µ 1 , µ 2 ) = 1. We now proceed by contradiction. We assume that there exists a pair of points (u 1 , u 2 ) with u i ∈ supp(µ i ) such that u 1 − u 2 < C( u 1 ∨ u 2 ). We will conclude by showing that this implies that d TV (µ 1 , µ 2 ) < 1 and hence µ 1 and µ 2 are not singular which will be a contradiction.
Our assumption on u 1 and u 2 implies that there exists a set A containing u 1 and u 2 such that α def = min(µ 1 (A), µ 2 (A)) > 0 and β def = sup{ u − v : u, v ∈ A}C( u 1 ∨ u 2 ) < 1. As shown in the proof of Theorem 3.16 in [HM06], for any n one has 3 dn(ν 1 , ν 2 ) = sup R ϕdν 1 − R ϕdν 2 where the supremum runs over functions ϕ : H → R which have Lipschitz constant one with respect to the metric dn.
4 Different communities normalize the total variation distance differently. Our d TV is half of the total variation distance as defined typically in analysis. The definition we use is common in probability as it is normalised in such a way that d TV (µ, ν) = 1 for mutually singular probability measures.
Paired with this stronger version of Theorem 1.2, we have the following version of Corollary 1.4 which uses an even weaker form of irreducibility. This is a general principle. If one has a stronger from of the asymptotic strong Feller property, one can prove unique ergodicity under a weaker form of topological irreducibility. The form of irreducibility used in Corollary 2.2 allows the point where two trajectories approach to move about, depending on the degree of closeness required. To prove unique ergodicity, the trade-off is that one needs some control of the "smoothing rate" implied by asymptotic strong Feller at different points in phase space.

Corollary 2.2
Let {P t } be as in Theorem 2.1. Suppose that, for every R 0 > 0, it is possible to find R > 0 and T > 0 such that, for every ε > 0, there exists a point v with v ≤ R such that P T (u, B ε (v)) > 0 for every u ≤ R 0 . Then, P t can have at most one invariant probability measure.
Proof. Assume by contradiction that there exist two ergodic invariant probability measures µ 1 and µ 2 for P t . Then, choosing R 0 large enough so that the open ball of radius R 0 intersects the supports of both µ 1 and µ 2 , it follows form the assumption, by similar reasoning as in the proof of Corollary 1.4, that supp µ i intersects B ε (v). Since v is bounded uniformly in ε, making ε sufficiently small yields a contradiction with Theorem 2.1 above.

Functional analytic setup
In this paper we introduce the basic function analytic set-up for the rest of the paper. We will develop the needed existence and regularity theory to place the remainder of the paper of a firm foundation. We consider semilinear stochastic evolution equations with additive noise in a Hilbert space H of the form Here, the W k are independent real-valued standard Wiener processes over some probability space (Ω, P, F ). Our main standing assumption throughout this article is that L generates an analytic semigroup and that the nonlinearity N results in a loss of regularity of a powers of L for some a < 1. More precisely, we have: There exists a ∈ [0, 1) and γ ⋆ , β ⋆ > −a (either of them possibly infinite) with γ ⋆ + β ⋆ > −1 such that:

Remark 3.2
The assumption u, Lu ≥ u 2 is made only for convenience so that L γ is well-defined as a positive selfadjoint operator for every γ ∈ R. It can always be realized by subtracting a suitable constant to L and adding it to N . Similarly, non-selfadjoint linear operators are allowed if the antisymmetric part is sufficiently "dominated" by the symmetric part, since one can then consider the antisymmetric part as part of the nonlinearity N . It will be convenient in the sequel to define F by Note that F is in Poly n (H γ+1 , H γ ) for every γ ∈ [−1, γ ⋆ ). We also define a linear operator G : With these notations, we will sometimes rewrite (3.1) as for W = (W 1 , . . . , W d ) a standard d-dimensional Wiener process.

Polynomials
We now describe in what sense we mean that n is a "polynomial" vector field. Given a Fréchet space X, we denote by L n s (X) the space of continuous symmetric n-linear maps from X to itself. We also denote by L(X, Y ) the space of continuous linear maps from X to Y . For the sake of brevity, we will make use of the two equivalent notations P (u) and P (u ⊗n ) for P ∈ L n (X).
Given Q ∈ L k s , its derivative is given by the following n − 1-linear map from X to L(X, X): We will also use the notation DQ * : X → L(X ′ , X ′ ) for the dual map given by Given P ∈ L k s and Q ∈ L ℓ s , we define the derivative DQ P of Q in the direction P as a continuous map from X × X to X by Note that by polarisation, u → DQ(u)P (u) uniquely defines an element on L k+ℓ−1 s . This allows us to define a "Lie bracket" [P, Q] ∈ L k+ℓ−1 s between P and Q by We also define Poly n (X) as the set of continuous maps P : X → X of the form is the space of constant maps and can be identified with X). We also set Poly(X) = n≥0 Poly n (X). The Lie bracket defined above extends to a map from Poly(X) × Poly(X) → Poly(X) by linearity.

Polynomials over H
We now specialize to polynomials over H. We begin by choosing X equal to the Fréchet space H ∞ , the intersection of H a over all a > 0. Next we define the space Poly(H a , H b ) ⊂ Poly(H ∞ ) as the set of polynomials P ∈ Poly(H ∞ ) such that there exists a continuous mapP : H a → H b withP (u) = P (u) for all u ∈ H ∞ . Note that in general (unlike Poly(H ∞ )), P, Q ∈ Poly(H a , H b ) does not necessarily imply [P, Q] ∈ Poly(H a , H b ). We will make an abuse of notation and use the same symbol for both P andP in the sequel.

A priori bounds on the solution
This section is devoted to the proof that Assumption A.1 is sufficient to obtain not only unique solutions to (3.1) (possibly up to some explosion time), but to obtain further regularity properties for both the solution and its derivative with respect to the initial condition. We do not claim that the material presented in this section is new, but while similar frameworks can be found in [DPZ92,Fla95], the framework presented here does not seem to appear in this form in the literature. Since the proofs are rather straightforward, we choose to present them for the sake of completeness, although in a rather condensed form.
We first start with a local existence and uniqueness result for the solutions to (3.1): Proposition 3.4 For every initial condition u 0 ∈ H, there exists a stopping time τ > 0 such that (3.1) has a unique mild solution u up to time τ , that is to say u almost surely satisfies for all stopping times t with t ≤ τ .
For notational convenience, we denote by W L (s, t) = t s e −L(t−r) G dW (r) the "stochastic convolution." Since we assumed that g k ∈ H γ⋆+1 , it is possible to obtain bounds on all exponential moments of sup 0≤s<t≤T W L (s, t) γ for every T > 0 and every γ ≤ γ ⋆ + 1. (3.7) Since N ∈ Poly(H, H −a ) by setting γ = −a in Assumption A.1.2, and suppressing the dependence on u 0 , there exists a positive constant C such that Recall that n is the degree of the polynomial nonlinearity N . It follows that, for every ξ, there exists T > 0 and R > 0 such that Φ T,ξ (u 0 , · ) is a contraction in the ball of radius R around e −Lt u 0 + ξ(t). Setting ξ(t) = W L (0, t), this yields existence and uniqueness of the solution to (3.6) by the Banach fixed point theorem. The largest such T is a stopping time since it only depends on the norm of u 0 and on ξ up to time T , thus concluding the proof of the proposition.
The remainder of this section is devoted to obtaining further regularity properties of the solutions. counting the number of appearances of the index k. With this identification, the union of two multi-indices corresponds to the sums of their counting functions, while α ⊂ β means that α(k) ≤ β(k) for every k.
Proposition 3.5 Fix T > 0. For every γ ∈ [0, γ ⋆ + 1) there exist exponents p γ ≥ 1 and q γ ≥ 0 and a constant C such that Proof. The proof follows a standard "bootstrapping argument" on γ in the following way. The statement is obviously true for γ = 0 with p γ = 1 and q γ = 0. Assume that, for some α = α 0 ∈ [1/2, 1) and for some γ = γ 0 ∈ [0, γ ⋆ + a), we have the bound for all t ∈ (0, T ]. We will then argue that, for any arbitrary δ ∈ (0, 1 − a), the statement (3.9) also holds for γ = γ 0 + δ (and therefore also for all intermediate values of γ) and α = α 2 0 . Since it is possible to go from γ = 0 to any value of γ < γ ⋆ + 1 in a finite number of steps (making sure that γ ≤ 1 + a in every intermediate step) and since we are allowed to choose α as close to 1 as we wish, the claim follows at once.

Linearization and its adjoint
In this section, we study how the solutions to (3.1) depend on their initial conditions. Since the map from (3.7) used to construct the solutions to (3.1) is Fréchet differentiable (it is actually infinitely differentiable) and since it is a contraction for sufficiently small values of t, we can apply the implicit functions theorem (see for example [RR04] for a Banach space version) to deduce that for every realisation of the driving noise, the map u s → u t is Fréchet differentiable, provided that t > s is sufficiently close to s.
Iterating this argument, one sees that, for any s ≤ t < τ , the map u s → u t given by the solutions to (3.1) is Fréchet differentiable in H. Inspecting the expression for the derivative given by the implicit functions theorem, we conclude that the derivative J s,t ϕ in the direction ϕ ∈ H satisfies the following random linear equation in its mild formulation: (3.10) Note that, by the properties of monomials, it follows from Assumption A.1.2 that for every γ ∈ [−a, γ ⋆ ). A fixed point argument similar to the one in Proposition 3.4 shows that the solution to (3.10) is unique, but note that it does not allow us to obtain bounds on its moments. We only have that for any T smaller than the explosion time to the solutions of (3.1), there exists a (random) constant C such that The constant C depends exponentially on the size of the solution u in the interval [0, T ]. However, if we obtain better control on J s,t by some means, we can then use the following bootstrapping argument: Proposition 3.6 For every γ < γ ⋆ + 1, there exist exponentsp γ ,q γ ≥ 0, and constants C > 0 and γ 0 < |γ| such that we have the bound (1 + u t+r γ0 )q γ J t,t+r ϕ , (3.12) for every ϕ ∈ H and every t, s > 0. If γ < 1 − a, then one can choose γ 0 =p γ = 0 andq γ = n − 1.
Since an almost identical argument will be used in the proof of Proposition 3.8 below, we refer the reader there for details. We chose to present that proof instead of this one because the presence of an adjoint causes slight additional complications.
For s ≤ t, let us define operators K s,t via the solution to the (random) PDE Note that this equation runs backwards in time and is random through the solution u t of (3.1). Here, DN * (u) denotes the adjoint in H of the operator DN (u) defined earlier. Fixing the terminal time t and setting ϕ s = K t−s,t ϕ, we obtain a more usual representation for ϕ s : (3.14) The remainder of this subsection will be devoted to obtaining regularity bounds on the solutions to (3.13) and to the proof that K s,t is actually the adjoint of J s,t . We start by showing that, for γ sufficiently close to (but less than) γ ⋆ + 1, (3.13) has a unique solution for every path u ∈ C(R, H γ ) and ϕ ∈ H.
Proposition 3.7 There exists γ < γ ⋆ + 1 such that equation (3.13) has a unique solution for every s < t and every u ∈ C(R, H γ ). Furthermore, K s,t depends only on u r for r ∈ [s, t].
Proof. As in Proposition 3.4, we define a map Φ T,ξ : It follows from Assumption A.1.3 with β = −a that there exists γ < γ ⋆ + 1 such that DN * (u) : H → H −a is a bounded linear operator for every u ∈ H γ . Proceeding as in the proof of Proposition 3.4, we see that Φ is a contraction for sufficiently small T .
Similarly to before, we can use a bootstrapping argument to show that K s,t ϕ actually has more regularity than stated in Proposition 3.7.

Proposition 3.8 For every
for every ϕ ∈ H, every t, s > 0, and every u ∈ C(R, H γ ).
Proof. Fix β < β ⋆ + a and δ ∈ (0, 1 − a) and assume that the bound (3.15) holds for K s,t ϕ β . Since we run s "backwards in time" from s = t, we consider again t as fixed and use the notation ϕ s = K t−s,t ϕ. We then have, for arbitrary α ∈ (0, 1), Iterating these bounds as in Proposition 3.5 concludes the proof.
The following lemma appears also in [MP06,BM07]. It plays a central role in establishing the representation of the Malliavin matrix given in (4.11) on which this article as well as [MP06,BM07] rely heavily.
Proof. Fixing 0 ≤ s < t and ϕ, ψ ∈ H ∞ , we claim that the expression Evaluating (3.16) at both r = s and r = t then concludes the proof. We now prove that (3.16) is independent of r as claimed. It follows from (3.13) and Proposition 3.5 that, with probability one, the map r → K r,t ϕ is continuous with values in H β+1 and differentiable with values in H β , provided that β < β ⋆ . Similarly, the map r → J s,r ψ is continuous with values in H γ+1 and differentiable with values in H γ , provided that γ < γ ⋆ . Since γ ⋆ + β ⋆ > −1 by assumption, it thus follows that Since furthermore both r → K r,t ϕ and r → J s,r ψ are continuous in r on the closed interval, the proof is complete. See for example [DL92,p. 477] for more details.

Malliavin calculus
In this section, we show that the solution to the SPDE (3.1) has a Malliavin derivative and we give an expression for it. Actually, since we are dealing with additive noise, we show the stronger result that the solution is Fréchet differentiable with respect to the driving noise. In this section, we will make the standing assumption that the explosion time τ from Proposition 3.4 is infinite.

Malliavin derivative
In light of Proposition 3.4, for fixed initial condition u 0 ∈ H there exists an "Itô map" We have: Proposition 4.1 For every t > 0 and every u ∈ H, the map Φ u t is Fréchet differentiable and its Fréchet derivative in the mild sense.
Remark 4.2 Note that (4.1) has a unique H-valued mild solution for every continuous function v because it follows from our assumptions that Gv ∈ C(R + , H γ ) for some γ > 0 and therefore Proof of Proposition 4.1. The proof works in exactly the same way as the arguments presented in Section 3.3: it follows from Remark 4.2 that for any given u 0 ∈ H and t > 0, the map , H). Furthermore, for t sufficiently small (depending on u and W ), it satisfies the assumptions of the implicit functions theorem, so that the claim follows in this case. The claim for arbitrary values of t follows by iterating the statement.
As a consequence, it follows from Duhamel's formula and the fact that J s,t is the unique solution to (3.10) that

Corollary 4.3 If v is absolutely continuous and of bounded variation, then
where the integral is to be understood as a Riemann-Stieltjes integral and the Jacobian J s,t is as in (3.10).
In particular, (4.2) holds for every v in the Cameron-Martin space , so we will in the sequel use the notation The representation (4.2) is still valid for arbitrary stochastic processes h such that h ∈ CM ′ almost surely.
Since G : R d → H γ * +1 is a bounded operator whose norm we denote G , we obtain the bound valid for every h ∈ CM ′ . In particular, by Riesz's representation theorem, this shows that there exists a (random) element DΦ u t of CM ′ ⊗ H such that for every h ∈ CM ′ . This abuse of notation is partial justified by the fact that, at least formally, . In our particular case, it follows from (4.2) that one has With this notation, the identity (4.2) can be rewritten as It follows from the theory of Malliavin calculus, see for example [Mal97,Nua95] that, for every t > 0 (and for any Hilbert space H), the linear map Φ → DΦ described above yields a closable linear operator from L 2 (Ω, Here, F t is the σ-algebra generated by the increments of W up to time t and L 2 ad denotes the space of L 2 functions adapted to the filtration {F t }. The operator D simply acts as the identity on the factor H, so that we really interpret it as an operator from L 2 (Ω, R) to L 2 (Ω, CM ′ ). The operator D is called the "Malliavin derivative." We define a family of random linear operators A t : CM ′ → H (depending also on the initial condition u 0 ∈ H for (3.1)) by h → DΦ u t , h . It follows from (4.3) that their adjoints A * t : H → CM ′ are given for ξ ∈ H by for s > t . (4.5) is defined as the adjoint of the Malliavin derivative operator (or rather of the part acting on L 2 (Ω, F t , R) and not on H). In other words, one has the following identity between elements of H: then one has the following modification of the Itô isometry: Note here that since h(s) ∈ R d , we interpret D r h(s) as a d × d matrix.

Malliavin derivative of the Jacobian
By iterating the implicit functions theorem, we can see that the map that associates a given realisation of the Wiener process W to the Jacobian J s,t ϕ is also Fréchet (and therefore Malliavin) differentiable. Its Malliavin derivative D h J s,t ϕ in the direction h ∈ CM ′ is given by the unique solution to endowed with the initial condition D h J s,s ϕ = 0. Just as the Malliavin derivative of the solution was related to its derivative with respect to the initial condition, the Malliavin derivative of J s,t can be related to the second derivative of the flow with respect to the initial condition in the following way. Denoting by J (2) s,t (ϕ, ψ) the second derivative of u t with respect to u 0 in the directions ϕ and ψ, we see that endowed with the initial condition J (2) s,s (ϕ, ψ) = 0. Assuming that h vanishes outside of the interval [s, t] and using the identities J r,t J s,r = J s,t and D h u t = which we can rewrite as This identity is going to be used in Section 5.

Malliavin covariance matrix
We now define and explore the properties of the Malliavin covariance matrix, whose non-degeneracy is central to our constructions.
Definition 4.4 Assume that the explosion time τ = ∞ for every initial condition in H.
Then, for any t > 0, the Malliavin matrix M t : H → H is the linear operator defined by (4.10) Observe that this is equivalent to for all ϕ ∈ H. The meaning of the Malliavin covariance matrix defined in (4.10) is rather intuitive, especially for the diagonal elements M t ϕ, ϕ . If M t ϕ, ϕ > 0 then there exists some variation in the Wiener process on the time interval [0, t] which creates a variation of u t in the direction ϕ.
It is also useful to understand on what spaces the operator norm of M t is bounded. As a simple consequence of Proposition 3.6, we have: Proof. From (4.10) we have that Since the g k belong to H by assumption, the required bound now follows from Proposition 3.6.

Smoothing in infinite dimensions
We now turn our study of (3.3) to one of the principle goals of this article. As in the preceding section, we all solutions are global in time and that standing assumptions in Assumption A.1 continue to hold. The aim of this section is to prove "smoothing" estimates for the corresponding Markov semigroup P t whose action on bounded test functions ϕ : H → R is defined by Here, the subscript in the expectation refers to the initial condition for the solution u t to (3.3). We begin with a brief discussion of the type of estimates we will prove and the ideas used in their proof. A long discussion on this can be found in [HM06] in which a number of the tools of this paper were developed or [Mat08] which has a longer motivating discussion.
Recall also that the Malliavin covariance matrix M t : H → H for the solution to (3.3) was defined in (4.10) as M t = A t A * t and that it is a random, compact selfadjoint operator on H. Since H is assumed to be infinite-dimensional, M t will in general not be invertible. However as discussed in the introduction we will only need it to be "approximately invertible" on some subspace paired with a assumption that the dynamics is counteractive off this subspace. The assumption of "approximately invertible" on some subspace is formulated in below Assumption B.1 and the contractivity assumption is formulated in Assumption B.4. These are the two fundamental structural assumptions needed for this theory. In between the statement of these two assumption two other assumptions are given. They are more technical in nature and ensure that we can control various quantities.
holds for every ε ≤ 1, p ≥ 1 and u 0 ∈ H. Furthermore for someq ≥ 2, there exist a constant C U so that for every initial condition u 0 ∈ H, the bound holds uniformly in n ≥ 0.
We are also going to assume in this section that the solutions to (3.3) have the following Lyapunov-type structure, which is stronger than Assumption C.1 used in the previous section:
We finally assume that the Jacobian of the solution has some "smoothing properties" in the sense that if we apply it to a function that belongs to the image of the orthogonal complement Π ⊥ = 1 − Π of the projection operator Π then, at least for short times, its norm will on average be reduced:

Assumption B.4 (Smoothing) One has the bound
The constants η andp appearing in this bound are the same as the ones appearing in Assumption B.3, the constant η ′ is the same as the one appearing in Assumption B.2, and the projection Π is the same as the one appearing in Assumption B.1.

Remark 5.2
The condition C Π − C J > 2κC L may seem particularly unmotivated. In the next section, we try to give some insight into its meaning.

Remark 5.3
Notice that if Range(Π) ⊂ span{g 1 , . . . , g d }, then in light of the last representation in (4.11) it is reasonable to expect (5.1) to hold as long as one has some control over moments the modulus of continuity of s → K s,t . (This is made more precise in Lemma 6.18.) We refer to such an assumption on the range as the "essentially elliptic" setting since all of the directions whose (pathwise) dynamics are not controlled by Assumption B.4 are directly forced.
Under these assumptions we have the following result which is the fundamental "smoothing" estimate of this paper. It is the linchpin on which all of the ergodic results rest.

Theorem 5.4 Let Assumptions A.1 and B.1-B.4 hold. Then for any
Remark 5.6 If ϕ ∞ or Dϕ ∞ are bounded by one then the corresponding terms under the square root are bounded by one. Furthermore, in light of Assumption B.2, if ϕ(u) 2 ≤ exp(V (u)), then Of course, the same bound for holds for (P 2n Dϕ 2 )(u), provided that one has an estimate of the type Dϕ 2 (u) ≤ exp(V (u)).

Motivating discussion
We now discuss in what sense (5.3) implies smoothing. When the term "smoothing" is used in the mathematics literature to describe a linear operator T , it usually means that T ϕ belongs to a smoother function space than ϕ. This usually means that T ϕ is "more differentiable" then ϕ. A convenient way to express this fact analytically would be an estimate of the form (Of course the "smoothing" property may improve the smoothness by less than a whole derivative, or one may consider functions ϕ that are not bounded, but let us consider (5.4) just for the sake of the argument.) This shows in a quantitative way that T ϕ is differentiable while ϕ need not be. In light of Remark 5.6, this is in line with the first term on the right hand side of (5.3).
The second term on the right hand side of (5.3) embodies smoothing of a different type. Suppose that T satisfies the estimate for some positive C and some γ ∈ (0, 1). (Note that this is a variation of what is usually referred to as the Lasota-Yorke inequality [LY73,Liv03] or the Ionescu-Tulcea-Marinescu inequality [ITM50].) Though (5.5) does not imply that T ϕ belongs to a smoother function space then ϕ, it does imply that the gradients of T ϕ are smaller then those of ϕ, at least as long as the gradients of ϕ are sufficiently steep. This is in line with a more colloquial idea of smoothing, though not in line with the traditional mathematical definition used.

Strongly dissipative setting
Where does the assumption C Π > C J + 2κC L come from? This is easy to understand if we consider the "trivial" case Π = 0. In this case, Assumption B.1 is empty and the projection Π ⊥ is the identity. Therefore, the left hand sides from Assumptions B.3 and B.4 coincide, so that one has C J = −C Π and our restriction becomes C J + κC L < 0. This turns out to be precisely the right condition to impose if one wishes to show that E J 0,n → 0 at an exponential rate: Proof. Using the fact that J 0,n ≤ J n−1,n J 0,n−1 , it is easy to show that the following recursion relation holds: It now suffices to apply this n times and to use the fact that J 0,0 = 1.
We now use this estimate to prove a version of Theorem 5.4 when the system is strongly dissipative: Proposition 5.8 Let Assumptions B.2 and B.3 hold and set C T = C J + κC L with κ = η/(1 − η ′ ) as before. Then, for any ϕ : H → R and n ∈ N one has D(P n ϕ)(u) ≤ γ n e κV (u) P n Dϕ 2 (u) .
with γ = e CT . In particular, the semigroup P t has the asymptotic strong Feller property whenever C T < 0.
Comparing this result to the bound (5.3) stated in Theorem 5.4 shows that, the combination of the smoothing Assumption B.4 with Assumption B.1 on the Malliavin matrix allows us to consider the system as if its Jacobian was contracting at an average rate (C Π − C J )/2 instead of expanding at a rate C J . This is precisely the rate that one would obtain by projecting the Jacobian with Π ⊥ at every second step. The additional term containing P 2n ϕ 2 appearing in the right hand side of (5.3) should then be interpreted as the probabilistic "cost" of performing that projection. Since this "projection" will be performed by using an approximate inverse to the Malliavin matrix, it makes sense that the larger the lower bound on M t is, the lower the corresponding probabilistic cost.

Remark 5.9
It is worth mentioning, that nothing in this section required that the number of Wiener process be finite. Hence one is free to take d = ∞, as long as all of the solutions and linearization are well defined (which places conditions on the g k ).

Transfer of variation
Having analyzed the strongly dissipative setting, we now turn to the general setting. We would like to mimic the calculation used in Proposition 5.8, but we do not want to require the system to be "contractive" in the sense of being strongly dissipative. However, in settings where one can prove (5.4) there is usually no requirement of strong dissipativity but rather an assumption of hypoellipticity. This is because the variation in the initial condition is transferred to a variation in the Wiener space. Mirroring the discussion in [Mat08,HM06] (where more details can be found), we begin sketching a proof of (5.4) and then show how to modify it to obtain (5.5). The central idea is to compensate as much as possible the effect of an infinitesimal perturbation in the initial condition to an infinitesimal variation in the driving Wiener process. In short, to transfer one type of variation to another.
Denoting by S = {ξ ∈ H : ξ = 1} the set of possible directions in H, let there be given a map from S × C([0, ∞), R d ) → CM ′ denoted by (ξ, W ) → h ξ (W ), mapping variations in in the initial condition u to variations in the Wiener path W . We will worry about constructing a suitable map in the next sections; for the moment we just explore which properties of h ξ might be useful. Fixing t, let us begin by assuming that the following identity holds: The first and last equalities are just changes in notation.) In words, the middle equality states that the variation in u t (W ) caused by an infinitesimal shift in the initial condition in the direction ξ is equal to the variation in u t caused by an infinitesimal shift of the Wiener process W in the direction h ξ (W ). This is the basic reasoning behind smoothness estimates proved by Malliavin calculus. We begin as in the proof of Proposition 5.8. For any ξ ∈ S, one has that Taking expectations and using the Malliavin integration by parts formula (4.6) to obtain the last equality yields Applying the Cauchy-Schwartz inequality to the last term produces a term of the form of the first term on the right-hand side of (5.3) provided E| t 0 h ξ s · dW (s)| 2 < ∞. Taken alone, provided one can find a mapping (ξ, W ) → h ξ (W ) satisfying (5.6) with E| t 0 h ξ s · dW (s)| < ∞, we have proven an inequality of the form (5.4). In the infinite-dimensional SPDE setting of this paper, finding a map (ξ, W ) → h ξ (W ) satisfying (5.6) seems hopeless, unless the noise is infinite-dimensional itself and acts in a very non-degenerate way on the equation, see [Mas89,DPEZ95,EH01] or the monograph [DPZ96] for some results in this direction. Instead, we only "approximately compensate" for the variation due to differentiating in the initial direction ξ with a shift in the Wiener process. As such, given an mapping (ξ, W ) → h ξ (W ), we replace the requirement in (5.6) with the definition and hope that we can choose h ξ in such a way that ρ t → 0 as t → ∞. As before, we postpone choosing a mapping (ξ, W ) → h ξ (W ) until the next section. For the moment we are content to explore the implications of finding such a mapping with desirable properties.
Returning to (5.7) but using (5.8), we now have (5.9) Taking expectations of both sides and applying the Malliavin integration by parts the first term on the right-hand side produces which in turn, after application of the Cauchy-Schwartz inequality twice, yields for some γ ∈ (0, 1) we will have proved Theorem 5.4. Choosing a mapping (ξ, W ) → h ξ (W ) so that these two conditions hold is the topic of the next four sections.

Choosing a variation
As discussed in [HM06] and at length in [Mat08], if one looks for the variation h ξ such that (5.6) holds and t 0 |h ξ s | 2 ds is minimized, then the answer is h ξ While this is not quite the correct optimisation problem to solve since its solution h ξ is not adapted to W and hence E| t 0 h ξ s · dW (s)| 2 = t 0 E|u s | 2 ds, it is in general a good enough choice. A bigger problem is that the space on which M t can be inverted is far from evident. If the range of G was dense in H (which requires infinitely many driving Wiener processes), then there is some chance that Range(J t ) ⊂ Range(M t ) and the above formula for h t could be used. This is in fact the case where the Bismut-Elworthy-Li formula is often used and which might be refereed to as "truly elliptic." It this case the system is in fact strong Feller. We are precisely interested in the case when only a finite number of directions are forced (or there variance decays so fast that it is effectively true). One of the fundamental ideas used in this article is that we need only effect they system of a finite dimensional subspace since the dynamic pathwise control embodied in Assumption B.4 can control the remaining degrees of freedom.
While Theorem 6.7 of the next section gives conditions that ensure that M t is almost surely non-degenerate, it does not give much insight into the structure of the range since it only deals with finite dimensional projections. However, Assumption B.1 ensures that it is unlikely the eigenvectors with sizable projection in ΠH have small eigenvalues. As long as this is true, the "regularised inverse" (M t + β) −1 , which always exists since M t is positive definite, will be a "good inverse" for M t , at least on ΠH. This suggests that we make the choice h ξ s = G * K s,t (M t + β) −1 J t ξ for some very small β > 0. Observe that (5.12) which will be expected to be small as long as J t ξ has small projection (relative to the size of β) in Π ⊥ H. But in any case, the norm of the right hand side in (5.12) will never exceed the norm of J t ξ, so that for small values of β, D ξ u t − D h ξ u t is expected to behave like Π ⊥ J t ξ . Assumption B.4 precisely states that if one projects the Jacobian onto Π ⊥ H, then the system behaves as if it was "strongly dissipative" as in Section 5.1.1. All together, this motivates alternating between choosing h ξ = A * n,n+1 (M n,n+1 + β n ) −1 J n,n+1 ρ n for even n and h ξ ≡ 0 on [n, n + 1] for odd n.
Since we will split time into intervals of length one, we introduce the following notations: We then define the map (ξ, W ) → h ξ (W ) recursively by for s ∈ [2n − 1, 2n) and n ∈ N . (5.13) Here, as before, ρ 0 = ξ, ρ t = J 0,t ξ − A 0,t h ξ s = D ξ u t − D h ξ u t , and β n is a sequence of positive random numbers measurable with respect to F n which will be chosen later.
Observe that these definitions are not circular since the construction of h ξ s for s ∈ [n, n + 1) only requires the knowledge of ρ n , which in turn depends only on h ξ s for s ∈ [0, n). The remainder of this section is devoted to showing that this particular choice of h ξ is "good" in the sense that it allows to satisfy (5.11). We are going to assume throughout this section that Assumptions A.1 and B.1-B.4 hold, so that we are in the setting of Theorem 5.4, and that h ξ is defined as in (5.13).

Preliminary bounds and definitions
We start by a stating a few straightforward consequences of Assumption B.2: Proposition 5.10 For any α ≤ 1, one has the bound Furthermore, for η > 0 and p > 0 such that ηp ≤ 1, one has Finally, setting κ = η/(1 − η ′ ) as before, one has the bound Proof. The first bound follows immediately from Jensen's inequality. The second and third inequalities are shown by rewriting the estimate from Assumption B.2 as E( exp(ηpV (u n ))|F n−1 ) ≤ exp (ηpη ′ V (u n−1 ) + ηpC L ) , and iterating it.
Similarly, we obtain a bound on the Jacobian and on the Malliavin derivative A n of the solution flow between times n and n + 1: Proposition 5.11 For any p ∈ [0,p], one has sup n≤s<t≤n+1 E J s,t p ≤ exp(p(η ′ ) n ηV (u 0 ) + pC J + pκC L ) (5.14) Furthermore, (5.14) also holds for J (2) s,t with C J replaced by C (2) J .
Proof. We only need to show the bound for p =p, since lower values follow again from Jensen's inequality. The bound (5.14) is an immediate consequence of Assumption B.2 and Proposition 5.10. The second bound follows by writing In addition to these first Malliavin derivatives, we will need the control of the derivative of various objects involving the Malliavin derivative. The following lemma gives control over two objects related to the second Malliavin derivative: Proof. For this, we note that by (4.9) one has the identities Hence if p ∈ [0,p/2] (which by the way also ensures that 2pκ < 1) it follows from Proposition 5.11 that for r ≤ s and similarly for s ≤ r. Since, for p ≥ 1, we can write the second estimate then follows from the first one.

Controlling the error term
The purpose of this section is to show that the "error term" ρ t = D ξ u t − D h ξ u t goes the zero as t → ∞, provided that the "control" h ξ is chosen as explained in Section 5.3. We begin by observing that for even integer times, ρ n is given recursively by where R β k is the operator Observe that R β k measures the error between M k (β + M k ) −1 and the identity, which we will see is small for β very small. This recursion is of the form ρ 2n+2 = Ξ 2n+2 ρ 2n , with the (random) operator Ξ 2n+2 : H → H defined by Ξ 2n+2 = J 2n+1 R β2n 2n J 2n . Notice that Ξ 2n is F 2n -measurable and that Ξ k is defined only for even integers k. Define the n-fold product of the Ξ 2k by It is our aim to show that it is possible under the assumptions of Section 5 to choose the sequence β n in an adapted way such that for a sufficiently small constantη and p ∈ [0,p/2] one has for someκ > 0. This will give the needed control over the last term in (5.10). By Assumption B.1, we have a bound on the Malliavin covariance matrix of the form Here, by the Markov property, the quantities ε and α do not necessarily need to be constant, but are allowed to be F k -measurable random variables. In order to obtain (5.17), the idea is to decompose Ξ 2n+2 as The crux of the mater is controlling the term ΠR β2n 2n since J 2n+1 Π ⊥ is controlled by Assumption B.4 and we know that R β2n 2n ≤ 1. To understand and control the I 2n+2,2 term, we explore the properties of a general operator of the form of R β 2n . where Λ δ = {ξ : Πξ ≥ δ ξ }. Then, defining R = 1−M (β +M ) −1 = β(β +M ) −1 for some β > 0, one has ΠR ≤ δ ∨ β/γ.
Now for Rξ ∈ Λ δ , we have by assumption (5.20) Combining both estimates gives the required bound.
This result can be applied almost directly to our setting in the following way: Corollary 5.14 Let M (ω) be a random operator satisfying the conditions of Lemma 5.13 almost surely for some random variable γ. If we choose β such that, for some (deterministic) δ ∈ (0, 1) , p ≥ 1 and C > 0, one has the bound P(β ≥ δ 2 γ) ≤ Cδ p , then E ΠR p ≤ (1 + C)δ p .
To obtain the second statement, it is sufficient to consider (5.18) with ε = β 2n /δ 2 , so that one can take for γ the random variable equal to ε on the set for which the bound (5.18) holds and 0 on its complement. It then follows from the choice (5.21) for β 2n that the assumption for the first part are satisfied with C = 1 and p =p, so that the statement follows.

Lemma 5.15
For any ε > 0 and p ∈ [0,p/2], there exists a δ > 0 sufficiently small so that if one chooses β n as in Corollary 5.14 and η such that κp ≤ 1, one has Proof. Since for every ε > 0 there exists a constant C ε such that |x+y| p ≤ e pε/2 |x| p + C p ε |y| p , recalling the definition of I 2n+2,1 and I 2n+2,2 from (5.19) we have that We begin with the first term since it is the most straightforward one. Using the fact that R β2n 2n ≤ 1 and thatpη < 1 − η ′ by the assumption on η, we obtain from Assumptions B.2 and B.3 that Turning to the second term, we obtain for any δ ∈ (0, 1) the bound provided that we choose β n as in Corollary 5.14. Choosing now δ sufficiently small (it suffices to choose it such that δ p ≤ εp 2 √ 2 C −p ε e −pCJ −pCΠ for every p ≤p/2) we obtain the desired bound.
Combining Lemma 5.15 with (5.23), we obtain the needed result which ensures that the "error term" ρ t from (5.10) goes to zero.  uniformly as n → ∞. We assume throughout this section that h ξ t was constructed as in Section 5.3 with β n as in (5.21).
Since our choice of h ξ s is not adapted to the W s , this does not follow from a simple application of Itô's isometry. However, the situation is not as bad as it could be, since the control is "block adapted." By this we mean that h n is adapted to F n for every integer value of n. For non-integer values t ∈ (n, n + 1], h t has no reason to be F tmeasurable in general, but it is nevertheless F n+1 -measurable. The stochastic integral in (5.24) is accordingly not an Itô integral, but a Skorokhod integral. Hence to estimate (5.24) we must use its generalization given in (4.7) which produces where |||f ||| 2 I = I |f (s)| 2 ds and M HS denotes the Hilbert-Schmidt norm on linear operators from R d to R d . We see the importance of the "block adapted" structure of h s . If not for this structure, the integrand appearing in the second term above would need to decay both in s and t to be finite.
The main result of this section is Proof of Proposition 5.17. In the interest of brevity we will set M n = M n + β n and I n = [n, n + 1]. We will also write |||h||| I for the norm on L 2 (I, R d ) viewed as a subset of CM ′ and we will use · and ||| · ||| I to denote respectively the induced operator norm on linear maps from H to H and CM ′ to H. Hopefully without too much confusion, we will also use ||| · ||| I to denote the induced operator norm on linear maps from H to CM ′ . In all cases, we will further abbreviate |||h||| In to |||h||| n . Observe now that the definitions of M n and A n imply the following almost sure bounds: We start by bounding the first term on the right hand-side of (5.25). Observe that Using the bound on A * k M −1 k from (5.26), we obtain By our assumption that 10/p + 2/q ≤ 1 we can find 1/q + 1/r + 1/p = 1 with q ≤q, 2r ≤p and 2p ≤p. By the Hölder inequality we thus have From Proposition 5.11, Assumption B.1 and Lemma 5.16, we obtain the existence of a positive constant C (depending only on the choice made forκ and on the bounds given by our standing assumptions) such that one has the bounds (5.28) combining these bounds and summing over k yields uniformly in n ≥ 0.
We now turn to bound the second term on the right hand side of (5.25). Since the columns of the matrix representation of the integrand are just D i s , the ith component of the Malliavin derivative, we have From the definition of h t , Lemma 5.12, the relation M 2k = A 2k A * 2k + β 2k , and the fact that both ρ 2k and β 2k are F 2k -measurable, we have that for fixed s ∈ I 2k , D i s h is an element of L 2 (I 2k , R) ⊂ CM ′ with: . For brevity we suppress the subscripts k on the operators and norms for a moment. It then follows from (5.26) that one has the almost sure bounds In particular, this yields the bounds Applying all of these estimates to (5.31) we obtain the bound The assumption that 10/p + 2/q ≤ 1 ensures that we can find q ≤q/2, r ≤p/2 and p ≤p/4 with 1/r + 2/p + 1/q = 1. Applying Hölder's inequality to the preceding products yields: We now use previous estimates to control each term. From Lemma 5.12 and Proposition 5.11, we have the bounds . Recall furthermore the bounds on ρ 2k and J 2k already mentioned in (5.28). Lastly, from Assumption B.1 we have that, similarly as before, there exists a positive constant C such that Combining all of these estimates produces m i=1 2k+1 2k |||D i s h||| 2 2k ds ≤ C exp((8η + 2κ)V (u 0 ))U 2 (u 0 ) , for some different constant C depending only on C J , C (2) J , C L , η, κ,κ and the choice of δ in (5.21). Combining this estimate with (5.29) and (5.25) concludes the proof.

Spectral properties of the Malliavin matrix
The results in this section build on the ideas and techniques from [MP06] and [BM07]. In the first, the specific case of the 2D-Navier Stokes equation was studied using similar ideas. The time reversed representation of the Malliavin matrix used there is also the basis of our analysis here (see also [Oco88]). In the context for the 2D-Navier Stokes equations, a result analogous to Theorem 6.7 was proven. As here, one of the key results needed is a connection between the typical size of a non-adapted Wiener polynomial and the typical size of its coefficients. In [MP06], since the non-linearity was quadratic, only Wiener polynomials of degree one were considered and the calculations and formulation were made a coordinate dependent fashion. In [BM07], the calculations were reformulated in a basis free fashion which both made possible the extension to more complicated non-linearities and the inclusion of forcing which was not diagonal in the chosen basis. Furthermore in [BM07], a result close to Theorem 6.7 was claimed. Unfortunately, the auxiliary Lemma 9.12 in that article contains a mistake, which left the proof of this result incomplete.
That being said, the techniques a presentation used in this and the next section build on and refine those from [BM07]. One technical, but important, distinction between Theorem 6.7 and the preceding versions is that Theorem 6.7 allows for rougher test functions. This is accomplished by allowing K t,T to have a singularity in a certain interpolation norm as t → T . See equation (6.3a) for the precise from. This extension is important in correcting an error in [HM06] which requires control of the Malliavin matrix of a type given by Theorem 6.7, that is with test functions rougher than those allowed in [MP06]. Indeed, the second inequality in equation (4.25) of [HM06] is not justified, since the operator M 0 is only selfadjoint in L 2 and not in H 1 . Theorem 6.7 rectifies the situation by dropping the requirement to work with H 1 completely.

Bounds on the dynamic
As the previous sections have shown, it is sufficient to have control on the moments of u and J in H to control their moments in many stronger norms. This motivates the next assumption. For this entirety to this section we fix a T 0 > 0. Assumption C.1 There exists a continuous function Ψ 0 : H → [1, ∞) such that, for every T ∈ (0, T 0 ] and every p ≥ 1 there exists a constant C such that for every u 0 ∈ H. Here, J denotes the operator norm of J from H to H.
Under this assumption, we immediately obtain control over the adjoint K s,t .
Proposition 6.1 Under Assumption C.1 for every T > 0 and every p ≥ 1 there exists a constant C such that for every u 0 ∈ H.
Proof. By Proposition 3.9 we know that K s,t is the adjoint of J s,t in H. Combined with Assumption C.1 this implies the result.
In the remainder of this section, we will study the solution to (3.1) away from t = 0 and up to some terminal time T which we fix from now on. We also introduce the interval for some δ ∈ (0, T 4 ] to be determined later. Given u t a solution to (3.1), we also define a process v t by v t = u t − GW (t), which is more regular in time. Using Assumption C.1 and the a priori estimates from the previous sections, we obtain: Proposition 6.2 Let Assumption C.1 hold and Ψ 0 be the function introduced there. For any fixed γ < γ ⋆ and β < β ⋆ there exists a positive q so that if Ψ = Ψ q 0 then the solutions to (3.1) satisfy the following bounds for every initial condition u 0 ∈ H: Finally, the adjoint K s,t to the linearization satisfies the bounds wherep β is as in Proposition 3.8. In all these bounds, C p is a constant depending only on p and on the details of the equation (3.1).

Remark 6.3
One can assume without loss of generality, and we will do so from now on, that the exponent q defining Ψ is greater or equal to n, the degree of the nonlinearity. This will be useful in the proof of Lemma 6.16 below.
Proof. It follows immediately from Assumption C.1 that Combining this with Proposition 3.5 yields first of the desired bounds with q = p γ .
Here, Ψ 0 is as in Assumption C.1 and p γ is as in Proposition 3.5.
Turning to the bound on ∂ t v t , observe that v satisfies the random PDE It follows at once from Proposition 3.5 and Assumption A.1.2 that the quoted estimate holds with q = p γ+1 . More precisely, it follows from Proposition 3.5 that u t ∈ H α for every α < γ ⋆ + 1. Therefore, Lu t ∈ H γ for γ < γ ⋆ . Furthermore, N ∈ Poly(H γ+1 , H γ ) by Assumption A.1.2, so that N (u t ) ∈ H γ as well. The claim then follows from the a priori bounds obtained in Proposition 3.5. Concerning the bound (6.2a) on the linearization J s,t , Proposition 3.6 combined with Assumption C.1 proves the result with q =q γ + 1. The line of reasoning used to bound ∂ t v t γ also controls ∂ t J s,t γ for s < t and s, t ∈ I δ , since ∂ t J s,t = −LJ s,t + DN (u t )J s,t .
Since Proposition 6.1 give an completely analogous bound for K s,t in H as for J s,t the results on K follow from the a priori bounds in Proposition 3.8.

A Hörmander-like theorem in infinite dimensions
In this section, we are going to formulate a lower bound on the Malliavin covariance matrix M t under a condition that is very strongly reminiscent of the bracket condition in Hörmanders celebrated "sums of squares" theorem [Hör85,Hör67]. The proof of the result presented in this section will be postponed until Section 6.3 and constitutes the main technical result of this work.
Throughout all of this section and Section 6.3, we are going to make use of the bounds outlined in Proposition 6.2. We therefore now fix once and for all some choice of constants From now on, we will only ever use Proposition 6.2 with this fixed choice for γ and β. This is purely a convenience for expositional clarity since we will need these bounds only finitely many times. As a side remark, note that one should think of these constants as being arbitrarily close to γ ⋆ and β ⋆ respectively. This definition allows us to define a family of increasing subsets A i ⊂ Poly(γ, β) by the following recursion: Remark 6.4 Recall from (3.5) that Q α is proportional to the iterated "Lie bracket" of Q with g α1 , g α2 and so forth. Similarly, [F σ , Q α ] is the Lie bracket between two different iterated Lie brackets. As such, except for the issue of admissibility, the set of brackets considered here is exactly the same as in the traditional statement of Hörmander's theorem, only the order in which they appear is slightly different.
To each A N we associate a positive symmetric quadratic form-valued function Q N by Lastly for α ∈ (0, 1), and for a given orthogonal projection Π : (6.6) With this notation, we make the following non-degeneracy assumption: for every u ∈ H a . Furthermore, for every p ≥ 1, t > 0 and every α ∈ (0, 1), there exists C such that E Λ −p α (u t ) ≤ CΨ p (u 0 ) for every initial condition u 0 ∈ H.
Remark 6.5 Assumption C.2 is in some sense weaker than the usual non-degeneracy condition of Hörmander's theorem, since it only requires Q N to be sufficiently nondegenerate on the range of Π. In particular, if Π = 0, then Assumption C.2 is void and always holds with Λ α = 1, say. This is the reason why, by choosing for Π a projector onto some finite-dimensional subspace of H, one can expect Assumption C.2 to hold for a finite value of N , even in our situation where A N only contains finitely many elements.
Remark 6.6 As will be seen in Section 8, it is often possible to choose Λ α to be a constant, so that the second part of Assumption C.2 is automatically satisfied.
When Assumption C.2 holds, we have the following result whose proof is given in Section 6.3. Theorem 6.7 Consider an SPDE of the type (3.1) such that Assumptions A.1 and C.1 hold. Let furthermore the Malliavin matrix M t be defined as in (4.10) and S α as in (6.6). Let Π be a finite rank orthogonal projection satisfying Assumption C.2. Then, there exists θ > 0 such that, for every α ∈ (0, 1), every p ≥ 1 and every t > 0 there exists a constant C such that the bound holds for every u 0 ∈ H and every ε ≤ 1.

Remark 6.8
If Π is a finite rank orthogonal projection satisfying Assumption C.2 then Theorem 6.7 provides the critically ingredient to prove the smoothness of the density of (P * t δ x )Π −1 with respect to Lebesgue measure. Though [BM07] contains a few unfortunate errors, it still provides the framework needed to deduce smoothness of these densities from Theorem 6.7. In particular, one needs to prove that Πu t is infinitely Malliavin differentiable. Section 5.1 of [BM07] shows how to accomplish this in a setting close to ours, see also [MP06].

Proof of Theorem 6.7
While the aim of this section is to prove Theorem 6.7, we begin with some preliminary definitions which will simplify its presentation. Many of the arguments used will rely on the construction of "exceptional sets" of small probability outside of which certain intuitive implications hold. This justifies the introduction of the following notational shortcut: Definition 6.9 Given a collection H = {H ε } ε≤1 of subsets of the ambient probability space Ω, we will say that "H is a family of negligible events" if, for every p ≥ 1 there exists a constant C p such that P(H ε ) ≤ C p ε p for every ε ≤ 1.
Given such a family H and a statement Φ ε depending on a parameter ε > 0, we will say that "Φ ε holds modulo H" if, for every ε ≤ 1, the statement Φ ε holds on the complement of H ε .
We will say that the family H is "universal" if it does not depend on the problem at hand. Otherwise, we will indicate which parameters it depends on.
Given two families H 1 and H 2 of negligible sets, we write H = H 1 ∪ H 2 as a shortcut for the sentence "H ε = H ε 1 ∪ H ε 2 for every ε ≤ 1." Let us state the following useful fact, the proof of which is immediate: Lemma 6.10 Let H ε n be a collection of events with n ∈ {1, . . . , Cε −κ } for some arbitrary but fixed constants C and κ and assume that P(H ε k ) = P(H ε ℓ ) for any pair (k, ℓ). Then, if the family {H ε 1 } is negligible, the family {H ε } defined by H ε = n H ε n is also negligible.

Remark 6.11
The same statement also holds of course if the equality between probabilities of events is replaced by two-sided bounds with multiplicative constants that do not depend on k, ℓ, and ε.
An important particular case is when the family H depends on the initial condition u 0 to (3.1). We will then say that H is "Ψ-controlled" if the constant C p can be bounded byC p Ψ p (u 0 ), whereC p is independent of u 0 .
In this language, the conclusion of Theorem 6.7 can be restated as saying that there exists θ > 0 such that, for every α > 0, the event inf ϕ∈Sα ϕ, M T ϕ ≤ ε ϕ 2 is a Ψ θ -controlled family of negligible events. Recall that the terminal time T was fixed once and for all and that the function Ψ was defined in Proposition 6.2. We further restate this as an implication in the following theorem which is easily seen to be equivalent to Theorem 6.7: Theorem 6.12 Let Π be a finite rank orthogonal projection satisfying Assumption C.2. Then, there exists θ > 0 such that for every α ∈ (0, 1), the implication holds modulo a Ψ θ -controlled family of negligible events.

Basic structure and idea of proof of Theorem 6.12
We begin with a overly simplified version of the argument which neglects some technical difficulties. The basic idea of the proof is to argue that if M T ϕ, ϕ is small then Q k (u T )ϕ, ϕ must also small (with high probability) for every k > 0. This is proved inductively, beginning with the directions which are directly forced, namely those belonging to A 1 . Assumption C.2 then guarantees in turn that Πϕ must be small with high probability. On the other hand, since ϕ ∈ S α , we know for a fact that Πϕ ≥ α ϕ which is not small. Hence one of the highly improbable events must have occurred.
This sketch of proof is essentially the same as that of Hörmander's theorem in finite dimensions, see [Mal78,KS84,KS85a,Nor86,Nua95]. Trying to adapt this argument to the infinite-dimensional case, one is rapidly faced with two major hurdles. First, processes of the form t → J t,T g, ϕ appearing in the definition of M T are not adapted to the filtration generated by the driving noise. In finite dimensions, this difficulty is overcome by noting that and then working withM t instead of M t . (M t is often called the reduced Malliavin covariance matrix.) The processes t → J −1 0,t g, ϕ appearing there are now perfectly nice semimartingales and one can use Norris' lemma [Nor86], which is a quantitative version of the Doob-Meyer decomposition theorem, to show inductively that if ϕ, M T π is small, then t → J −1 0,t Q(u t ), ϕ must be small for every vector field Q ∈ A k . In our setting, unlike in some previous results for infinite-dimensional systems [BT05], the Jacobian J 0,t is not invertible. This is a basic feature of dissipative PDEs with a smoothing linear term which is the dominating term on the right hand side. Such dynamical systems only generate semi-flows as opposed to invertible flows.
Even worse, there appears to be no good theory characterising a large enough subset belonging to its range. The only other situations to our knowledge where this has been overcome previously are the linear case [Oco88], as well as the particular case of the two-dimensional Navier-Stokes equations on the torus [MP06] and in [BM07] for a setting close to ours. As in those settings, we do not attempt to define something like the operatorM t mentioned above but instead we work directly with M t , so that we do not have Norris' lemma at our disposal. It will be replaced by the result from Section 7 on "Wiener polynomials." This result states that if one considers a polynomial where the variables are independent Wiener processes and the coefficients are arbitrary (possibly non-adapted) Lipschitz continuous stochastic processes, then the polynomial being small implies that with high probability each individual monomial is small. It will be shown in this section how it is possible to exploit the polynomial structure of our nonlinearity in order to replace Norris' lemma by such a statement.
Another slightly less serious drawback of working in an infinite-dimensional setting is that we encounter singularities at t = 0 and at t = T (for the operator J t,T ). Recall the definition of the time interval I δ = [ T 2 , T − δ] from Section 3. We will work on this interval which is strictly included in [0, T ] to avoid these singularities. There will be a trade-off between larger values of δ that make it easy to avoid the singularity and smaller values of δ that make it easier to infer bounds for Q k (u T )ϕ, ϕ .
When dealing with non-adapted processes, it typical to replace certain standard arguments which hing on adaptivity by arguments which use local time-regularity properties instead. This was also the approach used in [MP06,BM07]. To this end we introduce the following Hölder norms. For θ ∈ (0, 1], we define the Hölder norm for functions f : and similarly if f is real-valued. (Note that even though we use the same notation as for the norm in the Cameron-Martin space in the previous section, these have nothing to do with each other. Since on the other hand the Cameron-Martin norm is never used in the present section, we hope that this does not cause too much confusion.) We furthermore set |||f ||| θ,γ = sup where · γ denotes the γth interpolation norm defined in Assumption A.1. Finally, we are from now on going to assume that δ is a function of ε through a scaling relation of the type for some (very small) value of r to be determined later.

Some preliminary calculations
We begin with two preliminary calculations. The first translates a given growth of the moments of a family of random variables into a statement saying that the variables are "small," modulo a negligible family of events. As such, it is essentially a translation of Chebyshev's inequality into our language. The second is an interpolation result which controls the supremum of a functions derivative by the supremum of the function and the size of some Hölder coefficient.
Lemma 6.13 Let δ be as in (6.8) with r > 0, let Ψ : H → [1, ∞) be an arbitrary function, and let X δ be a δ-dependent family of random variables such that there exists b ∈ R (b is allowed to be negative) such that, for every p ≥ 1, E|X δ | p ≤ C p Ψ p (u 0 )δ −bp . Then, for any q > br and any c > 0, the family of events Proof. It follows from Chebychev's inequality that whereC ℓ is equal to C p c p with ℓ = p(q − br). Provided that q − br > 0, this holds for every ℓ > 0 and the claim follows.
Then, there exists a constant C depending only on α such that Here, |||f ||| α denotes the best α-Hölder constant for f .
1/α . The claim then follows from the fact that if |∂ t f (x)| ≥ A over an interval I, then there exists a point x 1 in the interval such that |f (x 1 )| ≥ A|I|/2.

Transferring properties of ϕ back from the terminal time
We now prove a result which shows that if ϕ ∈ S α then with high probability both ΠK T −δ,T ϕ and the ratio ΠK T −δ,T ϕ / K T −δ,T ϕ can not change dramatically for small enough δ. This allows us to step back from the terminal time T to the right end point of the time interval I δ . As mentioned at the start of this section, this is needed to allow the rougher test functions used in Theorem 6.7. Lemma 6.15 Let (6.3d) hold and fix any orthogonal projection Π of H onto a finite dimensional subspace of H spanned by elements of H 1 . Recall furthermore the relation (6.8) between δ and ε. There exists a constant c ∈ (0, 1) such that, for every r > 0 and every α > 0, the implication holds modulo a Ψ 1/r -controlled family of negligible events.
To prove this Lemma, we will need the following axillary lemma whose proof is given at the end of the section. Lemma 6.16 For any δ ∈ (0, T /2], one has the bound for every p ≥ 1 and every u 0 ∈ H. Here, n is the degree of the nonlinearity N .
Proof of Lemma 6.15. We begin by showing that, modulo some Ψ 1/r -dominated family of negligible events, By the assumption on Π, we can find a collection {v k } N k=1 in H 1 with v k = 1 such that Πϕ = k v k v k , ϕ . Therefore, there exists a constant C 1 = sup k v k 1 so that Πϕ ≤ C 1 ϕ −1 . Combining Lemma 6.13 with Lemma 6.16, we see that (1−a)r -dominated family of negligible events. Hence, modulo the same family of events, Combining now Lemma 6.13 with (6.3c), we see that (1−a)r -dominated family of negligible events, thus showing that K T −δ,T ϕ belongs to S cα with c = 1/(2C) and concluding the proof.
We now give the proof of the auxiliary lemma used in the proof of Lemma 6.15.
Proof of Lemma 6.16. It follows from (3.13) and the variation of constants formula that It now follows from Assumption A.1, point 3 that there exists γ 0 ∈ [0, γ ⋆ + 1) such that DN * (u) is a bounded linear map from H to H −a for every u ∈ γ 0 and that its norm is bounded by C u n−1 γ0 for some constant C. The first bound then follows by combining Proposition 6.2 with the fact that e −Lt is bounded by Ct −a as an operator from H −a to H as a consequence of standard analytic semigroup theory [Kat80].
In order to obtain the second bound, we write where the last inequality is again a consequence of standard analytic semigroup theory. The claim then follows from (6.3c).

The smallness of M T implies the smallness of
In this section, we show that if M T ϕ, ϕ is small then Q N (u t )K t,T ϕ, K t,T ϕ must also be small with high probability for every t ∈ I δ . The precise statement is given by the following result: Lemma 6.17 Let the Malliavin matrix M T be defined as in (4.10) and assume that Assumptions A.1 and C.1 are satisfied. Then, for every N > 0, there exist r N > 0, p N > 0 and q N > 0 such that, provided that r ≤ r N , the implication holds modulo some Ψ qN -dominated negligible family of events.
Proof. The proof proceeds by induction on N and the steps of this induction are the content of the next two subsections. Since A 1 = {g 1 , . . . , g d }, the case N = 1 is implied by Lemma 6.18 below, with p 1 = 1/4, q 1 = 8, and r 1 = 1/(8p β ). The inductive step is then given by combining Lemmas 6.21 and 6.24 below. At each step, the values of p n and r n decrease while q n increases, but all remain strictly positive and finite after finitely many steps.

The first step in the iteration
The "priming step" in the inductive proof of Lemma 6.17 follows from the fact that the directions which are directly forced by the Wiener processes are not too small with high probability. Lemma 6.18 Let the Malliavin matrix M be defined as in (4.10) and assume that Assumptions A.1 and C.1 are satisfied. Then, provided that r ≤ 1/(8p β ), the implication holds modulo some Ψ 8 -dominated negligible family of events. Here,p β is as in (6.3b) and β was fixed in (6.4).
The claim now follows from the a priori bound (6.3b) and Lemma 6.13 with q = 1 4 and b =p β .

The iteration step
Recall that we consider evolution equations of the type where F is a "polynomial" of degree n. The aim of this section is to implement the following recursion: if, for a given polynomial Q, the expression Q(u t ), K t,T ϕ is "small" in the supremum norm, then both the expression [Q, F ](u t ), K t,T ϕ and [Q, g k ](u t ), K t,T ϕ must be small in the supremum norm as well.
The main technical tool used in this section will be the estimates on "Wiener polynomials" from Section 7. Using the notation for a multi-index α = (α 1 , . . . , α ℓ ), this estimate states that if an expression of the type |α|≤m A α (t)W α (t) is small, then, provided that the processes A α are sufficiently regular in time, each of the A α must be small. In other words, two distinct monomials in a Wiener polynomial cannot cancel each other out. Here, the processes A α do not have to be adapted to the filtration generated by the W k , so this gives us some kind of anticipative replacement of Norris' lemma. The main trick that we use in order to take advantage of such a result is to switch back and forth between considering the process u t solution to (6.12) and the process v t defined by which has more time-regularity than u t . Recall furthermore that given a polynomial Q and a multi-index α, we denote by Q α the corresponding term (3.5) appearing in the (finite) Taylor expansion of Q.
Recall the definition Poly m (γ, β) = Poly m (H γ , . We first show that if Q ∈ Poly m (γ, β) and Q(u t ), K t,T ϕ is small, then the expression Q α (v t ), K t,T ϕ (note the appearance of v t rather than u t ) must be small as well for every multi-index α: Lemma 6.19 Let Q ∈ Poly m (γ, β) for some m ≥ 0 and for γ and β as chosen in (6.4). Let furthermore q > 0 an setq = q3 −m . Then, the implication holds modulo some Ψ 6(m+1)/q -dominated negligible family of events, provided that r <q/(6p β ).
Proof. Note first that both inner products appearing in the implication are well-defined by Proposition 6.2 and the assumptions on Q. By homogeneity, we can assume that ϕ = 1. Since Q is a polynomial, (3.4) implies that Applying Theorem 7.1, we see that, modulo some negligible family of events Osc m W , sup t∈I δ | Q(u t ), K t,T ϕ | ≤ ε q implies that either or there exists some α such that ||| Q α (v t ), K ·,T ϕ ||| 1 ≥ ε −q/3 . (6.14) We begin by arguing that the second event is negligible. Since Q is of degree m, there exists a constant C such that Here, we used the fact that Q α ∈ Poly m (H γ , H −1−β ) to bound the first term and the fact that Q α ∈ Poly m (H γ+1 , H −β ) to bound the second term. The fact that Q α belongs to these spaces is a consequence of g k ∈ H γ⋆+1 and of the definition (3.4) of Q α . Therefore, (6.14) implies that either Combining the Cauchy-Schwarz inequality with Proposition 6.2, we see that X δ and Y δ satisfy the assumptions of Lemma 6.13 with Φ = Ψ m+1 and b =p β , thus showing that the families of events (6.15) and (6.16) are both Ψ 6(m+1)/q -dominated negligible, provided that r <q/(6p β ).
In the sequel, we will need the follow simple result which is, in some way, a converse to Theorem 7.1. Lemma 6.20 Given any integer N > 0 and any two exponents 0 <q < q, there exists a universal family of negligible events Sup N W such that the implication Since for any p > 0, is a negligible family of events, the claim follows at once.
As a corollary to Lemmas 6.19 and 6.20, we now obtain the key estimate for Lemma 6.17 in the particular case where the commutator is taken with one of the constant vector fields: Lemma 6.21 Let Q ∈ Poly m (γ, β) be a polynomial of degree m and let q > 0. Then, forq = q3 −(m+1) , the implication holds for all ϕ ∈ H modulo some Ψ 2(m+1)/q -dominated negligible family of events, provided that r <q/(2p β ).
Proof. Since it follows from (3.4) that (Q α ) β = Q α∪β , we have the identity Combining Lemma 6.19 and Lemma 6.20 with N = m proves the claim.
In the next step, we show a similar result for the commutators between Q and F . We are going to use the fact that if a function f is differentiable with Hölder continuous derivative, then f being small implies that ∂ t f is small as well, as made precise by Lemma 6.14. As previously, we start by showing a result that involves the process v t instead of u t : Lemma 6.22 Let Q be as in Lemma 6.19 and such that [Q α , F σ ] ∈ Poly(γ, β) for any two multi-indices α, σ. Let furthermore q > 0 and setq = q3 −2m /8. Then the implication holds modulo some Ψ 6(m+1)/q -dominated negligible family of events, provided that r <q/(6p β ). (As before the empty multi-indices are included in the supremum.) Proof. By homogeneity, we can assume that ϕ = 1. Combining Lemma 6.19 with Lemma 6.14 and definingq = q3 −m , we obtain that sup t∈I δ | Q(u t ), 1/3 } , (6.17) modulo some Ψ 6(m+1)/q -dominated negligible family of events, provided that r ≤ q/(6p β ). Note that this family is in particular independent of both α and ϕ. Here and in the sequel, we use the letter C to denote a generic constant depending on the details of the problem that may change from one expression to the next. One can see that Q α (v t ), K t,T ϕ is differentiable in t by combining Proposition 6.2 with the fact that Q α ∈ Poly(H γ , H −1−β ) ∩ Poly(H γ+1 , H −β ) as in the proof of Lemma 6.19. See [DL92] for a more detailed proof of a similar statement.
Computing the derivative explicitly, we obtain The function B α can be further expanded to Notice that, by the assumption that [Q α , F σ ] ∈ Poly(γ, β), one has (Here it is understood that if one of the exponents of the norm of v t is negative, the term in question actually vanishes.) It therefore follows from Proposition 6.2 that for every p ≥ 1 and some constants C p .
Since the Hölder norm of f α,ϕ is bounded by we can use the bounds on B α just obtained, the Cauchy-Schwarz inequality, Proposition 6.2, and Lemma 6.13, to obtain modulo some Ψ 12(n+m)/q -dominated negligible family of events, provided that r ≤ min{q/12,q/(6p β )}. As a consequence, modulo this family, we obtain from (6.17) the bound sup α sup t∈I δ |f α,ϕ (t)| ≤ Cεq 8 which can be rewritten as  (6.20) or there exists some α and σ such that Again following the same logic as Lemma 6.19, we see that the family of events in (6.21) is Φ 6(m+1)/q -dominated negligible provided that r <q/(6p β ).
In order to turn this result into a result involving the process u t , we need the following expansion: The Jacobi identity for Lie bracket states that By iterating this calculation, we see that for any multi-index ζ, [Q α , F σ ] ζ is equal to some linear combination of a finite number of terms of the form [Q αi , F σi ] for some multi-indices α i and σ i .
Proof. It follows from Lemma 6.23 that Combining the control of the [Q αi , F σi ](v t ), K T,t ϕ obtained in Lemma 6.22 with Lemma 6.20 gives the quoted result.
6.10 Putting it all together: proof of Theorem 6.12 We how finally combine all of the results we have just accumulated to give the proof of the main theorem of these sections.
Proof of Theorem 6.12. We are going to prove the statement by showing that there exists θ > 0 and, for every α > 0, a Ψ θ -dominated family of negligible events such that, modulo this family, the assumption inf ϕ∈Sα ϕ, M T ϕ ≤ ε ϕ 2 leads to a contradiction for all ε sufficiently small. From now on, fix N as in Assumption C.2. By Lemmas 6.15 and 6.17, we see that there exist constants θ, q, r 0 > 0 such that, modulo some Ψ θ -dominated family of negligible events, one has the implication provided that we choose r ≤ r 0 in the definition (6.8) of δ. By Assumption C.2, this in turn implies (modulo the same family of negligible events) On the other hand, it follows from Lemma 6.13 and the assumption on the inverse moments of Λ cα that, modulo some Ψ 4 q -dominated family of negligible events, one has the bound Possibly making θ smaller, it follows that, modulo some Ψ θ -dominated family of negligible events, one has the implication which cannot hold for ε small enough, thus concluding the proof of Theorem 6.12

Bounds on Wiener polynomials
We will use the terminology of "negligible sets" introduced in Definition 6.9. We will always work on the time interval [0, 1], but all the results are independent (modulo change of constants) of the time interval, provided that its length is bounded from above and from below by two positive constants independent of ε. This is seen easily from the scaling properties of the Wiener process.
The results of this section are decendents of similar results obtained in [MP06,BM07] by related techniques. In [BM07] it was proven that if a Wiener polynomial, with continuous, bounded variation coefficients, is identically zero on an interval then so are its coefficients. This is enough to prove the almost sure invertiblity of projections of the Malliavin matrix, which in turn implies the existence of a density for the projections of the transition probabilities. To prove smoothness of the densities or the ergodic results of this paper, more quantitative control is needed. In [BM07], a result close to (7.1) is claimed. However an error in Lemma 9.12 of that article leave the proof incomplete. Arguing along similar, though slightly different lines, we prove the needed result below. We build upon the presentation in [BM07] but simplify it significantly. (The presentation in [BM07] was already a significant simplification over that in [MP06].) Theorem 7.1 Let {W k } d k=1 be a family of i.i.d. standard Wiener processes and, for every multi-index α = (α 1 , . . . , α ℓ ), define W α = W α1 . . . W α ℓ with the convention that W α = 1 if α = φ. Let furthermore A α be a family of (not necessarily adapted) stochastic processes with the property that there exists m ≥ 0 such that A α = 0 whenever |α| > m and set Then, there exists a universal family of negligible events Osc m W depending only on m such that the implication Remark 7.2 Informally, we can read the statement of Theorem 7.1 as "if Z A is small, then either all of the coefficients A α are small, or at least one of them oscillates very fast." The exponents appearing in the statement of Theorem 7.1 are somewhat arbitrary. By going through the proof more carefully, we can see that for any κ > 2, it is possible to find a constant C κ > 0 such that the exponents in (7.1) can be replaced by κ −m and −C κ κ −m respectively. Here, the coefficient C κ tends to 0 as κ → 2. While the precise values of the exponents in (7.1) arising from our proof are unlikely to be sharp, they are not far from it, as can be seen by looking at processes of the form , where W θ is the linear interpolation of the Wiener process W over intervals of size ε θ .

Remark 7.3
The reason why the family of negligible sets appearing in this statement is called Osc m W is that it relies on the fact that the Wiener processes typically fluctuate sufficiently fast on every small time interval so that their effects can be distinguished from those of the multiplicators A α which fluctuate over much longer timescales. It is important to note that Osc m W depends on the processes A α only through the value of m.
Before we start with the proof, we show the following result, which is essentially the particular case of Theorem 7.1 where m = 1 and where the coefficients A α do not depend on time. Here, ·, · denotes the scalar product in R d .
holds modulo Osc W for any choice of coefficients A ∈ R d .

Remark 7.5
We would like to stress again the fact that the family of events Osc W is independent of the choice of coefficients A and depends only on the realisation of the W k 's.
Proof. Fix κ > 0 and define a family of events B by B ε = {sup t∈[0,1] |W (t)| ≥ ε −κ }. It follows immediately from the fact that the supremum of a Wiener process has Gaussian tails that the family B is negligible. Consider now the unit sphere S d in R d . For every A ∈ S d , the process W A (t) = A, W (t) is a standard Wiener process and so P(sup t∈[0,1] |W A (t)| ≤ 2ε κ ) ≤ C 1 exp(−C 2 ε −2κ ) for some constants C 1 and C 2 that are independent of A. Denote this event by H ε A . Choose now a collection {A k } of points in S d such that sup A∈S d inf k |A − A k | ≤ ε 2κ and define H ε = k H ε A k . Since this can be achieved with O(ε −2κ(n−1) ) points, the family H is negligible by Lemma 6.10. We now define Osc W = H ∪ B and we note that, modulo Osc W , one has for everyĀ ∈ R d the bound as required.
We now turn to the Proof of Theorem 7.1. The proof proceeds by induction on the parameter m. For m = 0, the statement is trivial since in this case one has Z A (t) = A φ (t), so that one can take Osc 0 Fix now a value m ≥ 1 and assume that, for some ε, both inequalities hold. Our aim is to find a (universal) family of negligible sets Osc m W such that, modulo Osc m W , these two bounds imply the bound sup α A α L ∞ ≤ ε 3 −m . Before we proceed, we localise our argument to Wiener processes that do not behave too "wildly." Using the fact that the Hölder norm of a Wiener process has Gaussian tails for every Hölder exponent smaller than 1/2, we see that the bounds both hold modulo some universal family Wien of negligible events. The reason for these particular choices of exponents will become clearer later on, but any two negative exponents would have been admissible.
Choose an exponent κ to be determined later and define a sequence of times t ℓ = ℓε κ for ℓ = 0, . . . , ε −κ , so that the interval [0, 1] gets divided into ε −κ subintervals of the form [t ℓ , t ℓ+1 ]. We define A ℓ α = A α (t ℓ ) and similarly for W ℓ α . We also define the Wiener incrementsW ℓ i (t) = W i (t) − W i (t ℓ ) and their productsW ℓ α = Π j∈αW ℓ j . With these notations, one has for t ∈ [t ℓ , t ℓ+1 ] the equality for some "error term" E ℓ that will be analysed later. Here, the combinatorial factor C α,σ counts the number of ways in which the multi-index σ can appear in the multiindex α ∪ σ (for example C (i,j),(j) is equal to 2 if i = j and 3 if i = j). Using the Brownian scaling and the fact that the supremum of a Wiener process has Gaussian tails, we see that for every κ ′ < κ, the bound holds modulo some universal family Wien κ ′ ,m of negligible events. Note now that all the terms appearing in E ℓ are (up to combinatorial factors) either of the form A ℓ α∪σ W ℓ αW σ (t) with |σ| ≥ 2, or of the form (A α (t) − A ℓ α )W α (t). Together with (7.6) and the first bound in (7.4), this shows that there exists a constant C depending only on m such that (7.3b) implies |E ℓ (t)| ≤ C(ε κ ′ −1/27−1/10 + ε κ−1/9−1/10 ) , (7.7) In order to conclude the proof of the theorem, it therefore only remains to obtain a similar bound on A φ L ∞ . We define a family of negligible events Wien ′′ m so that Wien ′ ⊂ Wien ′′ m and such that the bound Since we choose the bound (7.12) in such a way that 7/20 − 1/70 > 1/3, we obtain A φ L ∞ ≤ ε 1/3 for sufficiently small ε. Together with the remark following (7.11), this concludes the proof of Theorem 7.1.

Examples
In this section, we apply the abstract framework developed in this article to two more concrete examples: the stochastic Navier-Stokes equations on a sphere and a class of stochastic reaction-diffusion equations. The examples are chosen in order to highlight the techniques that can be used to verify the assumptions of our results and to get some idea of their scope of applicability. In particular, the Navier-Stokes equations provide an example where bounds on the Jacobian are not very uniform, so that an initial condition dependent control is required in Assumption C.1. The stochastic reaction-diffusion system on the other hand satisfies very strong a priori bounds, but Assumption A.1 is not verified with the usual choice H = L 2 , so that one has to work a bit more to fit the equations into the framework presented here.
In both of our examples, the Hörmander-type assumption, Assumption C.2 will be verified by using constant vector fields only. Before we turn to the examples themselves, we therefore present the following useful little lemma that will be used throughout this section: Proof. Assume by contradiction that the statement does not hold. Then, there exists α > 0 and a sequence h n in H such that Πh n = 1, h n ≤ α −1 , and such that lim n→0 h n , Q n h n → 0. Since h n ≤ α −1 is bounded, we can assume (modulo extracting a subsequence) that there exists h ∈ H such that h n → h in the weak topology. Since Π has finite rank, one has Πh = 1. Furthermore, since the maps h → h, Q n h are continuous in the weak topology and since n → h, Q n h is increasing for every n, one has so that h, g i = 0 for every i > 0. This contradicts the fact that the span of the g i is dense in H.

The 2D Navier-Stokes equations on a sphere
Consider the stochastically forced two-dimensional Navier-Stokes equations on the two-dimensional sphere S 2 : Here, the velocity field u is an element of H 1 (S 2 , T S 2 ), ∇ u u denotes the covariant differentiation of u along itself with respect to the Levi-Civita connection on S 2 , ∆ = −∇ * ∇ is the (negative of the) Bochner Laplacian on S 2 , and Ric denotes the Ricci operator from T S 2 into itself. In the case of the sphere, the latter is just the multiplication with the scalar 1. See also [Tay92,TW93,Nag97] for more details on the Navier-Stokes equations on manifolds.
As in the flat case, it is possible to represent u uniquely by a scalar "vorticity" field w given by where n denotes the unit vector in R 3 normal to the surface of the sphere (so that n ∧ u defines again a vector field on the sphere). With this notation, one can rewrite (8.1) as Here, we denoted by K the operator that reconstructs a velocity field from its vorticity field, that is and ∆ denotes the Laplace-Beltrami operator on the sphere. See [TW93] for a more detailed derivation of these equations. In order to fit the framework developed in this article, we assume that the operator G is of finite rank and that its image consists of smooth functions, so that the noise term can be written as We choose to work in the space H = L 2 (S 2 , R) for the equation (8.3) in vorticity formulation, so that the interpolation spaces H α coincide with the fractional Sobolev spaces H 2α (S 2 , R), see [Tri86]. In particular, elements w ∈ H α are characterised by the fact that the functions x → ϕ(x)w(ψ(x)) belong to H 2α (R 2 ) for any compactly supported smooth function ϕ and any function ψ : R 2 → S 2 which is smooth on an open set containing the support of ϕ. Since the sphere is compact, this implies that the usual Sobolev embeddings for the torus also hold true in this case. Define now A 0 = {g i : i = 1, . . . , n} and set recursively sufficiently large. This concludes the verification of the assumptions of Theorem 5.4 and the claim follows.
Remark 8.3 Just as in [HM06], this result is optimal in the following sense. The closureĀ ∞ of the linear span of A ∞ in L 2 is always an invariant subspace for (8.3) and the invariant measure for the Markov process restricted toĀ ∞ is unique. However, ifĀ ∞ = L 2 , then one expects in general the presence of more than one invariant probability measure in L 2 at low values of the viscosity ν.

Stochastic reaction-diffusion equations
In this section, we consider a general class of reaction-diffusion equations on a "nice" domain D. The dimension m of the ambient space is chosen smaller or equal to 3 for technical reasons. However, the number ℓ of components in the reaction is arbitrary. The domain D is assumed to be either of • A compact smooth m-dimensional Riemannian manifold.
• A bounded open domain of R m with smooth boundary.
• A hypercube in R m .
We furthermore denote by ∆ the Laplace (resp. Laplace-Beltrami) operator on D, endowed with either Neumann or Dirichlet boundary conditions. With these notations in place, the equations that we consider are with u(t) : D → R ℓ and f : R ℓ → R ℓ a polynomial of arbitrary degree n with n ≥ 3 an odd integer. (We exclude the case n = 1 since this gives rise to a linear equation and is trivial to analyse.) The functions g i describing the stochastic component of the equations are assumed to belong to H ∞ , the intersection of the domains of ∆ α in L 2 (D) for all α > 0. It is a straightforward exercise to check that (8.4) has unique local solutions in E = C(D, R ℓ ) for every initial condition in C(D, R ℓ ) (replace C by C 0 in the case of Dirichlet boundary conditions). In order to obtain global solutions, we make the following assumption on the nonlinearity: f k for f with f k being k-linear maps from R ℓ to itself, we assume that n is odd and that Remark 8.4 Provided that Assumption RD.1 holds, one can check that there exist positive constants c and C such that the inequality holds for every u, v ∈ R ℓ .
Essentially, Assumption RD.1 makes sure that the function u → |u| 2 is a Lyapunov function for the "reaction" partu = f (u) of (8.4). In the interest of brevity, we define Sup t,∞ (v) = 1 + sup s≤t v(s) E for any function v ∈ L ∞ ([0, t], E) and Sup t,r (v) = 1 + sup s≤t v(s) H r for v ∈ L ∞ ([0, t], H r (D)), As a consequence of Assumption RD.1, we obtain the following a priori bound on the solutions to (8.4): Proposition 8.5 Under Assumption RD.1, there exist constants c and C such that the bound , holds almost surely for every u 0 ∈ E, where E is either C(D, R ℓ ) or C 0 (D, R ℓ ), depending on the boundary conditions of ∆. In particular, for every t 0 > 0 there exists a constant C such that one has the almost sure bound independently of the initial condition, provided that t ≥ t 0 .
Proof. The proof is straightforward and detailed calculations for a variant of it can be found for example in [Hai08]. Setting v = u − W ∆ (t) where W ∆ is the "stochastic convolution" solving the linearised equation (8.4) with f ≡ 0, and defining V (v) = v 2 L ∞ , we obtain from (8.5) the almost sure bound In particular, there exist possibly different constants such that for all v such that V (v(t)) ≥ CSup 2 t,∞ (W ∆ )). Since we assumed that n ≥ 3, a simple comparison theorem for ODEs then implies that where we set α = 2/n. The requested bound then follows at once. The second bound is an immediate consequence of the first one.

Remark 8.6
The function t → V (v(t)) is of course not differentiable in time in general. The left hand side in (8.6) should therefore be interpreted as the right upper Dini derivative lim sup h→0 In order to fit the framework developed in this article, we cannot take L 2 as our base space, since the nonlinearity will not in general map L 2 into any Sobolev space of negative order. However, provided that k > m/2, the Sobolev spaces H k form an algebra, so that the nonlinearity u → N (u) def = f • u is continuous from H k to H k in this case. It is therefore natural to choose H = H k for some k > m/2. In this case, for α > 0, the interpolation spaces H α coincide with the Sobolev spaces H k+2α , so that one has N ∈ Poly(H α , H α ) for every α > 0. This shows that Assumption A.1 is satisfied with a = 0, γ ⋆ = ∞ and β ⋆ = ∞. It turns out that it is relatively easy to obtain bounds in the Sobolev space H 2 . From now on, we do therefore assume that the following holds: Assumption RD.2 The space dimension m is smaller or equal to 3.
This will allow us to work in H = H 2 . Before we state the main theorem of this section, we obtain a number of a priori bounds that will allow us to verify that the assumptions from the previous parts of this article do indeed apply to the problem at hand.
By using a bootstrapping argument similar to Proposition 3.5, we can obtain the following a priori estimate: Proposition 8.7 Assume that Assumptions RD.1 and RD.2 hold. If u is the solution to (8.4) with initial condition u 0 ∈ H 2 then there exists a constant C such that the bounds u(t) H 2 ≤ CSup 2n t,∞ (u)( u 0 H 2 + Sup t,2 (W ∆ )) , hold for all t ≤ 1 almost surely.
Integrating the last term yields the first bound. The second bound can be obtained in exactly the same way, using the smoothing properties of the semigroup generated by the Laplacian.
As a consequence, we obtain the following bound on the exponential moments in H 2 of the solution starting from any initial condition: Proof. Without loss of generality, we set T = 1. Combining Proposition 8.7 and the Markov property, we see that there exists a constant C > 0 such that u(s) n L ∞ Sup 1,2 (W ∆ ) .
The requested bound then follows from (8.6) and the fact that Sup 1,2 (W ∆ ) has Gaussian tails by Fernique's theorem.
We now turn to bounds on the Jacobian J for (8.4). Recall from (3.10) that, given any "tangent vector" ξ, the Jacobian J s,t ξ satisfies the random PDE where Df denotes the derivative of the map f . Our main tool is the fact that, from Assumption RD.1, we obtain the existence of a constant C > 0 such that for every u, v ∈ R ℓ . In particular, we obtain the a priori L 2 estimate: d dt J s,t ξ 2 L 2 = −2 ∇J s,t ξ 2 L 2 + 2 J s,t ξ, (Df • u)(t)J s,t ξ ≤ 2C J s,t ξ 2 L 2 , (8.8) so that J s,t ξ L 2 ≤ e C(t−s) ξ L 2 almost surely. We now us similar reasoning to obtain a sequence of similar estimates in smoother spaces.
Proof. The first estimate is just a rewriting of the calculation before the Proposition. As in the proof of the a priori bounds for the solution, we are going to use a bootstrapping argument, starting from the bound (8.8). Applying Duhamel's formula and using the notation Sup t,∞ as before, we obtain J s,t ξ H 1 ≤ ξ H 1 + t s C √ t − r (Df • u)(r)J s,r ξ L 2 dr ≤ ξ H 1 1 + Sup n−1 t,∞ (u) t s C √ t − r e C(r−s) dr ≤ ξ H 1 1 + CSup n−1 t,∞ (u)e C|t−s| √ t − s ≤ ξ H 1 CSup n−1 t,∞ (u)e C|t−s| .
(And similarly for the second bound.) Regarding the H 2 norm of the Jacobian, we use the fact that there is a constant C such that the bound Hence we get similarly to before t,∞ (u)( u 0 H 2 + Sup t,2 (W ∆ )) ξ H 2 , which is the requested bound. To obtain the last bound, one proceeds identically except that one used e L(t−s) ξ H 2 ≤ C ξ H 1 / √ t − s.
We now turn to the second variation.
The claim now follows from the fact that, for any η > 0 and r > 0, there exist a C η,r with (4n + 1) log(1 + x) ≤ ηx r + C η,r for all x ≥ 0.
Theorem 8.14 Let P t be the Markov semigroup on H 2 generated by (8.4). If the linear span of A ∞ is dense in H 2 then, for every orthogonal finite rank projection Π : H 2 → H 2 , for every p > 0, and for every α > 0, there exists a constant C(α, p, Π) such that the bound (5.1) on the Malliavin matrix holds with U = 1.
Proof. The result follows from Theorem 6.7. One can check that Assumption A.1 holds with H = H 2 , a = 0, γ ⋆ = β ⋆ = ∞ since H ℓ is a multiplicative algebra for every ℓ ≥ 2 (this is true because we restricted ourselves to dimension m ≤ 3). Since the most involved part is the assumption on the adjoint, part 3, we give the details for that one. One can verify that the adjoint of DN (u) in H acts on elements v in H ∞ as (This is because H is the Sobolev space H 2 and not the space L 2 .) The claim then follows from the fact that the multiplication by a smooth enough function is a bounded operator in every Sobolev space H ℓ with ℓ ∈ R.
Since Assumption C.2 (with Λ α a constant depending on Π) can be verified by using Lemma 8.1, it remains to verify Assumption C.1 with Ψ 0 = 1. This in turn is an immediate consequence of Proposition 8.11. Combining all of these results, we finally obtain the following result on the asymptotic strong Feller property of a general reaction-diffusion equation: Theorem 8.15 Let P t be the Markov semigroup on H 2 generated by (8.4) and let Assumptions RD.1 and RD.2 hold. If the linear span of A ∞ is dense in H 2 then, for any ζ > 0, there exists a positive constant C so that for every u ∈ H 2 , and ϕ : H 2 → R on has DP t ϕ(u) L 2 →R ≤ C( ϕ L ∞ + e −ζt sup v∈H 2 Dϕ(v) H 2 →R ) . (8.10) In particular, P t has the asymptotic strong Feller property in H 2 .

Remark 8.16
As a corollary, we see that for the semigroup on E, one has where all the derivatives a Fréchet derivatives of functions from E to R.
In particular, in space dimension m = 1, the same bound is obtained in the space H 1 since one then has H 1 ⊂ E.
Proof. The result follows from Theorem 5.4. Fix Π = Π M , the projection onto the eigenfunctions of ∆ with eigenvalues of modulus less than M 2 . The constant M is going to be determined later on. Assumption B.2 with V (u) = u 1/n H and η ′ = 0 follows immediately from Proposition 8.8. Fix anyp > 10 and any positive η < 1/p. Assumption B.3 then follows from Proposition 8.11. It then follows from Proposition 8.13 that we can choose the value of M in the definition of Π sufficiently large so that Assumption B.4 holds and such that (C Π − C J )/2 − ηC L > ζ. Since, in view of Theorem 8.14, Assumption B.1 holds with U = 1, we thus obtain from Theorem 5.4 the bound DP t ϕ(u) H 2 →R ≤ Ce η u 1/n ( ϕ L ∞ + e −ζt sup v∈H 2 Dϕ(v) H 2 →R ) . (8.11) In order to obtain (8.10), we note that one has E J 0,2 2 L 2 →H 2 ≤ E J 0,1 2 L 2 →L 2 J 1,2 2 L 2 →H 2 ≤ CE J 1,2 2 L 2 →H 2 ≤ C , (8.12) where C is a universal constant independent of the initial condition. Here, we combined the bounds of Proposition 8.9 with Proposition 8.5 in order to obtain the last bound. We thus have DP t ϕ(u) L 2 →R = DP 2 P t−2 ϕ(u) L 2 →H 2 ≤ E DP t−2 ϕ(u 2 ) H 2 →R J 0,2 L 2 →H 2 ≤ C( ϕ L ∞ + e −ζt sup v Dϕ(v) H 2 →R )E(e 2η u2 1/n J 0,2 L 2 →H 2 ) , where we made use of (8.11) to obtain the last inequality. The requested bound now follows from (8.12) and Proposition 8.8.

Unique ergodicity of the stochastic Ginzburg-Landau equation
In this section, we show under very weak conditions on the driving noise that the stochastic real Ginzburg-Landau equation has a unique invariant measure. Recall that this equation is given by du(x, t) = ν∂ 2 x u(x, t) dt + ηu(x, t) dt − u 3 (x, t) dt + where the spatial variable x takes values on the circle x ∈ S 1 and the driving functions g j belong to C ∞ (S 1 , R). The two positive parameters ν and η are assumed to be fixed throughout this section. This is a particularly simple case of the type of equation considered above, so that Theorem 8.15 applies. The aim of this section is to show one possible technique for obtaining the uniqueness of the invariant measure for such a parabolic SPDE. It relies on Corollary 2.2 and yields: satisfies u(T ) − v H 1 ≤ ε. Furthermore, we want to be able to choose v such that v H 1 ≤ K for some constant K independent of ε. The claim on the topological supports of transition probabilities then follows immediately from the fact that the Itô map (u 0 , W ) → u t is continuous in the second argument in our case.
The idea is to choose f of the form f (x, t) = ε −γ g(x) for 1 ≤ t ≤ 2, 0 otherwise, and to set T = 3. We furthermore set v to be the solution at time 1 for the uncontrolled equation (that is (8.14) with f = 0) with an initial condition v 0 satisfying for some exponent γ > 0 to be determined. Such a v 0 always exists since the coercive "energy functional" has at least one critical point. Even though v 0 is in general very large (see however Lemma 8.21 below), it follows from (8.6) that the target v constructed in this way is bounded independently of ε.
The remaining ingredient of the proof are Lemmas 8.22 and 8.21 below. To show that this is sufficient, note first that (8.6) implies the existence of a constant C such that u(1) L 2 ≤ C independently of u 0 . It then follows from Lemmas 8.22 and 8.21 that (choosing for example β = γ/14) there exists a constant C such that one has the bound u(2) − v 0 L 2 ≤ Cε γ 6 .
Since the uncontrolled equation expands at rate at most η, this immediately yields u(T ) − v L 2 ≤ Cε γ 6 . On the other hand, we know from Proposition 8.7 that there exists a constant C such that u(T ) − v H 2 ≤ C, so that 1/2 ≤ Cε γ/12 , and the claim follows by choosing γ > 12.

Lemma 8.21
There exists a constant C v independent of ε < 1 such that the bound v 0 (x) L ∞ ≤ C v ε −γ/3 holds.
Proof. It follows immediately from (8.15), using the fact that ∂ 2 x v 0 ≤ 0 at the maximum and ∂ 2 x v 0 ≥ 0 at the minimum.

Lemma 8.22
For every exponent β ∈ [0, γ/4] there exists a constant C such that the bound holds for every ε ≤ 1 and every u ∈ L 2 (S 1 ).