Invariant measure selection by noise: An Example

We consider a deterministic system with two conserved quantities and infinity many invariant measures. However the systems possess a unique invariant measure when enough stochastic forcing and balancing dissipation are added. We then show that as the forcing and dissipation are removed a unique limit of the deterministic system is selected. The exact structure of the limiting measure depends on the specifics of the stochastic forcing.


Introduction
There is much interest in the regularizing effects of noise on the longtime dynamics. One often speaks informally of adding a balancing noise and dissipation to a dynamical system with many invariant measures and then studying the zero noise/dissipation limit as a way of selecting the "physically relevant" invariant measure.
There are a number of settings where such a procedure is fairly well understood. In the case of a Hamiltonian or gradient system with sufficiently non-degenerate noise, Wentzell-Freidlin theory gives a rather complete description of the effective limiting dynamics [FW12] in terms of a limiting "slow" system derived through a quasi-potential and deterministic averaging. In the gradient case the stochastic invariant measures concentrate on the attracting structures of the dynamics. In the Hamiltonian setting, Wentzell-Freidlin theory considers the slow dynamics of the conserved quantity (the Hamiltonian) when the system is subject to noise. It is the zero noise limit of these dynamics which decides which mixture of the Hamiltonian invariant measures is selected in the zero noise limit.
In the case of system with an underlying hyperbolic structure, such as Axiom A, it is known that the zero noise limit of random perturbations selects a canonical SRB/"physical measure" [Sin68,Sin72,Rue82,Kif74]. This relies fundamentally on the expansion/contraction properties of the underlying deterministic dynamical system. See [You02] for a nice discussion of these issues. The Axiom A assumption ensures that the deterministic dynamics has a rich attractor which attracts a set of positive Lebesgue measure.
One area where the idea of the relevant invariant measure being selected through a zero noise limit is prevalent is in the study of stochastically forced and damped PDEs. Two important examples are the stochastic Navier-Stokes equations and the stochastic KdV equation. Both of these equations have been studied in a sequence of works by Kuksin and his co-authors [Kuk04,Kuk07a,Kuk07b,KP08,Kuk10]. In all these works, tightness is established by balancing the noise and dissipation as the zero noise limit is taken. Any limiting invariant measure is shown to satisfy an appropriate limiting equation. Typically a number of properties are inherited from the pre-limiting invariant measure.
The hope is that the study of these limiting measures will give some insight into important questions for the original, unperturbed equations. In the case of the Navier-Stokes equations one would be interested in understanding questions such as the existence of energy cascades and turbulence. Setting aside the question of whether the regularity of the solutions in [Kuk04] is appropriate for turbulence, it is interesting to understand if the noise selects a unique limit and what are the obstructions to such uniqueness as they give information about the structure of the deterministic phase space. In all of the works [Kuk04,Kuk07a,Kuk07b,KP08,Kuk10] the question of uniqueness of the limit is not addressed and seems out of reach. (Though a rencent work [Kuk13] makes progress in this direction. See the note at the end of the introduction.) The equation for the evolution a 2D incompressible fluid's vorticity q(x, t) (a scalar) on the 2-torus subject to stochastic excitation can be written aṡ q(x, t) = ν∆q(x, t) + B q(x, t), q(x, t) + √ ν k∈Z 2 σ k e ik·xẆ (k) t where ν > 0 is the viscosity, ∆ is the Laplacian, σ k are constants to be chosen, {W (k) t : k ∈ Z 2 } are a collection of standard one-dimensional Wiener processes and B(q, q) is a quadratic non-linearity such that B(q, q), q L 2 = 0. The scaling of ν is chosen to keep the spatial L 2 norm of order one in the ν → 0 limit and is the only scaling on a fixed torus which will result in a non-trivial sequence of tight processes. On a fixed interval, the formal ν = 0 limit of this is equation is the Euler equation which conserves its Hamiltonian (the energy or L 2 norm) but also has an infinite collection of other conserved quantities since the vorticity is simply transported about space. This means that a priori there will be many conserved quantities whose slow evolution must be analyzed.
Inspired by models in [Lor63,MTVE02] and the Euler equation itself, we construct a model problem in the form of an ODE in R 3 such that the non-linearity is quadratic and conserves the norm of the solution as in analogy with the Euler nonlinearity. We will also see that our model system in fact possesses two conserved quantities (the most it could have without becoming trivial). In many ways our analysis follows the familiar pattern of [FW12] in that we change time to consider the evolution of the conserved quantities from the unforced system on a long time interval which grows as the noise is taken to zero. This produces a limiting system which captures the effect of the noise. However multiple conserved quantities are not usually treated in Wentzell-Freidlin theory and the complications of having more then one are non-trivial in our case. In particular, the limiting system does not have a unique solution. Nonetheless, we are able to show that a particular solution is selected by the limiting procedure which in turn leads to a unique invariant measure for the limiting system being selected.
Since the limiting system does not have unique solutions there are many possible invariant measures depending on which of the solutions are chosen. It is interesting to note that identifying the limiting "averaged" solution is not sufficient to identify the likely limit of the invariant measure. Analysis of the limiting solution in isolation revel domain walls which separate different regions of phase space and along which the diffusion degenerates giving rise to the possibility of solutions which could spend arbitrary mounts of time on the domain boundaries. Only through the analysis of pre-limiting systems do we discover that the systems selects the solutions which spend zero time on the domain walls.
These domain walls are the planes {x = y} in R 3 and correspond to heteroclinic cycles from the original deterministic system made up of homoclinic orbits connecting the fix points. Hence it is not surprising that the limiting system supports solutions which could spend arbitrarily long times on these orbits. However, it is interesting that the limiting procedure selects solutions which do not become trapped near the heteroclinic orbits.
After the completion and submission of this work, we became aware of a recent work by Kuksin which proves that the zero noise/damping limit of the stochastic Complex Ginzburg Landau (CGL) selects a unique invariant measure from the many possible measures which are invariant for the formal deterministic limit [Kuk13]. That problem is very much in the spirit of the one discussed here. Our example is finite dimensional. However, the associated limiting martingale problem for the fast variables is more complicated and does not have a unique solution. Like the CGL, our example has a simplified orbit structure which facilitates averaging. More complicated settings such as the stochastic Navier Stokes equations are still out of reach. We hope our paper helps clarifying some of the issues involved.

Model System
As an exercise in studying the zero noise/dissipation limit of conservative systems, we have chosen to study the following three-dimensional system: We will write ϕ t for the flow map induced by (1), i.e. ξ t = ϕ t (ξ 0 ). We will constantly write (X t , Y t , Z t ) for ξ t when we wish to speak of the components of ξ t . Since we see that |ξ t | 2 = X 2 t + Y 2 t + Z 2 t is constant along trajectories of (1). Similarly one sees that X 2 t − Y 2 t is also conserved by the dynamics of (1). Since any linear combination is also conserved, we are free to consider 2X 2 t +Z 2 t and 2Y 2 t +Z 2 t , which are more symmetric. Since we will typically use the second pair, we introduce the map A moments reflection shows that the existence of these two conserved quantities implies that all of the orbits of (1) are bounded and most are closed orbits, topologically equivalent to a circle. All orbits live on the surface of a sphere whose radius is dictated by the values of the conserved quantities. More precisely, given the initial condition ξ 0 ∈ R 3 the orbit {ξ t : t ≥ 0} is contained in the set To any initial point ξ 0 = (X 0 , Y 0 , Z 0 ) contained in a closed orbit, we can associate a measure defined by the following limit We will show in Section 7.4 that this invariant measure depends only upon Φ(ξ 0 ), and a choice of a sign. Any such measure is an invariant measure for the dynamics given by (1). Hence we see that (1) has infinitely many invariant measures. It is reasonable to expect that the addition of sufficient driving noise and balancing dissipation, will result in a system with a unique invariant measure. Our goal is to study its limit as the noise/dissipation are scaled to zero. We are specifically interested in understanding whether this procedure selects a unique convex combination of the measures for the underlying deterministic system (1).
More concretely for ε > 0, we will explore the following stochastic differential systemξ where the two components W (1) t and W (2) t are mutually independent standard Brownian motions. In all this paper, we assume that σ 1 > 0 and σ 2 > 0.
As above, we will write (X ε t , Y ε t , Z ε t ) when we wish to discuss the coordinates of ξ ε t . For each ε > 0, the three-dimensional hypoelliptic diffusion process is positive recurrent and ergodic, its unique invariant probability measure µ ε is absolutely continuous with respect to Lebesgue measure, with a density which charges all open sets.
Our aim is to study the limit of µ ε , as ε → 0. We first note that as ε → 0, the process (X ε t , Y ε t , Z ε t ) converges to the solution of (1) on any finite time interval. The main result of this article is that there exists a probability measure µ which is absolutely continuous with respect to Lebesgue measure and so that µ ε converges weakly to µ as ε → 0. Of course, µ is a mixture of the ergodic invariant measures appearing in (6), which we shall describe.
It is natural to ask what is the effect of adding noise also in the z-direction. Unfortunately this leads to unexpected complications which at present we are not able to handle.

Main Results
For ε ≥ 0, we define the Markov semigroup P ε t associated with (7) by The dynamics obtained by formally setting ε = 0 are deterministic. We will write P t rather than P 0 t for the corresponding Markov semigroup which is defined by (P t φ)(ξ) = φ(ξ t ).
Theorem 3.1. For each ε > 0, P ε t has a unique invariant probability measure µ ε which has a C ∞ density which is everywhere positive.
Theorem 3.2. There exists a probability measure µ such that µ ε ⇒ µ as ε → 0 and furthermore such that µ is invariant for the dynamics generated by (1) in that µP t = µ for all t > 0. In addition, µ is absolutely continuous with respect to Lebesgue measure on R 3 , with a density which is positive on the complement of The paper is organized as follows. Section 4 studies finite time convergence of ξ ε as ε → 0. Section 5 studies existence and uniqueness of the invariant measure for ξ ε . Namely, this section proves Theorem 3.1. Section 6 studies the deterministic system on a faster time scale, more precisely it introduces the process Assuming some results from Section 8, we uniquely characterize the limit (U t , V t ) of the process (U ε t , V ε t ) as ε → 0, and show that that limit has a unique invariant probability measure. Section 7 studies very precisely the deterministic dynamics behind the ODE (1) obtained by formally setting ε = 0 in (7). Section 8 establishes crucial results which were assumed to hold in the discussion in Section 6, the main important and most delicate one being the convergence of the quadratic variation of (U ε , V ε ), which builds upon the analysis in Section 7. Finally Section 9 is devoted to the proof of Theorem 3.2.

Finite Time Convergence on original timescale
In this section we show that the dynamics of stochastic dynamics given by (7) converge to the deterministic dynamics given by (1). Hence the limit as ε → 0 on this time scale does not help in understanding the selection of any limiting invariant measure as ε → 0.

Existence and Uniqueness of Invariant Measures with Noise
Similarly if we consider the evolution of the norm, we have the following result which is useful in establishing the existence of the invariant measure µ ε and the tightness of various objects.
Proposition 1. For any integer p ≥ 1 there exists C(p) > 0 so that for all t ≥ 0, ε > 0, The proof then follows from Lemma 5.1 below. Remark 1. One can actually easily prove uniform in time bounds on E exp(κX t ) for κ > 0 but sufficiently small. See [HM08] for a proof using the exponential martingale estimate.
The following Lemma provides the key estimate to Proposition 1.
Lemma 5.1. Let X t be a semimartingle so that X t ≥ 0, where a > 0, b > 0 and M t is a continuous local martingale satisfying for some c > 0. Then for any integer p ≥ 1 there exist a constant C(p) (depending besides p only on a, b and c) so that for any X 0 ≥ 0 and t ≥ 0 Proof of Lemma 5.1. Fixing an N > 0 and defining the stopping time τ = inf{t : where the first inequality follows from Fatou's lemma applied to the limit N → ∞.
Using the assumption on the quadratic variation of M t we now see that M t is a L 2 -martingale. Hence Now applying Itô's formula to X p t produces Using the same stopping time τ and the same argument as before, we have Hence inductively we have a bound on EX p t for all integer p ≥ 1 which implies that M (p) t is an L 2 -martingale for all p ≥ 1. Hence we have Proceeding inductively using this estimate and (8) as the base case produces the stated result.
Corollary 2. For each ε > 0, the Feller diffusion {ξ ε t : t ≥ 0} possesses at least one invariant probability measure µ ε . Furthermore, any invariant probability measure µ ε satisfies for any integer p ≥ 1 where C(p) is the constant from Lemma 1 (which is independent of ε). Hence the collection of probability measures which are invariant under the dynamics for any given ε > 0 is tight.
Proof of Corollary 2. Since ξ ε t is a time-homogeneous Feller diffusion process and from Proposition 1 for fixed ε > 0, the collection of random vectors {ξ ε t , t > 0} is tight, the existence of an invariant probability measure µ ε follows by the Krylov-Bogolyubov theorem.
The next result follows by hypoellipticity and the Stroock and Varadhan support theorem.
Proof of Proposition 2 . Hypoellipticity follows from the fact that taking Lie brackets of the drift with ∂ x and then ∂ y (the two noise directions) produces the third and missing direction ∂ z . This ensures the existence of a smooth density with respect to Lebesgue mesure [Str08,Hör94a,Hör94b]. Positivity will then follow by showing that the support of the transition density is all of R 3 . We will invoke the support theorem of Stroock and Varadhan [SV72]. Indeed, consider the controlled system associated to the SDE for (X ε t , Y ε t , Z ε t ), which reads where {(f 1 (t), f 2 (t)), t ≥ 0} is the control at our disposal. Now by choosing appropriately the control, we can drive the two components (x ε (t), y ε (t)) in time as short as we like to any desired position, which permits us to drive the last component z ε (t) to any prescribed position in any prescribed time. The result follows.
We are now in a position to give the proof of Theorem 3.1.
Proof of Theorem 3.1. Since by Proposition 2, P ε t has a smooth transition density, any invariant probability measure must have a smooth density which charges any ball B ⊂ R 3 . Recall the fact that in our setting any two distinct ergodic invariant probability measures must have disjoint supports which is impossible since the measures have densities and charge any open set. Uniqueness of the invariant probability measure follows immediately from the fact that any invariant measure can be decomposed into ergodic components [CFS82].

The fast dynamics
Since by the results in Section 4 ξ ε t = (X ε t , Y ε t , Z ε t ) converges to ξ t = (X t , Y t , Z t ), in order to study the limiting invariant probability measure one needs to consider the system on ever increasing time intervals as ε → 0. One must pick a time scale, depending on ε, so that the amount of randomness injected into the system is sufficient to keep the system from settling onto a deterministic trajectory as ε → 0.
With this in mind consider the process ξ ε t on the fast scale t/ε. In other words, consider the processξ ε t = ξ ε t/ε which solves the SDĖ Here we have used a slight abuse of notations, replacing the ε-dependent standard Brownian motion W ε t = √ εW t/ε by W t . In coordinates we will write ( t be the Markov semigroup associated to (10) and defined for ψ : R 3 → R by Associated with this right-action on functions we have a dual action on measures. We will denote this by left action rather then the often used ( P ε t ) * notation. Hence if µ is a measure on R 3 and ψ a real-valued function on R 3 then Of course, this time change does not change the set of invariant measures for the dynamics since the time change was not state dependent. Hence P ε t has the same unique invariant probability measure µ ε as P ε t .
6.1. Fast evolution of conserved quantities. One indication that this is the right time scale is that the conserved quantities (u, v) = Φ(x, y, z) now continue to evolve randomly as ε → 0. More precisely, defining the processes (U ε where The function Λ will be defined in Section 7.3.2. However for our present discussion, it will be sufficient to state a few important facts whose proofs will be given later in Section 7.3.2. Proposition 3. Λ(r) is a continuous and strictly increasing function on [0, 1] with Λ(0) = 1 2 and Λ(1) = 1. Furthermore as ε → 0 + , In addition, on any closed interval in [0, 1), Λ is uniformly Lipschitz.
6.2. Finite time behavior (U t , V t ). Before stating and proving the main theorem of this section, let us establish three Lemmata.
is a standard Brownian adapted to F t . motion and x > 0. If a ≥ c/2, then with probability one X t > 0 for all t ≥ 0.
Proof of Lemma 6.1. We consider the SDE and It is not hard to see that there exists a standard Brownian motion, which by an abuse of notation we denote again by W , which is such that We will need a slightly better result which generalizes the preceding Lemma.
Proof of Lemma 6.2. We define There exists a standard Brownian motion, still denoted by W , such that Define two sequences of stopping times as follows. S 0 = 0, and for k ≥ 1, On each interval [T k , S k ], since Z t /Y σt ≥ 1, by a standard comparison theorem for SDEs we can bound from below Z t by the solution of the equation of the previous Lemma, starting from a/2b. Hence Z t never hits zero.
Using Lemma 6.2, we are now in a position to prove the following result.
Theorem 6.3. Assume that the initial condition Proof. We first prove that any solution (U t , V t ) never hits the two axis, except possibly at (0, 0). The fact that (U t , V t ) cannot hit (0, v) with v > 0 follows clearly from the equation for U t and Lemma 6.2, once we have noted that whenever There exists a standard Brownian motion W t such that The result again follows from Lemma 6.2, since Γ(U t , V t ) ≥ 0.
We next establish a crucial property shared by all the possible accumulation points of the collection It follows from Itô's formula, which can be applied here although ϕ M,δ ∈ C 2 (R), that for each M > 0, n ≥ 1, We now take the limit in the last inequality as n → ∞, and deduce from Fatou's Lemma that It follows from Proposition 3 that to any c > 0, we can associate δ > 0 and a > 0 such that whenever u, v ≥ c > 0, and Consequently the above establishes that and letting finally δ → 0, we deduce that Since we know that both U t and V t never reaches 0, and c > 0 is arbitrary, this shows that J t = U t − V t spends a.s. zero time at 0, i.e. that the process (U t , V t ) spends a.s. zero time on the diagonal.
Now that we know that (U, V ) spends no time on the diagonal, we can introduce the following time change. Let for u, v > 0, Let us define the time change There exists a two-dimensional Wiener process, which we still denote by (W where It is easily verified that the diffusion matrix of this system is locally uniformly elliptic in (0, +∞) × (0, +∞) and continuous. However, the drift is unbounded near the diagonal. We will now prove uniqueness of the weak solution of (17), using methods and results from Portenko [Por90]. [Por90] constructs a weak solution to an equation like (17) from the solution without drift, using Girsanov's theorem, provided the drift is in L p (R 2 ), with p > 4. His uniqueness theorem is proved under conditions which are difficult to verify. The condition is tailored to make sure that Girsanov's theorem can be used to show that the law of the equation with drift is absolutely continuous with respect to that of the equation without drift. We will do that by verifying the condition of Lemma 1.1 from [Por90], which we now state Lemma 6.5. Assume that {Z t , t ≥ 0} is a non-negative progressively measurable process, adapted to the σ-algebra {F t , t ≥ 0}. Suppose that there exists a mapping ρ from the set of subintervals of [0, T ] into R + , such that Then for any λ < κ −1 , We intend to apply Proposition 6.5 for a Z t which will give us sufficient control over the drift in (17) that we can use Novikov's criterion and Girsanov's Theorem to transform the SDE (17) into the same equation without drift, and hence prove uniqueness of the solution using an argument in the vein of Theorem 1.2 from [Por90].
To better understand how to use Lemma 6.5 to remove the drift from (17) en route to prove uniqueness of the solution, we take a close look at the drift term. Our uniqueness argument exploits the fact that, since uniqueness is a local property, we can modify the coefficients of the (H, K) equation outside the set {(h, k), M −1 ≤ h, k ≤ M } for some arbitrary M > 0, so that tr resulting equation takes the form where for some C > 0, Hence the only possible difficulty will arises if when F is small. From the definition of F in (15) and the asymptotics in Lemma 3 we note that F (h, k) is small when |h−k| h∨k is small. Hence if we wish to use Girsanov's theorem to remove the drift from (18), the danger comes from |h−k| small if we restrict our attention to the set The following simple observation is useful to control the drift.
Proof. The inequality follow from the fact that if b > 0 then b > log(b) and if a ∈ (0, 1] then log(1/a) ≥ 0. The equality follows from log 1 Combining Lemma 6.6 and the asymptotics in Lemma 3 we note that 2 Hence on the set M −1 ≤ h ∨ k ≤ M , controlling F −1 (h, k) amounts to estimating | log(|h − k|)| for |h − k| < a, where 0 < a ≤ 1 is arbitrary. Hence, defining N t := H t − K t , if we desire to apply Lemma 6.5 to Z t = 1/F 2 (H t , K t ) it will be sufficient to estimate for 0 ≤ H s , K s ≤ M for some M > 0. Notice that as we will eventually prove that Lemma 6.5 holds with κ = 0 the conclusion of Lemma 6.5 will be more than sufficient to invoke Novikov's criterion.
Recall that N t satisfies with f 1 , f 2 , g bounded and g bounded away from zero. As a prologue to the needed estimate on (19), we prove the following result.
Proof: It clearly suffices to prove that We prove this result with {a ≤ |N r | ≤ b} replaced by {a ≤ N r ≤ b}. The same proof would estimate similarly . Also ϕ a,b ∈ C 2 , we can apply Itô's formula to obtain Since there exists Proof: In this proof, C stands for a constant which may vary on each line.

Now the first term is bounded by
The claimed result is proved.
We are now in a position to establish the desired uniqueness result.
Theorem 6.9. Equation (13) has a unique solution which spends zero time on the diagonal. Furthermore the whole sequence (U ε , V ε ) converges weakly to this solution as ε → 0.
Proof. Consider any solution of (13) which spends zero time on the diagonal. The time-change defined by (16) transforms this process into a solution (H t , K t ) of (17). Lemma 6.8 combined with Lemma 6.5 shows that, locally in the open positive quadrant, the law of that solution to (17) is absolutely continuous with respect to that of the same SDE, but without drift. That last equation has a unique weak solution, according to Theorem 7.2.1 in [SV79]. Now any solution to (17) coincides with the one constructed via Girsanov's theorem in Theorem 1.1 of [Por90] whose assumptions clearly hold in our case (alternatively, Girsanov's theorem could also be used thanks to a simplified version of Lemma 6.8 together with Lemma 6.5 again). Now that we have a unique weak solution (H t , K t ) to (17), we can time-change it back to the original (H t , K t ), i.e. defining Indeed a weaker version (without the power 2 of the logarithm) of Lemma 6.9 shows that A ′ t < ∞, for all t < ∞. Clearly A ′ t → ∞, as t → ∞. So this defines (U t , V t ) for all t ≥ 0, and this process coincides with the arbitrary solution which spends zero time on the diagonal. But the law of that process is uniquely characterized as being the time-change of the unique weak solution of (17).
Finally, it follows from this conclusion and Theorem 6.4 that all accumulation points of the collection {(U ε , V ε ), ε > 0} have the same law, hence the whole sequence converges. This proves the stated result.
From now on (U t , V t ) will always refer to the process whose law has just been uniquely characterized.
Since the martingale problem associated to (17) is well posed, (H t , K t ) is a Markov process, and from Theorem 6.3 in [Var07] (this theorem is stated in dimension 1, but exactly the same argument works in our case), so is its time-change (U t , V t ). We call Q t the Markov semigroup associated to that process, which is defined for φ : Remark 2. We note that in both cases σ 1 = σ 2 and σ 1 = σ 2 , the law of the uniquely characterized solution of (13) and that of a non-degenerate SDE in the quadrant R + × R + are equivalent. This is in sharp contrast with the result one would get if the diffusion coefficient would degenerate in a more regular way on the diagonal (e.g. it would be Lipschitz). In the latter case, in the case σ 1 = σ 2 , the solution would stay on the diagonal once it has hit it. In the case σ 1 = σ 2 , the solution would stay in the set u ≥ v or v ≥ u, depending upon the sign of σ 2 1 − σ 2 2 , after having hit once the diagonal. Often the period around an orbit diverges like 1/ log(ρ) while approaching a heteroclinic cycle or a homoclinic orbit. Here ρ is the distance from the limiting orbit. This often leads to coefficients which vanish very slowly. This is the situation in our setting. Hence while it may seem esoteric at first, in fact it is likely to be generic in many settings. 6.3. Longtime behavior of (U, V ). Unlike the pair (U ε t , V ε t ), the pair (U t , V t ) constructed from (H t , K t ) in the previous section form a Markov process. Hence, we can speak of an invariant probability measure for the Markov semigroup Q t .
Observe that for some positive c (recall that Γ(u, v) ≤ u ∧ v). Hence the following result follows from Lemma 5.1.
Proposition 4. For any p ≥ 1, there exists a constant C(p) so that We shall also need the an analogous result for the (H, K) process. We could prove stronger results but the following will be sufficient for our purposes.
, for some positive constants a and b. The result follows readily.
where F was defined in (15) and was used in the time change between (U, V ) and (H, K).
Proof of Lemma 6.11. Lemma 6.10 gives the needed tightness of the averaged transition densities of (H, K) to employ the Krylov-Bogolyubov theorem to show the existence of an invariant measure (just as we did in the proof of Corollary 2). Since It is not hard to see that whether (1/F )dν < ∞ or = ∞, in both cases, as t → ∞, The case (1/F )dν = ∞ follows as follows. For any M > 0, which implies by monotone convergence that hence the result. Consequently, in order to prove that 1/F ∈ L 1 (ν), as a consequence of Fatou's Lemma, all we have to show is that there exists C > 0 and T > 0 such that for all t ≥ T , It follows from Lemma 6.10 and Lemma 6.6 that all (22) will follow from the fact that for some constant C and all t ≥ 1, Consider the function ϕ ∈ C 1 (R) ∩ W 2,∞ (R) defined as We then have and ϕ ′′ (x) = (− log(|x|) + .
We now deduce from Itô's formula The result now follows easily from Lemma 6.10 and the fact that the process F −1 (H s , K s )ϕ ′ (N s ) is bounded. Proof of Theorem 6.12. Let ρ(u, v) be the density of the unique invariant measure ν as guaranteed by Lemma 6.11 . The same lemma also states that ρ is positive in the open positive quadrant. Now, using the notations from the proof of Theorem 6.9, for any measurable and locally bounded f : If one assumes that f is bounded then Lemma 6.11 ensures that f /F ∈ L 1 (ν) and that for any initial distribution Kr) F (Hr ,Kr) dr to f F dν is ensured by (21). The same computation with f ≡ 1 permits one to conclude that In summary, we have that for any f measurable and bounded, for any initial probability measure, This is enough to show that our process (U t , V t ) has the unique invariant probability measure Indeed, to see uniqueness, let λ ′ be any ergodic invariant probability measure for (U t , V t ). Then for any bounded f , the Birkoff ergodic theorem implies that for λ ′ -almost every initial (U 0 , V 0 ). Combining this with (23) implies that f dλ ′ = f dλ for all bounded f which in turn implies λ = λ ′ . Since any invariant measure can be decomposed into ergodic invariant measures the uniqueness of λ is proved.

The Deterministic Dynamics
We now investigate more fully the deterministic dynamics given in (1) and obtained by formally setting ε = 0 in (7). As already mentioned, (1) has two conserved quantities (u, v) = Φ(ξ 0 ) which are constant on any given orbit. If ξ 0 = (X 0 , Y 0 , Z 0 ) then u = 2x 2 + z 2 and v = 2y 2 + z 2 give two independent equations. Since we are working in three dimensions, the locus of the solutions, which contains the points in the orbits, is a one-dimensional curve. We undertake this study since the 1 ε B term in (10) implies that on the fast timescale the solution will make increasingly many turns very near a deterministic orbit of (1), before the stochastic or dissipative terms cause appreciable diffusion or drift from the current deterministic orbit. 7.1. Structure of orbits. If u = v then the orbit is a simple periodic orbit which is topologically equivalent to a circle. In this case, there are two disjoint orbits which are solutions. If u > v, one such orbit is given by and another by Similarly if v > u then the corresponding orbits are given by Whether u > v or v > u is enough information to localize a given orbit to one of two orbits on sphere of radius (u + v)/2. The remaining piece of information is contained in the sign of the function defined by sn(x, y, z) = sign 1 |x|>|y| x + 1 |y|>|x| y (24) The value of sn corresponds to the sign decorating the Γ ± u,v . Hence if one starts from the initial condition (x, y, z) such that the (u, v) computed from these orbits satisfies u = v then the deterministic dynamics will trace the set Γ sn u,v . The exception to being topologically equivalent to a circle are the lines of fixed points given by {(0, 0, z) : z ∈ R}, {(x, 0, 0) : x ∈ R}, and {(0, y, 0) : y ∈ R} and the heteroclinic orbits which connect them which are contained in the locus of points where u = v. For a given such choice there are four heteroclinic orbits given by These heteroclinic orbits split each sphere into four regions which contain closed orbits of finite period. The following set limits hold In contrast to the case when u = v, the orbits starting from a given point (x, y, z) do not converge to one of these unions of heteroclinic trajectories since any given orbit is restricted to a single heteroclinic trajectory. This could be a point of concern, but we will see in the next section that it does not pose a problem, which is an interesting and important feature of this model.

Symmetries and their implications.
Defining s e : R 3 → R 3 by s e (x, y, z) = (y, x, z) and s ± : R 3 → R 3 by s ± (x, y, z) = (−x, −y, z), observe that if ξ t is a solution to (1) then so are s e (ξ t ) and s ± (ξ). This implies that Γ − u,v = s ± (Γ + u,v ) and Γ + u,v = s e (Γ + v,u ), and that if µ is an invariant probability measure for P t then necessarily µs −1 e and µs −1 ± are also invariant probability measures for P t . The situation for the stochastic dynamics given in (7) is the same for s ± but depends on the choice of σ 1 and σ 2 for s e . In all cases s ± (ξ ε t ) is a solution (for a different Brownian motion) if ξ ε t is a solution. However s e (ξ ε t ) is again a solution if ξ ε t is one only when σ 1 = σ 2 . In any case, we have the following observation which we formulate as a proposition for future reference.
Proposition 5. Let s : R 3 → R 3 be a map such that s(ξ ε t ) is a solution (for possibly a different Brownian motion) whenever ξ ε t is a solution, then µ ε = µ ε s −1 where µ ε is the unique invariant probability measure of P ε t guaranteed by Theorem 3.1.
Proof of Proposition 5. As before it is clear that µ ε s −1 is again an invariant probability measure, however we know that µ ε is the unique invariant probability measure given the assumptions on the σ's. Hence we conclude that µ ε s −1 = µ.
7.3. Averaging along the deterministic trajectories. Since the separation of time scale between the fast and slow dynamics leads to the averaging of the coefficients of (U ε , V ε ) equation around the deterministic orbits we now discuss averaging along the deterministic orbits in general. After this we will define the function Λ whose asymptotics was described in Proposition 3. Given a function ψ : Notice that Aψ is again a function from R 3 → R and that it is constant on the connected components of the level sets of (u, v).
To obtain a more explicit representation for the averaging operation we will switch to an angular variable θ. Given any positive u and v, for θ ∈ [0, 2π] we parametrize z by z(θ) = √ u ∧ v sin(θ). To define the other coordinates we introduce the following auxiliary angles and set x(θ) = u 2 cos(φ 1 (θ)), and y(θ) = v 2 cos(φ 2 (θ)). Putting everything together we have that the trace of the trajectory starting at ( u 2 , v 2 , 0) is given by where γ u,v (θ) def = x(θ), y(θ), z(θ) . As already discussed depending on weather u > v or v > u this represents a closed orbit on the sphere of radius (u + v)/2 which rotates around respectively either the x-axis in the positive x half space or the yaxis in the positive y half space. The orbits in the negative half space are given by To define the occupation measure on these orbits we define a third auxiliary angle For u = v, we define a probability measure on R 3 by | cos(φu,v (θ))| dθ. We let ν − u,v (dx dy dz) = ν + u,v s −1 ± . For u = v, we define ν ± u,u (dx dy dz) = δ (0,0,± √ u) (dx dy dz). Each of these probability measures is supported on the corresponding set Γ + u,v or Γ − u,v . It is straightforward to see that for any ψ : R 3 → R and (x, y, z) ∈ R 3 such that |x| = |y| one has where ξ = (x, y, z), (u, v) = Φ(ξ) and s = sn(ξ) where sn was defined in (24). 7.3.2. Definitions of Γ and Λ. The central quantities which need to be averaged in the (U ε t , V ε t ) dynamics, given in equation (12), are the infinitesimal quadratic variations. They are given respectively by 16σ 2 1 x 2 and 16σ 2 1 y 2 . From (4), we have that x 2 = 1 2 (u − z 2 ) and y 2 = 1 2 (v − z 2 ). Since u and v are constant along the deterministic trajectories, this in turn implies that Since z 2 does not depend on the chose of sign in the definition ν ± u,v (dx dy dz) by defining the single function we have access to all of the averaged quantities we will require.
Clearly the function Γ(u, v) is symmetric in (u, v) and can be written as a function of u ∨ v and u ∧ v only. In fact one see that if one defines Λ(r) = K r 2π 0 sin 2 (θ) | cos(arcsin(r sin(θ)))| dθ, where K −1 r = 2π 0 1 | cos(arcsin(r sin(θ)))| dθ for r ∈ [0, 1] then (14) holds. The properties of Λ given in Proposition 3 follow directly from this definition, Proposition 6 in the next subsection, and its proof. 7.3.3. Averaging near the diagonal.
Proposition 6. Let ψ : R 3 → R be a continuous function. If δ = 1−(u∧v)/(v ∨u) then as |u − v| → 0 (and hence δ → 0) while (u, v) remains in a compact set, one has If in addition for all u where r = u∧v v∨u . Remark 3. The asymptotic expansion given in Proposition 3 follows from the fact that C u (1 − z 2 ) = 4u. The continuity properties follow from the formulas and the fact that the values at the ends of the intervals are finite.
As |u − v| → 0 and hence r → 1, this integral concentrates around the two points θ equal π/2 and 3π/2 since around these points | cos(φ(θ))| → 0 as r → 1. At these points (x(θ), y(θ), z(θ)) converges to (0, 0, √ u ∨ v) and (0, 0, − √ u ∨ v) respectively. Around these points we have one behavior and away from the another. Consider the following representative portion of the integral which will converge to 1 2 ψ(0, 0, √ u ∨ v). Fixing any sufficiency small ε > 0, we define a = a(ε) so that sin(π/2 − a) = sin(π/2 + a) = 1 − ε. Then The remaining half of the integral in (32) will converge to 1 2 ψ(0, 0, − √ u ∨ v) in a completely analogous fashion. The first and third integral behave the same. We consider the first. If, as before, we have r = u∧v v∨u and then make the change of variables α = sin θ followed by β 2 = √ rα 2 to obtain (1−ε)r By the asymptotics on K r given in (30), this goes to zero since ε > 0 as |u − v| → 0 and hence r → 1. Now as r → 1 one has (1−ε)r The last conclusion follows directly from the assumed finiteness of C u (ψ) and the asymptotics of K r as r → 1 as |u − v| → 0. in (31). We summarize this discussion in the following result.
Proposition 7. The set of ergodic invariant probability measure of (1) consists precisely of Given (u, v) ∈ R 2 + , we define the probability measure ν u,v on R 3 by where ν ± u,v (dξ) we defined in (26) and the text below it. The following corollary of Proposition 7 will be central to the proof of the convergence of µ ε to a unique limiting measure.
Corollary 3. Any invariant probability measure m for (1) which satisfies ms −1 ± = m can be represented as for some probability measure γ on (0, ∞) 2 . Furthermore the measure γ is unique. Conversely, a probability measure which is invariant for (1) and satisfies ms −1 ± = m is uniquely specified by the measure γ = mΦ −1 .
Proof of Corollary 3. The ergodic decomposition theorem [CFS82] implies that there exists a unique pair of measures (γ + , γ − ) so that the total mass of γ + + γ − is one and Since ν − u,v and ν + a,b are mutually singular for all choices of positive u, v, a, and b, we see that γ + = γ − and the total mass of both is 1 2 . Setting γ = 2γ + = 2γ − we see that γ is a probability measure and that This proves that any invariant m satisfying the symmetry assumption can be represented as claimed. All that remains is to show is that γ is unique. Let γ be another probability measure so that which in turn implies that 1 2 γ = γ + since the ergodic decomposition is unique. However, this implies γ = γ as desired.
7.5. The Limiting Fast Semigroup. We begin with a small detour to think about the limiting dynamics. Its action on a test function can be understood to instantly assign to each point on an orbit the average of the function around the orbit and to each point on the heteroclinic connection the value of the function at the limiting fixed point on the z-axis.
Recall the definition of ν u,v from (33), for φ : R 3 → R we define νφ by Recalling the definition of Φ which maps ξ to (u, v) from (4), we note that for any ρ : Lastly recalling the definition of P ε t from (11), Q t from (20) and let λ be the unique invariant probability measure of Q t guarantied by Theorem 6.12. For φ : R 3 → R we define Remark 4. If φ is a test function such that φ• s ± = φ or m is an initial measure on R 3 such that m = ms −1 ± then is not hard to convince oneself that mP ε t φ → mP t φ as ε → 0. If one neither starts with initial data which has this symmetry nor uses a symmetric test function, then things are more complicated. The orbit may average with respect to only one of the two measure: ν + u,v or ν − u,v . For definiteness assume that we are on the ν + u,v orbit. We believe that when the (U, V )-dynamics hits the line U = V then it is essentially spending all of its time at (0, 0, √ u) and (0, 0, − √ u). (See Proposition 6.) With probability 1 2 it returns to a ν + u,v orbit and with probability 1 2 it enters on to a ν − u,v orbit. Hence to describe the P t semigroup in the non-symmetric setting, it seems we need to add a sequence of independent Bernoulli random variables to make decision of whether one should average with respect to the + or the − orbit. Since we are primarily interested in the structure of the invariant probability measure we have not tried to make this picture rigorous.
Let λ be the unique invariant probability measure of Q t and define µ = λν. Observe that µ is invariant under P t because for any bounded φ : R 3 → R one has Here the first equality is by definition, the second follows from (34), the third from the invariance of λ under Q t and the last from the definition of µ.
8. Convergence of (U , ε V ) ε towards (U, V ) We now prove the results which were taken for granted in Section 6, namely that where C u , C v and C are three positive constants.
Lemma 8.1. Under the condition of Proposition 8, We can now proceed with the proof of tightness.
Proof of Proposition 8. We prove tightness of U ε only, V ε being treated completely similarly. We have t 0 e 2s dM ε s .
Clearly the first two terms on the right are tight in C([0, ∞)), since the collection of R-valued r.v.'s U ε 0 is tight. We only need check tightness in C([0, ∞)) of the process W ε t := t 0 e 2s dM ε s . Since W ε 0 = 0, we need only verify condition (ii) from Theorem 7.3 in Billingsley [Bil99], which follows from the condition of the Corollary of Theorem 7.4 again in [Bil99]. In other words it suffices to check that for any T , η and η ′ > 0, there exists δ ∈ (0, 1) such that for all ε > 0, 0 ≤ t ≤ T − δ, Combining Chebycheff and Burkholder-Davis-Gundy inequalities, we deduce that (we use below the result from Lemma 8.1) is not a Markov process it does not have an invariant probability measure. However the projection λ ε = µ ε Φ −1 of µ ε , the unique invariant probability measure of the Markov process ξ ε t , is well defined. We now establish the following tightness result: Lemma 8.2. The sequence of measure {λ ε : ε > 0} is tight on the space (0, ∞) × (0, ∞).
Remark 5. We emphasis that Lemma 8.2 is tightness in the open set (0, ∞)×(0, ∞) which implies the measure does not accumulate neither at the boundary at "infinity" nor at the boundary at zero. In other words, for any δ > 0 there exists a r > 0 so that The following result which implies the tightness at infinity follows immediately from the definition of λ ε , the definition of Φ and Corollary 2.
Lemma 8.3. For any p ≥ 1, there exists a C(p) > 0 so that We now handle the boundary at zero.
Lemma 8.4. Let ζ t be a Markov process and f and g two real-valued functions on the state space of ζ t satisfying 0 ≤ g(ζ t ) ≤ f (ζ t ) for all t ≥ 0 almost surely and such that f (ζ t ) is a continuous semimartingale satisfying df (ζ t ) = (a − f (ζ t ))dt + c g(ζ t )dW t where a and c are positive constants and W t a standard Wiener process. If µ is any invariant probability measure of ζ t with µ[ f 2 ] = f 2 (ζ)dζ < ∞, then for any δ ∈ (0, 1).
Proof of Lemma 8.4. Defining Observe that H ′ (x) = I(x) and I ′ (x) = φ(x) and that φ, I and H are well defined on the intervals (0, ∞), (0, ∞) and [0, ∞) respectively. H and I are everywhere positive, while φ is positive on [1, +∞) and negative on (0, 1). It is plain that the discontinuity of H ′′ at x = 1 will not prevent us from using Itô's formula. Taking ζ 0 distributed according to µ, noticing that since H(x) < 2 + x 2 for x ≥ 0, and setting X t = f (ζ t ) for notational convenience, we have that where M t is the Martingale defined by dM t = c g(ζ t )I(X t )dW t . We conclude that Now integrating over the initial conditions ζ 0 (which were distributed according to µ), we see that H terms are equal by the stationarity embodied in (40) (and hence they cancel) and that The result follows, since xI(x) ≤ 1 + x 2 .
The following Corollary is a direct consequence of the two last Lemmata Corollary 4. There exists a constant C > 0 so that for any δ ∈ (0, 1) Proof of Lemma 8.2. The result follows immediately by combining Lemma 8.3 and Corollary 4. 8.3. Convergence of Quadratic variation. Now that we know that the collection {(U ε t , V ε t ), t ≥ 0} ε>0 is tight, in view of Theorem 6.3, the weak uniqueness result for (13), and comparing (12) and (13), the weak convergence (U ε , V ε ) ⇒ (U, V ) will follow from the convergence of the quadratic variations of U ε and V ε to those of U and V , which will be proved in the next Lemma.
For each M > 0, let Considering the three different cases of the behavior of (U, V ), it is not hard to see that in all cases κ M , defined exactly as κ ε M , but with (U ε , V ε ) replaced by (U, V ), is a.s. a continuous function of the (U, V ) trajectory, hence In particular lim inf It will then follow that for any t > 0, the lim inf as ε → 0 of P(κ ε M > t) can be made arbitrarily close to 1, by choosing M large enough.
Lemma 8.5. Let ν ε be any sequence of tight probability measures on R 3 and let ( X ε t , Y ε t , Z ε t ) be the solution to (10) with ( X ε 0 , Y ε 0 , Z ε 0 ) distributed as ν ε . Then for any t > 0, as ε → 0, Proof. Since 2( X ε s ) 2 = U ε s − ( Z ε s ) 2 and 2( Y ε s ) 2 = V ε s − ( Z ε s ) 2 , we only need to show that t 0 ( Z ε s ) 2 ds ⇒ t 0 A(z 2 )(U s , V s )ds. It suffices in fact to show that for all M > 0. The vlaues of t > 0 and M will be fixed throughout this proof. For any δ > 0, we define N δ = ⌈t/δ⌉, t n = nδ ∧ κ ε M for 0 ≤ n < N δ and t N δ = t ∧ κ ε M . Let now Z (n) s be the z component of the solution to the deterministic dynamics (1) at time s which started at time t n from the point (X ε tn , Y ε tn , Z ε tn ). Then clearly To control the error term observe that The first term in the product on the righthand side is bounded due to the stopping time κ ε M . Using Lemma 4.1, we see that E|Ξ , ε δ | is bounded by a constant times the square root of Hence if we choose (42) δ = C −1 M ε log(1/ε), then Ξ ε,δ → 0 in L 1 (Ω) as ε → 0. Having made this choice of δ, we now suppress it from notation designating dependence on parameters.
We now further divide Φ ε (δ having been suppressed) depending on whether in phase space the starting point (X The reason for this decomposition is that why the time average of (Z n ) 2 over the time interval [t n+1 , t n ] is close to the function A(z 2 )(U tn , V tn ) is different in the two regions. The terms which have |u − v| > ρ have periods uniformly bounded from above and hence as (t n+1 − t n )/ε = δ/ε → ∞ the number of periods contained in the interval over which we are averaging also goes to infinity. On the other hand, as the points approach the diagonal u = v the period grows to infinity. So for |u − v| small enough the period might be much greater than the length of the time interval (t n+1 − t n )/ε = δ/ε over which we are averaging. Hence the reason for convergence for the A ε,ρ to the appropriate average values occurs by a different mechanism. Proposition 6 shows that A(z 2 )(u, v) → u = v as |u − v| → 0. To understand why ε δ tn+1/ε tn/ε (Z (n) s ) 2 ds → U tn ∧ V tn one needs to recall the discussion from Section 7. The deterministic orbits when u = v consist of heteroclinic orbits connecting the fixed points at (0, 0, √ u) and (0, 0, − √ u). Since the time to reach the fixed points on these orbits in infinite, it is not surprising that for |u − v| small the periodic orbit spends most of its time near (0, 0, ± √ u) ∼ (0, 0, ± √ v). This can also be seen in the fact that the occupation measures given in (26) concentrates around θ ∼ π/2, 3π/2, which corresponds to the fixed points, if |u − v| ∼ 0. Importantly, even when the time is not long enough to traverse the orbit completely, any average will be concentrated near the fixed points since the time to reach the neighborhood of the fixed point is small relative to the time it will take to leave that neighborhood once it has arrived there. This idea will be made quantitative below.
Hence we definê and τ (u, v) was the period of the deterministic orbit. For all β > 0 and α > ρ, we definê The utility of Ψ is the following which can be deduced from Section 7.3 sup u,v≤M
Lemma 8.6. Let X n be a sequence of X -valued r. v.'s, and X be such that X n ⇒ X, where X is a separable Banach space. Let {F n , n ≥ 1} be a sequence in C(X ), which is such that as n → ∞, F n → F uniformly on each compact subset of X . Then F n (X n ) ⇒ F (X), as n → ∞.
Proof of Lemma 8.6. Choose ε > 0 arbitrary, and let K be a compact subset of X such that P(X n ∈ K) ≤ ε, for all n ≥ 1. Now choose n large enough such that |F n (x) − F (x)| ≤ ε, for all x ∈ K. Choose an arbitrary G ∈ C b (R), such that sup x |G(x)| ≤ 1. We have The first term of the righthand side can be made arbitrarily small by choosing ε small, uniformly in n, since G is uniformly continuous on the union of the images of K by the F n 's. The last term clearly goes to zero as n → ∞.
Lemma 8.7. Let {X n , n ≥ 1} and X denote real-valued random variables, defined on a given probability space (Ω, F , P). A sufficient condition for X n ⇒ X is that for any continuous, bounded and increasing function F .
where ρ denotes the density of λ with respect to Lebesgue measure and we have used formula (26). Hence the restriction of µ to O is absolutely continuous with respect to Lebesgue measure on R 3 , with a density which is positive at (x, y, z) since ρ(u, v) > 0.