Invariant measure of the stochastic Allen-Cahn equation: the regime of small noise and large system size

We study the invariant measure of the one-dimensional stochastic Allen-Cahn equation for a small noise strength and a large but finite system. We endow the system with inhomogeneous Dirichlet boundary conditions that enforce at least one transition from -1 to 1. (Our methods can be applied to other boundary conditions as well.) We are interested in the competition between the energy that should be minimized due to the small noise strength and the entropy that is induced by the large system size. Our methods handle system sizes that are exponential with respect to the inverse noise strength, up to the critical exponential size predicted by the heuristics. We capture the competition between energy and entropy through upper and lower bounds on the probability of extra transitions between -1 and 1. These bounds are sharp on the exponential scale and imply in particular that the probability of having one and only one transition from -1 to +1 is exponentially close to one. In addition, we show that the position of the transition layer is uniformly distributed over the system on scales larger than the logarithm of the inverse noise strength. Our arguments rely on local large deviation bounds, the strong Markov property, the symmetry of the potential, and measure-preserving reflections.


INTRODUCTION
In this paper we study the unique invariant measure of the stochastically perturbed Allen-Cahn equation where u ε is a one-dimensional order parameter defined for all non-negative times t ∈ R + and x ∈ (−L ε , L ε ). Here η is a formal expression denoting space-time white noise and V is a symmetric double-well potential. The canonical choice for V is although more general choices are possible (see Assumption 1.1 below). We are interested in the properties of the invariant measure for large system sizes, It is well-known that for ε ↓ 0 and fixed system size L, the invariant measure of the Allen-Cahn equation concentrates on minimizers of the energy This follows from large deviation theory. In fact, even for system sizes L ε that grow with ε, the same is true. Indeed, in [Web10] the second author proved this fact for L ε ∼ ε −α for any α < 2 3 . Our main goal in the current paper is to go up to interval sizes that are exponential with respect to ε −1 and, specifically, to understand the competition between energy and entropy that emerges in this regime. Let us first consider the effect of energy on the measure. The intuition is that the invariant measure can be viewed as a Gibbs measure with the given energy, i.e., that it is in some heuristic sense proportional to The heuristic picture then says that, because of the potential term in the energy, functions u supported on this measure are most likely to be close to one or the other minimum of V on most of [−L ε , L ε ]. On the other hand, because of the gradient term in the energy, there is an energetic "cost" c 0 for each transition between these two preferred states, making such transitions unlikely. When the system size is order-one, this is the end of the story. Now let us consider the competing effect of entropy on the measure when the system size is large. Namely, the probability of finding a transition between the minima of V is increased by the fact that it is possible for the transition to occur anyplace in the system. Hence, the folklore is that the probability of finding n transition layers scales like This competition between entropy and energy is captured (on the exponential level) in our first theorem, Theorem 1.5 below. Our second result, Theorem 1.9, then shows the uniform distribution of transitions within the domain. As far as our methods, the central idea is that one can decompose the measure into conditional measures and the corresponding marginals in order to reduce to order-one intervals on which one can apply large deviation theory. Along the way, it is important for us to use measure-preserving reflection arguments that allow us to transform the underlying Brownian paths. The detailed structure of the (deterministic) energy functional is also critical in our proofs.
We will state our results in detail in Subsection 1.2 after first explaining our set-up and notation.
1.1. Set-up and notation. For the potential V in (1.1), we need a symmetric double-well potential with at least superlinear growth at infinity. For simplicity, we assume that the two minima of V are normalized to be at ±1 and that the minimum value of the potential is zero. To be precise, our assumptions are: V (u) ≥ u 1+β /C for u ≥ C for some C < ∞ and β > 0. (1.3) Remark 1.2. If we assume superquadratic growth on V at infinity (recall that we have quartic growth of the standard double well potential V (u) = (1 − u 2 ) 2 /4), some of our technical lemmas simplify slightly. In particular, one can remove the dependence of the minimal system size ℓ * on M in Lemmas 2.3 and 2.5.
Because of the normalization of our potential, the transitions that we are interested in are transitions between ±1. We make the notion precise in the following definition.

Definition 1.3 (Up/down transition layers). We say that u has an up transition layer on
and |u(x)| < 1 for all x ∈ (x − , x + ).
We say that u has a down transition layer on (x − , x + ) if the same condition holds with signs reversed, and that u has a transition layer if it has an up or down transition layer.
For the boundary conditions on our PDE, we will work with the popular inhomogeneous Dirichlet boundary conditions Because of the boundary conditions, there is necessarily one up transition layer, and the question is whether there are additional transition layers. Moreover, if there are additional layers, they come as a pair of an up layer and a down layer. Note that our methods can also handle other boundary conditions, for instance periodic boundary conditions or Dirichlet boundary conditions that do not force a transition layer to be present. We will denote the invariant measure of (1.1) subject to the boundary conditions (1.4) by µ −1,1 ε,(−Lε,Lε) and the corresponding expectation by E µε,−1,1 (−Lε,Lε) (·). We will often use the fact that the measure µ −1,1 ε,(−Lε,Lε) can be written as a Gaussian measure with density [Zab89]. Namely, one can express the expectation of any test function Φ as (1.5) Here E Wε,−1,1 (−Lε,Lε) denotes the expectation with respect to the measure W −1,1 ε,(−Lε,Lε) , which is the distribution of a Brownian bridge on (−L ε , L ε ) from −1 to +1 with variance proportional to ε. Properties of W −1,+1 ε,(−Lε,Lε) will be discussed in detail in Section 3. The deterministic Allen-Cahn equation (set η = 0 in (1.1)) is the L 2 -gradient flow of the energy (1.6) When we need to refer to the energy on all of R or the localized energy on a subinterval, we will denote this with a subscript: (1.7) As mentioned above, the energy functional will be important for understanding the invariant measure of the stochastic equation. In particular, the probability of finding transition layers will depend on the energetic "cost" of a transition layer on R, that is: (1.8) It is well known [MM77] that this cost can be computed explicitly as (1.9) see the beginning of Section 2 for an explanation. We will often refer to scaling regimes in our results. To this end, we define the following notation.
Notation 1.4. The well-established theory of large deviations applies on intervals whose length is order-one with respect to ε. A main point of this paper, however, is to obtain estimates on intervals that are exponentially large with respect to ε and for which, consequently, the established theory does not apply. We therefore use a subscript of ε in order to distinguish interval lengths that are large with respect to ε from quantities that are order-one with respect to ε.
To specify bounds with respect to ε, we sometimes make use of the shorthand notation ≪, , and . To explain: We write if for every C < ∞, we have A ε /B ε ≤ 1/C for ε sufficiently small.
We write A ε B ε if there exists a universal constant C < ∞ such that A ε ≤ C B ε , and similarly for A ε B ε . If both inequalities hold, then we write A ε ∼ B ε . We write A ε B ε if for every α > 0 we have A ε ≤ B ε + α for ε sufficiently small, and similarly for A ε B ε . If both inequalities hold, then we write A ε ≈ B ε . We use numbered constants C 1 , C 2 , et cetera, to denote specific constants that we refer to later in the paper. On the other hand, we use C to denote a generic order-one constant whose value may change from place to place. Throughout the article, C or a numbered constant C i is a constant that is universal except for a possible dependence on the potential V .
We are now ready to state our results.

Main results.
Recall that the boundary conditions imply that there must be at least one up layer and that any additional layers come in pairs. We will always consider the regime where the system size L ε satisfies (1.10) (Recall that c 0 is the energy cost defined in (1.8).) This is the regime in which one expects the probability of extra transitions to go to zero and in particular to obey the energetic and entropic scaling expressed in (1.2). Our first result captures this behavior on the exponential level.
Remark 1.7. Throughout the paper, when we say "u has 2n + 1 layers," we mean that u has at least 2n + 1 layers.
Remark 1.8. As mentioned above, our techniques can also handle different boundary conditions, e.g., periodic boundary conditions or Dirichlet boundary conditions that do not enforce a transition layer. For instance, for periodic boundary conditions or Dirichlet conditions u(±L ε ) = 1, the probability of 2n transition layers is bounded above and below by respectively, while for homogeneous Dirichlet boundary conditions, the probability of n transition layers is bounded above and below by respectively.
Our second main result states that, on scales larger than logarithmic in 1/ε, the layer location is uniformly distributed in the following sense. Theorem 1.9. Consider µ −1,1 ε,(−Lε,Lε) in the regime (1.11) Then uniformly for any The theorem says that the probability of finding an up transition layer in a subinterval of length 2d ε given a system size 2L ε is approximately d ε /L ε in the sense expressed in (1.12), independent of the location of the subinterval. (The existence of an up transition layer somewhere in the system is forced by the boundary conditions.) In this sense, the layer locations are approximately uniformly distributed. The theorem is strongest when considering d ε at the lower range of validity: It shows that the uniform distribution holds not only on macroscopic intervals but also down to the logarithmic scale.
We remark that the uniform distribution of the layer location in our regime is very different from the characterization of the layer distribution in the case L ε = | log ε|/4 studied in [BBB08b]; see Subsection 1.4 below for more discussion.
1.3. Methods: Markovianity, compact sets, and reflections. Our approach for Theorem 1.5 relies on a simple idea. Namely, while we cannot use large deviation theory directly on (−L ε , L ε ), we can use the Markovianity of the underlying reference measure to reduce to order-one subintervals on which we can. In particular, by taking large (but order-one) subintervals and conditioning on the boundary values of a larger, surrounding subinterval, we can take advantage of large deviation bounds with a cost that is to leading order independent of the subinterval size. This method is similar in spirit to Freidlin and Wentzell's approach of calculating the expected exit time from a metastable domain for a diffusion process with small noise ( [FW98], see Subsection 1.4 for a more detailed account of the related literature).
To illustrate the idea, suppose that we want to estimate the probability that there is a transition layer contained within [−ℓ, ℓ] for some ℓ large. (Transition layers are introduced in Definition 1.3 above; roughly, they are layers connecting ±1.) The Markov property (Lemma 3.2) implies that this probability can be written as (1.13) Here ν denotes the marginal distribution of the pair (u(−2ℓ), u(2ℓ)), and µ u − ,u + ε, (−2ℓ,2ℓ) denotes the distribution of paths on (−2ℓ, 2ℓ) with boundary conditions u ± (see Section 3 for a precise definition of this measure). In Subsection 3.2 we establish large deviation estimates for the measures µ u − ,u + ε, (−2ℓ,2ℓ) that hold locally uniformly in the boundary values u ± . Hence for u ± in some large compact set, we can integrate over these bounds in (1.13). On the other hand, the probability that the boundary values u ± fall outside of the compact set [−M, M ] for M ≫ 1 decays exponentially with M (see Lemma 4.1 below). Here ∆E(transition) denotes the difference between the minimal energy of paths that perform a transition in (−ℓ, ℓ) and the minimal energy of any profile u that satisfies the boundary conditions u(±2ℓ) = u ± . (See Subsection 3.2 for a more complete discussion.) Now we arrive at the second problem, which is more subtle. The issue is that the energy difference ∆E(transition) depends strongly on the boundary conditions. The cost that we are expecting to recover is c 0 , defined in (1.8). However, if u − ≈ −1 and u + ≈ 1, for instance, then the energy difference is approximately zero! In this case, the information about the probability of a transition is encoded in the distribution ν.
Our idea to handle the problem of dependence on the boundary conditions relies on Markovianity and the global symmetries of µ −1,1 ε,(−Lε,Lε) . What we want to do is to transform a transition event into an event that does not feel the influence of the boundary conditions. Roughly, the new event will be that there are points Figure  1.1 for an illustration and Definitions 2.4 and 2.7 for formal definitions of these "wasted excursions.") The expected cost for such an event is also c 0 , and a little thought reveals that this should be the energy difference regardless of the boundary conditions at ±2ℓ. (For a result in this direction, see Lemma 2.5.) In order to transform transitions into wasted excursions, we use the strong Markov property (see Lemma 3.3) and the symmetry of V . Specifically, we reflect paths vertically between certain hitting points of zero in such a way that leaves µ −1,1 ε,(−Lε, Lε) invariant. For details, see for instance (4.22) and the subsequent calculations in the proof of Theorem 1.5.
A different reflection operator turns out to be useful when we come to the proof of the uniform distribution of the layer location in Theorem 1.9. Again the Markovianity and the symmetry of µ −1,1 ε,(−Lε,Lε) are crucial. Here the rough idea is to show that the probability of finding the transition layer in any interval [y − d ε , y + d ε ] is approximately the same as that of finding the layer in any other interval [z − d ε , z + d ε ]. In Section 5, we construct a measure-preserving reflection operator that transforms paths with a transition in [y − d ε , y + d ε ] into paths with a transition in (or near) [z − d ε , z + d ε ]. We build this reflection operator using certain hitting points of −1 and +1 to the left and right of the transition layer. (This is illustrated in Figure 1.2.) Hence a key point is to prove that, on the set of paths with a transition in [y − d ε , y + d ε ], such hitting points exist with high probability. This fact is developed in Lemmas 5.1 and 5.2 using an iterated rescaling argument and large deviation bounds.
1.4. Background literature and related results. The study of the effect of a small noise on a physical system has a rich history in the chemistry, physics, and mathematics literature. With roots in the fluctuation theory of Einstein [Ei10] (1910), the path integral formulations of Wiener [W30] (1930) and Feynman [Fe48] (1948) lie at the heart of the large deviation theory for diffusion processes and the characterization of the corresponding invariant measure. One of the aspects to receive the most applied interest and significant mathematical attention is the question of the first exit time from a metastable basin. The exponential dependence of the mean exit time on the energy barrier goes back to Van't Hoff and Arrhenius [VH84, A89] (1889). Refining this picture, the so-called Kramers formula determines the prefactor in terms of the curvature of the potential at the critical points and was made famous in the 1940 paper by Kramers [Kr40], although the result (for the overdamped dynamics) had been derived as early as 1927 by Farkas [Fa27]. See the review paper by Hänggi, Talkner, and Borkovec [HTB90] for a thorough historical survey. The higher dimensional case was analyzed by Landauer & Swanson [LS61] in 1961 and further pursued by Langer (see for instance [L69], 1969).
In the mathematics literature, metastability for diffusion processes that depend only on time (i.e., constant in space) was explored early on in the paper by Pontryagin, Andronov, and Vitt [PAV33] (1933). The mathematical theory of large deviations was subsequently developed in the 1970s in papers by Wentzell and Freidlin (see for instance [WF70]) and Kifer [Ki74], and a landmark text is the book of Freidlin-Wentzell [FW98] (published in Russian in 1979 and first published in English in 1984). On the level of the mean exit time, the Freidlin-Wentzell theory confirmed the exponential factor in the Kramers formula. The prefactor in Kramers' law for d > 1 was established via formal asymptotic expansions in the famous paper by Matkowsky and Schuss [MS77] in 1977. A rigorous derivation was given by Sugiura in [S95] and independently and with a different method by Bovier, Eckhoff, Gayrard, and Klein [BEGK04,BGK05].
The small noise problem for stochastic partial differential equations appears more recently in the mathematics community. A seminal paper in extending the Freidlin-Wentzell theory to spatially varying diffusions is the paper of Faris and Jona-Lasinio [FJ82] from 1982, which specifically established and studied the action functional of the stochastic Allen-Cahn differential equation on a bounded system [0, L]. The invariant measure of stochastically perturbed reaction diffusion systems (including the Allen-Cahn equation) on a bounded domain is studied by Freidlin in [Fr88] in 1988. Recently, Barret,Bovier,and Méléard [BBM10,B12] and Berglund and Gentz [BG12] have established the mean exit time estimate including the prefactor for a class of equations including the Allen-Cahn equation.
As we have emphasized in the beginning of the introduction, in this paper we are concerned with the interplay between small noise and large domain size. Specifically, we are interested in system sizes that are exponential with respect to the inverse noise strength. Before turning to the invariant measure for unbounded systems, we remark that there is already an entropic, system-size dependent component of the mean switching time when there is a "flat" or "degenerate" saddle point, e.g., for the Allen-Cahn equation in the periodic case. Specifically, the prefactor picks up a factor that is proportional to the volume of the degenerate set. This fact was observed already by Glasstone, Laidler, and Eyring [GLE41] (1941) in the context of transition state theory, and the estimates in the setting of overdamped diffusions were developed by Langer [L69] (1969) and Matkowsky & Schuss [MS77] (1977). See also [VW08] for an independent, also formal, derivation.
The dynamics of the stochastic Allen-Cahn equation (1.1) have been considered by several authors. In particular, in the groundbreaking works of Funaki [Fu95] and Brassesco, De Masi, and Presutti [BDMP95], the dynamics of very similar equations were studied. In [Fu95], the equation (1.1) is considered on the whole line with boundary conditions that enforce one transition. The noise term √ 2εη is multiplied by a function with compact support. In terms of our notation, the noise acts on an interval L ε of length polynomial in ε −1 . In [BDMP95], the equation (1.1) is considered for L ε = ε −1 with Neumann boundary conditions. In both articles, the initial condition is chosen close to the optimal profile of a single transition, and it is shown that the solution stays close to an optimal profile on timescales that are polynomial in ε −1 . The evolution of the midpoint of the transition layer is also characterized: In [Fu95], the interface dynamic is given by a stochastic differential equation that reflects the spatially dependent noise strength. In [BDMP95], it is shown that the midpoint performs a Brownian motion. The dynamic behavior observed in both of these articles is consistent with our results on the invariant measure. In particular, the Brownian motion of interfaces is consistent with the uniform distribution of layer location that we observe in Theorem 1.9. Now let us consider the interplay between small noise and large domain size. The idea of understanding large deviation events on large spatial systems via a decomposition into subintervals (intermediate in size between the logarithmic and exponential scale) is used in the paper [VW08] to heuristically derive the nucleation and propagation dynamics in the setting of an unequal-well potential. In rigorous work on the invariant measure for the equal-well case, the second author derived a concentration result for the measures µ −1,1 ε,(−Lε,Lε) in [Web10] for system sizes that are large but algebraically bounded: specifically, L ε ≤ ε −α for α < 2/3. The technique used there is completely different from the one employed in the present article, however. In [Web10], the measure is discretized to make rigorous the heuristic intuition that µ −1,1 ε,(−Lε,Lε) is a Gibbs measure. Explicit bounds on the energy landscape and Gaussian concentration inequalities are then used to derive bounds on this discretized measure. This technique does not appear to be applicable for longer intervals because the discretization errors become too large.
In the articles [BBB08a] and [BBB08b], the special case of intervals growing like L ε = 1 4 | log ε| is studied. (The prefactor 1/4 depends on a specific choice of double-well potential.) The article uses the fact pointed out in [RVE05] that the measure µ −1,1 ε,(−Lε,Lε) can be realized as the distribution of a diffusion process conditioned on the event u(L ε ) = 1. The drift term a ε is the logarithmic derivative of the ground state of the Schrödinger operator −ε 2 ∆ + V . (In most cases, the drift a ε cannot be given explicitly.) This is the extension to bounded intervals of the well-known equivalence for the measure on the real line, cf. [S79]. Building on the connection between the invariant measure of the PDE and the process in (1.14), [BBB08b] derives a concentration result around the one-parameter family of energy minimizers. Furthermore, the authors characterize the asymptotic distribution of the position of the interfacial layer. It is nonuniform due to the energetic repulsion from the boundary of the interval. To see this nonuniformity, the moderate scaling L ε ≈ | log ε| is necessary. Incidentally, this shows that our lower bound d ε ≫ | log ε| in Theorem 1.9 is optimal: Below the scale of | log ε|, nonuniformity occurs. Loosely speaking, the results in [BBB08b] and ours are complimentary. They obtain finer results on logarithmically large system sizes, we obtain coarser results on exponentially large system sizes.
Results similar to (but different from) ours were obtained in [COP93] for a onedimensional Ising model with ferromagnetic Kac potential. This is a spin model whose spins interact not only with their nearest neighbors, but with all spins in a given range. The authors study the limit in which this range diverges. This corresponds to the limit ε ↓ 0 that we investigate. Their main argument relies on a large deviation statement for the whole system in a local topology. This large deviation result implies, for example, that the the local spin averages concentrate around ±1 and that probability to see a transition from −1 to +1 in any given compact interval is exponentially small. The exponential rate is given by the energetic cost of a transition (similar to the constant c 0 in this work). The significant difference between their large deviation bounds and ours is the dependence on the boundary condition. Their bounds state that the exponential decay of the probability of observing a certain behavior on an order-one interval is governed by the energy. We only get bounds for the measures conditioned on the boundary values on that interval. The difference is easy to appreciate on the level of the results. As mentioned, the probability of seeing a transition on a given order-one subinterval in their setting is exponentially small, while-because of our boundary conditions-a similar statement cannot possibly hold in our case: Indeed, if it were to hold, we could sum over order-one subintervals and deduce that the probability to see a transition in the full system goes to zero with the noise, while in fact it is identically equal to one.
Finally, let us touch on the appearance of measures similar to µ −1,1 ε,(−Lε,Lε) in the study of Schrödinger operators. The Feynman-Kac formula gives a way to solve the imaginary time Schrödinger equation (i.e., the heat equation with a potential) in terms of measures that are absolutely continuous with respect to Wiener measure. In this context, our model is often referred to as the φ 4 1 model and the limit ε ↓ 0 corresponds to the semiclassical limit in which the Planck constant is sent to zero. Lemma 4.1, for instance, is closely related (but not equivalent to) a statement about the decay of the ground state for the Schrödinger operator ε 2 ∆ + V as ε ↓ 0.
1.5. Organization. We begin with preliminaries: In Section 2 we collect some properties of the energy functional, and in Section 3 we collect some probabilistic properties of µ −1,1 ε,(−Lε,Lε) and of the underlying Gaussian measures. With these preliminaries in hand, we turn in Section 4 to the proof of our first result, Theorem 1.5. In Section 5 we prove Theorem 1.9, the uniform distribution of the layer location. Finally, in Section 6 we prove the various technical lemmas that have been used in support of the main theorems.

DETERMINISTIC PRELIMINARIES
In this section we discuss some more details about the energy functional E (cf. (1.6)). Our goal is to familiarize the reader with the common intuition about this energy, as well as to present some facts that will guide our method and appear later in proofs.
As described above, the potential term in the energy favors the states ±1 and the gradient term in the energy leads to an energetic cost for transitions between these states. Given our large system and the boundary conditions (1.4), it is natural to consider the problem As we mentioned, the minimum cost c 0 can be calculated explicitly (cf. (1.9)). The calculations underlying this fact appear repeatedly in the proofs of our energy lemmas, so we begin by recalling them. The so-called Modica-Mortola trick (cf. [MM77]) uses the elementary inequality a 2 + b 2 ≥ 2ab to observe: which gives a lower bound on the energetic cost. For the matching upper bound, one observes that the equality a 2 + b 2 = 2ab holds if and only if a = b, so that the minimum energetic cost is achieved precisely when |∂ x u| = 2V (u). (2.1) For our boundary conditions, it is easy to see that the minimum is achieved for the strictly increasing function that satisfies We denote by m the minimizer that is normalized so that m(0) = 0. This function m is then the unique, centered, stationary solution of the Allen-Cahn equation on R subject to the given boundary conditions, i.e., the solution of In the case of the standard double-well potential V (u) = (1 − u 2 ) 2 /4, one has m(x) = tanh(x/ √ 2). For general potentials satisfying Assumption 1.1, the energy minimizer has similar qualitative properties to the hyperbolic tangent. In particular, what will be important for us is that the minimizer converges exponentially to ±1 as x → ±∞.
Lemma 2.1 (Exponential decay of minimizer). Under Assumption 1.1 on the potential V , there exists C < ∞ such that the global energy minimizer m satisfies The exponential convergence to ±1 follows directly from (2.2) and the quadratic behavior of V near the minima (cf., Assumption 1.1).
In addition to the exponential convergence to ±1, we see from (2.2) and Assumption 1.1 that outside of a neighborhood of ±1, the slope of m is bounded away from zero. Consequently, there is a characteristic length-scale associated to a transition layer. We will use this length-scale in an essential way. That is, since we cannot apply large deviation theory on the full system scale L ε , we will decompose into subsystems of bounded size, typically called 2ℓ or 4ℓ. We will choose the subsystem size so that (with very large probability) a typical transition layer fits inside, which requires ℓ to be large. In order to make these ideas precise, we begin by introducing the idea of a δ − transition layer. Simply put, instead of connecting ±1, it connects −1 + δ with 1 − δ.
Definition 2.2 (δ − transition layer). Fix δ ∈ (0, 1/2) and suppose x − < x + . We say that u has a δ − up transition layer between x − and x + if and We say that u has a δ − down transition layer on (x − , x + ) if the same condition holds true with signs reversed, and that u has a δ − transition layer if it has a δ − up or a δ − down transition layer.
Since it is of course true that µ −1,1 ε,(−Lε,Lε) u has (2n + 1) transition layers ≤ µ −1,1 ε,(−Lε,Lε) u has (2n + 1) δ − transition layers , the proof of the upper bound in Theorem 1.5 will be established if we can show that for any γ > 0 and for sufficiently small δ > 0, there is an ε 0 > 0 such that, for all ε ≤ ε 0 , we have The main ingredient for establishing (2.3) is the uniform large deviation estimate from Proposition 3.4, below, which essentially reduces the problem to one of energy estimates. We will control the energy of suitable classes of functions up to a small δ-dependence and ultimately absorb this error term into the large deviation error γ from the proposition. One of the first steps will be to understand the length-scale associated to δ − transition layers. For any δ ∈ (0, 1/2), the optimal transition layer captured by the energy minimizer m goes from −1 + δ to 1 − δ over a finite length-scale, and "typical layers" perform the transition on a similar length-scale. A question that we will have to address is how likely it is for a transition to take unusually long to complete a δ − transition. In the following lemma, we show that the difference of energies expressed in Proposition 3.4 is large for functions that perform unusually long transitions (uniformly with respect to the boundary values).

Lemma 2.3 (Long transitions).
There exists a C 1 < ∞ (depending only on V ) such that, for any M < ∞ and any δ ∈ (0, 1/2), there exists an ℓ * < ∞ with the following property. For any ℓ ≥ ℓ * and u ± ∈ [−M, M ], set The proof of Lemma 2.3 is given in Subsection 6.1. This lemma together with the large deviation bound from Proposition 3.4 will imply that for γ small with respect to δ 2 ℓ, the probability of finding such a layer is bounded above by which we can make negligible by choosing ℓ sufficiently large. Now we would like to show that the exponential factor in the probability of finding a δ − layer is close to c 0 , defined in (1.8). Specifically, we expect it to be approximately The problem, which we already alluded to at the end of Subsection 1.3, is that the boundary values (for instance u(−2ℓ) ≈ −1, u(2ℓ) ≈ 1) may make it likely to find a layer. Hence, we will employ reflection operators to transform δ − transition layers into events that are unlikely regardless of the boundary conditions. We will call such events wasted δ − excursions: Definition 2.4 (Wasted δ − excursion). For any δ ∈ (0, 1/2), we will say that u has a wasted δ − excursion on (−ℓ, ℓ) if there exist points As described above for long transitions, we will estimate the probability of such events using the large deviation estimate from Proposition 3.4. We note that the proposition requires minimizing energy over a ball (in the space of continuous functions) around the set of interest. Because of the way we have defined wasted excursions, a ball of radius δ around the set of functions with a δ − excursion in a given interval is equal to the set of functions with a (2δ) − excursion in that interval. Hence, our large deviation estimate together with an energetic estimate will bound the probability that we are after. The following lemma contains the necessary energetic estimate: namely, that the difference of energies described in our large deviation estimate is bounded below by c 0 plus a small term. (2.5) Then we have The proof of Lemma 2.5 is given in Subsection 6.1. It gives us the exponential factor in the desired estimate (2.3), above.
For the lower bound in Theorem 1.5, we will work with so-called δ + transition layers between −1 − δ and 1 + δ.
Definition 2.6 (δ + transition layer). Fix δ ∈ (0, 1/2). We say that u has a δ + up transition layer within the interval (−ℓ, ℓ) if there exist points We say that u has a δ + down transition layer on (−ℓ, ℓ) if the same condition holds true with signs reversed, and that u has a δ + transition layer if it has a δ + up or a δ + down transition layer.
In analogy with the δ − transition layers that we use for the upper bound, δ + transition layers will be convenient for the lower bound. Since the probability of having (2n + 1) transition layers is greater than the probability of having (2n + 1) δ + transition layers, it will suffice to show that µ −1,1 ε,(−Lε,Lε) u has (2n + 1) δ + transition layers We will establish this bound by reflecting in order to transform the δ + transition layers into some kind of "wasted excursions" whose probability we can bound, independently of the boundary conditions. Definition 2.7 (Wasted δ + excursion). For any δ ∈ (0, 1/2), we will say that u has a wasted δ + excursion on (−ℓ, ℓ) if there exist points (We will use only the wasted δ + excursions that come from below, but of course it would be straightforward to define the analogue with u(x ± ) ≥ 1 + δ, and they would obey the same energetic and probabilistic bounds.) As in the case of the upper bound, we need an energetic lemma that will control the contribution to the large deviation estimate for wasted δ + excursions. Because of the form of the large deviation estimate that we will develop in Section 3 (see Proposition 3.5 below), it will be convenient for us to introduce the energy bound on the following set of functions: It is easy to see that a δ ball (with respect to the sup norm) around A bc δ,pre is equal to the set of functions with wasted δ + excursions on (−ℓ, ℓ). This fact is what will later be useful for the lower bound. For now, we record the following energetic fact, which plays the role for the lower bound that Lemma 2.5 played for the upper bound.

Then we have
We will need to consider some additional properties of the energy as we prove the main theorems, but we defer their discussion to a later time when their motivation and hypotheses will be clearer. With the central facts about the energy in hand, we now turn to the probabilistic background for our paper.

PROBABILISTIC PRELIMINARIES
In this section, we collect some probabilistic facts about the Gaussian measures W u − ,u + ε,(x − ,x + ) and the measures µ After stating a precise definition and some elementary symmetry properties, we will discuss Markov properties satisfied by these measures in Subsection 3.1 and large deviation bounds in Subsection 3.2.
For every x − < x + , we denote by W 0,0 ε,(x − ,x + ) the distribution of a Brownian bridge with homogeneous boundary conditions on [x − , x + ] whose variance is proportional to ε. To be more precise, W 0,0 ε,(x − ,x + ) is the unique centered Gaussian measure on the space of continuous functions C([x − , x + ]) such that, for all Equivalently, one can say that W 0,0 ε,(x − ,x + ) is the centered Gaussian measure whose Cameron-Martin space is given by the Sobolev space H 1 0 ([x − , x + ]) with vanishing boundary conditions equipped with the homogeneous scalar product Indeed, the right-hand side of (3.1) is the Green's function for 1 ε ∂ 2 x with Dirichlet boundary conditions.
In the sequel, we often use the notation to denote the Gaussian part of the energy of a function u on the interval (x − , x + ).
It is common to think of W 0,0 ε,(x − ,x + ) as a Gibbs measure with energy I x − ,x + and noise strength ∝ ε. Of course, (3.3) does not make rigorous sense because there is no "flat measure" du on path space, and I x − ,x + (u) is almost surely infinite under W 0,0 ε,(x − ,x + ) . The heuristic formula (3.3) is motivated by finite dimensional approximations and it gives the right intuition for the large deviation bounds.
For more general boundary conditions u − , u + ∈ R, we can define W as the image measure of W 0,0 ε,(x − ,x + ) under the shift map where h is the affine function interpolating the boundary conditions: Similarly to (1.5), for any choice of boundary condition u ± and on any interval (x − , x + ), we denote by µ u − ,u + ε,(x − ,x + ) the probability measure whose density with respect to W u − ,u + ε,(x − ,x + ) can be expressed as dµ Here we have introduced the notation for the normalization constant that ensures that µ is indeed a probability measure.
As we have indicated in the introduction, there are symmetry properties of the measures W u − ,u + ε,(x − ,x + ) and µ u − ,u + ε,(x − ,x + ) that will play an important role in our argument. Observe for example that both W 0,0 ε,(x − ,x + ) and µ 0,0 ε,(x − ,x + ) are invariant under the vertical reflection u → Ru and the horizontal reflection u → Su where 3.1. Markov properties. We first present a two-sided version of the Markov property for the measures W u − ,u + ε,(x − ,x + ) and µ u − ,u + ε,(x − ,x + ) , which states that for any fixed points x − ≤x − <x + ≤ x + and for u distributed according to to W ε,(x − ,x + ) ). Then in Lemma 3.3, we give the strong Markov property, which states that the same statement holds true when the deterministic pointsx ± are replaced by left and right stopping points χ ± . The proofs of these statements are quite standard. For completeness, we have included them in Subsection 6.2.
In the case of the measures W , the Markov property can be stated in the following way. Forx − <x + , we define the piecewise linearization ux . Then the following holds.
x − is zero outside of (x − ,x + ) and is distributed according to W 0,0 ε,(x − ,x + ) between the two points.
Due to the lack of spatial homogeneity, the corresponding property for the measures µ We also introduce the following notation that extends the measures to paths on a larger domain by prescribing the values outside of an interval. Suppose that Then the Markov property takes the following form.

non-random points. Then for any bounded measurable test function
Here We will typically use (3.7) in the following way: For given points Here ν x 1 ,...,x 2n denotes the distribution of the random vector (u(x 1 ), . . . , u(x 2n )) under µ −1,1 ε,(x − ,x + ) . Formula (3.8) follows directly by applying (3.7) n times. To state the strong Markov property, we additionally need the notion of left and right stopping points. These are defined analogously to stopping times for Markov processes. A random variable In the same way a random variable χ + is called a right stopping point if for all In all of our applications the stopping points χ ± are going to be left or rightmost hitting points of a closed set. It is easy to check that these random points are indeed left and right stopping points as defined above.
For given left and right stopping points χ ± , we define the sigma-algebra F [x + ,χ − ] of events that occur left of χ − and the sigma-algebra F [χ + ,x + ] of events that happen to the right of χ + by The strong Markov property can be stated in an analogous way to (3.7). Then for any u ± ∈ R, we get the following identities The strong Markov property is a crucial ingredient in the proofs of both Theorem 1.5 and Theorem 1.9. Let us illustrate how it is used in the proof of Theorem 1.5. Let χ − be the leftmost hitting of zero to the right of a given point x − and χ + the rightmost hitting of zero to the left of a given point x + . The values u(χ ± ) in the formulas (3.9) and (3.10) are almost surely 0. Then we can use the invariance of W 0,0 ε,(χ − ,χ + ) and µ 0,0 ε,(χ − ,χ + ) under vertical reflection R to conclude that the whole right-hand side of (3.9) and (3.10) is invariant under vertical reflection on [χ − , χ + ].
In Section 4, we will use this observation to reduce the problem of calculating the probability of transition layers to computing the probability of wasted excursions (see Definition 2.4).

Large deviations.
Large deviation estimates for the measures µ u − ,u + ε,(x − ,x + ) constitute an important ingredient for our argument. Large deviation bounds for Gaussian measures with a small variance, e.g., for W [Bog98, Sec. 4.9]). They can be extended to the measures µ The estimates then state that for every closed set A ⊆ A bc and every γ > 0, there exists ε 0 > 0 such that, for ε ≤ ε 0 , we have Similarly, for every open set A ⊆ A bc and γ > 0 there exists ε 0 > 0 such that, for ε ≤ ε 0 , we have Here the energy difference ∆E(A) is defined as Here and in the sequel, all topological notions like open and closed refer to the uniform topology, i.e., the topology generated by (3.14) Although we will not make use of it here, we remark that the bounds (3.11) and (3.12) are also true for different choices of topology. The Gaussian large deviation bounds hold for any separable Banach space that supports the Gaussian measure, and the "exponential tilting" works as soon as the exponential density is continuous. A priori, the choice of ε 0 depends not only on γ but also on the interval length ℓ := x + − x − , the boundary data u ± , and even the set A itself. As pointed out in Subsection 1.3, however, our argument requires integrating probabilities for different boundary conditions. Therefore, we need to know that we can choose the same ε 0 for these different boundary conditions simultaneously. Moreover, in Lemma 5.1 we will need uniform estimates for measures with different potentials. Hence, we require uniform large deviation estimates, which is the content of the following two propositions. They deliver local uniformity with respect to ℓ, u ± , A, and even with respect to the potential function V . To state the result, it is convenient to introduce the notation for the minimal Gaussian energy with the given boundary conditions. We will also write for the δ neighborhood of a set A.

Proposition 3.4 (Large deviation upper bound). Fix constants
Then for any δ, γ > 0 there exists an ε 0 > 0 such that for all ε ≤ ε 0 we have where ∆E is defined in (3.13). This ε 0 depends on M, R, ℓ ± , δ, and γ but not on the particular choice of x ± , u ± . It only depends on the set A through the choice of R in condition (3.16). Furthermore, ε 0 depends on V only through the local Lipschitz norm sup In particular, the same bounds hold for the same ε 0 if V varies over a set of potentials with uniformly bounded local C 1 -norm. This uniformity of (3.17) with respect to V will be used in Subsection 6.6. There, it will be applied to the family We also get the corresponding lower bounds without a condition on the minimal energy of E(u) for u ∈ A.

Proposition 3.5 (Large deviation lower bound). Fix constants M and
Assume that there exists an energy minimizer Then, for any γ > 0 and δ > 0 small enough, there exists ε 0 > 0 such that for all ε ≤ ε 0 there holds where ∆E is defined in (3.13). As above, ε 0 depends on M, ℓ ± , δ, and γ, but not on the particular choice of x ± , u ± or the set A.
The same remark about the uniform dependence on V holds for the lower bounds.
Remark 3.6. The existence of energy minimizers is not necessary and it can be replaced by an approximation. Actually, we will show the Proposition under the slightly weaker assumption that for every The proofs of these Propositions are essentially a careful copy of the classical proofs and can be found in Subsection 6.3. Let us remark here that we do not expect the bounds (3.11) and (3.12) to hold uniformly for all open or closed sets. In fact, the argument for the classical statements makes use of qualitative properties such as existence of coverings by finitely many open sets. One sums over this finite number and uses the fact that, for ε small enough, only the largest summand matters. For different open or closed sets, this finite number will in general be different, and the choice of ε 0 would also be different . We can resolve this issue by taking the δ neighborhood of A in the bounds (3.17) and (3.18) as a uniform version of the topological assumptions on A.

PROOFS OF THEOREM 1.5: DOMINATION BY SINGLE TRANSITION LAYER OF MINIMAL ENERGY
In this section we prove Theorem 1.5. This theorem estimates the exponentially small probability of having more than one layer (with the correct entropic effect and exponential factor). Hence, the most likely functions are those with only one transition layer.
As outlined in Subsection 1.3, at the heart of the method is the idea of decomposing the invariant measure into conditional measures and the corresponding marginals, so that we can reduce to estimating the probability of transition layers on order-one subintervals. When the boundary data of the subinterval falls within a compact set [−M, M ], large deviation theory will allow us to estimate probabilities in the spirit of On the other hand, the probability that |u(±2ℓ)| ≥ M is uniformly small. Before turning to the proofs of the main theorems, we introduce this fact about the decay of the one-point distribution.
The proof of the lemma is given in Subsection 6.4. With this preliminary estimate in hand, we turn now to the proof of Theorem 1.5. We consider separately the upper and lower bounds.
Proof of Theorem 1.5. Fix γ > 0. Fix a corresponding δ > 0 sufficiently small. Let ℓ and M be large constants to be specified later. To begin with, let ℓ be large enough so that (2.4) and (2.6) hold for the given δ. We will divide the system into 2N ε intervals with labelling the endpoints: We will work with this grid for the rest of this paper. Then we consider the (overlapping) intervals Notice that x ±Nε is separated from x ±(Nε−1) by up to length 2ℓ, while the rest of the points are separated by length ℓ. Since our energetic estimates will all hold uniformly for subsystems that are sufficiently large, and our large deviation bounds will all hold uniformly for subsystems whose length vary within a compact set, it will not matter that the boundary points may be up to 2ℓ away from the neighboring points and we will ignore this issue for the rest of the proof.

Upper bound.
Here we will prove the upper bound, i.e., that As explained in Section 2, for the upper bound we will work with δ − transition layers, and it will be sufficient to show that for any sufficiently small γ > 0 and some sufficiently small δ > 0, there is an ε 0 > 0 such that for all ε ≤ ε 0 we have Since the probability of transition layers is less than the probability of δ − transition layers, the proof of the upper bound follows immediately. The subtle part of the proof will be estimating the probability of a transition layer on a subsystem. Recall from Subsection 1.3 that we cannot get the expected cost c 0 by estimating the probability µ u − ,u + ε,(−2ℓ,2ℓ) u has a δ − transition layer in (−ℓ, ℓ) because of the nontrivial dependence of this probability on the boundary conditions u ± . To avoid this problem, we will use reflection operators to transform δ − transition layers into wasted δ − excursions (see Definition 2.4 and the accompanying discussion).
With this scheme in mind, let us now begin our estimates.
Step 1. Fix γ > 0. Let δ > 0 be a small constant and M < ∞ be a large constant to be chosen below. Our first step will be to decompose the set of functions in which we are interested. Namely, we notice that the set of continuous paths u : [−L ε , L ε ] → R satisfying the boundary conditions u(±L ε ) = ±1 and exhibiting at least (2n+1) δ − transition layers is contained in the union of the following three sets: • The set of paths that exhibit an atypically large value at one of the x k : • The complementary set intersected with the set of paths that are bounded away from ±1 on all of [x k , x k+1 ] for some k: • The complement of A 1 intersected with the set of paths performing (2n + 1) δ − transitions, each of which is completely contained in (at least) one of the overlapping intervals I k . We denote this set Note that there might be more than one layer in a single interval; the 2n-tuple (k 1 , . . . , k 2n ) allows for a possible higher multiplicity. There may also be more than 2n layers; the statement is that there are at least 2n layers.
Above, we have made use of the boundary conditions. Indeed, for A 1 , we have omitted the points x ±Nε since u(±L ε ) = ±1. For A 2 we have omitted the boxes at the boundary since the boundary conditions make it impossible that u(x) ∈ [−1 + δ, 1 − δ] for all x in the box. For A 3 we have recalled that the boundary conditions force there to be at least one transition. Even though u has 2n+1 layers, we can expect an additional cost only for the 2n "extra" layers and hence only keep track of 2n layers.
Because the set of interest is contained within the above-mentioned sets, it suffices to bound Step 2. We first give a bound on the probability of A 1 . This bound follows directly from Lemma 4.1. In fact, we get (4.9) In particular, we can choose M large enough so that M/C 2 ≥ 2nc 0 and Hence, the probability of A 1 is of higher order with respect to the right-hand side of (4.5).
We remark that it is here where M (and therefore also ε 0 ) acquires a dependence on n.
Step 3. To bound the second probability in (4.8), we write (4.10) Using the Markov property (3.8), we can write for any k where ν k−1,k+2 denotes the marginal distribution of the pair (u(x k−1 ), u(x k+2 )).
We now want to invoke the large deviation bound (3.17) and the energy bound from Lemma 2.3 for the measures µ To this end, we observe that a δ/2 ball around functions contained in [−1 + δ, 1 − δ] consists of functions contained in [−1 + δ/2, 1 − δ/2]. Redefining C 1 by up to a factor of 8 to account for the parameter δ/2 and interval length (here ℓ rather than 2ℓ), we have that, for any γ > 0 and δ > 0, there exists an ε 0 > 0 such that, for all ε ≤ ε 0 and all (4.12) (Here we have used the fact that L ε ≫ 1, so that we can choose ℓ > ℓ * to satisfy Lemma 2.3.) Letting γ = 1 and choosing ℓ so that δ 2 ℓ ≥ C 1 , the combination of (4.10), (4.11), and (4.12) gives where we have trivially bounded the integral of ν by 1. In particular, for ℓ large but order-one (and depending on n, δ), we have that the probability of A 2 is also of higher order with respect to the right-hand side of (4.5).
Step 4. Finally, we arrive at the subtler part, in which we will need the reflection operators. To begin with, letk = (k 1 , . . . , k 2n ) and write where I is the set of nondecreasing 2n-tuples, i.e., and The right-hand side of (4.15) is slightly ambiguous if several indices coincide or in the case of overlapping intervals, i.e. if k i+1 = k i + 1 for some i. If j subsequent indices coincide, the right-hand side of (4.15) has to be interpreted as saying that there are at least j δ − transitions in the corresponding interval. In the case of overlapping intervals, for instance if k i+1 = k i + 1, the right-hand side of (4.15) should be interpreted to mean that there are at least two transitions in the interval The index set satisfies (4.16) (Recall our convention for the use of the symbol introduced in Notation 1.4.) Hence, to complete the proof of (4.5), it suffices to show that for fixedk ∈ I, we have (4.17) As explained above, the main step is to reduce the problem of estimating the probability of δ − layers to estimating the probability of wasted δ − excursions. This will be achieved through suitable reflections.
Let us at first assume that the I k are well-separated in the sense that Let us also assume that we are away from the boundary, i.e., that We will consider the possibilities of (a) intervals that overlap or are nearby, (b) intervals that are the same (k i = k i+1 ), and (c) boundary intervals at the end of Step 5. We start by defining n left stopping points χ 1 , . . . , χ n in the following manner. For i = 1, . . . , n we set (4.18) Here we set χ i = L ε if the corresponding set is empty. It is easy to see that these random points are all left stopping points. In a similar fashion, for i = n+1, . . . , 2n we set Here we set χ i = −L ε if the corresponding set is empty. Then χ i is a right stopping point for all i = n + 1, . . . 2n. For any u in Ak 3 , all the left and right stopping points χ i are contained in the corresponding intervals I k i and, furthermore, we have (4.20) Finally, note that as soon as For any left stopping point χ l ∈ {χ 1 , . . . χ n } and any right stopping point χ r ∈ {χ n+1 , . . . , χ 2n }, we now define the reflection operator R χr χ l . If χ l < χ r (which is the case for any u ∈ Ak 3 as remarked above), we set If χ l ≥ χ r we set R χr χ l u := u. We clearly have R χr χ l R χr χ l = Id; hence, R χr χ l is injective and onto. In order to show that R χr χ l preserves µ −1,1 ε,(−Lε,Lε) , we observe that for any measurable and bounded test function Φ : . Now we can use the fact that on the set {χ l < χ r } we have almost surely that u(χ l ) = u(χ r ) = 0 and the invariance of the measure µ 0,0 ε,(χ l ,χr) under the reflection R : u → −u. Note that the latter property relies on the symmetry of the doublewell potential V . We get Now we are finally ready to define the reflection operator as the composition We have again that R 2 = Id. For any profile u ∈ Ak 3 , the operator R acts in the following way: In intervals of the form (χ i , χ i+1 ) for i odd, u is replaced by −u, The reflection operator R turns the up and down transitions in the intervals I k i into wasted excursions in the same intervals.
and on the rest of the system, u is left invariant. The action of the operator R on a typical path in A 3 is illustrated in Figure 4.1. Finally, define the reflection of a set A as As a composition of measure-preserving transformations, the operator R preserves µ −1,1 ε,(−Lε,Lε) as well. Hence, we have in particular that This is useful because for u ∈ Ak 3 the profile Ru has a wasted δ − excursion on each interval I k i (as is easy to check). In other words, we note that RAk 3 is a (proper) subset of the functions with wasted δ − excursions in the given intervals.
Step 5. It remains to bound the probability of the sets RAk 3 . Again, let us at first assume that the I k are well-separated and away from the boundary in the sense described above. We consider the more general case at the end of this step.
Using the Markov property again, we have where ν k 1 −2,k 1 +2,k 2 −2,...,k 2n +2 denotes the distribution of the 4n-dimensional marginal u x k 1 −2 , u x k 1 +2 , u x k 2 −2 , . . . , u x k 2n +2 . Now we would like to apply the large deviation bound (3.17) and the energy bound from Lemma 2.5. We observe that a δ ball around paths with a wasted δ − excursion is equal to the set of paths with a wasted (2δ) − excursion. As a result, we get that for any γ > 0 and δ > 0 there exists an ε 0 > 0 such that for all ε ≤ ε 0 and for all boundary data contained in [−M, M ], the probability of a wasted δ − excursion is bounded by Choosing δ sufficiently small with respect to γ and estimating the integral of ν by 1 as usual, we have from the combination of (4.23), (4.24), and (4.25) that (4.17) holds (up to a redefinition of γ). Thus, finally, (4.14), (4.16), and (4.17) imply which concludes the proof of the upper bound in the well-separated case. It remains to consider the three special cases: (a) intervals that overlap or are nearby, (b) intervals that are the same (k i = k i+1 ), (c) intervals that are boundary intervals.

Case (a)
If two or more intervals overlap (i.e., if k i = k i−1 + 1) or are nearby (i.e., if k i−1 + 2 ≤ k i ≤ k i−1 + 3), then we lump them together into a single, larger interval and proceed as in (b), below. The size of the largest possible interval formed in this way is (4 + 3(2n − 1))ℓ. Our energy estimates require only that the interval length be sufficiently large and our large deviation estimates are uniform as long as the interval length falls within a compact set. (Here we rely on the fact that n is order-one with respect to ε.)

Case (b)
If a multi-indexk has repeated indices so that there is more than one δ − transition layer in a single interval, then we will use large deviation estimates for the event of having more than one wasted δ − excursion in a single interval.
Assume that we have k j = k j+1 = . . . k j+m for some j < 2n and some 1 ≤ m ≤ 2n. Furthermore, assume that the set of m + 1 indices is maximal in the sense that either j = 1 or k j−1 ≤ k j − 4 and similarly that either j + m = 2n or k j+m+1 ≥ k j+m +4. In this case, we define the m+1 stopping points χ j , . . . χ j+m in the following way.
Consider any index i ∈ {j, . . . , j + m} that satisfies i ≤ n. For i = j, we define χ j as in (4.18). On the other hand, for i > j, we define As usual, we define χ i = L ε if the set above is empty. Now consider any index i ∈ {j, . . . , j + m} that satisfies i > n. For i = j + m, we define χ j+m as in (4.19). On the other hand, for i < j + m, we define Again, we take the usual definition χ i = −L ε if the set above is empty.
As above these random points χ i are left stopping points for i ≤ n and right stopping points for i ≥ n + 1. Furthermore, we still have that (4.20) holds for all u ∈ Ak 3 . The measure preserving reflection operator R can be defined as above in (4.22), and R maps each u ∈ Ak 3 to a path that has m + 1 wasted δ − excursions in I k j . (Specifically, we mean m + 1 wasted δ − excursions on intervals . . , j + m} that are mutually disjoint except for possibly the endpoints.) We leave it to the reader to verify that a generalization of Lemma 2.5 is: Lemma 4.2. There exists C < ∞ with the following property. Fix δ > 0. For any system sizes ℓ 1 , ℓ 2 < ∞ sufficiently large and boundary conditions u ± ∈ R, set Define the optimal cost Then uniformly with respect to the boundary values u ± , one has Case (c) Suppose for instance that there is a transition layer in (x −Nε , x −Nε+1 ). Then we know the boundary value u(x −Nε ) = u(−L ε ) = −1, while the boundary value u(x Nε−2 ) at the other end of the subinterval is unknown. This is easily handled by a suitable "one-sided" generalization of Lemma 2.5, which is easy to prove.
Using the facts from above, the proof of the upper bound is completed by decomposing Ak 3 into the various cases and recovering the correct (and identical) bounds in each case.

Lower bound.
We turn now to the matching lower bound, i.e., that µ −1,1 ε,(−Lε,Lε) u has (2n + 1) transition layers As explained in Subsection 2, for the lower bound we will work with δ + transition layers (cf. Definition 2.6). Because of the boundary conditions and the definition of δ + layers, it will be sufficient to show that, for some δ ∈ (0, 1/2), we have (4.26) Indeed, in analogy with the upper bound, the probability of δ + layers is bounded above by the probability of transition layers, and because of the boundary conditions there must be an odd number of transitions.
Step 1. Once again, we will use the gridpoints x k defined in (4.3). Our first step is to get some control on the values of u at the gridpoints. The following lemma, used below, is established via techniques similar to those used for the upper bound. Lemma 4.3. For any M < ∞ sufficiently large, there exists ℓ * < ∞ and ε 0 > 0 such that, for ℓ ≥ ℓ * and ε ≤ ε 0 , we have for any L ε satisfying (1.10) that Recall the definition of A 1 in (4.6).
The proof is similar to the proof of the upper bound, and is deferred to Subsection 6.5. The main idea is that while the boundary conditions force there to be a transition layer, with high probability, there is only one transition layer. Moreover, by symmetry, this layer is as likely to appear on [0, L ε ] as it is on [−L ε , 0] (hence neither probability can be more than 1/2). On the other hand, for u to hit zero away from the transition layer is energetically unlikely, by arguments similar to those used for the upper bound.
Step 2. With Lemma 4.3 in hand, we turn to the basic set-up for the lower bound. In this case, we will not want to use overlapping subintervals. We will also not work with the full system, but only with intervals on the left-hand side. Specifically, we will work with We have assumed without loss of generality that 4 divides N ε . (If not, then N ε = 4j + r for some j ∈ N and r ∈ {1, 2, 3}. Replace N ε by N ε − r throughout.) We remark that, as usual, for an event falling in the interval I k , we will condition on the boundary values on a larger interval. Specifically, we will use a Markov decomposition in which we condition on the boundary values of the enlarged interval Notice that for all k ∈ E, the enlarged intervalsĨ k are nonintersecting. For future reference, let us denote the set of boundary indices The rough idea is to consider sets of functions having 2n layers with a layer in one of the intervals I k for 2n distinct values of k ∈ E. Unfortunately, because we work with functions u that have at least 2n + 1 transitions rather than exactly 2n + 1 transitions, a given function u may have more than 2n + 1 layers and belong to more than one of the sets we have just described. Hence we cannot translate the probability of the union into the sum of the probabilities. In order to work around this, we will work with more restrictive sets.
Analogous to the set A 1 defined in (4.6) above, we define the following set. Rather than keeping track of all the boundary values, it will be convenient to track only the boundary values for the extended intervals described above. That is, we considerÃ We now introduce a set that is analogous to the set Ak 3 above (but more restrictive, for the reason we have explained). For ease of notation, we do not introduce a new label. Letk = (k 1 , . . . , k 2n ) and consider the set Ak 3 :=∁Ã 1 ∩ in each I k i with i odd there exists a δ + up layer and in each I k i with i even there exists a δ + down layer and Clearly, we have the following inclusion of sets of paths: (4.28) where I is the following set of well-separated indices on the negative x-axis: Moreover, the sets Ak 3 fork ∈ I are disjoint, so that (4.28) implies The set on the right-hand side of (4.28) is certainly smaller than the set on the left-hand side, but the bound will be good enough on the level of scaling since (4.30) Step 3. Given (4.29) and (4.30), we will be done if we can establish that for any γ > 0 and for ε > 0 sufficiently small, we have (4.31) To this end, fix any multi-indexk ∈ I. We will now bound the probability of Ak 3 using reflections, as we did for the upper bound. Indeed, let Then we define the reflection operator R as By the same argument as above in (4.21) it can be seen that this operator preserves the measure µ −1,1 ε,(−Lε,Lε) . Notice that R creates δ + wasted excursions in the intervals I k i and cannot create layers in any interval (4.32) As usual, ν denotes the distribution of boundary values, here at the boundary points of each extended intervalĨ k for k ∈ E. Note that the second equality follows from the definition of wasted δ + excursions. The definition of wasted δ − excursions is different and led to an inequality in the analogous estimate, cf. (4.24). We remark that we do not actually need to condition on the boundary values for every k ∈ E-it would be enough to consider the intervalsĨ k for k ∈k and the complementary intervals-but doing it this way keeps notation simple and because of (1.10), it does not affect our bound by more than an exponentially small amount.
We now turn to the lower large deviation bound (3.18) and the energy bound from Lemma 2.8 (where we use that the boundary values are in [−M, 0]). We recall that the set A bc δ,pre from (2.7) was defined precisely so that B(A bc δ,pre , δ) = {u : u has a wasted δ + excursion in [−ℓ, ℓ]}. Therefore, applying the large deviation estimate to (4.32), we conclude that for any γ > 0 and δ > 0 small enough, there exists an ε 0 > 0 such that for anyk ∈ I and any ε ≤ ε 0 , we have where At the same time, for anyk ∈ I we have where A 1 includes all the gridpoints, as defined in (4.6). Hence by the estimate (4.27) from Lemma 4.3, the lower bound (4.33) improves to which establishes (4.31) and completes the proof of the lower bound.
5. PROOF OF THEOREM 1.9: THE UNIFORM DISTRIBUTION OF THE LAYER LOCATION As pointed out in Subsection 1.3, the proof of Theorem 1.9 relies on the construction of a measure-preserving operator R y,z . This operator maps paths that exhibit a transition near y to paths that exhibit a transition near z. It is constructed by performing a point reflection between hitting points of ±1 near y and z.
The main difficulty of the proof is to show that these hitting points exist with very high probability on the set of paths that perform a transition near y. The argument for this is provided in the following two lemmas.
The first lemma states, roughly speaking, that in the "bulk," fluctuations around ±1 are of order ε 1/2 . The system needs O(| log ε|) space to relax to this scale. For simplicity, we state the lemma for paths that stay close to 1. By symmetry, the analogous statement holds near −1.
Lemma 5.1. There exists C < ∞ with the following property. For every ℓ 0 < ∞ sufficiently large, there exists ε ′ 0 > 0 such that the following holds. For every ε and such that for ℓ ε := (2K ε + 1)ℓ 0 and all u ± ∈ [1/2, 3/2], we have We present the proof in Subsection 6.6. Next we need a lemma that says that with positive probability, the path actually hits ±1. Again, we state the result for hitting points of +1. By symmetry, the analogous statement holds for hitting points of −1.
Proof of Theorem 1.9. We will show that for some δ ∈ (0, 1/2) and any α > 0, we have for ε sufficiently small that At the end of the proof, it will not be hard to improve from a δ − up layer of length less than or equal to 2ℓ to a full up transition layer.
Notation 5.3. For brevity, we will often say "a transition layer ≤ 2ℓ" as shorthand for "a transition layer of length less than or equal to 2ℓ." For ε small enough we consider intervals of type The main step of our argument consists of proving that the probabilities of transitions in these intervals J y,ε for different values of y are roughly the same. Hence fix two points y, z such that J y,ε , J z,ε ⊆ [−L ε , L ε ]. Without loss of generality, assume that y ≤ z.
As above in the proof of Theorem 1.5, let ℓ and M be a large constants to be fixed later, and let N ε and x ±k be as defined in (4.2) and (4.3). Moreover, consider the overlapping intervals I k = [x k−1 , x k+1 ] as in (4.4). Finally, define as in (4.6) the "bad set" A 1 of functions that have boundary values larger than M in magnitude. In (4.9) above, we have already established that there is a universal constant C 2 < ∞ such that Hence, for the system sizes L ε that we consider, the probability of A 1 can be made arbitrarily small by choosing M large. We now define the set of functions J y,ε := u ∈ ∁A 1 : u has a δ − up layer ≤ 2ℓ in J y,ε . (5. 2) The set J z,ε is defined analogously. In Steps 1-3 below, we will establish that the probabilities of the J y,ε and J z,ε are roughly the same. The bounds that we obtain will be uniform with respect to y and z. Finally, in Step 4 we will show how this implies (5.1), and in Step 5 we will improve to the statement of Theorem 1.9.
Step 1. The first step consists of proving that on the set J y,ε , with high probability, the profile u is close to −1 on a sufficiently large interval J ε y,− just to the left of J y,ε and close to +1 on a sufficiently large interval J ε z,+ just to the right of J z,ε . The length h ε of each of these auxiliary intervals J ε y,− and J ε z,+ will be chosen below such that x k ε y,− FIGURE 5.1. The interval J y,ε and the auxiliary interval J ε y,− to its left.
(See Figure 5.1 for an illustration of J y,ε and J ε y,− .) We also define the following sets of indices For later use in (6.81) in the proof of Lemma 5.5, we will make the additional growth assumption where c 1 > 0 is defined in (6.86), below. This is not a strong condition; we will typically think of h ε as being much smaller. Finally, we define another set of "unlikely" paths, paths that have extra δ − layers to the left of J y,ε or to the right of J z,ε : A − y,3 := u ∈ J y,ε : there exists x ≤ (k ε y,+ + 1) ℓ with u(x) ≥ 1 − δ , A + y,3 := u ∈ J y,ε : there exists x ≥ (k ε z,− − 1) ℓ with u(x) ≤ −1 + δ , A y,3 := A − y,3 ∪ A + y,3 . (5.5) We now introduce two lemmas. The proofs of both lemmas are given in Subsection 6.6. The first lemma is an extension of the upper bound in Theorem 1.5 and states roughly speaking that conditioned on having a transition in a given interval, the probability of extra layers somewhere else is small. Lemma 5.4. Let Y be a subinterval of [−L ε , L ε ] and let x − = k − ℓ and x + = k + ℓ be two gridpoints (cf. (4.3)) to the left and to the right of Y respectively with distance ≥ ℓ from Y . We denote by J Y and A Y,3 the sets Fix any γ > 0 and any M < ∞ sufficiently large. For any δ > 0 sufficiently small and ℓ < ∞ sufficiently large, there exists ε 0 > 0 such that, for all ε ≤ ε 0 , we have We now apply Lemma 5.4 for Y = J ε,y and for x − = (k ε y,+ + 1)ℓ and x + = (k ε z,− − 1)ℓ. Because of the boundary conditions, the absence of layers to the left of x − and the right of x + implies in particular that u ≤ 1 − δ to the left of x − and that u ≥ −1 + δ to the right of x + . Hence we deduce that The second lemma establishes that, on the other hand, when there are no extra layers, there is only a small probability of making an excursion from −1 at some gridpoint in J ε y,− (respectively, an excursion from 1 at some gridpoint in J ε z,+ ). The result from the second lemma is exactly the necessary ingredient that we need in Step 2 in order to invoke Lemma 5.2.
Step 2. The second step of the proof consists of showing that paths in the set J y,ε have hitting points of −1 in J ε y,− and hitting points of +1 in J ε z,+ with large probability. This is captured by the following lemma, which is also proved in Subsection 6.6. Lemma 5.6. There exists C < ∞ with the following property. Fix any γ > 0 and any M < ∞ sufficiently large. For any δ > 0 sufficiently small and ℓ < ∞ The reflection operator R y,z performs a point reflection of the path between the left and right hitting points χ ± . In this way the δ − transition in J y,ε is mapped into J e z,ε .
sufficiently large, there exists ε 0 > 0 and λ > 0 such that, for all ε ≤ ε 0 , we have where the error term satisfies and c 1 is defined in (6.86), below.
Step 3. Now we are ready to define the reflection operator R y,z . First, we define the following left and right stopping points χ − := inf x ∈ J ε y,− : u(x) = −1 , χ + := sup x ∈ J ε z,+ : u(x) = 1 . Here we use the convention that χ − = L ε if there is no hitting point of −1 in J ε y,− and similarly χ + = −L ε if there is no hitting point of 1 in J ε z,+ . We use these hitting points to define the reflection operator if χ − ≤ χ + . We set R y,z to be the identity otherwise. In other words the operator R y,z performs a point reflection of the graphs of u between the left and right stopping points χ ± . As in Step 4 of the proof of the upper bound in Theorem 1.5, one argues that the strong Markov property (3.10) implies that R y,z leaves the measure µ −1,1 ε,(−Lε,Lε) invariant. The action of the reflection operator is illustrated in Figure  5.2.
Assume that u ∈ J y,ε is a path that admits a hitting point of −1 in J ε y,− and a hitting point of +1 in J ε z,+ . Recall that if u ∈ J y,ε , then u has a δ − up transition layer of length ≤ 2ℓ in J y,ε . Under R y,z the δ − up transition layer is mapped from J y,ε to near J z,ε and we would like to conclude that the reflected path is contained within J z,ε .
Unfortunately, the layer does not necessarily fall within J z,ε . What is true is that there is a δ − up layer of length less than 2ℓ in the extended interval (5.14) (Recall that h ε , the length of the auxiliary intervals, was defined above in (5.3).) Let us denote by J e z,ε the set of functions with a δ − up transition layer of length less than 2ℓ in J e z,ε : J e y,ε := u : u has a δ − up layer ≤ 2ℓ in J e y,ε . In Step 2, we had established that Hence, as R y,z leaves µ −1,1 ε,(−Lε,Lε) invariant, we can conclude that An analogous construction to turn transitions in J z,ε into transitions near J y,ε can be performed to obtain the same bound with J y,ε and J z,ε interchanged.
Step 4. In this step, we establish the bound (5.1). For notational convenience we will establish the bound in the case of the center interval [−d ε , d ε ], but our argument does not depend on this. More precisely, what we show is that for some δ > 0 and any α > 0, there exists an ε 0 > 0 such that, for ε ≤ ε 0 , we have L ε d ε µ −1,1 ε,(−Lε,Lε) u ∈ ∁A 1 : at least one δ − up layer The main ingredient will be the estimate (5.15). We will also use make use of Lemma 5.4 but except for that, the argument is completely elementary and only consists of choosing the right intervals and sets of paths.
We first split up the system into smaller blocks. Actually, it will useful to define two different partitions {J k,ε , k = −M ε , . . . , M ε − 1} and {J e m,ε , m = −M ε , . . . ,M ε − 1} of [−L ε , L ε ]. The lengths of the intervals J k,ε will be chosen small relative to d ε but still large relative to | log ε|. These intervals will be overlapping and play the role of J y,ε when we apply (5.15). The intervals J e m,ε will be slightly larger than than the intervals J k,ε and will be of distance 2ℓ away from each other. They will be used as J e z,ε when applying (5.15). We fix integers M ε and k ε such that Then we setd ε := L ε /M ε and define the overlapping intervals The boundary intervals are defined as As above in (5.2), we then define the associated sets of paths as J k,ε := u ∈ ∁A 1 : u has δ − up layer of length ≤ 2ℓ in J k,ε . (5.18) In order to define the slightly longer intervals, in analogy to the parameters h ε andK from Steps 1-3, we choose parametersh ε andK ε such that These parameters then define the error term E(ε) (see (5.12), above). Then we define the integersM ε and m ε such that (5.20) As above, we define the intervals and in particular these intervals are long enough to use them as J e z,ε in (5.15). Actually, when comparing (5.21) to (5.14), one notices a discrepancy in the length of 10ℓ but this can easily be treated by makingh ε a bit larger.
We define the associated sets of paths J e m,ε := u : u has a δ − up layer ≤ 2ℓ in J e m,ε , J e, * m,ε := u ∈ J e m,ε : u has no δ − up layer in any J e n,ε for any n = m . After these preliminary definitions, we are now ready to proceed to the proof of (5.16).
As mentioned above, the intervals J k,ε are overlapping. In particular, every δ − layer ≤ 2ℓ in [−d ε , d ε ] must be contained in at least one of the J k,ε . This implies that In the same way we see that every possible path on [−L ε , L ε ] must be either • in one of the unlikely sets A 1 or A 2 defined above in (4.6) and (4.7) • or in at least one of the sets J k,ε .

This implies that
and hence we have On the other hand, applying (5.15) gives that for any k and m We now collect ingredients to deduce the upper bound ≤ (1 − E(ε)) −2 (k ε + 1) M ε . (5.28) The proof of the lower bound now follows along similar lines: Now, from the assumptions (5.17) on k ε and M ε as well as the assumptions (5.20) on m ε andM ε , we have that Moreover, if we choose for instance M ≥ 4 C 2 c 0 in the bound (4.9) on A 1 , we recover (5.31) Combining (5.28), (5.29), (4.13), (5.30), and (5.31) establishes (5.16), as desired.
Step 5. It remains to remove the restriction on the length of the layer and improve from a δ − up layer to a full up layer.
The upper bound is immediate, since µ −1,1 ε,(−Lε,Lε) u ∈ ∁A 1 and there exists an up layer in for ℓ large enough with respect to 1/δ 2 . For the lower bound, on the other hand, we use Step 2 once more. To this end, we will consider layers falling strictly interior to J y,ε on the subset J ⊙ y,ε := [y − d ε + h ε + 3ℓ, y + d ε − h ε − 3ℓ]. Then, according to Step 2, there is a high probability of hitting ±1 on J y,ε \ J ⊙ y,ε . More precisely, notice that we can estimate µ −1,1 ε,(−Lε,Lε) u ∈ ∁A 1 and there exists an up layer in [y − d ε , y + d ε ] ≥ µ −1,1 ε,(−Lε,Lε) u ∈ ∁A 1 and there exists a δ − up layer ≤ 2ℓ in J ⊙ y,ε where in the last line, we have applied Lemma 5.6. On the other hand the probability on the last line can be estimated Applying the bound (5.1) to each term in (5.33) and substituting into (5.32) completes the lower bound. Finally, recalling the bound (4.9) on the probability of A 1 completes the proof of Theorem 1.9.
6. PROOFS OF THE LEMMAS 6.1. Proofs of preliminary energy lemmas. The energy lemmas rely on upper bounds and lower bounds for the energy over various sets. The upper bounds are derived based on constructions. (The minimum value of the energy is necessarily less than or equal to the value that we can achieve with any given construction.) The lower bound, on the other hand, describes the best possible value for any function and is based on the so-called Modica-Mortola trick discussed in Section 2. Before we begin, we make a remark about our constructions.
Remark 6.1. In addition to giving us an ODE for the energy minimizer on R, equation (2.1) serves as the backbone for the constructions that are used to establish upper bounds for energy minimization problems on finite systems. For instance, suppose we want to minimize the energy on (−ℓ, ℓ) subject to u(±ℓ) = ±1. For ℓ large, we can build a construction that almost achieves the cost c 0 . Specifically, consider the centered solution of (2.2) on (−ℓ + a, ℓ − a) for a = 1/ℓ. Linearly interpolate from its value at −ℓ+a to −1 at −ℓ, and symmetrically at the other end. Because of the exponential convergence of the minimizer to ±1 (cf., Lemma 2.1), the energy on (−ℓ, −ℓ + a) and (ℓ − a, ℓ) is o(1) as ℓ ↑ ∞. Similarly, if we minimize the energy over functions satisfying u(±ℓ) = ±M for M large, we can build a piecewise-defined construction that goes from −M at −ℓ to a neighborhood of −1 at −ℓ/2, goes from a neighborhood of −1 at −ℓ/2 + a to a neighborhood of 1 at ℓ/2 − a, and goes from a neighborhood of 1 at ℓ/2 to M at ℓ, with linear interpolation near ±ℓ/2 to make the function continuous. The cost of such a construction is where we write the integrals separately to emphasize the additivity of the energy over the three subintervals described above. Because according to (2.1) we can get a good bound using increasing or decreasing functions, the analogous bounds hold for u(±ℓ) = ∓M , u(±ℓ) = M , et cetera.
If M is very large, the constant ℓ * in the energy lemmas may also need to be very large in order to make the o(1) term small. The idea in all of the following proofs is to make this term small enough so that it can be absorbed into a δ-dependent term, so the ordering of the constants is important: We fix M (large) and δ (small) and then choose ℓ * large enough so that the term(s) that are o(1) with respect to ℓ can be absorbed.
In what follows, it will be convenient to introduce the notation: Proof of Lemma 2.3. We will establish (2.4) via an upper bound on the energy over A bc and a lower bound on the energy over A bc 0 . Because of the extra condition in A bc 0 , the energy on (−ℓ, ℓ) is large (of order δ 2 ℓ), and we do not have to be as careful about the boundary conditions as usual. A rough bound will suffice.
Step 1. As explained in Remark 6.1, the upper bound relies on a construction. Given any u − ∈ [−M, M ], we can use the solution of (2.1) to connect to a neighborhood of 1 or −1, and similarly for u + . If the optimal connection for u − is to −1 and the optimal connection for u + is to +1, then in order to build a continuous construction, we incur the additional cost ϕ −1 (1) = c 0 , where we have used the notation introduced above and recalled the value of c 0 from (1.9). (If the optimal connection for u − and u + is to the same value, then the construction does not incur this extra cost, but the upper bound is still valid.) Putting together these three pieces of the construction and the small correction terms for continuity (see Remark 6.1), we can express the upper bound derived in this way as: (6.1) Note that Assumption 1.1 allows that o(1) ℓ↑∞ may depend on M : If u − is very large, the (near) optimal connection from u − to 1 requires a lot of space. This explains why ℓ * in the statement of the lemma depends on M .
Step 2. Now we turn to the lower bound over A bc 0 . On the one hand, on (−ℓ, ℓ), the condition in A bc 0 implies that the integral of V over (−ℓ, ℓ) cannot be too small. Using the quadratic behavior of V near ±1 (see Assumption 1.1), we have for δ small enough To integrate over the rest of the interval, we recall the trick of Modica and Mortola that was explained in Section 2. Consider first (−2ℓ, −ℓ). We divide into two cases: |u − | > 1 and the complement.
If u ∈ A bc 0 and |u − | > 1, then there is a point x − ∈ (−2ℓ, −ℓ) such that |u(x − )| = 1. In this case, the Modica-Mortola trick on (−2ℓ, x − ) gives On the other hand if |u − | ≤ 1, then for ℓ large enough, we have If |u − | > 1, then adding the contributions from (6.2) and (6.3) and subtracting the contribution from (6.1) gives On the other hand if |u − | ≤ 1, then the contributions from (6.2) and (6.1) together with the bound from (6.4) imply Since this is a weaker bound, it holds in either case.
Proof of Lemma 2.5. We rewrite the set A bc 0 as where the A ± are the sets of paths that perform a wasted excursion starting from a neighborhood of ±1. We will prove the bound on the energy difference for A + . The corresponding bound for A − follows in the same way. As usual, our task is to produce appropriate upper and lower bounds.
Step 1. The upper bound on inf A bc E (−2ℓ,2ℓ) (u) is by construction. Consider the functionū that minimizes E (−2ℓ,2ℓ) subject to u(±2ℓ) = u ± , u(0) = 1, and notice that Step 2. We now turn to the lower bound on inf A + E (−2ℓ,2ℓ) (u). Recall the points x ± that follow from the definition of A + and Definition 2.4. Because of the properties of the potential, we may without loss of generality assume that u(x ± ) = 1 − δ and u(x 0 ) = δ.
Proof of Lemma 2.8.
As in the proof of Lemma 2.5, we observe thatū ∈ A bc δ,pre and hence the construction gives an upper bound Step 2. For the lower bound over A bc , we observe that for any u ∈ A bc , either there is a point x − ∈ (−2ℓ, 0) and a point x + ∈ (0, 2ℓ) such that u(x ± ) is in a δ neighborhood of 1 or −1, or else the energy (by the same argument as in the proof of Lemma 2.3) is bounded below by δ 2 ℓV ′′ (1)/2 for δ small enough. We can choose ℓ so large that this is greater than ϕ −1 (u − )+ϕ −1 (u + ) and hence dominates the boundary terms in (6.8). On the other hand, if the points x ± exist, then by the usual trick of Modica and Mortola, we recover where the second line follows by virtue of the boundary conditions u ± ∈ [−M, 0] and the symmetry of the potential. The combination of (6.8) and (6.9) completes the proof of Lemma 2.8.

Proof of the strong Markov property.
Proof of Lemma 3.1. By subtracting h u − ,u + (x − ,x + ) , we can reduce the problem to the case of zero boundary conditions. Under W 0,0 ε,(x − ,x + ) , u − ux + x − and ux + x − are jointly Gaussian and centered, because they are both linear images of u. So it is sufficient to calculate their covariances. Using (3.1), it is easy to see that, for all This shows the claim.
Proof of Lemma 3.2. We start by observing that the statement of Lemma 3.1 implies that In order to prove the desired statement (3.7), observe that the density of µ can be written as This finishes the proof of Lemma 3.2.
We are now ready to give a proof of the strong Markov property.

Proof of Lemma 3.3:
We treat only the Gaussian case (3.9). Equation (3.10) then follows as in the proof of Lemma 3.2. We start by proving (3.9) in the case in which χ − and χ + are left and right stopping points that attain values in a finite set χ 1 , . . . , χ N . Then we can write In the second equality, we have used the fact that the χ ± are left and right stopping points.
In order to see the general case, we approximate the stopping points by Then χ N − and χ N + are stopping points taking values in a finite set and, in particular, (3.9) holds for them. We have Now, in order to conclude that (3.9) also holds for χ ± , we first observe that for any continuous, bounded Φ : C([x − , x + ]) → R, we have for every path u that In the first step, we have used that, due to the continuity of u, the measures W u ε,(χ N − ,χ N + ) converge weakly to W u ε,(χ − ,χ + ) , as can easily be confirmed. In order to see the last line, it suffices to check that the limit in the third line does indeed satisfy the characteristic properties of a conditional expectation.
This equality can then be extended to arbitrary test functions Φ with a standard monotone class argument (see e.g. [RY99, Ch. 0, Thm 2.2]).
see (1.5). Consequently, the results will follow as soon as we establish upper and lower bounds on these expectations. Throughout this subsection, A will always denote a set of continuous paths u on [x − , x + ] that satisfy the boundary conditions u(x ± ) = u ± , and topological notions like open or closed will always refer to the topology of uniform convergence. We will frequently use I x − ,x + (u), the Gaussian energy of a path (defined in (3.2)), and I u ± x ± , the minimal Gaussian energy given the boundary conditions (defined in (3.15)).
The upper bound for the Gaussian expectation can then be stated as follows.
As usual in large deviation theory, the derivation of lower bounds for integrals is reduced to the case of a ball B(u * , δ) := u : u − u * ∞ ≤ δ around a suitably chosen profile u * .
Now we give the proofs of Lemmas 6.2 and 6.3. The proofs of Propositions 3.4 and 3.5 are given afterwards.
In order to prove the upper bound, we will invoke the known upper bound for Gaussian large deviations. In the current context, this can be stated as follows.
Proposition 6.4 (Gaussian large deviation, see e.g. [Bog98,Cor. 4.9.3 ]). For every closed set A and for any γ > 0, there exists an ε 0 > 0 such that for every ε ≤ ε 0 we have The argument for Lemma 6.2 is an adaptation of the proof of [dH00, p. 34].
Proof of Lemma 6.2.
Step 1. We start by reducing the general problem to the case of homogeneous boundary conditions on [0, 1]. To this end, we introduce the following affine transformation. We define the transformation T : u →û, where for a given path u : It is clear that T is a bijection between the set of continuous paths u on [x − , x + ] with boundary conditions u(x ± ) = u ± and C([0, 1]), the space of continuous paths on [0, 1] with homogeneous boundary conditions. Furthermore, if u is distributed according to W u − ,u + ε,(x − ,x + ) , thenû is distributed according to W 0,0 ℓε,(0,1) . Note that the variance changes due to the rescaling by ℓ.
The expectation that we want to bound can be expressed in terms ofû as whereÂ := T u : u ∈ A . On the other hand, the condition (6.12) and the righthand side of the desired bound (6.13) can also be expressed in terms ofû, as we will now do. We have for every u that where, for convenience, we have introduced the notation dx.
(Note that we have not included I u ± x ± in the definition of the rescaled energy E u ± ℓ , because this way E u ± ℓ will appear as the natural rate functional.) Condition (6.12) can now be expressed as and for the right-hand side of (6.13), we get RelabellingÂ as A andû as u, we conclude that it suffices to show that for This bound will be established in Steps 2-4.
Step 2.The strategy to prove (6.21) consists of decomposing C([0, 1]) into a set of paths with high Gaussian energy and a finite number of small balls with lower Gaussian energy. One can use the Gaussian large deviation bound (6.16) to bound the probability of the set of high Gaussian energy, which we will make to be a term of higher exponential order by choosing the Gaussian energy high enough. Then for the balls with lower Gaussian energy, the expectation over a given ball can be estimated by bounding an exponential factor by its supremum on that ball, and then bounding the Gaussian probability of the set using (6.16) again. Finally, one has to sum over all the balls. As the total number of balls is finite and the bounds decay exponentially, the largest of the summands determines the behavior.
The main difference with respect to the classical argument in [dH00] is that we choose a partition of C([0, 1]) into sets that do not depend on A. This is necessary to ensure that the number of balls is independent of A. The price we have to pay is that on the right-hand side of (6.21) we take the infimum over the small neighborhood B(A, δ) of A instead of taking it over A only, as in the classical argument.
Let us now give the details: First, fix a γ < 1 and let The sublevel set K ℓ + R := u : I 0,1 ≤ ℓ + R is compact in C([0, 1]), and we can cover it by a finite number Nδ ,ℓ + R of open balls B(u k ,δ) of radiusδ, where u k ∈ K ℓ + R for each k. Note that A does not enter here, so both the profiles u k and the number Nδ ,ℓ + R depend only on γ, δ, ℓ + R, and sup |v|≤ √ 2 −1 (ℓ + R+1)+M +1 |V ′ (v)|, not on the set A or the specific choice of x ± , u ± . Actually, it can be checked using the Hölder continuity of functions with bounded H 1 -norm that this number grows like exp C Rℓ +δ −1 2 . Using this covering and the positivity of V , we have for any set A that Step 3. The last term in (6.23) can now easily be bounded: The set B := ∁ ∪ k B(u k ,δ) is closed and by definition inf u∈B I 0,1 (u) ≥ ℓ + R. Hence, the Gaussian large deviation bound (6.16) implies that there exists anε 0 > 0 such that, for ε ≤ε 0 , we have Now we choose ε 0 =ε 0 ℓ −1 + . Then, for ε ≤ ε 0 , we can conclude that Step 4. It remains to bound the sum on the right-hand side of (6.23). Since the number of summands Nδ ,ℓ + R remains constant as ε ↓ 0, the sum is dominated by the largest summand. Specifically, after fixing γ, δ, ℓ + R and M , we can choose ε 0 > 0 sufficiently small so that ε ≤ ε 0 implies Hence, up to an extra factor of γ, it is sufficient to obtain a good exponential bound on the largest summand on the right-hand side of (6.23).
If B(u k ,δ) ∩ A is empty, the largest summand is zero. Otherwise, we have Due to the lower semi-continuity of I 0,1 , we can chooseũ k ∈ B(u k ,δ) so that Then the first factor in (6.27) can be bounded above by and we need a bound on ||ũ k || ∞ . First we recall that u k ∈ K ℓ + R , which by definition gives I 0,1 (u k ) ≤ ℓ + R. Together with the definition ofũ k , this gives Recalling the homogeneous boundary conditions, this implies that Hence, the definition (6.22) ofδ implies that the bound in (6.29) improves to On the other hand, the Gaussian large deviation bound (6.16) and the definition (6.28) ofũ k imply that for every k there exists an ε 0 > 0 such that for ℓ + ε ≤ ε 0 we have W 0,0 ℓε,(0,1) B(u k ,δ) ≤ exp − 1 ℓε I 0,1 ũ k − 2γ . (6.31) As there are only finitely many u k (the selection of which does not depend on A), we can find an ε 0 such that this bound holds for allũ k simultaneously and such that (6.25) holds as well. Substituting (6.30) and (6.31) into (6.27) gives for each k that After relabelling γ (for instance by a factor of 6), the above bound together with (6.23), (6.25), and (6.26) finishes the proof of (6.21).
The proof of the lower bound (6.15) relies on the classical Cameron-Martin Theorem. In the current context it can be stated as follows.
In that case the Radon-Nykodym derivative is given by Here, as in the case of Brownian motion, the stochastic integral term 1 ε x + x − ∂ x f (x) du(x) can be defined as the limit of Riemann sums in L 2 W 0,0 ε,(x − ,x + ) . In particular, it is a linear mapping in u defined for all u in a measurable subspace of C([x − , x + ]) of full measure (See e.g. [Hai09,Sec. 3]).
Note that (6.32) can formally be derived by expanding the square in the nonrigorous expression (3.3).
Proof of Lemma 6.3 . We can assume that u * ∈ H 1 , because otherwise the bound is trivial. As in the proof of the upper bound, (6.15) only gets stronger when we take a smaller δ. Therefore, it is sufficient to show (6.15) with δ replaced bỹ We begin by stating the simplistic bound Due to the assumption (6.14) on u * and the definition (6.33) ofδ, we get that sup u∈B(u * ,δ) It only remains to derive a lower bound on W u − ,u + ε,(x − ,x + ) B(u * ,δ) in terms of the Gaussian energy. To this end, we again transform u * to an interval of length one and shift it in a way that it satisfies homogenous boundary conditions, as in the proof of Lemma 6.2. To be more precise, we assume that u is distributed according to W u − ,u + ε,(x − ,x + ) and apply the affine transformation T defined in (6.17). Then T u = u is distributed according to W 0,0 ℓε,(0,1) . Therefore, we have to bound the probability W 0,0 ℓε,(0,1) B(û * ,δ) , whereû * := T u * . This can be obtained using the Cameron-Martin Theorem 6.5 with f :=û * . According to (6.32), we have Now we will use the trick of sneaking in a cosh function. To this end, we remark that the mapû → 1 0 ∂ xû * (x) dû(x) is linear inû. Also, the measure W 0,0 ℓε,(0,1) is invariant under the mappingû → −û and this mapping leaves the ball B(0,δ) invariant. Hence, the last expectation is equal to Therefore, we can write ≥ W 0,0 ℓε,(0,1) B(0,δ) .
We claim that there exists an ε 0 > 0 such that for all ℓ ≤ ℓ + and all ε ≤ ε 0 this probability is larger than exp − ε −1 γ . Actually, (6.16) even implies that for anỹ γ > 0 there existsε 0 > 0 such that, for ℓε ≤ε 0 , we have the stronger bound Note that this ε 0 also depends on sup |v|≤M +1 |V ′ (v)| as we have potentially decreased δ in the first step. Then in order to conclude, it is sufficient to observe that Now the proofs of Propositions 3.4 and 3.5 are straightforward. We begin with the upper bound, Proposition 3.4.

Proof of Proposition 3.4. We want to derive a bound on
(6.35) The assumptions on A in Proposition 3.4 are identical to those in Lemma 6.2, so we can conclude from (6.13) that for ε ≤ ε 0 . Also this ε 0 depends on M, R, , ℓ + , δ, and γ but not on the particular choice of x ± , u ± . It only depends on A through the condition (6.12) and on V through the local Lipschitz constant.
To get a lower bound on the denominator in (6.35), we observe that for every set of boundary conditions u ± , there exists at least one minimizer u * of E given these boundary conditions. Furthermore, this minimizer attains only values in [−M, M ]. This is clear because replacing u * by u * ∧ M ∨ (−M ) only decreases the energy. Therefore, for any δ > 0, we get from (6.15) that for ε ≤ ε 0 , where ε 0 satisfies the same uniformity assumptions as above. This finishes the argument.
The proof of the lower bound is similar.
Proof of Proposition 3.5. To derive a lower bound on µ u − ,u + ε,(x − ,x + ) A for a given γ we choose u γ as in (3.19). Then we can write using (6.15) for ε ≤ ε 0 where ε 0 can again be chosen uniformly.
To derive a uniform upper bound on the normalization constant we only need to observe that for any M < ∞ there exists an R < ∞ such that for all Then (6.13) implies that there exists ε 0 > 0 such that uniformly for ε ≤ ε 0 This establishes (3.18).

Proof of the one-point distribution lemma.
Proof of Lemma 4.1. First we remark that, heuristically, the "most difficult" point to consider is x 0 = 0. We present the following proof for precisely this case. The same proof carries over for any point x 0 (with only trivial modifications), but we present it for x 0 = 0 since it simplifies the notation slightly and makes the main ideas stand out. Also notice that by the symmetry of the potential (cf. Assumption 1.1) and the representation (1.5), it suffices to prove In fact, it will be convenient to establish the estimate in the form which is of course equivalent for C 2 := 4C 2 . Thus consider the set of functions Define x 3M ± as follows: Notice that we may assume without loss of generality that M ≥ 1, and hence, because of the boundary conditions u(−L ε ) = −1 and u(L ε ) = 1, the points x 3M − < 0 < x 3M + are well-defined for every u ∈ A. The set A can then be divided into the following two sets: To bound the probability of A 1 , we will use bounds on the potential and a reflection argument. For A 2 we will use a rescaling argument and the large deviation bound (3.17). The two cases are illustrated in Figure 6.1.
Step 1. We treat A 1 first. For u ∈ A 1 we have The idea is to introduce a reflection over the line u = 2M that preserves the Gaussian measure, and use the decrease of the energy (1.6) under this reflection. We begin by collecting some facts about the potential V . To begin with, according to the growth estimate in (1.3), V grows superlinearly at infinity. Hence, we may choose C 3 sufficiently large so that the following two properties are satisfied. On the one hand, V grows at least linearly on [C 3 , ∞), i.e., there exists C 4 < ∞ such that for u 1 ≥ u 2 ≥ C 3 , there holds (6.39) On the other hand, V (C 3 ) ≥ V (0), so that in particular FIGURE 6.1. The two different cases. To show that A 1 has small probability, we reflect between the x 2M ± . This decreases the potential energy. The probability of A 2 can be bounded using a large deviation argument.
We will use the fact that (6.39) and (6.40) together imply that as long as u 1 ≥ C 3 , then (6.41) Now we are ready to reflect. Define x 2M ± analogously to x 3M ± (noting as above that they are well-defined for paths in the set of interest). Consider the reflection , (6.42) which for the purposes of this lemma we will abbreviate with R. In order to have R well-defined for all continuous paths u, we define it to be the identity for those paths u that never exceed the level 2M .
Notice that x 2M − is a right but not a left stopping point, and similarly x 2M + is a left but not a right stopping point. In particular, the strong Markov property (3.9) does not directly imply that R leaves W −1,1 ε,(−Lε,Lε) invariant. Indeed, it is not true that under W −1,1 ε,(−Lε,Lε) the conditional distribution of u(x) for x ∈ [x 2M − , x 2M + ], given the path outside of this interval, is a Brownian bridge.
Still, it is true that the reflection operator R preserves W −1,1 ε,(−Lε,Lε) . To see this, introduce auxiliary stopping points χ 2M − := inf{x ≥ −L ε : u(x) = 2M } and χ 2M + := sup{x ≤ L ε : u(x) = 2M }. As above in (4.18), we use the convention that χ 2M ± = ∓L ε if these sets are empty. On A, these points are well-defined and we automatically have The points χ 2M ± are left and right stopping points. Therefore, (3.9) implies that the reflection operators R − (defined in the same way as R ) preserve W −1,1 ε,(−Lε,Lε) . Observing that we conclude that R also preserves W −1,1 ε,(−Lε,Lε) . We now develop a quantitative, pointwise estimate of the effect of R on the "bulk energy" V (u). By the definition of x 2M ± , we have that u(x) ≥ 2M for all , the set where R acts. Hence, it suffices to consider the effect of R when u(x) ≥ 3M and when u(x) ∈ [2M, 3M ). We will first establish that on the set R decreases the bulk energy significantly. Indeed, on this set, |Ru| ≤ u − 2M and which together with (6.39) implies that for u(x) ≥ 3M , (6.45) Combining (6.44) and (6.45) implies that for all u ∈ A 1 , we have We are now ready to estimate the probability of A 1 . Indeed, we have that where Z = Z −1,1 ε,(−Lε,Lε) is the normalization constant for µ −1,1 ε,(−Lε,Lε) and all of the integrals are over [−L ε , L ε ]. Moving the exponential to the other side of the inequality, we get that which gives (6.36) for A 1 withC 2 = C 4 /2 .
Step 2. Now consider the set A 2 . Here we will use a rescaling argument and the large deviation bound (3.17). For u ∈ A 2 , we can define with the understanding that χ ± = 0 if these sets are empty. These random variables are left and right stopping points. Hence, the strong Markov property (3.10) implies that Therefore, if we can show that for all ε sufficiently small (uniformly for −x − , x + ∈ (0, 1]), then the combination of (6.48) and (6.49) concludes the proof of (6.36). We can see (6.49) by rescaling. Indeed, if we transform (x − , x + ) into [−1, 1] by applying the affine change of is the normalization constant for µ 3M,3M ε,(x − ,x + ) andε := 1 2 ε∆x. Now, we observe that the family of potentials is locally uniformly Lipschitz. In particular, applying Proposition 3.4 for γ and δ fixed to say γ = δ = 1, there exists ε 0 > 0 such that, forε ≤ ε 0 and uniformly in x ± , we have Note that the choice of ε 0 depends on M . Here we use the notation and we will in fact establish the stronger bound We will establish the first inequality by way of a variational argument. Notice that we may assume that the infima are achieved (if not, a simple approximation argument suffices), and so let Observe that automaticallyû 1 (x − + x + )/2 ≥ 4M − 1.
We define the auxiliary functionû 3 := min{û 1 , 3M }. Notice that according to the growth assumption (1.3) (or see (6.39)): On the other hand, sinceû 3 ∈ B bc and asû 2 is the minimizer over B bc , we have This concludes the proof of (6.36) for A 2 and establishes the lemma.
6.5. Proofs of lemmas from the lower bound of Theorem 1.5.
Proof of Lemma 4.3. Since the proof is similar to (and simpler than) the proof of the upper bound in Theorem 1.5, we will be somewhat brief. Our goal is to bound above by 2/3 the complementary event, namely that u(x) > 0 for some x ∈ [−L ε , −2ℓ] or that |u(x k )| > M for some k in the index set. As in the proof of the upper bound, the probability that |u(x k )| > M can be shown to be exponentially small in M/ε, cf. (4.9). It remains to bound above the probability that u(x) > 0 for some x ∈ [−L ε , −2ℓ] and |u(x k )| ≤ M for all k. Now fix δ > 0 sufficiently small so that the estimates from the upper bound of Theorem 1.5 apply. The set {u ∈ ∁A 1 : u(x) > 0 for some x ∈ [−L ε , −2ℓ]} is contained within the union of: (1) functions with more than one δ − layer (exponentially unlikely by the upper bound of Theorem 1.5) , (2) functions with a δ − layer longer than 2ℓ (exponentially unlikely for δ 2 ℓ large, according to the calculation in Step 3 of the proof of the upper bound, cf. (4.13)), (3) functions with one and only one δ − layer, which is at most length 2ℓ and is contained in [−L ε , 0], (4) functions with one and only one δ − layer, which is at most length 2ℓ and is contained in [−2ℓ, L ε ], and such that u(x) > 0 for some x ∈ [−L ε , −2ℓ]. By symmetry properties of the measure, i.e. the symmetry with respect to point reflection of the graph at x = 0 and u = 0, the probability of a δ − layer contained in [−L ε , 0] is equal to the probability of a δ − layer contained in [0, L ε ], hence neither can be more than 1/2. Therefore, the probability of the event described in point (3) is less than or equal to 1/2.
By the calculations referred to above, the sum of the probabilities of the sets described in (1)-(3) is bounded by 1/2 plus exponentially small terms, so we are finished if we can show that the probability of the set described in (4) is also exponentially small, namely, the probability that: u(x) > 0 for some x ∈ [−L ε , −2ℓ], |u(x k )| ≤ M for all k, and there is one and only one δ − layer, which is at most 2ℓ and is contained in [−2ℓ, L ε ]. Note that the latter implies that u ≤ 1 − δ on This bound is easy to obtain by breaking into subintervals (using conditioning) and using the large deviation estimate (3.17). Indeed, we reduce to probabilities of the form where u k−2 and u k+2 are arbitrary boundary values in [−M, 1 − δ] and k ∈ {−(N ε − 2), − (N ε − 3), . . . , −3}. (We also need to consider the boundary interval, wherex ∈ (x −Nε , x −(Nε−2) ). As usual, this is no more difficult than the bound for the interior intervals.) After applying Proposition 3.4 (withδ = δ/2), it remains only to introduce an energetic bound. The bound from Lemma 6.6 below suffices.
Before stating the energy lemma, we explain the idea in words: If we take a δ/2 ball around the set of interest, then on [x k−1 , x k+1 ], there is a point x 0 such that u(x 0 ) ≥ −δ/2. For ℓ large, the energy minimizer needs to come very close to ±1 someplace in [x k−2 , x k−1 ] and [x k+1 , x k+2 ], (say within δ/4), and since it cannot come this close to +1, it is forced into a small neighborhood of −1. Consequently, the large excursion from −1 at x 0 costs almost c 0 energy. We give the precise statement below and prove the lemma at the end of the subsection. Proposition 3.4 and Lemma 6.6 together give Finally, we now choose γ and δ sufficiently small and sum over the order N ε ∼ L ε intervals. Bearing in mind the bound (1.10) on L ε , we observe that there is also an exponentially small probability of the final set that we have studied.
Proof of Lemma 6.6. We will be brief, since the proof is similar to the proof of Lemma 2.5. First of all, fix M large and δ small. The infimum of the energy over A bc is less than or equal to the minimum of the energy over functions with u(±2ℓ) = u ± and u(0) = −1. By a standard construction, we have In particular, for ℓ 0 large enough and ℓ ≥ ℓ 0 , one has (6.52) By iterated rescaling and application of the large deviation bounds we show that the paths relax to a O(ε 1/2 ) -neighbourhood of 1 within a distance of O| log(ε)|.
On the other hand, on A bc 0 , either there exist x − ∈ [−2ℓ, −ℓ] and x + ∈ [ℓ, 2ℓ] such that |u(x ± ) + 1| ≤ δ/2 or we have u ∈ [−1 + δ/2, 1 − δ/2] on an interval of length ℓ. In the latter case, we get easily Since this is higher order for ℓ large, we may assume that we are in the former case.
In the former case, we may assume without loss of generality that u(x ± ) = −1 + δ and u(x 0 ) = −δ/2. We then use the Modica-Mortola trick to connect the values (a) u − and u(x − ), (b) u(x − ) and u(x 0 ), (c) u(x 0 ) and u(x + ), and (d) u(x + ) and u + . We conclude in the usual way that Together with (6.52), this completes the proof of Lemma 6.6. 6.6. Proof of lemmas related to the uniform distribution.
Proof of Lemma 5.1. Our argument relies on an iterated rescaling, illustrated in Figure 6.2.
We will define K = K ε ≥ 1 below. We begin by enumerating the partition
We will use the elementary facts from probability that for any sets A 1 , A 2 , and A 3 , we have (6.54) We also use the Markov property from Lemma 3.2 to deduce the following property for conditional measures.
Keeping these preliminaries in mind, we now observe that we can make the following decomposition: For the first term, we can now send the smallness condition into the boundary conditions in the following way: We can iterate this argument to reduce the probability to the form: Hence it remains to estimate the individual terms in the sum. The argument involves three steps: a large deviation estimate, concatenation, and an iterated rescaling of the deviation of u from 1.
Step 1: Large deviation estimate. The first step is to derive a uniform large deviation bound for the measures µ u − ,u + ε,(−3ℓ 0 ,3ℓ 0 ) . We show that there exists C < ∞ such that for every ℓ 0 < ∞ sufficiently large, there exists ε ′ 0 > 0 such that for any u ± ∈ [1/2, 3/2] and ε ≤ ε ′ 0 , we get In the next steps, we will always assume ε 0 ≤ ε ′ 0 to be sufficiently small in this sense, and this is the only restriction on ε 0 in the proof of the lemma.
To bound the conditional probability in (6.57) it suffices to establish an upper bound on and |u(±ℓ 0 ) − 1| ≤ 1 2 (6.58) and a lower bound on uniformly with respect to u ± ∈ [1/2, 3/2]). To this end, we turn to the uniform large deviation estimates from Propositions 3.4 and 3.5. In fact, we do not even need the second condition in (6.58), and it suffices to bound the probability of the larger set The estimate (3.17) gives that for any γ, δ > 0, we have for sufficiently small ε that where ∆E is defined in (3.13) and Consider now a small δ > 0 to be fixed below and a function u ∈ B(A 0 , δ).
Because the boundary conditions are in [1/2, 3/2] and ℓ 0 is large, the infimum of the energy must take place over functions such that max min (6.61) (Indeed, u must be close to either 1 or −1 at some point in each of the intervals, and if u were instead close to −1 on either interval, satisfying the boundary conditions would lead to an even greater energetic cost than the one we will arrive at below.) Let us label the minimizing points x − and x + . Moreover, let us define x * to be a point in (−ℓ 0 , ℓ 0 ) such that As above in Subsection 6.1, we now define ϕ(u) := | 1 u 2V (s) ds| and apply the "Modica-Mortola trick" on (−3ℓ 0 , x − ), (x − , x * ), (x * , x + ), and (x + , 3ℓ 0 ) to where ϕ 1/4 := min{ϕ(3/4), ϕ(5/4)}.
On the other hand, a standard construction gives Now fixing δ > 0 and γ > 0 sufficiently small, the combination of (6.60), (6.62), and (6.63) gives for sufficiently small ε that We now remark that the lower bound on (6.59) follows easily from Proposition 3.5. Indeed, for a fixed 0 < δ < 1 2 , the set of interest can be written as the δ ball around the set A 1 defined as We recover for any γ > 0 and for ε > 0 sufficiently small that where ∆E is defined in (3.13). The constraint in A 1 is inactive/slack in the optimization for ℓ 0 sufficiently large, and the usual construction together with the usual Modica-Mortola estimate thus gives Plugging back into (6.65) gives which together with (6.64) gives (6.57) with C = 1/ϕ 1/4 as long as γ is chosen sufficiently small.
Step 2: Concatenation. The next step is to prove for any K ∈ N that uniformly for u ± ∈ (1/2, 3/2). As usual, the idea is to break up the larger interval by conditioning on the boundary values. The restriction of the boundary values on each subinterval to (1/2, 3/2) because of u ∈ A will allow us to apply the uniform estimate from Step 1.
First consider how the sets involved in (6.56) behave under the rescaling. Notice that u satisfies Similarly, for the set on which we condition, we have that u satisfies Hence each term in (6.56) can be bounded if we can establish that the bound from Step 2 also holds for the measure governingû.
In order to show that the estimates from Step 1 and 2 hold uniformly for the measure of the rescaled random variablesû, we need to be able to invoke Propositions 3.4 and 3.5 (with uniform constants). This in turn requires uniform control on the boundary values, the minimum energyÊ over the sets of interest, and the Lipschitz constant ofV . The boundary values are easy: On the sets of interest, the boundary values u ± ∈ (1/2, 3/2). On the other hand, the minimum of the energŷ E is bounded uniformly with respect to k on the sets of interest. Indeed, consider |u(x) − 1| ≥ 1 2 k+1 and letĈ denote the image of the set under the transformation u →û. By the usual method ("Modical Mortola trick" for the lower bound and construction for the upper bound), one can check that there exists R < ∞ such that, for every k ∈ N, one has Finally, because of Assumption 1.1, we have a uniform bound on the Lipschitz constant of V . Indeed, let C := 3/2 + 2ℓ 0 R + 1. Then uniformly with respect to k ∈ N, the potentialV satisfies where the supremum is taken over τ ∈ [1 − 2(C + 1), 1 + 2(C + 1)].
Hence, the potential satisfies the requirements of Propositions 3.4 and 3.5. The remaining requirement in order to invoke large deviation theory is that which is true if Therefore we choose K to be an integer satisfying (6.69) With the restriction (6.69) on K, the arguments used in Step 1 and Step 2 carry over to the rescaled measures governing theû.
We are now ready to complete the argument. Indeed, recalling the decomposition from (6.56), we have From the preceding argument, we can now apply the estimate (6.66) for the rescaled measures to bound the k th summand above by Substituting into the right-hand side of (6.70), we deduce for r ∈ (0, 1/4].
Proof of Lemma 5.2. We start by defining some sets. We denote the set of paths that we condition on by For ε, ε 0 > 0 let us also fix the following subset of A Then Lemma 5.1 implies in particular that, for a small but fixed ε 0 > 0 and for ε ≤ ε 0 , we have From now on, we fix an ε 0 such that this identity holds. This will be the only restriction on ε 0 . Let us also introduce a notation for the set of paths that have a hitting point of 1 As a slight abuse of notation we will use the same letter B to denote the set of paths u ∈ B restricted to [−ℓ 0 , ℓ 0 ].
Using the Markov property (3.7), we get for any u ± ∈ [1/2, 3/2] that Our main task is thus to derive a lower bound for the probabilities µ u − ,u + ε,(−ℓ 0 ,ℓ 0 ) B (6.73) that holds uniformly in the boundary conditions. In view of the definition of A ε , it is sufficient to consider boundary conditions u ± that are O(ε 1/2 ) close to 1: As in the proof of Lemma 5.1, we rescale the random profile u around 1, this time by a factor ε − 1 2 . More precisely, we consider the transformation According to its definition, a path u is in the set B if and only ifû is in B. Hence, we can express the probability (6.73) in terms ofû. The random variableû is distributed according to a rescaled version of µ u − ,u + ε,(−ℓ 0 ,ℓ 0 ) . The variance of the Gaussian reference measure becomes one and the rescaled boundary values areû ± := ε −1/2 (u ± − 1) + 1. Note that the condition (6.74) implies that these rescaled boundary conditions take values in an order-one interval around 1. More precisely, the distribution ofû is absolutely continuous with respect to Wû − ,û + 1,(−ℓ 0 ,ℓ 0 ) and the Radon Nikodym density of the rescaled measure is proportional to exp − ℓ 0 −ℓ 0V û dx , whereV (û) := 1 ε V ε 1/2 (û − 1) + 1 . Hence we can rewrite µ (6.75) The denominator of this expression can be trivially bounded above by 1. To get a lower bound for the numerator, we can write for example Hence it remains to get a lower bound on the second term in (6.76). As above in the proof of Lemma 5.1, Assumption 1.1 on V and Taylor's formula imply thatV satisfies sup |u−1|≤3 ε −1/2 0 sup ε∈(0,1) ε −1 V ε 1/2 (u − 1) + 1 =: C < ∞.
Proof of Lemma 5.4.
Step 1. We begin by ruling out long layers to the left and to the right of Y . Once we know that layers are bounded in length, we can use a reflection argument as in the proof of Theorem 1.5 to turn them into wasted excursions and estimate their probability. To this end, we define the set A Y,2 of functions that are bounded away from ±1 on a whole subinterval outside of Y : A Y,2 := {u ∈ J Y : there exists a k with k ≤ k − or k ≥ k + − 1 such that As usual, we note that A Y,3 is contained within A Y,2 ∪ (A Y,3 ∩ ∁A Y,2 ). Our first step is to show that (5.6) holds for A Y,2 . In fact, A Y,2 is of higher order for M and δ 2 ℓ sufficiently large. The set A Y,2 can be written in the obvious way as the union of sets A k Y,2 that have bad behavior on a given subinterval [x k , x k+1 ]. Without loss of generality, suppose that k ≤ k − .
Then we introduce the following sets for a Markovian decomposition: We remark that Consequently, the decompositions A k Y,2 = A ⊖ k ∩ A ⊙ δ,k ∩ A ⊕ k and J Y = A ⊖ k ∩ A ⊙ k ∩ A ⊕ k lend themselves to an application of the Markov property from Lemma 3.2. We will often use such decompositions in the proofs below.
In the proof at hand, the Markov property from Lemma 3. It suffices to bound the ratio of expectations on the right-hand side. For the denominator, we observe that for M sufficiently large. In fact, this bound follows immediately from the large deviation bound (3.17) and a simple energy estimate applied to the complement. Hence, it suffices to bound the numerator. Recalling the bound (4.12), the expectation in the numerator can be estimated by For δ 2 ℓ sufficiently large, this drops below the threshold expressed in the exponential in (5.7). Hence, summing the probabilities of A k Y,2 over k, the probability of A Y,2 is negligible in the sense that, in order to establish (5.7), it suffices to show that it holds forÃ Y,3 := A Y,3 \ A Y,2 . For ease of notation, we drop the tildes for the remainder of the proof of the lemma.
Step 2. We will now show the desired bound for A Y,3 . That is, we will show that for any γ > 0 there exists an ε 0 > 0 such that for all ε ≤ ε 0 we have The proof uses a reflection argument very similar to the argument in the proof of the upper bound in Theorem 1.5.
We will only give the bound for the set A − Y,3 . The proof of the corresponding bound for A + Y,3 follows in the same way. The set A − Y,3 is contained in the union of k from −(N ε − 1) to k − of the sets A −,k Y,3 := u ∈ ∁A 1 : u has a δ − up layer contained in [x k−1 , x k+1 ] and a δ − up layer ≤ 2ℓ in Y .
As in the proof of Theorem 1.5, we will transform the additional δ − transition layer into a wasted δ − excursion to control the probability. We need to reflect in such a way as to (a) create a wasted excursion in [x k−1 , x k+1 ] and (b) leave at least one δ − up layer in Y . To this end, we define the left stopping point capturing the additional δ − up layer χ − = inf{x > x k−1 : u(x) = 0 and u(y 1 ) = −1 + δ for some y 1 ∈ (x k−1 , x)} and the right stopping point χ + := sup x ≤ y + : u(x) = 0 and there exist y 1 < y 2 both in (x, y + ) with u(y 1 ) = −1 + δ, u(y 2 ) = 1 − δ , where y + := sup Y is the right boundary of Y . As before we will use the convention that χ ± = ∓L ε if these sets are empty. As in the proof of Theorem 1.5, the reflection operator R = R χ + χ − , reflects the paths u between the stopping points χ ± while preserving µ −1,1 ε,(−Lε,Lε) . On the other hand, it maps the set A −,k Y,3 into the set A −,k Y,3 := u ∈ ∁A 1 : at least one wasted δ − excursion in [x k−1 , x k+1 ] and at least one δ − up layer ≤ 2ℓ in J Y .
Hence, the estimate (5.7) will follow if we can establish, uniformly in k, that Then we can decomposeÂ −,k Y,3 = A ⊖ k ∩ A ⊙ w,k ∩ A ⊕ k and J Y = A ⊖ k ∩ A ⊙ k ∩ A ⊕ k , so that applying the Markov property as in Lemma 3.2 gives It now remains to estimate the ratio of expectations. Recalling (6.78), it suffices to bound the numerator. For this purpose, we remark that (4.25) yields that for any γ > 0 and for δ > 0 sufficiently small, we have (where, as usual, we have redefined γ by a factor of two). Substituting these upper and lower bounds, (6.80) improves to (6.79), and the proof of Lemma 5.4 is complete.
Proof of Lemma 5.5. We will show (5.8). The proof of (5.9) is similar. We can assume that the interval J ε y,− is contained in [−L ε , L ε ]; if it is not, the proof becomes even simpler.
Given the bound (5.4) on I ε − , it is clearly sufficient to prove that for any fixed k ∈ I ε − , we have µ −1,1 ε,(−Lε,Lε) u ∈ J y,ε ∩ ∁A − 3,y : |u(x k ) + 1| ≥ 1 2 ≤ exp − 3c 1 4ε µ −1,1 ε,(−Lε,Lε) J y,ε . (6.81) This in turn will follow trivially from µ −1,1 ε,(−Lε,Lε) u ∈ J y,ε : |u(x k ) + 1| ≥ ≤ exp − 3c 1 4ε µ −1,1 ε,(−Lε,Lε) J y,ε . (6.82) In order to establish (6.82), we again introduce a decomposition. This time we define the sets The set on the left-hand side of (6.82) can be written as A ⊖ k ∩ A ⊙ 1/2,k ∩ A ⊕ k , and we have the containment To get a lower bound for the denominator, we will as usual use the large deviation lower bound from Proposition 3.5. For this, we note that Therefore, the large deviation bound gives that for any γ > 0 and for ε small enough To get an upper bound for the numerator of (6.83), on the other hand, we will use the large deviation upper bound from Proposition 3.4. For this, we observe that the closed δ/2 ball around A ⊙ 1/2,k is the set A := u : |u(x k )| ≤ M + δ, for j = k − 2, . . . , k + 2, so that the large deviation bound gives We substitute (6.84) and (6.85) into the ratio on the left-hand side of (6.83) and observe that there is a cancellation of the second factor in the energy difference (see equation (3.13)): Hence, the final ingredient that we need is the following energetic fact. This lemma is virtually identical to Lemma 6.6. The principal difference is that here the excursion from −1 is only of magnitude 1/2. This changes only the leading order cost (from c 0 to c 1 ). We omit the proof of the lemma.
Proof of Lemma 5.6. We will prove only (5.10), the proof of (5.11) being essentially the same. We will always assume that the left endpoint of the interval J ε y,− is greater than or equal to −L ε (since otherwise the boundary condition at −L ε trivially implies the result).
Notice that the set of paths u ∈ J y,ε that do not hit −1 in J ε y,− is contained in the following two sets • The set of paths (a) in A − y,3 (extra δ − layers: recall (5.5)) or (b) without extra layers but more than 1/2 away from −1 at a gridpoint for some k in I ε − .
We want to use the Markov property and then apply Lemma 5.2 on these subintervals. Therefore, as usual, we introduce some sets for a decomposition. We now write A − y,4 as the intersection A ⊙ −1,j , (6.88) and apply the Markov property (Lemma 3.2)K ε times to deduce uniformly over all paths u that satisfy u(xk j ), u(xk j+1 ) ∈ [−3/2, −1/2]. We insert this bound into (6.89) and then use the Markov property once more to recover (6.90) Since the combination of (6.88) and (6.90) completes the proof of (6.87).