Exponential decay for the near-critical scaling limit of the planar Ising model

We consider the Ising model at its critical temperature with external magnetic field $ha^{15/8}$ on the square lattice with lattice spacing $a$. We show that the truncated two-point function in this model decays exponentially with a rate independent of $a$. As a consequence, we show exponential decay in the near-critical scaling limit Euclidean magnetization field. For the lattice model with $a=1$, the mass (inverse correlation length) is of order $h^{8/15}$ as $h\downarrow 0$; for the Euclidean field, it equals exactly $Ch^{8/15}$ for some $C$. Although there has been much progress in the study of critical scaling limits, results on near-critical models are far fewer due to the lack of conformal invariance away from the critical point. Our arguments combine lattice and continuum FK representations, including coupled conformal loop and measure ensembles, showing that such ensembles can be useful even in the study of near-critical scaling limits. The coupling we use is novel, even in the discrete lattice setting, and our proofs provide the first substantial application of measure ensembles.


Introduction
In this paper we obtain the first proof of exponential decay (or equivalently, a mass gap lower bound) for the important Euclidean field theory that is the near-critical scaling limit of the planar Ising model [59]. Key to our arguments is the use of conformal measure ensembles, introduced in [15], where they were called cluster area measures, and then constructed for percolation and the FK (Fortuin-Kasteleyn)-Ising model in [8]. The FK representation (see [28]) has been an invaluable tool in studies of the Ising modelparticularly for the critical two-dimensional scaling limit, where it is closely related to conformal loop ensembles [50,51]. Here we extend that approach to the near-critical case by introducing a new coupling between FK and Ising variables and using coupled measure and loop ensembles. An upper bound for the mass gap is obtained using methods quite different from those of the rest of the paper, namely transfer matrix techniques and reflection positivity.

1.1.
Overview. The Ising model [31], suggested by Lenz [37] and cast in its current form by Peierls [46], is one of the most studied models of statistical mechanics. Its twodimensional version has played a special role since Peierls' proof of a phase transition [46], and Onsager's calculation of the free energy [44]. This phase transition has become a prototype for developing new techniques. Its analysis has helped test a fundamental tenet of critical phenomena, that near-critical physical systems are characterized by a correlation length, which provides the natural length scale for the system, and diverges when the critical point is approached.
This divergence implies that the critical system itself has no characteristic length and is therefore invariant under scale transformations. This in turn suggests that thermodynamic functions at criticality are homogeneous, and predicts the appearance of power laws. For a lattice-based model, it also means that, at or near criticality, it should be possible to rescale the model appropriately and obtain a continuum scaling limit by sending the lattice spacing to zero. This idea is at the heart of the renormalization group philosophy.
Thanks to the work of Polyakov [47] and others [2,3], it was understood that, once an appropriate continuum scaling limit is taken, critical models should acquire conformal invariance. Because the conformal group is in general a finite dimensional Lie group, the resulting constraints are limited in number; however, in two dimensions, since every analytic function f defines a conformal transformation, provided that f is nonvanishing, the conformal group is infinite-dimensional.
Following this observation, in two dimensions, conformal methods were applied extensively to Ising and Potts models, Brownian motion, the self-avoiding walk, percolation, and diffusion limited aggregation. The large body of knowledge and techniques that resulted goes under the name of Conformal Field Theory (CFT). The aspect of CFT most related to our work in this paper is a particular near-critical scaling limit of the two-dimensional Ising model believed to be related to the Lie algebra E 8 [59,18,6,39], which we discuss in more detail below.
In recent years, significant developments in two-dimensional critical phenomena have emerged in the mathematics literature. A very significant breakthrough was the introduction by Schramm [49] of the Schramm-Loewner Evolution (SLE) and its subsequent analysis and application to the scaling limit problem for several models, most notably by Lawler, Schramm and Werner [34], and by Smirnov [53] (see also [14]). The subsequent introduction of Conformal Loop Ensembles (CLEs) [12,13,50,51,57], which are collections of SLE-type, closed curves, provided an additional tool to analyze the scaling limit geometry of critical models. Substantial progress in the rigorous analysis of the twodimensional Ising model at criticality was made by Smirnov [54] with the introduction and scaling limit analysis of fermionic observables, also known as discrete holomorphic observables or holomorphic fermions. These have proved extremely useful in studying the Ising model in finite geometries with boundary conditions and in establishing conformal invariance of the scaling limit of various quantities, including the energy density [29,30] and spin correlation functions [17]. (An independent derivation of critical Ising correlation functions in the plane was obtained in [19].) In [10] (resp., [11]), it was shown that the critical Ising model (resp., near-critical model with external magnetic field ha 15/8 ) on the rescaled lattice aZ 2 has a scaling limit Φ h (denoted Φ ∞,h in [11]) as a ↓ 0. When h = 0, Φ 0 satisfies the expected conformal covariance properties [10]. When h = 0, it was expected (as stated in [10]) that the truncated correlations of the near-critical scaling limit would decay exponentially. In this paper, we give a proof of that statement and we rigorously verify that the critical exponent for how the correlation length diverges as h ↓ 0 is 8/15, together with the related scaling properties of Φ h . Φ h is a (generalized) random field on R 2 -i.e., for suitable test functions f on R 2 , there are random variables Φ h (f ), formally written as Euclidean random fields such as Φ h on the Euclidean "space-time" R d := {x = (x 0 , w 1 , . . . , w d−1 )} (in our case d = 2) are related to quantum fields on relativistic space time, {(t, w 1 , . . . , w d−1 )}, essentially by replacing x 0 with a complex variable and analytically continuing from the purely real x 0 to a pure imaginary (−it) -see [45], Chapter 3 of [24] and [42] for background. One major reason for interest in Φ h is that the associated quantum field is predicted [59] to have remarkable properties including relations between the masses of particles described by the quantum field and the Lie algebra E 8 -see [18,6,39]. A natural first step in analyzing particle masses is to prove a strictly positive lower bound m(h) on all masses (i.e., a mass gap) which exactly corresponds (see [52,55] and Chapters VII and XI of [23]) to the type of exponential decay we prove in this paperi.e., showing (as a consequence of Theorem 2 below) that for test functions f, g ≥ 0 of compact support, and some C = C(f, g) < ∞, Φ h is the limit, as the lattice spacing a ↓ 0, of the lattice field Φ a,h := a 15/8 where {σ x } x∈aZ 2 are the ±1-valued spin variables in the standard planar Ising model (on aZ 2 ) at the critical (inverse) temperature β = β c with magnetic field H = a 15/8 h and δ x is a unit Dirac point measure at x. Hence, obtaining an exponential decay result for Φ a,h is directly related to corresponding results for {σ x } on the lattice, which we discuss next. But first we note that the choice of scaling factor a 15/8 in (1) relies on Wu's celebrated result (see [58] and [40]) that the critical Ising two-point function decays precisely as C |x − y| −1/4 for some C (where |x − y| := x − y 2 , the Euclidean distance). We will assume Wu's result in this paper, but note that without that result, one can replace a 15/8 by an implicitly defined scale factor and all the conclusions remain valid -see [10,11] and [8] for more details.
It was first proved in [36] that the lattice truncated two-point function with H > 0 decays exponentially. See also [22] for a different and simpler proof, where it was also shown that the decay ratem(H) (or inverse correlation length) on Z 2 is bounded below linearly in H. In this paper, we show exponential decay for the near-critical Ising model on aZ 2 with H = a 15/8 h. Roughly speaking, this means (see Theorem 1 below) that there is a lower bound onm(H) behaving like H 8/15 as H ↓ 0.
Good lower bounds as a ↓ 0 for fixed h or as H ↓ 0 for fixed a seem essential in order to obtain an exponential decay rate for the continuum field Φ h for any particular value, say h 0 , of the renormalized field strength h. It is worth noting that in the earlier work of [36,22] on lattices, exponential decay was first obtained for large H (by expansion techniques) and then shown to apply to all H > 0, albeit with a sub-optimal lower bound onm(H) as H ↓ 0. However, in the continuum setting, exponential decay (i.e., m(h) > 0) for any single value h 0 = 0 of h immediately implies exponential decay for all h = 0 with the correct dependence of m(h) on h. This follows from simple scaling properties of Φ h as we now explain.
Both the h = 0 and h > 0 fields Φ 0 and Φ h can be defined on a bounded (simplyconnected) domain in R 2 (now thought of as the complex plane C) with appropriate boundary conditions (e.g., free or plus) as well as on the full plane. Conformal mapping properties for Φ 0 were given in Theorem 1.8 of [10]. Similar properties for Φ h are only implicit in [11] so we state them explicitly below as Theorem 6 in Section 6. In the case of the full plane one can consider (for h = 0 and h > 0) the conformal mapping, . Here one has that λ 1/8 Φ h 0 (λx) is equal in distribution to Φ λ 15/8 h 0 (x) for any λ > 0 and real h 0 . Thus a positive exponential decay rate m(h 0 ) > 0, for a single h 0 > 0, implies the same for all h = 0 with m(h) = m(h 0 )/h Exponential upper bounds of the form Ce −m(h)|x−y| for the truncated two-point function σ x ; σ y a,h := Cov a,h (σ x , σ y ) on aZ 2 for small a or for the corresponding continuum y) on R 2 cannot be valid for small |x − y| since when h = 0, G 0 (x − y) = C|x − y| −1/4 . Indeed, one expects exponential decay only for distance larger than the correlation length and otherwise G 0 (x − y) behavior. Since the GHS inequality [26] implies G 0 (x − y) ≥ G h (x − y) for all x, y; one can paste together exponential upper bounds for large |x − y| with the h = 0 upper bounds for small |x − y| to obtain an upper bound of the form C |x − y| −1/4 e −m (h)|x−y| for all |x − y|, as we do in Theorems 1 and 2 below.
The analysis of Theorems 1 and 2 is first done for large h, in Section 3 after reviewing in Section 2 the FK random cluster representation for the Ising model and discussing two different couplings of FK and Ising variables relevant when h > 0. The heart of that analysis consists of the first three lemmas in that section, which concern circuits of vertices in an annulus created by "necklaces" of touching FK-open clusters containing sufficiently many vertices. For large h, with high probability, a necklace and its circuit will have all +1 spin values. Correlations will then only occur between regions of aZ 2 that are connected within the complement of a strongly supercritical infinite percolation cluster. The proof relies on continuum results concerning coupled conformal loop (CLE 16/3 ) and measure ensembles. Indeed, a main contribution of this paper is a demonstration of the utility of such coupled loop and measure ensembles. Relevant CLE κ results are in [50], [51], [41]. Conformal measure ensembles and their coupling to CLE κ were proposed in [15] and carried out in [8] for κ = 6 and 16/3. It may be worth noting, as was mentioned in [15], that, in addition to their utility for near-critical models, measure ensembles may be more extendable than loop ensembles to scaling limits in dimensions d > 2, but that issue goes well beyond the scope of this paper. In Section 4, the case of small h on the lattice is treated, while in Sections 5 and 6 the continuum field Φ h is treated, including conformal mapping properties. Finally, in Section 7, using reflection-positivity methods, we give a proof of Theorem 3, which provides an upper bound for the mass gap (inverse correlation length) matching the lower bound given in Corollary 1.

1.2.
Main results. Let a > 0. Denote by P a h the infinite volume Ising measure at the inverse critical temperature β c on aZ 2 with external field a 15/8 h > 0. The precise value of β c , log(1 + √ 2)/2, originates in [33,44]. Let · a,h be the expectation with respect to P a h . Let σ x ; σ y a,h be the truncated two-point function, i.e., σ x ; σ y a,h := σ x σ y a,h − σ x a,h σ y a,h .
Our main result about the truncated two-point function is: There exist B 0 , C 0 ∈ (0, ∞) such that for any a ∈ (0, 1] and h > 0 with c(h), m(h) ∈ (0, ∞) such that c(h) ≤ C 0 and m(h) ≥ B 0 h 8/15 for ha 15/8 ≤ 1. In particular, for a=1 and any H > 0, we have For a = 1, define the (lattice) mass (or inverse correlation length)M (H) as the supremum of allm > 0 such that for some Cm < ∞, The following immediate corollary of Theorem 1 gives a one sided bound for the behavior ofM (H) as H ↓ 0, with the expected critical exponent 8/15. Let Φ a,h be the near-critical magnetization field in the plane defined by Φ a,h := a 15/8 where {σ x } x∈aZ 2 is a configuration for the measure P a h . In Theorem 1.4 of [11], it was proved that Φ a,h converges in law to a continuum (generalized) random field Φ h . Let C ∞ 0 (R 2 ) denote the set of infinitely differentiable functions with compact support. Φ h (f ) denotes the field Φ h paired against the test function f (which was denoted Φ h , f in [11]).
where c(h) and m(h) are as in Theorem 1.

Remark 1. Theorem 2 may be expressed as
For Φ h , define the mass M (Φ h ) as the supremum of allm > 0 such that for all f, g ∈ C ∞ 0 (R 2 ) and some Cm(f, g) < ∞, The following is a consequence of Theorem 2 and the scaling properties of Φ h (as discussed in Subsection 1.1 and Section 6).    This result complements that of [9] that the (H ↓ 0 at β c ) Ising magnetization exponent is 1/15:

Preliminary definitions and results
2.1. Ising model and FK percolation. In this subsection, our definitions and terminology (especially after the ghost vertex is introduced below) follow those of [1]. With vertex set aZ 2 , we write aE 2 for the set of nearest neighbour edges of aZ 2 . For any finite D ⊆ R 2 , let D a := aZ 2 ∩D be the set of points of aZ 2 in D, and call it the a-approximation of D. For Λ ⊆ aZ 2 , define Λ C := aZ 2 \ Λ, z has a nearest neighbor in Λ}, be the set of all edges {z, w} ∈ aE 2 with z, w ∈ Λ, and B(Λ) be the set of all edges {z, w} with z or w ∈ Λ. We will consider the extended graph G = (V, E) where V = aZ 2 ∪ {g} (g is usually called the ghost vertex [25]) and E is the set of nearestneighbor edges of aE 2 plus {{z, g} : z ∈ aZ 2 }. The edges of aE 2 are called internal edges while {{z, g} : z ∈ aZ 2 } are called external edges. Let E (Λ) be the set of all external edges with an endpoint in Λ, i.e., Let Λ L := [−L, L] 2 and Λ a L be its a-approximation. The classical Ising model at inverse (critical) temperature β c on Λ a L with boundary condition η ∈ {−1, +1} ∂exΛ a L and external field a where the first sum is over all nearest neighbor pairs (i.e., |u − v| = a) in Λ a L , and Z a L,η,h is the partition function (which is the normalization constant needed to make this a probability measure). P a Λ L ,f,h denotes the probability measure with free boundary conditions -i.e., where we omit the second sum in (7).
It is known that P a Λ L ,η,h has a unique infinite volume limit as L → ∞, which we denote by P a h . Note that this limiting measure does not depend on the choice of boundary conditions (see, e.g., Theorem 1 of [35] or the theorem in the appendix of [48]).
The FK (Fortuin and Kasteleyn) percolation model at β c on Λ a L with boundary con- denotes the number of clusters in (ωρ) Λ a L which intersect Λ a L and do not contain g, andZ a L,ρ,h is the partition function. An edge e is said to be open if ω(e) = 1, otherwise it is said to be closed. P a Λ L ,ρ,h is also called the random-cluster measure (with cluster weight q = 2) at β c on Λ a L with boundary condition ρ with external field a 15 8 h ≥ 0. P a Λ L ,f,h (respectively, P a Λ L ,w,h ) denotes the probability measure with free (respectively, wired) boundary conditions, i.e., ρ ≡ 0 (respectively, ρ ≡ 1) in (8). Below we will also consider FK measures P a D,ρ,h for more general domains D ⊆ R 2 , defined in the obvious way.
It is also known that P a Λ L ,ρ,h has a unique infinite volume limit as L → ∞, which we denote by P a h . Again this limiting measure does not depend on the choice of boundary conditions. The reader may refer to [28] for more details in the case h = 0; the proof for h > 0 is similar.
2.2. Basic properties. The Edwards-Sokal coupling [21], based on the Swendsen-Wang algorithm [56], is a coupling of the Ising model and FK percolation. LetP a h be such a coupling measure of P a h and P a h defined on The conditional distribution of the Ising spin variables given a realization of the FK bonds can be realized by tossing independent fair coins -one for each FK-open cluster not containing g -and then setting σ x for all vertices x in the cluster to +1 for heads and −1 for tails. For x in the ghost cluster, σ x = +1 (for h > 0). We note that a different coupling for h = 0 between internal FK edges and spin variables is given in Lemma 4 and Proposition 1 below.
For any u, v ∈ V , we write u ←→ v for the event that there is a path of FK-open edges that connects u and v, i.e., a path u = z 0 , z 1 , . . . , The following identity, immediate from the coupling, is essential.
Let P a := P a h=0 . By standard comparison inequalities for FK percolation (Proposition 4.28 in [28]), one has Lemma 2. For any h ≥ 0, P a h stochastically dominates P a . The following lemma is about the one-arm exponent for FK percolation with h = 0. The proof can be found in Lemma 5.4 of [20]. Lemma 3. There exists a constant C 1 independent of a such that for all a ≤ 1 and for any boundary Let D ⊆ R 2 be bounded, and D a := aZ 2 ∩ D be the a-approximation of D. For any ω ∈ {0, 1} B(D a ) , let C (D a , ω) denote the set of clusters of ω; for a C ∈ C (D a , ω), let |C| denote the number of vertices in C. Then we have Moreover, conditioned on ω, the events {C i ←→ g} are mutually independent.
Proof. This follows from the proof of the next proposition.
where E a D,f,h=0 is the expectation with respect to P a D,f,h=0 . LetP a D,f,h be the Edwards-Sokal coupling of P a D,f,h and its corresponding Ising measure. For any C ∈ C (D a , ω), let σ(C) be the spin value of the cluster assigned by the coupling. Then we have, for ω ∈ {0, 1} B(D a ) , Moreover, conditioned on ω, the events {σ(C i ) = +1} are mutually independent.
Proof. It is not hard to show that (see e.g. pp. 447-448 of [1] where o(ω) and c(ω) denote the number of open and closed edges of ω respectively. So (11) follows from (12), (8) and the factP a D,f,h (ω) = P a D,f,h (ω). (12) also gives for any , with a similar product expression for the intersection of three or more of the events {C i ←→ g}. Hence, conditioned on ω, these events are mutually independent. The rest of the proof follows from the Edwards-Sokal coupling.
Remark 4. This type of analysis can be extended to the continuum like what is done in [8] for h = 0, with the continuum analog of the coupling in Proposition 1 valid also for h > 0. Such an extension is planned by the authors for a future paper.

Exponential decay for long FK-open paths not connected to the ghost.
Our goal in this section is to show the following where C 0 (h), m 1 (h) ∈ (0, ∞) only depend on h.
Remark 5. By the GHS inequality [26], σ x ; σ y a,h is decreasing in h for fixed a, x, y. Thus as soon as (13) is valid for some h = h 0 , it is valid for all h > h 0 by taking C 0 (h) = C 0 (h 0 ) and m 1 (h) = m 1 (h 0 ). Of course, this would not give the correct h dependence for the optimal value of m 1 (h).  L) to g. Similarly, let A(L, 2L) be the annulus with inner radius L and outer radius 2L and A a (L, 2L) be its a-approximation. We will consider circuits in the annulus -i.e., nearest neighbor self-avoiding paths of vertices that end up at their starting vertex. Let G(a, L) denote the event that there is a circuit of vertices surrounding Λ L in the annulus A a (L, 2L) with each vertex in the circuit connected to g via A a (L, 2L); see Figure 1. Denote the complement of G(a, L) by G comp (a, L).
In this subsection, we prove Proposition 3. There exist h 0 , 0 ∈ (0, ∞) such that for all h > h 0 and a ≤ 0 , for any boundary condition ρ 1 on A a (1, L) and any boundary condition ρ 2 on A a (L, 2L), we have where C 1 (h) ∈ (0, ∞) only depends on h.
Before proving Proposition 3, we state and prove several lemmas. The first gives a useful property of CLE 16/3 and its related conformal measure ensemble; the idea of such coupled loop and measure ensembles originated in [15]. Lemma 5. Let E(K) be the event that there is a sequence of K or fewer loops (say, L 1 , . . . , L k with k ≤ K) such that the total mass of the limiting counting measure corresponding to L i is larger than 1/K for each i and Then, the conformal invariance of CLE 16/3 implies that a finite sequence satisfying (16) exists in Λ 3,1 with P Λ 3,1 -probability 1. Our argument is inspired by the proof of Lemma 9.3 in [51]. Let L * be the outermost loop containing 0, and let D * be the connected component of D \ L * containing 0. Let O 1 (respectively, O 1 ) be the union (respectively, collection) of all loops that touch Γ 1 , then clearly O 1 = ∅ with probability 1. If L * ∈ O 1 , then we stop; otherwise we let D 1 be the connected component of D \ O 1 containing 0. In this case, the conformal radius ρ 1 of D 1 seen from 0 has a strictly positive probability to be strictly smaller than 1, and the harmonic measure of ∂ 1 := O 1 ∪ Γ 1 from 0 in D is not smaller than the harmonic measure of Γ 1 in D from 0. We now consider the CLE 16/3 in D 1 , and we let O 2 (respectively, O 2 ) be the union (respectively, collection) of all loops that touch ∂ 1 . If L * ∈ O 2 , then we stop; otherwise we let D 2 be the connected component of D 1 \ O 2 containing 0, and we interate the procedure. After i steps, the conformal radius ρ i of D i seen from 0 is stochastically smaller than a product of n i.i.d. copies of ρ 1 . Since the conformal radius of D * from 0 is strictly positive with probability 1, this shows that, with probability 1, L * is reached in a finite number of steps. Hence, there exists almost surely a finite sequence of loops L 1 , . . . , L n (with L i ∈ O i for each i < n) such that dist(L 1 , Γ 1 ) = 0, dist(L i , L i+1 ) = 0 for any 1 ≤ i ≤ n − 1, L n = L * .
By the same argument, one can find a finite sequence of loops (say, L 1 , . . . , L j ) such that dist(L 1 , Γ 2 ) = 0, dist(L i , L i+1 ) = 0 for any 1 ≤ i ≤ j − 1, L j = L * . The sequence of loops L 1 , . . . , L n−1 , L * , L j−1 , . . . , L 1 satisfies (16) with k = n + j − 1, and the proof is concluded by noting that the mass of each limiting counting measure associated to a loop in that sequence is almost surely strictly positive (see Remark 8.3 of [8]).
The next lemma says that on aZ 2 , with high probability, we can find (for h = 0) a finite sequence of FK clusters in Λ 3,1 whose concatenation almost forms an open crossing of Λ 3,1 in the horizontal direction. Lemma 6. Let E a (K) be the event that there exists a sequence C 1 , . . . , Proof. Let L 2 and L 3 be two distinct loops from a CLE 16/3 realization inside Λ 3,1 such that dist(L 2 , L 3 ) = 0. Because of the convergence of the collection of the lattice boundaries of critical FK clusters to CLE 16/3 , there is a coupling between FK percolation in Λ 3,1 and CLE 16/3 such that the pair (L a 2 , L a 3 ) of lattice boundaries of two FK-open clusters converges a.s. to (L 2 , L 3 ). Under this coupling, we claim that the probability of dist(L a 2 , L a 3 ) = a tends to 1 as a ↓ 0 . Indeed, it is easy to see that if dist(L a 1 , L a 2 ) > a, then there is a 6-arm event of type (100100) (see page 4 of [16] for the precise definition of this event). But by Corollary 1.5 of [16], the 5-arm exponent is 2. Together with RSW, this implies that the critical exponent for a 6-arm event of type (100100) is strictly larger than 2 (see Theorem 1.1 of [16]). Using a proof similar to that of Lemma 6.1 of [13], one can conclude that the probability of seeing a 6-arm event anywhere goes to 0 as a ↓ 0. This completes the proof of the claim.
By a similar argument, using the fact that the exponent for a 3-arm event near a boundary is strictly larger than 1 (Corollary 1.5 of [16]) and hence they do not occur as a ↓ 0, one can prove that, if L 1 is a loop such that dist(L 1 , {0}×[0, 1]) = 0, then there is a coupling between FK percolation and CLE 16/3 in Λ 3,1 such that the FK lattice boundary L a 1 converges a.s. to L 1 and also that the probability that dist(L a lim K→∞ lim inf a↓0 P a Λ 3,3 ,f,0 (N a (K)) = 1.
Proof. We use a standard argument in the percolation literature -see, e.g., Figure 3 in [7] -as follows. It is easy to show that N a (K) contains the intersection of four events which are rotated and/or translated versions of E a (K/4). Note that E a (K/4) is an increasing event. So the lemma follows from the FKG inequality and Lemma 6.
Next, we consider FK percolation with external field a 15/8 h. We say A a 1,3 is good if there is a sequence of open clusters in A a 1,3 (say C 1 , C 2 , . . . , C k for some k ∈ N) such that dist(C i , C i+1 ) = a for all 1 ≤ i ≤ k − 1, dist(C k , C 1 ) = a, C i ←→ g for each i and there is a circuit of vertices in ∪ k i=1 C i surrounding [1, 2] 2 . Lemma 8. Given any > 0, there exist h 0 < ∞ and 0 > 0 such that Lemma 4 implies that for each C i from the definition of N a (K), where the second inequality follows from Lemma 4 and (18).
We are ready to prove Proposition 3. Our argument is similar to ones appearing elsewhere in the percolation literature -see, e.g., the proof of Lemma 5.3 in [7].
Proof of Proposition 3. We first consider FK percolation on aZ 2 . For each z = (z 1 , z 2 ) ∈ Z 2 , let A 1,3 (z) := (z 1 − 1, z 2 − 1) + A 1,3 and A a 1,3 (z) be its a-approximation. We define whether A a 1,3 (z) is good (or not) by the translation of the definition for A a 1,3 and then define a family of random variables {Y z , z ∈ Z 2 } such that Y z = 1 if A a 1,3 (z) is good and Y z = 0 otherwise. Note that the worst boundary condition for the event {A a 1,3 is good} is the free boundary condition on the boundary of Λ a 3,3 . Then by Theorem 0.0 of [38] and Lemma 8, {Y z , z ∈ Z 2 } stochastically dominates a family of i.i.d. random variables {Z z , z ∈ Z 2 } such that P (Z z = 1) = π( 0 , h 0 ) and P (Z z = 0) = 1 − π( 0 , h 0 ) where π( 0 , h 0 ) can be made arbitrarily close to 1 by choosing 0 small and h 0 large.
We note that if A a 1,3 is good then there is a circuit of vertices surrounding [1, 2] 2 in A a 1,3 with each vertex in this circuit connected to g in A a 1,3 . Such a circuit prevents the existence of an FK-open path from the inner boundary ∂ 1 A a 1,3 to the outer boundary ∂ 2 A a 1,3 whose cluster does not contain g. This means that, whenever Y z = 1, there is no such FK-open path from ∂ 1 A a 1,3 (z) to ∂ 2 A a 1,3 (z) whose cluster does not contain g. But whenever F (a, L) occurs, there is a nearest neighbor path (say γ) on Z 2 starting at 0 and reaching at least distance L away from 0 such that Y z = 0 for each z ∈ γ. Pick 0 and h 0 > 0 such that π( 0 , h 0 ) is larger than the critical probability of site percolation on Z 2 . Then Theorem 6.75 of [27] (actually that theorem is for bond percolation but the proof also applies to site percolation) implies that there exists a finite constant C 1 (h) such that P a A(1,L),ρ 1 ,h (F (a, L)) ≤ e −C 1 (h)L . If G comp (a, L) occurs, then there is a * -path (i.e., one that can use both nearest neighbor and diagonal edges) from ∂ 1 A(L, 2L) to ∂ 2 A(L, 2L) such that each vertex in this path is not connected via A(L, 2L) to g. We note that if A a 1,3 is good then there is no such * -path (with the cluster of each vertex on the path does not contain g) from the inner box to the outer boundary of A a 1,3 . The rest of the proof of (15) is similar to that of (14).

3.2.
Exponential decay of σ x ; σ y for large h. Although we do not use it in our current proof, there is a nice BK-type inequality for Ising variables [4] which can at least give partial results on exponential decay; perhaps a more careful use would give complete results. Let B(z, L) := z + Λ L for z ∈ R 2 and L > 0 denote the square centered at z (parallel to the coordinate axes) of side length 2L. Recall that P a h is the infinite volume measure for the Ising model on aZ 2 at critical temperature 1/β c with external field a 15/8 h. Let P a h be the same infinite volume measure except that the external field is 0 in B(x, 1) ∪ B(y, 1). Let · a, h be the expectation with respect to P a h , and P a h be the corresponding FK percolation measure.
For the rest of this section, for simplicity we assume x, y ∈ aZ 2 are on the x-axis; otherwise one has to slightly modify choices of lengths of some squares by factors of 1/ √ 2. For ease of notation, we also suppress the superscript a on various events defined below (A 0 , A 1 z , A c z , A f z ) even though these are all defined in the aZ 2 setting; we keep the superscript a in the various probability measures, such as P a h . To bound σ x ; σ y a,h , we first use the GHS inequality [26] to see that σ x ; σ y a,h ≤ σ x ; σ y a, h .
Let A 0 := {x ←→ y ←→ g}, A 1 z := {z ←→ g} for z = x or y. Then the Edwards-Sokal coupling (like in Lemma 1) gives where for u, v ∈ {f, c}, D u,v : . Next, we show that each term on the RHS of (19) decays exponentially with the desired power law factor a 1/4 .

Proposition 4.
There exist h 0 , 0 ∈ (0, ∞) such that for all h > h 0 , a ≤ 0 , and x, y ∈ aZ 2 with |x − y| > 3, Proof. The proofs for P a h (A 0 ), D f f , D f c and D cf are similar to each other. The proof for D cc is harder.
(1) P a h (A 0 ). In order for A 0 to occur there must be one arm events in both B(x, 1) and B(y, 1), and in the complement of B(x, 1) ∪ B(y, 1) there must be a (long) open path from ∂ in B(x, 1) to ∂ in B(y, 1) with the open cluster (within that complement) of the path not connected to the ghost. We will use Lemma 3 twice to get (C 1 a 1/8 ) 2 and Proposition 3 twice to get the exponential factor. More precisely, define A 0,z andÃ 0,z for z = x or y as A 0,z := {z ←→ ∂ in B(z, 1)} andÃ 0,z be the event that there is an open path from ∂ in B(z, 1) to ∂ in B(z, |x − y|/2) with the open cluster of that path in B(z, |x − y|/2) \ B(z, 1) not connected to g. Then and by taking the worst case boundary condition and using translation invariance, we have by using Lemma 3 and Proposition 3 (twice each): (2) D f f . This proof is close to that for part (1) because whereĀ f z denote the event that there exists a (long) open path connecting ∂ in B(z, 1) to ∂ in B(z, |x − y|/3) within the annulus Ann(z) := B(z, |x − y|/3) \ B(z, 1) with the open cluster of that path (within that annulus) not connected to the ghost. This leads to More generally, by considering the worst boundary condition twice in the sense of where the sup is over all (FK) boundary conditions on both parts of the boundary of Ann(z) and doing that both for z = x and z = y, one gets the last inequality in (3) D f c and D cf . Clearly, D f c = D cf , so we only need to prove decay for D f c . Note that A f x is treated as in the proof of part (2) but A c y is handled by noting that A c y ⊆ {y ←→ ∂ in B(y, 1)}. This leads to D f c ≤ P a B(x,1),w,h=0 (x ←→ ∂ in B(x, 1)) · θ x · P a B(y,1),w,h=0 (y ←→ ∂ in B(y, 1)) ,1),w,h=0 (y ←→ ∂ in B(y, 1)) ≤ C 1 a 1/8 . We consider the worst case boundary condition on ∂ ex B(x, 2|x − y|/3) to get where P a 2/3,w, h and P a 2/3,f, h refer to wired and free boundary conditions on B(x, 2|x−y|/3). As in Proposition 3, let G = G(a, |x − y|/3) denote the event that there is a circuit of vertices surrounding B(x, |x − y|/3) in the annulus Ann(1/3, 2/3) := A(|x − y|/3, 2|x − y|/3) with each vertex in the circuit connected to g within the annulus. Then x |G) corresponds roughly to a wired boundary condition on some random circuit which is inside the wired boundary condition of P a 2/3,w, h . Since A c x is an increasing event, one expects that P a 2/3,w, h (A c x ) − P a 2/3,f, h (A c x |G) ≤ 0 by some stochastic domination argument. Indeed, this inequality is proved in the next lemma. Then, by Proposition 3, Lemma 10. Let C be any deterministic circuit of vertices in the annulus Ann(1/3, 2/3). LetÃ C denote the event that each x ∈ C is connected to g within the annulus and let A C denote the event that C is the outermost such circuit. Then for any increasing event A in the interior of C (including edges to g), With G = ∪ CÃC = ∪ C A C , it follows that for any increasing event E in B(x, |x − y|/3), P a 2/3,f, h (E|G) ≥ P a 2/3,w, h (E). Remark 6. We note that the proof below shows that this lemma extends to quite general annuli (instead of Ann(1/3, 2/3)), boundary conditions (instead of f and w) and magnetic field profiles h(x) ≥ 0 (instead of h).
Proof. For simplicity, we let B denote the a-approximation of B(0, 2|x − y|/3) in this proof. Let D be the interior of C. The stochastic domination (20) will follow from the stronger stochastic domination that P a 2/3,f, h (A|A C ) ≥ P a D,w, h (A) for any increasing event A in D, since P a D,w, h stochastically dominates P a 2/3,w, h on D. To prove (21), it is sufficient to prove that the Radon-Nikodym derivative dP a 2/3,f, h (·|A C )/dP a D,w, h (·) is an increasing function (in the FKG sense). In the following proof, ω out is always in {0, 1} (B(B)\B(D))∪(E (B)\E (D)) . By the h replacing a constant h version of (8), for any ω in ∈ {0, 1} B(D)∪E (D) , where ρ 0 is the configuration with every edge closed and ω in ⊕ ω out denotes the configuration in {0, 1} B(B)∪E (B) whose open edges are all those from ω in or (disjointly) from ω out . Also, For any fixed ω out , let ω = ω in ⊕ ω out andω =ω in ⊕ ω out . If ω ∈ A C , then it is not hard to see that A key observation is Combining (24) and (25) with (22) and (23), we have that , which completes the proof of (21) and thus (20).
We are ready to prove Proposition 2 Proof of Proposition 2. Proposition 2 follows from Lemma 9 and Proposition 4.

4.
Exponential decay for small h and proof of Theorem 1 For N ∈ N, let Λ 3N,N := [0, 3N ] × [0, N ] and Λ a 3N,N be its a-approximation. By the conformal invariance of CLE 16/3 , the conformal covariance of the limiting counting measures [8], and Lemma 5, we have Lemma 11. Let E(K, N ) be the event that there is a sequence of K or fewer loops (say, L 1 , . . . , L k with k ≤ K) such that the total mass of the limiting counting measure corresponding to L i is larger than N 15/8 /K for each i and Then for any > 0, there exists K( ) < ∞ such that Proof. Using the conformal Markov property of CLE, CLE 16 We also have the following lemmas, whose proofs use Lemma 11 and are otherwise similar to those of Lemmas 6, 7 and 8.
Then for any > 0, there exists K( ) < ∞ such that and there is a circuit of vertices in We say A a N,3N is good if there is a sequence of open clusters in A a N,3N (say C 1 , . . . , C k for some k ∈ N) such that dist(C i , C i+1 ) = a for each 1 ≤ i ≤ k − 1, dist(C k , C 1 ) = a, C i ←→ g for each i and there is a circuit of vertices in Lemma 14. Given any h > 0 and > 0, there exist N 0 = N 0 (h, ) ∈ (0, ∞) and N,3N is good) ≥ 1 − for any a ≤ 0 , N ≥ N 0 . Then a proof similar to that of Proposition 2, but using Lemma 14 instead of Lemma 8, gives Proposition 5. For any h > 0, there exists 1 = 1 (h) ∈ (0, ∞) such that for all a ≤ 1 σ x ; σ y a,h ≤ C 4 (h)a 1/4 e −m 2 (h)|x−y| whenever |x − y| > K 0 (h) and x, y ∈ aZ 2 , where C 4 (h), m 2 (h), K 0 (h) ∈ (0, ∞) only depend on h.

Exponential decay in the continuum
In this section, we prove Theorem 2.
Proof of Theorem 2. For any f, g ∈ C ∞ 0 (R 2 ), Theorem 1.4 of [11] (plus some moment bounds that follow from arguments like those used for Proposition 3.5 of [10] The LHS of the above equality before the limit is equal to (E a h (·) := · a,h ) where the last inequality follows from Theorem 1 when a ∈ (0, 1]. Letting a ↓ 0 in (31), and using (30) completes the proof.

Scaling of the magnetization fields
In [10,11], the critical and near-critical magnetization fields were denoted by Φ ∞ and Φ ∞,h (where h is the renormalized magnetic field strength). These are generalized random fields on R 2 so for a suitable test function f on R 2 (including 1 [−L,L] 2 (x)), one has random variables Φ ∞ , f (or R 2 Φ ∞ (x)f (x)dx) and similarly for Φ ∞,h . We now drop the superscript ∞.
Proof. This is a special case of the conformal invariance result (Theorem 1.8) of [10] with the conformal map φ(z) = λz.
Proof. It follows from the results and arguments of [10,11] that the distribution P h of Φ h is obtained from P of Φ by multiplying P by the Radon-Nikodym factor (1/Z L )e h Φ,I [−L,L] 2 and letting L → ∞. Then one applies Theorem 4 to complete the proof.
The following observation, which expands on the discussion about scaling relations in the introduction, may be useful to interpret Theorem 5. In the zero-field case, Φ 0 (λx) is equal in distribution to λ −1/8 Φ 0 (x) in the sense that, with the change of variables z = λx, , where the equalities are in distribution. In the non-zero-field case, provided thath = λ −15/8 h, using Theorem 5 one obtains an analogous relation as follows: Note also thath = λ −15/8 h implies M (Φh) = Ch 15/8 = λ −1 M (Φ h ), where M is introduced in Corollary 2. This is consistent with the interpretation of M as the inverse of the correlation length. As noted in Subsection 1.1, a version Φ h Ω , of Φ h , can be defined in a (simply connected) domain Ω (with some boundary condition). In that case, one can consider a conformal map φ : Ω →Ω (with inverse ψ = φ −1 :Ω → Ω) and give a generalization of Theorem 5, as we do next. The pushforward by φ of Φ 0 Ω to a generalized field onΩ was described explicitly in Theorem 1.8 of [10]. The generalization to Φ h , implicit in [11], is stated explicitly in the next theorem, where we now replace a constant magnetic field h orh on Ω orΩ by a suitable magnetic field function h(z) orh(x).

Upper bound for the mass
In this section we give a proof of Theorem 3. The techniques here are quite different than the FK-based technology used for the proof of Theorem 1. An FK-based approach appears possible and is planned by the authors for a future paper, but currently is longer and yields a weaker conclusion than the approach we present in this section.
Points x in Z 2 will be denoted x = (k, w) with k, w ∈ Z.
Proof of Theorem 3. Supposem > 0 is as in (4); then by the results of [36], for any random variables F and G that are finite linear combinations of finite products of σ (0,w) 's, one has F ; T k G 1,H = Cov(F, T k G) ≤ C F,G · (e −m ) k , (32) where T k translates G k units to the right to be a function of the σ (k,w) 's. Let Σ j (resp., Σ ≤j or Σ ≥j ) denote the σ-field generated by {σ (j,w) : w ∈ Z} (resp., {σ (k,w) : w ∈ Z, k ≤ j (or k ≥ j)}). It follows from the spatial Markov property of our nearest-neighbor Ising model on Z 2 , that the random process X k = (σ (k,w) : w ∈ Z) for k ∈ Z is a stationary Markow chain. Let T denote the transition operator (the transfer matrix in statistical physics terminology); then (32) may be rewritten (using (·, ·) to denote the standard inner product in H 0 := L 2 (Ω, P 1 H , Σ 0 ) where Ω = {−1, +1} Z 2 ) as (F, (T k − P 1 )G) = (F, (T − P 1 ) k G) ≤ C F,G · (e −m ) k , where P 1 is the orthogonal projection on the eigenspace of constant random variables. Now, by reflection positivity for the Ising model (see, e.g., [24] or [5]), it follows that T and T − P 1 are positive semidefinite. By (33), the spectrum of T − P 1 is contained in some interval [0, λ] with λ ≤ e −m . It follows that (33) is valid for F, G any random variables in H 0 and that one may replace C F,G in (33) by (I − P 1 )F · (I − P 1 )G , where · denotes the norm in H 0 , so that (I − P 1 )F 2 = (F, F ) − (P 1 F, P 1 F ) = E(F 2 ) − [E(F )] 2 = Var(F ), where E denotes expectation with respect to P 1 H .
If G ∈ L 2 (Ω, P 1 H , Σ ≥k ) and F ∈ L 2 (Ω, P 1 H , Σ ≤0 ), then by the Markov property Since E(F ) = E(F ) while E(F 2 ) ≤ E(F 2 ) and similarly forG, Using (36) In the limit a i ↓ 0, the mean and second moment (and hence variance and S a i h ) have a finite limit (for decent test functions f and g -see [10]) and so the mass gap M (Φ h ) in the continuum limit would satisfy M (Φ h ) ≥ Bh 8/15 .
If lim sup H↓0M (H)/H 8/15 = ∞, then one could take B arbitrarily large in (38) which would make M (Φ h ) = ∞; i.e., uncorrelated Φ h for spatially separated regions. But (for say f and g indicator functions of squares) this would violate known properties of the magnetization variables for Φ 0 , as follows. By FKG-based arguments (see [43]), such a spatially uncorrelated Φ h would have to be Gaussian white noise. But, for f the indicator function 1 of a square, by another FKG argument, one can compare the magnetization M h := Φ h (1 ) on R 2 to the corresponding M h + with plus boundary conditions on the square to obtain E(e tM h ) ≤ E(e tM h + ) = E(e (t+h)M 0 + )/E(e hM 0 + ), where the equality follows from the proof of Proposition 1.5 in [11]. But Proposition 2.2 of [11] then implies that M h is non-Gaussian. This proves that the C of Corollary 2 is finite and it also follows that lim sup H↓0M (H)/H 8/15 ≤ C < ∞.