Decay of Correlations for the Hardcore Model on the $d$-regular Random Graph

A key insight from statistical physics about spin systems on random graphs is the central role played by Gibbs measures on trees. We determine the local weak limit of the hardcore model on random regular graphs asymptotically until just below its condensation threshold, showing that it converges in probability locally in a strong sense to the free boundary condition Gibbs measure on the tree. As a consequence we show that the reconstruction threshold on the random graph, indicative of the onset of point to set spatial correlations, is equal to the reconstruction threshold on the $d$-regular tree for which we determine precise asymptotics. We expect that our methods will generalize to a wide range of spin systems for which the second moment method holds.


INTRODUCTION
In this paper we consider the hardcore model on random d-regular graphs and study its local spatial mixing properties. We determine the location of a phase transition where the model undergoes a spatial mixing transition after which the spin at a typical vertex becomes dependent over long distances. Theory from statistical physics relates this transition to the clustering or shattering threshold and both of these transitions appear to be related to the apparent computational difficulty of finding large independent sets. No algorithms are known to find independent sets of size (1+ε) log d d n in a random d-regular graph on n vertices which coincides with the spatial mixing threshold. In contrast the maximal independent set is of size (2−o d (1)) log d d n [14]. In this work, we show that the reconstruction or extremality threshold on the infinite d-regular tree determines the onset of long distance point to set spatial correlations in the random d-regular graph. We prove an asymptotic lower bound on the reconstruction threshold which matches the known upper bound in the first two terms of the asymptotic series. Together, these results determine the asymptotic location of the threshold for the random d-regular graph for the onset of point to set correlations over long distances. For a finite graph G = (V, E) an independent set is a subset of the vertices containing no adjacent vertices. Denote the set of independent sets as I(G). We will view an independent set as a spin configuration σ, taking values in {0, 1} V with σ v denoting the spin at the vertex v. The hardcore model (or hardcore measure) is the probability measure over the set of independent sets σ ∈ I(G) given by The parameter λ > 0 is known as the fugacity and controls the typical size of an independent set with larger values of λ putting more of the weight of the distribution on larger independent sets. As usual, Z is a normalizing constant called the partition function. The definition of Gibbs measures and the hardcore model in particular can be extended to infinite graphs by way of the Dobrushin-Lanford-Ruelle condition which essentially says that for every finite set A, the probability of a configuration on A is given by the Gibbs distribution given by a random boundary generated by the measure outside of A. Such a measure is called a Gibbs measure, and it may not be unique (see e.g. [15] for more details).
On the infinite d-regular tree T d , there is a unique Gibbs measure for the hardcore model if and only if λ ≤ (d−1) d−1 (d−2) d . However, for every λ, there exists a translation invariant Gibbs measure given by a Markov model on the tree which we denote by P T d (henceforth, we refer to this as "the translation invariant measure" on T d ). We denote the density of P T d , that is, the probability that a site is occupied, by α = α(λ, d) which satisfies the relation ( Since α = α(λ, d) is a strictly monotone increasing function of λ we will use both parameters interchangeably to specify the model depending on the context. The density of the largest independent set of a d-regular random graph is asymptotically (2 log d − (2 + o(1)) ln ln d)/d [14]. The results we present hold very close to this threshold, up to We take λ c to be the corresponding value of λ. The bulk of this paper is devoted to establishing that the hardcore measure on the random d-regular graph is well approximated locally by the measure P T d when λ < λ c . We prove that the measure converges in a strong notion of local weak convergence described in Section 1.2.
Theorem 1. Let G n be the random d-regular graph on n vertices. Then for large enough d, the hardcore measure on G n with fugacity λ < λ c converges in probability locally to the measure P T d .
Our methods provide a general framework for proving convergence in probability locally which we expect will apply to various other Gibbs measures on random graphs such as colorings or NAE-SAT.
Having established Theorem 1, it is natural to consider properties of the measure P T d . The set of Gibbs measures is convex and so we may ask whether P T d is extremal, that is, it is not a convex combination of other Gibbs measures. Extremality is equivalent to a notion of point to set correlation on trees called the reconstruction problem (for a survey, see [26]).
To formalize the definition of the problem, we will make use of a description of P T d as a Markov model on the tree generated as follows. First the spin at the root is chosen to be occupied with probability α and unoccupied with probability 1 − α, where α is chosen as in (2). The spins of the remaining vertices of the graph are generated from their parents' spins by taking one step of the Markov transition matrix M = p 11 p 10 p 01 p 00 = 0 1 where p i j denotes the probability of the spin at a vertex being j given that the spin of the parent is state i. Since (α, 1 − α) is reversible with respect to M this gives a translation invariant measure on T d which corresponds to the measure P T d with fugacity λ. Let σ(L) denote the spins of the vertices at distance L from the root as generated by the Markov model described above. The reconstruction problem on the tree asks if we can recover information on σ ρ , the spin of the root ρ from the spins σ(L) as L → ∞. Formally, we say that the model (T d , M) has non-reconstruction if lim L→∞ P T d (σ ρ = 1|σ(L)) → α(λ, d) (3) in probability as L → ∞, and otherwise, the model has reconstruction. Non-reconstruction is equivalent to extremality of the Gibbs measure or that the tail σ-algebra of the Gibbs measure is trivial [26]. Information theoretically, non-reconstruction corresponds to fast decay of correlations between the spin at the root and the spins of far away vertices [26]. Proposition 12 of [25] implies that there exists a critical fugacity λ R (or, equivalently, a critical density α R ) such that reconstruction holds for the hardcore model with fugacity λ > λ R and non-reconstruction holds for λ < λ R . The reconstruction problem on the tree was originally studied as a problem in statistical physics but has since found many applications including in computational phylogenetic reconstruction [11], the study of the geometry of the space of random constraint satisfaction problems (CSP's) [1,19] and the mixing time of Markov chains [3,8,22,28,33].
Here we establish tight bounds on the reconstruction threshold for the hardcore model on the d-regular tree 1 . The upper bound was shown by Brightwell and Winkler [9], and our contribution is the lower bound.
Theorem 2. For large enough d, the reconstruction threshold for P T d on the d-regular tree satisfies (ln 2 − o(1)) ln 2 d 2 ln ln d ≤ λ R ≤ (e + o(1)) ln 2 d.
Prior to our work, Martin [21] had shown that λ R > e − 1. Restating Theorem 2 in terms of α we have that the critical density for reconstruction satisfies leaving only an additive (ln ln ln d)/d gap between the bounds. The form of our bound in equation (4) is strikingly similar to the bound for the q-coloring model [30] which states that reconstruction (resp. non-reconstruction) holds when the degree d is at least (resp. at most) q(ln q + ln ln q + O(1)). The next theorem, combined with Theorem 2 gives a precise picture of the local spatial mixing properties of the hardcore model on the random d-regular graph. In [16] a natural extension of the reconstruction problem was introduced for graphs. Let {G n } be a family of random graphs whose size n goes to infinity, and let σ be distributed according to the hardcore model with fugacity λ. We will use σ(S) to denote the configuration on a subset of vertices S and σ v to denote the spin at a vertex v. The model has non-reconstruction if for a uniformly chosen u ∈ V (G n ), where B u (L) denotes the vertices within distance L of u (and by abuse of notation, the induced subgraph), ∂B u (L) denotes the boundary of B u (L) and α(λ, d) is the density given by (2).
Theorem 3. Let λ < λ c and let α(λ, d) be the density given by (2). Let G n be the random d-regular graph on n vertices and let u be a uniformly random vertex in V (G n ). Then, for large enough d, That is, the random d-regular graph has non-reconstruction if and only if (T d , M) has non-reconstruction.
1.1. Related work. A significant body of work has been devoted to the reconstruction problem on the d-regular tree by probabilists, computer scientists and physicists for a number of different spin configuration models. The earliest such result is the Kesten-Stigum bound [18] which states that for a Markov model defined on the tree, reconstruction holds whenever θ 2 (d − 1) > 1, where θ is the second largest eigenvalue of the corresponding Markov matrix. This bound was shown to be tight in the case of the Ising model [6,13] where it was shown that non-reconstruction holds when θ 2 (d − 1) ≤ 1. Similar results were derived for the Ising model with small external field [8] and the 3-state Potts model [29] which constitute the only models for which exact thresholds are known. On the other hand, for the hardcore model θ 2 (d − 1) = (1 + o(1)) 1 d ln 2 d and thus at least when d is large, the Kesten-Stigum bound is known not to be tight [9]. In both the coloring model and the hardcore model the reconstruction threshold is far from the Kesten-Stigum bound for large d. In the coloring model close to optimal bounds on the reconstruction threshold [5,30] were obtained by first showing that, when n is small, the information on the root is sufficiently small. Then a quantitative version of [17] establishes that the information on the root converges to 0 exponentially quickly. In this work, we show that the hardcore model behaves similarly.
1.1.1. Replica Symmetry Breaking and Finding Large Independent Sets. The reconstruction problem plays a deep role in the geometry of the space of solutions of random CSPs. While for problems with few constraints the space of solutions is connected and finding solutions is generally easy, as the number of constraints increases the space may break into exponentially many small clusters. Physicists, using powerful but non-rigorous "replica symmetry breaking" heuristics, predicted that the clustering phase transition exactly coincides with the reconstruction region on the associated tree model [20,19]. This picture was rigorously established (up to first order terms) for the coloring and satisfiability problems [1] and further extended to sparse random graphs by [24]. When solutions are far apart, local search algorithms will in general fail. Indeed for both the coloring and SAT models, no algorithm is known to find solutions in the clustered phase. It has been conjectured to be computationally intractable beyond this phase transition [1]. Previous results [16,24] have related the reconstruction problem on the Poisson tree with constant expected degree with reconstruction in sparse random graph ensembles. These results established a "replica" condition saying that the empirical distribution of pairs of spins at a vertex from two independent configurations are from a product measure. This does not apply in the case of the hardcore model since the degree of a vertex affects its probability of being in the independent set. At the same time, for the d-regular random graph the methods of [24] do not seem to be directly applicable and we approach the problem instead using the theory of local weak convergence of Gibbs measures. The associated CSP for the hardcore model corresponds to finding large independent sets in random d-regular graphs. The replica symmetric heuristics again predict that the space of large independent sets should be clustered in the reconstruction regime. Specifically this refers to independent sets of size αn where α > α R , the density of 1's in the hardcore model at the reconstruction threshold, and roughly half the density of the largest independent set [14]. On the other hand the best known algorithm finds independent sets only of density (1+o(1)) ln d d which is equal to α R asymptotically as d → ∞ [34]. This is consistent with the physics predictions and it was shown that on Erdős-Renyi random graphs, independent sets exhibit the same clustering phenomena [10] as colorings and SAT [1,19] at the reconstruction threshold and one would expect this to also be the case for random regular graphs. By determining the reconstruction threshold on such graphs we provide further evidence supporting the computational hardness of finding large independent sets in random graphs. Sufficiently close to the satisfiability threshold many CSPs including the hardcore model are believed to undergo an additional phase transition called the condensation [19]. Beyond this transition the second moment method fails and the distribution places most of its weight on a constant number of clusters [2]. After the condensation transition it is believed that the hardcore measure no longer converges locally to P T d explaining the necessity of an upper bound on λ in our theorems.

Local Weak Convergence.
There are a number of natural notions of local weak convergence of Gibbs measures and we introduce these now, following the notation used in [23]. Let T d denote the space of hardcore Gibbs measures on T d endowed with the topology of weak convergence and let M d to be the space of probability measures over T d . For a sequence of graphs G n we denote a hardcore measure by µ n while ν denotes a hardcore measure on T d . The notation T d (L) will denote the restriction of the tree T d to a ball of radius L around the root (and by abuse of notation, we also use it to denote the set of vertices of the restriction). The shorthand µ L n or ν L denote the restrictions of the corresponding measures to a ball of radius L. For a measure on Gibbs measures m ∈ M d , we let m L denote the measure on the space of measures on {0, 1} T d (L) induced by such projections.
Definition 1.1. Consider a sequence of graphs-Gibbs measure pairs {(G n , µ n )} n∈N and for v ∈ V (G n ), let P L n (v) denote the law of the pair (B v (L), σ(B v (L))) when σ is drawn with distribution µ n . Let U n denote the uniform measure over a random vertex u ∈ V (G n ). Let P L n = E U n (P L n (u)) denote the average of P L n (u). Let δ T d (L) denote the Dirac measure on graphs which is 1 on T d (L).
A. The first mode of convergence concerns picking a random vertex u and a random local configuration in the neighbourhood of u. Formally, forν ∈ T d we say that {µ n } n∈N converges locally on average toν if for any L, B. A stronger form of convergence involves picking a random vertex u and the associated random local measure P L n (u) and asking if this distribution of distributions converges. Formally, we say that the local distributions of {µ n } n∈N converge locally to m ∈ M d if it holds that the law of P L n (u) converges weakly to δ T d (L) × m L for all L.
C. If m is a point mass onν ∈ T d and if the local distributions of {µ n } n∈N converge locally to m then we say that {µ n } n∈N converges in probability locally toν. Equivalently convergence in probability locally toν says that for any L and any ε > 0 it holds that Remark 1.2. As noted in [23], C ⇒ B ⇒ A while in [32] it is noted that if the measures ν are extremal Gibbs measures then the three notions of convergence A, B and C are equivalent.
At a high level, convergence locally on average to ν means that after averaging the local distribution of configurations over all the vertices, the random configuration converges weakly to ν while convergence in probability locally to ν means that the local distribution at almost every vertex is close to ν eventually. As noted above the former is a weaker condition and is in fact much simpler to prove. One can apply the second moment method for the hardcore model on the random d-regular graph for a large range of λ to relate the hardcore measure to its planted version where one first chooses a random independent set and then constructs a uniformly chosen graph compatible with the set. By exploring the graph in the planted measure by progressively revealing its edges one can show convergence locally on average to the measure P T d and via the second moment method this can be extended to the original hardcore distribution. This argument does not imply the stronger local convergence in probability and indeed, if one assumes the picture developed in statistical physics, in the condensation phase one expects local convergence of type A but not convergence of type B or C. In order to investigate the reconstruction problem it is necessary to work with local convergence in probability. Much of the work of the paper involves showing how the second moment method can be used to imply this stronger notion of convergence. Thus, our proof shows that for the hardcore model, up to the fugacity for which the second moment method holds, the notions A and C of local convergence of measures are equivalent. Our methods are quite general and should apply to a broad range of CSPs and Gibbs measures on graphs. Roughly speaking, one would need to show a corresponding bound on the second moment of the partition function and concavity of the log-partition function. One would also need to show that the partition function changes by a bounded amount when an edge is added and as such, our method should be applicable to non-zero temperature models.
1.3. Outline of the proof. We begin by establishing a lower bound on the reconstruction threshold for the d-regular tree in Section 2, proving Theorem 2. We show that when α is bounded by the lower bound in (4) then even for a tree of depth 3 there is already significant loss of information of the spin at the root. In particular we show that if the spin of the root was 1 then the typical posterior probability that the spin of the root is 1 given the spins at level 3 will be less than 1 2 . The result is completed by linearizing the tree posterior probability recursion similarly to [8,29]. In this part of the proof we closely follow the analysis of [8] who analyzed the reconstruction problem for the Ising model with small external field. We do not require the full strength of their analysis, however, as in our case we are far from the Kesten-Stigum bound. We show that a quantity referred to as the magnetization decays exponentially fast to 0. The magnetization provides a bound on the posterior probabilities and this completes the result. The ln ln d term in our bound on λ R in Theorem 2 is explained as the first point at which there is significant decay of information at level 3 on the tree. In particular the analysis in Proposition 2.3 part c) is essentially tight. It may be possible to get improved bounds by considering higher depth trees although the description of the posterior distribution necessarily becomes more complex. A sharper analysis of this sort was done in [30] for the coloring model although the method there made crucial use of the symmetry of the states. The bulk of the paper concerns proving local weak convergence to P T d for the hardcore model on the random d-regular graph and this is shown in Theorem 5.4 in Section 5. Our main tool is a new approach to the use of the second moment method. We select say, n 3 5 randomly chosen vertices in the d-regular random graph, and consider a "punctured" graph with the local neighborhoods of these vertices removed. The punctured graph is used to study the partition function of the original graph conditional on the configuration of the boundaries of these neighborhoods. The second moment method in combination with Azuma's inequality implies that the partition function conditioned on a boundary configuration is within a multiplicative factor of exp(O(n 1 2 +ε )) of the expected partition function. We prove convergence in probability locally by showing that it is extremely unlikely that a constant fraction of the n 3 5 randomly chosen vertices have a local measure which is far from the translation invariant measure on the tree. Indeed, we show that this would entail the existence of a set of configurations on the set of boundary vertices which has a constant probability under the hardcore measure but expected probability of only exp(−cn 3 5 ). In Proposition 5.1 we show that this is precluded by the second moment method. One strength of our approach is that it does not require the detailed calculations of the small graph conditioning method. In many spin systems, including the one studied here, the ratio of the second moment of the partition function to the square of the first moment tends to a value > 1 and so the second moment method cannot be used to estimate the partition function with probability tending to 1. In this case, small graph conditioning can be used to give estimates on the partition function [35]. The first and second moments of the hardcore partition function for a d-regular random graph are derived in Section 3 while the calculations for the punctured random graph appear in Section 4. The remaining proof involves establishing the requisite bound on the second moment itself. This involves determining the maximum of a function which corresponds to the expected number of pairs of independent sets in a random regular graph with a given overlap between them. In Proposition 3.3, which is proved in Section 6, we consider the scaled log-partition function, determine its maximum and show that it decays quadratically near its maximum. This is a key fact used in relating the first and second moments of the partition functions of the random graph in Section 4.

UPPER BOUND ON THE RECONSTRUCTION THRESHOLD ON THE TREE
In this section we present the proof of Theorem 2. We start by noting that for any finite restriction of T d to its first n levels, we can use the Markov matrix M as before to generate an independent set from the hardcore measure by setting the spin of the root to be occupied with probability α and then applying the matrix as before to generate the spins at the children recursively until we reach the leaves of the tree. We define the following quantities which are related to the transition probabilities of the Markov matrix M. Let As mentioned in the introduction, θ, the second eigenvalue of M, plays a particularly important role in the reconstruction problem.
For ease of notation, we will establish non-reconstruction for the model ( T d , M) where T d is the d-ary tree (where each vertex has d children) rather than on the d-regular tree. It is not difficult to modify the recursion we will obtain for the d-ary tree to a recursion for the (d + 1)-regular tree, showing that non-reconstruction also holds in that case. Finally, we can show that non-reconstruction on the d-regular tree is equivalent to non-reconstruction on the (d + 1)-regular tree once we note that in equation (4) we have that so the difference can be absorbed in the error term. We will use T to denote a finite tree whose root will be denoted x. Let P 1 T , E 1 T (and resp. P 0 T , E 0 T and P T , E T ) denote the probabilities and expectations with respect to the measure on the leaves of T obtained by conditioning on the root x to be 1 (resp. 0, and stationary). Let L = L(n) denote the set of vertices of T at depth n and let σ(L) = σ(L(n)) denote the configuration on level n. We will write P T (·|σ(L) = A) to denote the measure conditioned on the leaves being in state A ∈ {0, 1} L(n) . As in [8] we analyze the weighted magnetization of the root of T which is a function of the random configuration the vertices at distance n from the root and defined as follows: Notice that since E T (P T (σ x = 1|σ(L))) = P T (σ x = 1) = α, by (8), we have that E T (X) = 0. Also, from the first line of (8), it can be verified that X ≤ 1 since P T (σ x = 1|σ(L) = A) ≤ 1 for any A. We will also make use of the following second moments of the magnetization.
The following equivalent definition of non-reconstruction is well known and follows from the definition in (3) using (8).
In the remainder of the proof we derive bounds for X. We begin by showing that already for a 3 level tree, X becomes small. Then we establish a recurrence along the lines of [8] that shows that once X is sufficiently small, it must converge to 0. As this part of the derivation follows the calculation in [8] we will adopt their notation in places. Non-reconstruction is then a consequence of Proposition 2.1. In the next lemma we determine some basic properties of X. Lemma 2.2. For any n ≥ 1, the following relations hold: Note that for any random variable Y = Y (A) which depends only on the states at the leaves, we Parts a) and b) therefore follow since X is a random variable that is a function of the states at the leaves. For part c) we proceed as follows. The first and last equalities below follow from (8).
The second part of c) follows by combining this with a) and the fact that E T (X) = 0.
The following proposition estimates typical posterior probabilities which we will use to bound X. Let T (n) denote the tree which is the restriction of T d to its first n levels. For a finite tree T , let T i be the subtrees rooted at the children of the root u i .
For a finite d-ary tree T we have that a) For any configuration at the leaves A = (A 1 , · · · , A d ), Then P 0 Proof. Part a) is a consequence of standard tree recursions for Markov models established using Bayes rule. For part b) first note that Now, where the first and third equations follow by definition of conditional probabilities and the second follows from (9) and the definition of G which establishes b).
For part c), we start by calculating the probability of certain posterior probabilities for trees of small depth. With our assumption on α we have that Also, Using the two equations above, we have that The first case above corresponds to leaf configurations of the tree T (1) where at least one of the leaves is 1, while the second case corresponds to the configurations where all the leaves are 0. Next, applying part a) to a tree of depth 2, we have ) Using this expression we can write down this conditional probability based on the leaf configurations of the subtrees of the root of depth 1.
The first case above corresponds to the situation when each subtree of the root of depth 1 has a leaf configuration where at least one of the leaves is 1. The second case is when one of the d subtrees has a leaf configuration where all leaves are 0, while the remaining subtrees have leaf configurations where at least one leaf is 1. The third case corresponds to the remaining possibilities.
By part b) with G as defined, and (10) we have that after substituting the expressions for λ and ω, We can now calculate the values of P 1 T (3) (σ x = 0|σ(L)) as follows. By part a) Thus, p is the probability that if we started with σ ρ = 0 in T (2), the configuration at the leaves is from G. If we start with σ ρ = 1 in T (3), the number subtrees of the root with leaf configurations in G is distributed binomially and will be about d p. By Chernoff bounds, and the bound on p from (11), Finally, by the definition of G, and hence, By taking d large enough above, we conclude that for β as in the assumptions and large enough d, Proof. By part c) of Lemma 2.2, and part c) of Proposition 2.3, Next, we present a recursion for X and complete the proof of the main result. The development of the recursion follows the steps in [8] closely so we follow their notation and omit some of the calculations. Magnetization of a child. Let T be a finite tree with root x as before. Let y be a child of x and let T be the subtree of T rooted at y (see Figure 1). Let A be the restriction of A to the leaves of T . Let Y = Y (A ) denote the magnetization of y.
. The proof follows from the first part of Lemma 2.2 and the Markov property when we condition on x. Next, we can write the effect on the magnetization of adding an edge to the root and merging roots of two trees as follows. Referring to Figure 2, let T (resp. T ) be a finite tree rooted at y (resp. z) with the channel on all edges being given M, leaf states A (resp A ) and weighted magnetization at the root Y (resp. Z). Now add an edge (ŷ, z) to T to obtain a new treeT . Then mergeT with T by identifying y =ŷ to obtain a new tree T . To avoid ambiguities, denote by x the root of T and X the magnetization of the root of T . We let A = (A , A ) be the leaf state of T . LetŶ be the magnetization of the root ofT .
Note: In the above construction, the vertex y is a vertex "at the same level" as x, and not a child of x as it was in Lemma 2.5.
The proof follows by applying Bayes rule, the Markov property and Lemma 2.2. These facts also imply that Lemma 2.7. For any treeT , With these lemmas in hand we can use derive a recursive upper bound on the second moments. We will use the expansion Taking r = π 01 YŶ , by Lemma 2.7 we have where the last inequality follows since X ≤ 1 with probability 1.
If A − ∆B ≥ 0, this would already give a sufficiently good recursion to show that X(n) goes to 0, so we will assume is negative and try to get a good (negative) lower bound. First note that by their definition , to minimize it, its sufficient to consider the extreme cases. When ρ = 0, A is minimized at the upper bound of ρ and hence Hence, we have Applying this recursively to the tree, we obtain the following recursion for the moments.
We bound the (1 + x) k − 1 term as, and this implies the following recursion.
Proposition 2.8. If for some n, X(n) ≤ α 2 , we have that When α = 1 d ln d + ln ln d − ln ln ln d − β and β > ln 2 − ln ln 2, by Lemma 2.4, for d large enough, Hence by equation (14) we have that X(n) → 0 and so by Proposition 2.1 we have non-reconstruction. Since reconstruction is monotone in λ and hence in α it follows that we have non-reconstruction for α ≤ α R for large enough d. This completes the proof of Theorem 2.

PARTITION FUNCTION OF THE HARDCORE MODEL FOR RANDOM d-REGULAR GRAPH
In this section, we derive expressions for the first and second moments of the hardcore partition function for the d-regular random graph. The calculations are along the lines of those in [27] and we adopt their notation here. We will work with the configuration model for random graphs, described below, in order to simplify the calculations. H (n, d) denote the set of all d-regular (multi)graphs on n vertices and G(n,d) the subset of d-regular simple graphs. The analysis of the properties of a random graph in G(n,d) can often be simplified by making use of the configuration model, introduced by Bollóbas [7]. Fix d and n such that dn is even. Define a d-regular multigraph on n vertices via the configuration model as follows. Begin by replacing each vertex with d distinct copies and then generate a uniformly random pairing of the dn distinct points. Finally, collapse the d copies corresponding to each vertex back into one vertex, obtaining a uniformly random multigraph in H (n, d). Let S be the event that the multigraph obtained is simple. Clearly, on the event S, the graph obtained is uniformly distributed over G(n,d). Moreover, for fixed d,

Configuration model. Let
where the o(1) term tends to 0 as n → ∞. Since the probability in (15) is uniformly bounded away from 0, any event that holds asymptotically with high probability for H (n, d) also holds asymptotically with high probability when we condition on S, i.e. for G(n,d). In what follows, by "d-regular random graph", we will mean the multigraph generated by the configuration model, unless mentioned otherwise.
One useful property of the configuration model that we will make use of repeatedly is that the pairings of the dn distinct points may be revealed sequentially. That is, given a vertex v, we may reveal the pairings of its d copies one by one so that the distribution of pairings over the remaining unmatched points remains uniform. Notation: In the sequel, we will use f (n) = Θ(g(n)) to mean equality of the functions up to polynomial factors in n. We will assume throughout that quantities of the form an, αn, γn, εn are integers. We use "with high probability" to mean with probability going to 1 as n → ∞. In what follows, we will use σ u to denote the restriction of an independent set σ of the graph to a vertex u. The restriction of σ to a subset of vertices S will be denoted by σ(S).

3.2.
The first moment of the partition function. In this section, we calculate the first moment of the partition function for the hardcore model on the d-regular random graph. For an independent set σ ∈ I(G), let |σ| denote the number of vertices in I. For fugacity λ, the partition function is given by Let 0 ≤ α ≤ 1/2 and let Z G,α = Z G,α (λ) be the contribution to the partition function from independent sets of size αn, i.e.
The following approximation will be useful in simplifying the probabilities obtained in the sequel. Let a > 0 be a constant. Then, by Stirling's approximation, Let Lemma 3.1. Let G ∼ H (n, d). Fix λ > 0 and 0 ≤ α ≤ 1 2 . The first moment of Z G,α is given by where Proof. The first equality follows by calculating the probability in the configuration model that a given subset of αn vertices is an independent set, i.e. that the vertices in the subset are not matched to vertices in the subset itself. The second equality follows by (16).
For λ > 0, it can be verified that the maximum of Φ is achieved at α * = α * (λ, d), which is the solution to the equation which is obtained by differentiating Φ. To solve, if we were to set α = x/d, this would reduce roughly to solving xe x = λd and thus we obtain that Notice that the relation (19) between α, λ and d is equivalent to (2).
3.3. The second moment of the partition function. To estimate the second moment E((Z G,α ) 2 ), we consider the contributions from pairs of independent sets S, T each of size αn. We divide this according to the size of the overlap |S ∩ T | = γn and according to the number εn of edges of the graph which go from each of S, T to the complement V \ (S ∪ T ). Call this contribution Z G,α,γ,ε . That is, for (α, γ, ε) in the region we define G,α,γ,ε .
Calculating the probability in the configuration model that a pair of subsets of vertices S and T as above are both independent sets, we have that for G ∼ H (n, d), G,α,γ,ε = λ 2αn n αn αn γn The following function arises in the approximation of the expression in (21) f (α, γ, ε) = 2α ln(λ) and In particular, in Section 6 we will show that the logarithm of E(Z G,α,γ,ε ) scaled by n is well approximated by f , and for λ < λ c , f decays quadratically around its maximum. G,α,γ,ε = exp (n f (α, γ, ε) + O(ln(n))) .

PARTITION FUNCTION OF A PUNCTURED RANDOM GRAPH
In this section we study the effect on the hardcore measure of a d-regular random graph of conditioning on the spins of a small number of the vertices. In order to do this, we analyze the partition function of a punctured graphG obtained from a d-regular random graph G by deleting a small fraction of vertices and their neighborhoods. Define the following quantities with respect to a graph G = (V, E). Let d(u, v) = d G (u, v) denote the distance between two vertices u, v ∈ V . For a vertex u ∈ V and integer r, the r-neighborhood of u, denoted B r (u) and its (vertex) boundary are defined as Proof. Suppose a vertex v ∈ B is in ∂B r (u 1 ) ∩ ∂B r (u 2 ) for some u 1 , u 2 . We know that with high probability it is not in any third ∂B r (u 3 ), otherwise there are 3 centers close together contradicting Lemma 4.1. Therefore its degree inG is at least d − 2 since there are at most two vertices adjacent to it in ∪ v∈S B r−1 (v). In the other case, v ∈ ∂B r (u) for a unique vertex u ∈ S and hence its degree inG is d − 1. The bounds on the numbers of these vertices follow by Lemma 4.1 and applying the second moment method. The bound on the size of k follows immediately from Lemma 4.1.
In what follows we will sometimes work in the conditional space of the configuration model where G is such that the conclusions of Lemma 4.1 and Corollary 4.2 hold forG. Since the configuration model allows us to expose edges and maintain the uniform distribution over pairings of the unmatched pairs, under the conditioning,G is a graph chosen according to the configuration model where the degrees of the vertices are modified appropriately, and we denote this set of graphs byĤ (n, d). We useP andÊ to denote the corresponding conditional expectation and probabilities. (22) and let σ ∈ {0, 1} B . Define Z G,σ to be the partition function over independent sets of G whose restriction to B is σ, i.e., In this section, we will show that in expectation, for the boundary B as defined in (22)  E ZG ,α,σ = λ L 1 +L 2 +αm m αm

The First Moment of the Partition Function ofG. Let B be the subset of vertices defined in
In what follows, let G ∼ H (n, d) and letG be defined as above. Recall that α * is given by the solution to (19). Let σ 0 denote the empty independent set on B. (1) For all σ ∈ {0, 1} B , and 0 ≤ α ≤ 1 2 , (2) Let α be such that |α − α * | < Cn − 2 5 for a constant C > 0. Then, where Proof. We compare the formula (24) for σ and σ 0 . Let N 1 , N T be the corresponding quantities for σ 0 as defined before. Note that for i = 1, 2, L i = 0 and N T = N T . Comparing the numerators and denominators of the fraction in (24) we obtain that where the last line follows since The last bound follows since L 2 ≤ O(n   G m ,α,γ,ε respectively as Z G,α and Z (2) G,α,γ,ε were defined, with G = G m .
Lemma 4.6. For any 0 ≤ α ≤ 1 2 , Proof. Let N 1 and N T be defined as in (24). Define N * 1 and N * T analogously for α * . Note that for the configuration σ 0 , L i = 0 for i = 1, 2 and N T = N * T . Comparing the expressions in (24) and (17), we have thatÊ where the last line follows by the bounds on m and M i for i = 1, 2.

4.2.
The Second Moment of the Partition Function ofG. As before, we divide the second moment E((ZG ,α,σ ) 2 ) into the contribution from pairs of independent sets S and T ofG whose restriction to B is σ, ∑ v∈V \B S v = ∑ v∈V \B T v = αm and |(S ∩ T ) \ B| = γm. We can further divide according the number εdm of half-edges which are paired from each of S and T to V (G) \ (S ∪ T ). Denote this contribution by Z (2) G,α,γ,ε,σ . Thus, we can write G,α,γ,ε,σ .
As before, let L denote |σ| with L i denoting the numbers of vertices of B in the independent set of degrees d − i for i = 1, 2. Define M i as before and let K i = M i − L i . Calculating the probability in the configuration model that a pair of subsets S and T as above are independent sets we obtain E Z (2) G,α,γ,ε,σ = λ 2αm+2(L 1 +L 2 ) m αm αm γm We will show below that the second momentÊ (Z G,σ ) 2 is roughly the square of the first moment E (Z G,σ ) by an analysis similar to that in [27,Theorem 6.11] and [31,Lemma 3.5].
To prove Proposition 4.8, we need a few intermediate lemmas.
Proof. For the configuration σ 0 , for i = 1, 2, L i = 0 and K i = M i . Comparing the expressions (21) and (25), we obtain that for G m ∼ H (m, d), The final equality follows by Proposition 3.2 and the bounds on the sizes of m, M 1 and M 2 by the assumed conditioning. By Proposition 3.3, we have that for some constant C, possibly depending on d and λ, Therefore, we obtain that Proof. Using the expression (25) for each of E(Z (2) G,α,γ,ε,σ 0 ) and E(Z G,α,γ,ε,σ ) and taking ratios of the numerators and denominators separately for each of the products, we havê Since γ * = (α * ) 2 , ε * = α * (1 − 2α * ) and |σ| = L 1 + L 2 , the last line gives that and the lemma follows.
Putting these results together we now prove Proposition 4.8.

24
NAYANTARA BHATNAGAR, ALLAN SLY, AND PRASAD TETALI Applying Lemmas 4.9 and 4.10 to the terms in the product above, we have that For a constant C, define the set Note that for some sufficiently large C, if (α, γ, ε) ∈ R C , the the right-hand side of (26) can be bounded by exp O(n 1 5 ) . On the other hand, if (α, γ, ε) ∈ R C for any constant C, then the right-hand side of (26) can be made arbitraily small and therefore, for any (α, γ, ε) ∈ R and σ E Z Proof. Applying the Cauchy-Schwarz inequality, Proposition 4.8 and Lemma 4.10, we havê The next step is to show a bound on the second moment of ZG ,σ 0 by the square of the first moment, and we begin with the following intermediate result. Proof. As before, we note that for the configuration σ 0 , for i = 1, 2, L i − 0 and K i = M i . Comparing the expressions (21) and (25) we obtain that where the last equality follows by canceling terms and using the fact that γ * = (α * ) 2 and ε * = α * (1 − 2α * ). Similarly, comparing (17) and (24), we obtain that Combining Proposition 3.2, Lemma 3.1 and Lemma 6.2, we have that Finally, putting together (27), (28) and (29) proves the lemma. Proof. Applying the Cauchy-Schwartz inequality, we havê where the second inequality is by Proposition 4.8 and the third inequality is by Lemma 4.12.
Define Z G,σ to be the partition function over independent sets of G whose restriction to B is σ. Extending this, define Z G,α,σ to be the partition function over independent sets of G whose restriction to B is σ and for which α fraction of the vertices in V (G) \ B are in the independent set. That is, Lemma 4.14. For any σ ∈ {0, 1} B and 0 ≤ α < 1 2 , the partition functions for G andG can be related by Z G,σ = κ(σ)ZG ,σ and Z G,α,σ = κ(σ)ZG ,α,σ where κ(σ) is a constant depending only on the configuration σ and has a product structure and κ i = κ j for 1 ≤ i, j ≤ k.
Proof. By the Markov property.
Putting these results together, we obtain the following. Proof.

LOCAL WEAK CONVERGENCE TO THE FREE MEASURE ON THE TREE
The first result in this section shows that there does not exist a "bad set" of neighborhoods with large stationary probability where the partition function is much larger than the expected partition function. and is at most exp −n (1)) .
The choice of 4 7 could be replaced with any constant less than 3 5 and greater than 1 2 .
Proof. Suppose that there is a set B of boundary configurations such that E 1 (B) and E 2 (B) hold.
Define the set of configurations Suppose it was the case that Then, This contradicts (30) (that E 1 (B) holds) and thus we may assume that Therefore, by (31) By (32), and Markov's inequality, By the definition of D and Proposition 4.15 we havê Putting together (33) and (34), we obtain that which completes the proof of Proposition 5.1.
Recall the definition of the set of vertices S from (23) and recall that for each s i ∈ S , W i is the set of vertices on the boundary ∂B r (s i ) and |S | = k. Fix 1 ≤ i ≤ |S |. Let T 1 , . . . , T |W i | be (d − 1)-ary trees and letT be their union. Let PT be the product of the free measures on the trees. Let us identify the roots u (i) = {u 1 , . . . , u |W i | } of these trees with the vertices of W i . Let T be the tree obtained by joining toT a d-ary tree of depth r whose leaves are identified with the u i . If σ ∼ ν, then for each 1 ≤ i ≤ k, the σ(W i ) are independent and ν(σ(W i ) ∈ ·) = P T (σ(u (i) ) ∈ ·) = P T d (B r (ρ) ∈ ·).
Proof. By relating the occupation probability of the root for the free measure for the d-regular tree and the occupation probability of the root of the free measure on the (d − 1)-ary tree, it can be verified that By the Markov property, and (35) follows.
Let P G n denote the hardcore measure on the random d-regular graph of size n. Recall that we have local weak convergence to the free measure if for all all r, with high probability over a uniformly chosen vertex u, for ε > 0, as n → ∞, The following lemma will be used in the next result.
Lemma 5.3. Let I be an index set, (X i ) i∈I random variables and (a i ) i∈I constants. Suppose that for each i, Proof. We show first that The equivalence above is immediate and the first implication can be seen as follows: Applying Markov's inequality, we have We will show the following result which in turn implies (37), since with high probability k = (1 − o(1))n 3/5 .
Let ω be an independent set drawn according to the hardcore measure on G. For any ε > 0, as n → ∞.
Proof. Let E be the event that the left hand side of (38) is at least δ > 0. If E occurs, then by the definition of total variation distance, there exists a set J of indices of size δk and A i ∈ {0, 1} W i for i ∈ J such that Using the fact that ∑ i∈J 1(ω(W i ) ∈ A i ) ≤ δk, we obtain that Combining (39) and (40) and using the fact that k ≥ (1 − o(1))n 3 5 , we get Define the set of configurations By (41), on the event E In particular, (42) holds when G ∼Ĥ (n, d). By Proposition 4.15, for any σ ∈ {0, 1} B and G ∼Ĥ (n, d), so that by the Paley-Zygmund inequality, Indeed, by Markov's inequality, it follows that for any κ > 0 On the other hand, by Azuma's inequality, for all κ > 0, Together, (43) and (44) imply that the interval [ln(Ê(Z G,σ ) − ln 2, n κ + lnÊ(Z G,σ )] ∩ [Ê(log Z G,σ ) − n 1/2+κ ,Ê(log Z G,σ ) + n 1/2+κ ] is non-empty and hence |Ê(ln Z G,σ ) − ln(Ê(Z G,σ ))| ≤ 2n 1 2 +κ . Plugging this into (44), we obtain that for each fixed σ, and combining with (42), we have that That is, we have shown that the event E 2 (B), as defined in Proposition 5.1 holds with high probability on the event E. By Corollary 4.5 and Lemma 4.14 for each σ ∈ B, we havê Summing this over all possible σ, we havê and comparing these two equalities, we obtain that Combining the above bound with (45), we obtain that ∑ σ∈BÊ (Z G,σ ) ≤ exp −cn 3 5 Ê (Z G ).
Theorem 5.4 implies (37), which establishes local weak convergence to the free measure, proving Theorem 1. Given local weak convergence, the equivalence of the reconstruction thresholds and hence Theorem 3, follow.

TECHNICAL LEMMAS ABOUT THE PARTITION FUNCTION
In this section we show that the second moment E(Z 2 G ) is close to the the square of the first moment E(Z G ) and satisfies a quadratic decay property. Let Recall that E Z G,α,γ,ε = λ 2αn n αn αn γn We also recall that the following function arises naturally in the estimation of the second moment: Using (16) to compare terms in (47) α,γ,ε = exp(n f (α, γ, ε) + O(ln n)). Thus the second moment depends on the behavior of the function f . We will show a series of technical lemmas showing that f attains its maximum at (α * , γ * , ε * ) and decays quadratically around this point. Define ln d → 0 as d → ∞ and C > 3. Lemma 6.1. For each fixed α, γ in the region R , the function f has a local maximum at Proof. Differentiating (48), we have that the derivative of f is given by Since the hardcore model is a permissive model, we may assume that the local maxima of f are in the interior of R (see e.g. [12,Proposition 3.2]. Solving for ε by setting ∂ f ∂ε = 0, gives that the unique solution in the interior of R is ε = ε. Further, we check that the second derivative and hence ε is a maximum.
We will also use the following technical lemma.
Proof. Substitutingγ = α 2 andε = α(1 − 2α) in the expression for f (α, γ, ε), and simplifying, we have Define the function g(α, γ) ≡ f (α, γ, ε(α, γ)) and consider its extremal values for fixed α in the region where α, γ ≥ 0 and α − γ ≥ 0. The derivative of g is given by while its second derivative is We establish the behavior of the function f near its maximum by showing several facts about g.
Proof of Proposition 3.3. By Lemma 6.1, for fixed α and γ, f has a maximum at ε(α, γ). Thus, it remains to find the maximum of g(α, γ). In what follows we will show that for fixed α, g is maximized at (α, α 2 ). Noting that ε(α,γ) = α(1 − 2α), it follows that f is maximized at (α * , γ * , ε * ) since by Lemma 6.2, f (α, α 2 , α(1 − 2α)) = 2Φ(α), and by (19), Φ is maximized at α * . We can verify that (α, α 2 ) is a stationary point of g by computing the first derivative. In what follows, we establish that for fixed α < α c , (α, α 2 ) is a global maximizer of g(α, γ) by considering several possible ranges for γ. This will consist of two main steps. The first is to show that (α, α 2 ) is a maximum and the second is to show that the function g is larger at (α, α 2 ) than at any other possible maximum. Let γ i := c i α ln(α −1 ) for i = 1, 2, 3 with the constants c i to be set later. Computing the derivatives using equations (49) and (50) gives that 1) Let d be sufficiently large so that by the assumption that α < α c , we have that d < 3α −1 ln(α −1 ). Let d also be large enough so that for a small constant c 1 to be chosen later, α 2 < c 1 α/ ln(α −1 ). We show that there is a constant c 1 so that for γ ∈ [0, γ 1 ], ∂ 2 g ∂γ 2 < 0 (see Figure 3) and hence the stationary point (α, α 2 ) of g is a maximum (see Figures Figure 4 and 5). Note that the first term of the second derivative (51) is negative and has magnitude at least 1/γ ≥ α −1 ln(α −1 )/c 1 for this range of γ. The term on the second line of (51) is positive, but we will argue that its magnitude is O(α −1 ln(α −1 )). The claim will follow by taking the constant c 1 to be small enough.
The terms in the second bracket of (51) can be bounded by O(1) as can be seen below from the series expansion in Mathematica (note that this calculation does not depend on the size of γ or on c 1 ). Therefore where above the particular value of C may change in each appearance. Hence by choosing c 1 to be sufficiently small, the claim follows. We now divide the analysis showing that γ = α 2 corresponds to a global maximum into two cases based on the size of α.
2) When α is small enough, we show that g has no stationary point for γ ∈ (α 2 , α]. Suppose that α < ε ln(d)/d for ε sufficiently small. Then, for some ε , d < ε α −1 ln(α −1 ). Expanding the terms in the second bracket of (51) as in (52), and recalling that ∂g(α, α 2 )/∂γ = 0, we have that for γ > α 2 , We claim that the bound on the right hand side of (53) is maximized at the end points of the interval [α 2 , α]. Indeed, differentiating the bound with respect to γ, the only stationary point in the interval is at γ = α/(Cε ln(α −1 )). Furthermore, the second derivative of the bound is positive so that the stationary point can only be a minimum.
3) We show that there is a constant c 2 > c 1 so that for γ ∈ (α 2 , γ 2 ], ∂g ∂γ < 0. Thus, for this range of γ there are no stationary points of g (see Figure 4). Integrating the second derivative in this range we obtain that The upper bounds in the first line above follow by the arguments similar to those of 1) and 2) above.
In particular, c 1 can be chosen small enough so that the first bound follows. The last inequality follows since α < α c which implies d < 3α −1 ln(α −1 ). As d → ∞, α → 0 and therefore for large enough d, the first derivative will be negative as claimed. 4) There are constants c 2 , c 3 such that c 2 > c 1 and for γ ∈ [γ 2 , α − γ 3 ], ∂ 2 g ∂γ 2 > 0 (see Figure 3). This implies that g does not have a maximum in this range.
We conclude by establishing Proposition 3.4 which says that second moment of the partition function can be bounded by the by the square of the first moment up to a polynomial term. where the second inequality is by Proposition 3.2 while the penultimate equality is by Lemma (6.2).