Generalized belief propagation for the magnetization of the simple cubic Ising model

A new approximation of the cluster variational method is introduced for the three-dimensional Ising model on the simple cubic lattice. The maximal cluster is, as far as we know, the largest ever used in this method. A message-passing algorithm, generalized belief propagation, is used to minimize the variational free energy. Convergence properties and performance of the algorithm are investigated. The approximation is used to compute the spontaneous magnetization, which is then compared to previous results. Using the present results as the last step in a sequence of three cluster variational approximations, an extrapolation is obtained which captures the leading critical behaviour with a good accuracy.


Introduction
A major approximate tool in the equilibrium statistical physics of lattice models is the mean-field theory, together with its many generalizations. These techniques are known to give quite often reliable qualitative results, which makes them very useful in understanding properties of a model like its phase diagram. Due to the limited quantitative accuracy of simple meanfield theory, many generalizations were developed. Since mean-field neglects correlations, typically the idea is to include local, but progressively longer range correlations in the treatment, for example by means of cluster expansions where clusters of increasing size can be included.
This line of research started with the Bethe-Peierls approximation [1,2], where nearest-neighbour correlations are taken into account, and the Kramers-Wannier approximation [3,4], including correlations up to a square plaquette. Many generalizations were then proposed, and a particularly successful one was the cluster variational method (CVM), introduced by Kikuchi in 1951 [5] and applied to the Ising model. The largest clusters considered by Kikuchi in this work were a cube of 8 sites for the simple cubic lattice and a tetrahedron of 4 sites for the face centered cubic lattice.
Larger clusters were later considered: in 1967 Kikuchi and Brush [9] introduced the B 2L sequence of approximations for the two-dimensional square lattice, whose convergence properties were later studied [10] using maximal clusters up to 13 sites. Kikuchi and Brush also suggested that a similar approach could in principle be carried out in three dimensions, although the computational costs would have been prohibitively large at that time. Their intuition was put on a firmer ground by Schlijper [6,7,8], who showed that, for translation-invariant models in the thermodynamical limit, there exist sequences of CVM approximations whose free energy converges to the exact one. For a d-dimensional model, the largest clusters to consider grow in d−1 dimensions only, as in a transfer matrix approach. In 3 dimensions this idea was used by the present author to develop a CVM approximation for the Ising model on the simple cubic lattice based on an 18-site (3 × 3 × 2) cluster [11].
The main difficulty encountered in trying to enlarge the basic clusters is the computational cost, which grows exponentially with the cluster size. More precisely the problem can be written as the minimization of a free energy whose number of indipendent variables increases exponentially with the cluster size. A significant amount of work was then devoted to develop efficient algorithms. The original iterative algorithm proposed by Kikuchi [12,13,14], the so-called natural iteration method, is not particularly efficient, but in certain cases it is provably convergent [15] to a (maybe local) minimum of the free energy. Faster, provably convergent algorithms were developed more recently [16,17].
A very important step in the direction of speeding up algorithms for the minimization of the CVM free energy has been made in 2001, when it was shown [18] that Belief Propagation (BP) [19], a message-passing algorithm widely used for approximate inference in probabilistic graphical models, is strictly related to the Bethe-Peierls approximation. In particular, it was shown [18] that fixed points of the BP algorithms correspond to stationary points of the Bethe-Peierls variational free energy. This result was later extended by showing that stable fixed points of BP are (possibly local) minima of the Bethe-Peierls variational free energy, though the converse is not necessarily true. This provides us with the fastest, but not always convergent, algorithm for the minimization of the Bethe-Peierls free energy. When convergent, BP outperforms the other algorithms by orders of magnitude (see [10] for a detailed comparison). BP was also extended to an algorithm, named Generalized Belief Propagation (GBP) [18,20], whose fixed points are stationary points of the CVM free energy for any choice of basic clusters. Like BP, GBP is extremely fast but not always convergent.
The purpose of the work described here is twofold: we aim both to test how GBP performs in minimizing a CVM free energy with a very large (32 sites) basic cluster, and to make one more step in the hierarchy of CVM approximations for three-dimensional lattice models. Working on the Ising model on the simple cubic lattice as a paradigmatic example, we follow Schlijper's ideas and enlarge the basic cluster for our CVM approximation in 2 dimensions only, thus choosing a 4 × 4 × 2 cluster (as far as we know, the largest cluster ever considered in CVM). We then use a GBP algorithm to minimize the corresponding free energy in a wide range of temperatures and discuss the accuracy of the result, focussing in particular on the spontaneous magnetization, in comparison with state-of-the-art results.
The paper is organized as follows: in Sec. 2 we describe our CVM approximation and the GBP algorithm we use to minimize it, in Sec. 3 we analyze the performance of the algorithm and the accuracy of the results for the magnetization, and conclusions are drawn in Sec. 4.

Methodology
The CVM is based on the minimization of an approximate variational free energy, which is obtained from a truncation of a cumulant expansion of the entropy [6,10,21,22,23]. A particular CVM approximation is specified by the set R of clusters one wants to keep in the expansion: the typical choice involves a set of maximal clusters and all their subclusters. Introducing the Möbius numbers {a α , α ∈ R}, defined by the variational free energy takes the form Here p α is the probability distribution for cluster α and where s α = {s i , i ∈ α} is the configuration of cluster α, H α its contribution to the Hamiltonian and, as customary, T is the absolute temperature (Boltzmann's constant k B has been set to 1). If H is the Hamiltonian of the model under consideration, then the condition must be satisfied. In the following we shall consider the nearest-neighbour Ising model in zero field, so our Hamiltonian will be (the coupling constant J has also been set to 1, hence the temperature will be expressed in units of J/k B ). The splitting of H into contributions H α from the various clusters appearing in the expansion is not unique, several (equivalent) choices are possible. In the following we distribute H evenly among the maximal clusters only, so that no Hamiltonian terms appear in the subcluster free energies. Our choice for the largest clusters in R is based on Schlijper's result [6,7,8] that for a d-dimensional model one can improve accuracy by increasing the maximal clusters in d − 1 dimensions only. On the simple cubic lattice, the elementary cubic cell of 2 × 2 × 2 sites has been already considered by Kikuchi [5] and a 3 × 3 × 2 basic cluster made of 4 elementary cubic cells has been used in [11]. The next step in this sequence is then a 4 × 4 × 2 basic cluster (32 sites, 9 elementary cubic cells). As far as we know, this is the largest maximal cluster ever used in a CVM approximation. The number of possible Ising configurations of such a cluster is 2 32 ≃ 4·10 9 , which is also the number of independent variables in our variational problem. More precisely, exploiting lattice symmetries, this number can be reduced by a factor close to 16, leaving us with ≃ 2 28 ≃ 2.5 · 10 8 independent variables. In order to deal with such a large number of variables, we shall use the parent-to-child GBP algorithm [20] to find stationary points of the variational free energy. This will also provide a test of convergence and performance of the algorithm in a very large scale problem.
The parent-to-child GBP algorithm is a message-passing algorithm, based on iterative equations written in terms of quantities called messages, which are exchanged between clusters. We shall use the notation m α→β (s β ) for a message going from cluster α to cluster β, which is a function of the configuration s β of the latter. Only clusters α ∈ R with Möbius number a α = 0 are involved in the message-passing scheme. Exploiting the lattice translational invariance in the thermodynamic limit, we shall identify a cluster by the symbol l x l y l z , where l x , l y and l z are the lengths of the cluster in the three spatial directions, in terms of lattice sites. For instance, a cluster made of a single site will be denoted by 111, nearest-neighbour pairs in the three directions by 211, 121 and 112 respectively, the elementary cubic cell by 222, and our maximal cluster by 442. With this notation, it is easy to check that, according to Eq. 1, if R includes 442 and all its subclusters, the only clusters with non-vanishing Möbius numbers are the following: Thanks to lattice isotropy, when l x = l y , the clusters l x l y l z and l y l x l z are equivalent, so we need to consider only six different clusters, that is six probability distributions, related to each other by marginalization conditions. In the parent-to-child GBP algorithm [20], messages go from a (parent) cluster α to a direct subcluster (child) β ⊂ α, where direct means that there exist no other cluster γ with a γ = 0 such that β ⊂ γ ⊂ α. Hence, in the present CVM approximation, we will have to introduce only the messages corresponding to the parent-child pairs listed in Tab. 1.
In the parent-to-child GBP algorithm [20], cluster probability distributions at a stationary point of the CVM variational free energy are written in terms of messages (up to a normalization constant) as  where s β denotes the restriction of s γ to subcluster β. In the above products β is any subcluster of γ (including γ itself) with a β = 0, while α is any parent of β not contained in γ.
Messages are computed by iterating equations derived from the marginalization conditions which must be fulfilled by the probability distributions of a parent cluster α and one of its children clusters β: where s α\β = {s i , i ∈ α \ β}. In the resulting set of equations, messages at (iteration) time t enter the right-hand side and new messages at time t + 1 are obtained on the left-hand side. Writing down the above equations for all parent-child pairs one realizes that two cases can be distinguished. As a first example, consider β = 331 and α = 332 and write the corresponding probability distributions using Eq. 7. It can be easily checked that all messages appearing in the left-hand side, except m α→β (s β ), will appear also in the right-hand side, cancelling each other. The resulting equation will then give (up to a normalization constant) directly m α→β (s β ) as a function of other messages and can be included in an iterative scheme where messages at step t enter the right-hand side and a new value for m α→β (s β ) at time t + 1 is obtained.
As a second example, consider now β = 431 and α = 432. In this case, after cancellations, the left-hand side will contain the product of m α→β (s β ) and two more messages, specifically those which go from the 332 subclusters of α to the corresponding 331 subclusters of β. As a consequence, in order to evaluate new 432 → 431 messages at time t + 1, one must have already computed the new 332 → 331 messages at time t + 1 from the corresponding equations. By working out the details for all messages a partial order emerges among the various computations. At time t + 1 one has to compute: We shall close this section with a few technical details about the implementation of the above scheme.
In the GBP algorithm, messages are defined up to a normalization constant. More precisely, for a given α → β parent-child pair, the messages m α→β (s β ), ∀s β can be rescaled by a common constant. As a consequence, in order to check convergence of the iterative scheme, we need to normalize messages properly. Several (equivalent) choices are possible, we normalize them at each iteration by requiring that s β m α→β (s β ) = 1 (9) for all α → β parent-child pairs. Iteration proceeds until the condition is met, where new and old denote messages at times t + 1 and t respectively. The actual value of ǫ will be specified in the next section, where we discuss the performance of the algorithm. Finally, as it often occurs with message-passing algorithms, in order to achieve convergence it is necessary to damp the iteration. There is not a unique recipe, and a reasonable tradeoff between convergence and speed must be looked for in any given problem. Here we found that convergence was always achieved by replacing, after each iteration, the new messages with the geometric mean of old and new messages.

Results
The parent-to-child GBP algorithm described in the previous section was applied to the simple cubic Ising model in the low-temperature phase. The inverse temperature K = T −1 was varied in the range 0.223 to 0.436, with a step δK = 0.001.
A broken-symmetry initialization was used for the messages: with m 0 = 0.1. The equations for the messages were then iterated until the convergence condition Eq. 10, with α = 442, β = 441 and ǫ = 10 −12 , was met. With this choice of α and β the squared distance ∆ in Eq. 10 contains 2 16 terms, so the average rms variation of individual messages at convergence is not larger than 10 −6 /2 8 ≃ 4 · 10 −9 . Of course, any other choice of the α → β parent-child pair (or a combination of all possible parent-child pairs), with a suitable rescaling of ǫ, produces similar results. The convergence criterion is rather strict: as an example, increasing ǫ to 10 −9 at K = 0.24 affects the spontaneous magnetization in the 7th decimal place. After a short transient, the squared distance ∆ decreases exponentially with the number of iterations, as illustrated in Fig. 1. The algoritm converges in a number of iterations N(K) which stays practically constant at 47-48 for K ≥ 0.29 and exhibits a critical slowing down as the critical temperature is approached. In Fig. 2 we report the number of iterations as a function of K − K c , where we have used the estimate K c ≃ 0.22165 [24,25,26,27,28]. It can be clearly seen that N(K) is very well fitted by the function N 0 (K − K c ) −z , with z ≃ 0.564. In the slowest case, K = 0.223, the algorithm took 354 iterations to converge. The number of messages to be updated at any iteration is mainly determined by the number of 442 → 432 messages, that is 2 24 . Exploiting lattice symmetries this number reduces to ≃ 2 22 ∼ 4 · 10 6 . For each message, a sum of 2 8 terms must be evaluated. The time taken by our code to execute a single iteration on a 2.66 GHz, 64 bits single processor is ≃ 28 minutes, so the algorithm converges in a time which ranges from approximately 1 day for K > 0.29 to approximately 1 week at K = 0.223.  For each K, after convergence, we evaluate the probability distribution of the 331 cluster using Eq. 7, check translational invariance, and compute the spontaneous magnetization m. Any correlation function involving a group of sites contained in the 442 cluster can be computed.
In order to assess the accuracy of the method, we have compared our results for the magnetization with the formula by Talapov and Blöte [28], determined on the basis of high precision simulations and finite size scaling. For comparison, we have also considered two lower-order CVM approximation, the cube (2 × 2 × 2) one [5] and the 18-site (3 × 3 × 2) one [11]. In the following we shall denote by m TB (K) the Talapov-Blöte result and by m L (K) the CVM result from the approximation with the L × L × 2 maximal cluster, with L going from 2 (the cube approximation) to 4 (the present approximation).
A simple plot of the 4 functions in the critical regions is shown in Fig.  3. The largest deviations occur of course close to the critical point. In particular, at K = 0.223, the present approximation m 4 is larger than the Talapov The above result is better appreciated by plotting the deviations m L (K)− m T B (K) from the Talapov-Blöte estimate, reported in Fig. 4. In this figure we also report the deviation m ∞ (K) − m T B (K) for an extrapolation m ∞ . In principle, based on Schlijper's results [6,7,8], for any K > K c one would like to define m ∞ (K) = lim L→∞ m L (K), which should be equal to the exact result. The terms of this sequence for L > 4 are not available, so we need a finite-size ansatz to extrapolate m ∞ from the results for L = 2, 3 and 4.
Since for K > K c the model has a characteristic length ξ(K), the correlation length, a natural assumption is that at least asymptotically. Assuming equality for L = 2, 3 and 4 one finds Notice that m ∞ and ξ are independent of an offset which might be added to (or subtracted from) L. For example, one could measure the size of the maximal clusters in terms of lattice spacings and obtain L ′ = 1, 2 and 3 for the cube, 18-site and the present approximation, but this would affect only the value of the prefactor δm. It is also important here to stress that without m 4 this extrapolation would not have been possible. The correlation length ξ from Eq. 14 is strongly affected by numerical uncertainties, due to the small differences between m 2 , m 3 and m 4 . Indeed, it oscillates, and then becomes not defined, for K > 0.26. On the other hand, the extrapolated spontaneous magnetization m ∞ does not suffer from these numerical problems and it is remarkably accurate. At K = 0.223, it is larger than m T B by 0.05 only, less than 1/3 the corresponding deviation of m 4 . Since m ∞ (K) is our best estimate for the spontaneous magnetization, it is worth investigating its critical behaviour. We have fitted our dataset to the function [28] m where t = 1 − K c /K denotes reduced temperature, with 6 fitting parameters: K c , β, a 0 , a 1 , a 2 , θ (from now on β denotes the critical exponent of the spontaneous magnetization), obtaining: (4), β = 0.332(1), a 0 = 1.70(1), a 1 = 0.73 (7), a 2 = 0.03(8), θ = 0.72 (4).
We see that the leading term is captured reasonable well by our approximation, with a slightly smaller K c , a slightly larger exponent β and a compatible prefactor a 0 , while the same is not true for the correction to scaling terms.

Discussion
The present paper discusses the application of the CVM approximation with a 4 × 4 × 2 maximal cluster to the three-dimensional Ising model on the simple cubic lattice. The maximal cluster is, as far as we know, the largest ever considered (32 sites) and the approximation can be viewed as the third step of a sequence of L × L × 2 approximations, where L = 2 is the original cube approximation [5] and L = 3 was considered in [11].
Due to the large size of the maximal cluster, it is necessary to resort to an instance of the GBP algorithm for the minimization of the variational free energy. As a consequence, this work also tests the GBP algorithm with a very large maximal cluster, showing that convergence can be achieved in reasonable times even with 2 22 messages. The accuracy of the results is assessed by evaluating the spontaneous magnetization and comparing it with previous approximations and with a recent best estimate.
In addition to the expected improvement with respect to L = 2 and 3, we observe that the availability of results for three L values allows an extrapolation, with the only assumption that the approach to the exact value is exponential in L. This assumption is much weaker than those underlying other techniques used to attempt to extrapolate non-classical critical behaviour from generalized mean field theories, like the Cluster Variational -Padé Approximant Method [30,31] or the Coherent Anomaly Method [32].
The extrapolation gives a good estimate of the leading term of the critical behaviour, although it cannot reach the accuracy of the recent best estimates (see e.g. [29] for a review).
It does not seem feasible, at least at the moment, to investigate the next approximation in the sequence, corresponding to L = 5. This would mean to use a 50-site maximal cluster, making the number of variables increase by a factor 2 18 with respect to the present study.
It would instead be interesting to consider so-called improved Ising models [33,34], where third-neighbour interactions are included in order to minimize subleading corrections to scaling.
Finally, it is worth mentioning that while the above analysis was carried out by considering the zero-field magnetization, thus leading to an estimate for the critical exponent β, it could be extended to several other quantities. The magnetization itself could be computed in the presence of an external field, yielding estimates for the critical isotherm and its exponent δ. Moreover, any correlation function involving a group of sites contained in our largest cluster can be computed, in particular short-range correlation functions, and hence the internal energy, even in the above-mentioned improved Ising models with third-neighbour interactions. From the magnetization in non-zero field and the internal energy, taking numerical derivatives, response functions like specific heat and susceptibilities can be obtained, and the respective exponents α and γ could be estimated.