High-temperature Expansions and Message Passing Algorithms

Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and show in particular that many classical approximation schemes, such as adaptive TAP, Expectation-Consistency, or the approximations behind the Vector Approximate Message Passing algorithm all rely on the same assumptions, that are also at the heart of high-temperature expansions. We focus on the case of rotationally invariant random coupling matrices in the `high-dimensional' limit in which the number of samples and the dimension are both large, but with a fixed ratio. This encapsulates many widely studied models, such as Restricted Boltzmann Machines or Generalized Linear Models with correlated data matrices. In this general setting, we show that all the approximation schemes described before are equivalent, and we conjecture that they are exact in the thermodynamic limit in the replica symmetric phases. We achieve this conclusion by resummation of the infinite perturbation series, which generalizes a seminal result of Parisi and Potters. A rigorous derivation of this conjecture is an interesting mathematical challenge. On the way to these conclusions, we uncover several diagrammatical results in connection with free probability and random matrix theory, that are interesting independently of the rest of our work.


Background and overview of related works
Many inference and learning tasks can be formulated as a statistical physics problem, where one needs to compute or approximate the marginal distributions of single variables in an interacting model. This is, for instance, the basis behind the popular variational mean-eld approach [WJ08]. Going beyond the naive meaneld theory has been a constant goal in both physics and machine learning. One approach, for instance, has been very e ective on tree-like structures: the Bethe approximation, or Belief-Propagation. Its development in the statistical physics of disordered systems can be traced back to Thouless-Anderson-Palmer (TAP) [TAP77] and has seen many developments since then [MPV87,YFW03,MM09,ZK16]. Over the last decades, in particular, there has been many works on densely connected models, leading to a myriad of di erent approximation schemes. In many disordered problems with i.i.d. couplings, a classical approach has been to write the TAP equations as an iterative scheme. Iterative algorithms based on this scheme are often called Approximate Message Passing (AMP) [DMM09,KMS + 12] in this context. AMP, or TAP, is an especially powerful approach when the coupling constants in the underlying statistical model are distributed as i.i.d. variables. This is, of course, a strong limitation and many inference schemes have been designed to improve on it: the adaptive TAP (adaTAP) method [OW01a,OW01b,HK13], approximation schemes such as Expectation-Consistency (EC) [Min01,OW05a] and the recent improvements of AMP such as Vector approximate Message Passing (VAMP) and its variants [MP17,RSF17,SRF16,OCW16,ÇOFW16]. Given all these approaches, one may wonder how di erent they are, and when they actually lead to asymptotically exact inference. In this paper, we wish to address this question using two main tools: high-temperature expansions and random matrix theory.
High-temperature expansions at xed order parameters (denoted in this paper as "Plefka expansions") are an important tool of the study of disordered systems. In the context of spin glass models, they have been introduced by Plefka [Ple82] for the Sherrington-Kirkpatrick (SK) model, and have been subsequently generalized, in particular by Georges-Yedidia [GY91]. This latter paper provides a systematic way to compute high-temperature (or high-dimension) expansions of the Gibbs free entropy for a xed value of the order parameters (that is Plefka expansions).
One aim of the present paper is to apply this method to a general class of inference problems with pairwise interactions, in which the coupling constants are not i.i.d., but they can have strong correlations, while keeping a rotational invariance that will be made explicit below. In particular, we generalize earlier and inspirational work by Parisi and Potters [PP95], who computed the self-consistent equations for the marginals in Ising models with orthogonal couplings via a resummation of the in nite series given by the high-temperature expansion. We shall show that a similar resummation yields the EC, adaTAP and VAMP formalisms.

Structure of the paper, and summary of our contributions
In this paper, we perform Plefka expansions for a generic class of models of pairwise interactions with correlated matrices. We provide a detailed derivation of the method, inspired by the work of Georges-Yedidia [GY91] for Ising models, and we include new results on the diagrammatics of the expansions, leveraging rigorous results of random matrix theory. This yields a general framework that encapsulates many known properties of systems sharing this pairwise structure. The main message of this work is that the three successful approximation-schemes that have been developed in the last two decades, Expectation-Consistency, adaTAP or Vector Approximate Message Passing, are equivalent and rely on the same hidden hypothesis. A careful analysis of the Plefka expansion reveals this hypothesis, as it identi es the class of high-temperature expansion diagrams that are e ectively kept in these three schemes. A careful diagrammatic analysis leads us to conjecture that all these methods are asymptotically exact for rotationally-invariant models, in the high-temperature phase. It is also worth noting that although all four methods (Expectation-Consistency, adaTAP, Vector Approximate Message Passing, Plefka expansion) lead to the same mean-eld equations, the (most recent) VAMP approach presents the advantage of generating a "natural" way to iterate these equations in order to nd a xed point, which turns them into e cient algorithms. We now turn to a more precise description of the content of the paper. Throughout the paper, we will use two random matrix ensembles that we will both refer to as being rotationally invariant. The rst one is de ned as a measure over the set S N of symmetric matrices: Model S (Symmetric rotationally invariant matrix). Let N ≥ 1. J ∈ S N is generated as J = ODO , in which O ∈ O(N ) is drawn uniformly from the (compact) orthogonal group O(N ), and D = Diag({d i } N i=1 ) is a random diagonal matrix, such that its empirical spectral distribution ρ Examples Examples of such random matrix ensembles include matrices generated via a potential V (x): one can generate J ∈ S N with a probability density proportional to e − N 2 Tr V (J) , and this kind of matrix satis es the hypotheses of Model S. These ensembles also include the following well-known examples: • The Gaussian Orthogonal Ensemble (GOE), in the case of Model S with a potential V (x) = x 2 /2. • The Wishart ensemble with a ratio ψ ≥ 1. This corresponds to a random matrix W = XX /m, with X ∈ R n×m an i.i.d. standard Gaussian matrix, and n, m → ∞ with m/n → ψ. This ensemble satis es Model S, with a potential V (x) = x − (ψ − 1) log x. • Standard Gaussian i.i.d. rectangular matrices, for Model R. One can also think of them as generated via a potential, as the probability density of such a matrix is P(L) ∝ e − 1 2 Tr L L . • Generically, consider a random matrix L from Model R. Then, both J 1 ≡ L L and J 2 ≡ LL satisfy the hypotheses of Model S.
The structure of our work is as follows: • Spherical models with rotationally invariant couplings In Sec. 2, we focus on spherical models and we generalize the seminal works of [MPR94a,MPR94b,PP95]. While they studied Ising models with orthogonal couplings, we consider spherical models, just assuming the coupling matrix to be rotationally invariant. We consider two types of models: "symmetric" models with an interaction of the type x Jx , in which J follows Model S, and "bipartite" models with interactions of the type h F x, in which F follows Model R. This encapsulates orthogonal couplings, but can also be applied to other random matrix ensembles such as the Gaussian Orthogonal Ensemble (GOE), the Wishart ensemble, and many others.
Using diagrammatic results that we derive with random matrix theory, we conjecture a resummation of the Plefka expansion giving the Gibbs free entropy in these models. Our results are in particular consistent with the ndings of classical works for Gaussian couplings [Ple82] and orthogonal couplings [PP95].
• Plefka expansion for statistical models with correlated couplings Sec. 3 is devoted to the description of the Plefka expansion for di erent statistical models and inference problems which possess a coupling or data matrix that has rotation invariance properties. We consider models similar to the spherical models of Sec. 2, but with generic prior distributions on the underlying variables. In Sec. 3.1, we recall the Expectation-Consistency (EC), adaTAP and VAMP approximations and comment brie y on their respective history, before showing that they are equivalent. As a consequence, we will generically refer to these approximations as the Expectation-Consistency approximations (EC). We hope that our paper will help providing a unifying presentation of these works, generalizing them by leveraging random matrix theory. Our main conjecture for this part can be stated as the following: Conjecture 1.
[Informal] For statistical models of symmetric or bipartite interactions with coupling matrices that satisfy respectively Model S or Model R, the three equivalent approximations, Expectation-Consistency, adaTAP and VAMP (generically denoted EC approximations), are exact in the large size limit in the high temperature phase.
We believe that the validity of the above conjecture extends beyond the high temperature phase. In particular that it is correct for inference problems in the Bayes-optimal setting, and more generally anytime the system is in a replica symmetric phase as de ned in [MPV87].
The approximation behind EC approximations can be checked order by order using our high-temperature Plefka expansions technique and its resummation. We then derive Plefka expansions for these generic models, and we apply it to di erent situations, namely: -In Sec. 3.2.1 we perform a Plefka expansion for a generic symmetric rotationally invariant model with pairwise interactions. Using this method and our diagrammatic results, we show then in Sec. 3.2.2 that the EC approximations are exact for these models in the large size limit.
-In Sec. 3.2.3 we apply our general result to the TAP free energy of the Hop eld model [Hop82], an Ising spin model with a correlated matrix of the Wishart ensemble, used as a basic model of neural network. In particular, we nd back straightforwardly the results of [NT97] and [Méz17].
-In Sec. 3.3 we extend our Plefka expansion and the corresponding diagrammatic techniques to the study of a replicated system, in which we constraint the overlap between di erent replicas. The interest for such systems comes as a consequence of the celebrated replica method of theoretical physics [MPV87].
-Finally, we show in Sec. 3.4 how we can use these results to derive the Plefka-expanded free entropy for a very broad class of bipartite models, which includes the Generalized Linear Models (GLMs) with correlated data matrices, and the Compressed Sensing problem.
We emphasize that we were able to derive the free entropy of all these models using very generic arguments relying only on the rotational invariance of the problem.
• The TAP equations and message passing algorithms Finally, we show in Sec. 4 that the TAP (or EC) equations that we derived by maximizing the Gibbs free entropy of rotationally invariant models can strikingly be understood as the xed point equations of message passing algorithms. In the converse way, many message-passing algorithms can be seen as an iteration scheme of the TAP equations. This was known in many models in which the underlying data matrix was assumed to be i. shows that the VAMP algorithm is an example of an approximation scheme that follows Conjecture 1.
• Diagrammatics of the expansion and random matrix theory Our results are largely based on a better control on the diagrammatics of the Plefka expansions for rotationally invariant random matrices, which are presented in Sec. 5. We leverage mathematically rigorous results on Harish-Chandra-Itzykson-Zuber (HCIZ) integrals [HC57,IZ80,GM05,CŚ07], involving transforms of the asymptotic spectrum of the coupling matrix, to argue that only a very speci c class of diagrams contributes to the high-temperature expansion of a system with rotationally invariant couplings. These results are used throughout our study, and are detailed in Sec. 5. Some generalizations are postponed to Appendix D.

Symmetric and bipartite spherical models with rotationallyinvariant couplings
In this section we consider two spherical models that will serve both as guidelines and building blocks for our subsequent analysis. We show in details how to perform the Plefka-Georges-Yedidia high-temperature expansion in this context, and the precise diagrammatic results that allow us to resum the Plefka series for rotationally invariant couplings. These results will be useful to clarify our subsequent derivation of the TAP equations in more involved models, and are also interesting by themselves from a random matrix theory point of view.

Symmetric spherical model
In this section N ≥ 1, σ > 0, and we de ne the following pairwise interaction Hamiltonian on S N −1 (σ √ N ), the N -th dimensional sphere of radius σ √ N : The coupling matrix J is a N × N symmetric random matrix drawn from Model S.

Direct free entropy computation
The Gibbs measure for our model at inverse temperature β is de ned as: in which dx is the usual surface measure on the sphere S N −1 (σ √ N ). We write the partition function of the model introducing a Lagrange multiplier γ to enforce the condition x 2 = N σ 2 . We will write A N B N to . At leading exponential order, one has: Denoting γ(β) the solution to the saddle-point equation in eq. (4), we have e ectively de ned a new Gibbs measure: where now dx is the usual Euclidian measure on R N . Following [KTJ76] we diagonalize the Hamiltonian and we integrate over the spins in this new basis, which yields: in which the sum over λ runs over the set of eigenvalues of J. Taking the N → ∞ limit, the saddle point equation reads: which we can write as a function of the limiting spectral law ρ D of the matrix J (de ned in Model S): We assumed (see Model S) that the support of ρ D is compact so that we can de ne its maximum λ max ∈ R. Under these assumptions, eq. (8) has the solution: as long as −S ρ D (λ max ) ≥ βσ 2 , where R ρ D is the R-transform of ρ D and S ρ D its Stieltjes transform (see Appendix C for their de nitions). In the opposite case (if βσ 2 > −S ρ D (λ max )), γ 'sticks' to the solution γ = λ max β. The intensive free entropy Φ J (β) is de ned as: In the end, we can compute the free entropy in the high-temperature phase β ≤ β c ≡ −σ −2 S ρ D (λ max ): By taking the derivative of this expression with respect to β it is easy to show that this simpli es to: In the low temperature phase (for β ≥ β c = −σ −2 S ρ D (λ max )) one has Note that both in the high and low temperature phases the free entropy can formally be expressed as: a formulation which is both more compact and easier to implement algorithmically for generic matrices J.
Remark The free entropy is usually de ned as an average over the quenched disorder J, but here it is clear that the free entropy is self-averaging as a function of J, so that taking this average is trivial. Moreover, Φ J (β) only depends on J via ρ D , its asymptotic eigenvalue distribution.
Remark The derivation of the free entropy both in the high and low temperature phase has been made rigorous in [GM05], and the method of proof also essentially consists in xing a Lagrange multiplier to enforce the condition i s 2 i = σ 2 N .

Plefka expansion and the Georges-Yedidia formalism
A more generic way to compute the free entropy is to follow the formalism of [GY91] to perform a hightemperature Plefka expansion [Ple82]. The goal is to expand the free entropy at low β, in the high-temperature phase. In order to do so, we introduce the very useful U operator de ned in Appendix A of [GY91]. We will compute the free entropy given the constraints on the means x i β = m i and on the variances x 2 i β = v i +m 2 i . The notation · β indicates an average over the Gibbs measure of our system at inverse temperature β, see eq. (5). A set of parameters {m i , v i } will thus determine a free entropy value, and the comparison with the direct calculation of Sec. 2.1.1 will be made by maximizing the free entropy with respect to {m i , v i }. We can enforce the spherical constraint x 2 2 = σ 2 N by constraining our choice of parameters {m i , v i } to satisfy the identity: The Lagrange parameters introduced to x the magnetizations are denoted {λ i }, and the ones used to x the variances are denoted {γ i }. For clarity we will keep their dependency on β explicit only when needed. For a given β and a given J one de nes the operator U of Georges-Yedidia: The derivation of U as well as its (many) useful properties are brie y recalled in Appendix A. We are now ready to compute the rst orders of the expansion of the free entropy Φ J (β) in terms of β. In this expansion the Lagrange parameters {λ i (β), γ i (β)} are always considered at β = 0, so we drop their β-dependency. We detail the rst orders of the expansion, following Appendix A (cft. Appendix A of [GY91]).
Order 0 First of all, taking β = 0 one has easily: This yields after extremization over {λ i , γ i }: Order 1 At order 1, one easily derives: We can now make use of the Maxwell-type relations which are valid at any β: These relations plugged in eq. (18) lead to ∂ β γ i (β = 0) = J ii and ∂ β λ i (β = 0) = j( =i) J ij m j . We then obtain the U operator at β = 0 from eq. (16): Order 2 Following eq. (190) in Appendix A, we have the relation: Order 3 and 4 For the order 3, we obtain: in which the sum is made over pairwise distinct i, j, k indices. Applying eq. (192) we reach:  where again, i, j, k, l are pairwise distinct indices. For pedagogical purposes (and since it will be useful for the following sections), we detail this calculation in Appendix B.
Larger orders By its very nature, the perturbative expansion of Georges-Yedidia [GY91] can not (somehow disappointingly) give an analytic result for an arbitrary perturbation order n. However, the results up to order 4 of eqs. (17), (18), (22), (23), (24) lead to the following natural conjecture for the free entropy at a given realization of the disorder: Note that in order to obtain this formula, we took the N → ∞ limit at every perturbation order in β, which is part of the implicit assumptions of the Plefka expansion. The terms of this perturbative expansion can be represented diagrammatically as simple cycles of order p, see Fig. 1a.
In general, at any order in the expansion one can construct a diagrammatic representation of the contributing terms, and one expects that only strongly irreducible diagrams contribute to the free entropy. Strongly irreducible diagrams are those that cannot be split into two pieces by removing a vertex [GY91] (examples are given in Fig. 1a and 1b). However we retain only simple cycles as the one depicted in Fig. 1a because other diagrams as in Fig. 1b are negligible when N → ∞ for rotationally invariant models, as we argue in Section 5.2. For the case of orthogonal couplings, this dominance of simple cycles was already noted in [PP95]. On the other hand, generic cactus diagrams like the one pictured in Fig. 1c are not negligible, but they cancel out and do not appear in the nal form of the expansion (at order 4, this is shown in Appendix B).
We shall now prove the dominance of simple cycles, and the correctness of eq. (25), in the high-temperature phase. In this phase, the solution to the maximization of eq. (25) under {m i } is the paramagnetic solution m i = 0. Furthermore, we expect that the {v i } that maximize the free entropy of eq. (25) are homogeneous, that is ∀i, v i = v. The constraint of eq. (15) thus gives v = σ 2 .
We can compare the result of the resummation of simple cycles, eq. (25) with the exact results of eq. (12) in the paramagnetic phase. For these two results to agree, we need the generating function for simple cycles to be related to the R-transform of ρ D by: in which the outer expectation is with respect to the distribution of J. In particular, an order-by-order comparison yields that the free cumulants {c p (ρ D )} p∈N (see Appendix C for their de nition) must satisfy: Using rigorous results of [GM05], we were able to prove a stronger version of eq. (27), namely convergence in L 2 norm, so we state it as a theorem: Theorem 1. For a matrix J ∈ S N generated by Model S, one has for every p ∈ N : We postpone the proof to Sec. 5. We assume that we can invert the summation over p and the N → ∞ limit in eq. (25), so Theorem 1 implies that eq. (26) is true not only in expectation but that we can write: in which the limit here means convergence in L 2 norm as N → ∞. This is important, as it allows to "resum" the free entropy of eq. (25), which is valid for a given instance of J. As a nal note, we can use the results of Sec. 2.1.1 to write this result in an alternative form (dropping O N (1) terms):

Stability of the paramagnetic phase
If Theorem 1 implies that our Plefka expansion is exact up to β = β c , we can actually check whether the paramagnetic solution is stable exactly up to the same critical temperature β = β c . Recall that in this model we do not optimize the free entropy simultaneously over v and the {m i }, because the norm ||x|| 2 2 = σ 2 N is xed, yielding the constraint v = σ 2 − 1 N i m 2 i . Solely as a function of the {m i }, the free entropy therefore reads, up to O N (1) terms: in which G ρ D is the integrated R-transform of ρ D , see Appendix C for its de nition. The Hessian of the extensive free entropy N Φ J at the paramagnetic solution m = 0 is: The paramagnetic solution is stable as long as the Hessian N ∂ 2 m Φ J (β, m = 0) is a negative matrix. This is true as long as β < β c = −σ −2 S ρ D (λ max ), because at β c the spectrum of N touches zero. For β > β c the Hessian is again negative, giving the impression of stability of the paramagnetic phase, however S −1 ρ D (−βσ 2 ) is evaluated in the non physical solution, so the solution has to be discarded. Our Plefka expansion allows thus to compute the free entropy in the whole paramagnetic phase, coherently with the results of Sec. 2.1.1.

Bipartite spherical model
In this section we consider N, M ≥ 1. We let α > 0, and we will take the limit (sometimes referred to as the thermodynamic limit) in which N, M → ∞ with a xed ratio M/N → α. We let σ x , σ h > 0. Let us consider the following Hamiltonian, which is a function of two elds h ∈ R M and x ∈ R N : The coupling matrix L ∈ R M ×N is assumed to be drawn from Model R.

Direct free entropy computation
The calculation for this bipartite case is very similar to the calculation performed in Sec. 2.1.1, although one can not always express the result as a well-known transform of the measure ρ D of Model S. For all values of β, the result can be expressed as: where ρ D is the asymptotic eigenvalue distribution of L T L (see the de nition of Model R).

Plefka expansion
The Plefka expansion for this model is a straightforward generalization of Sec. 2.1.2. We will x the averages to be h µ = m h µ and x i = m x i , and the second moments h 2 In this problem, the U operator of [GY91] at β = 0 is given by: Once again, as in Sec. 2.1.2, one can study all the diagrams that appear in the Plefka expansion. We show again the L 2 concentration of the simple cycles, and the negligibility of other strongly irreducible diagrams that can be constructed from the rectangular L matrix. We state in more details these results for the bipartite case in Sec. 5.5.1. We obtain the following result, a counterpart to eq. (25) for this bipartite model: in which indices {µ l } run from 1 to M and indices {i l } run from 1 to N . We make again an assumption of uniform variances at the maximum: Comparing to eq. (33) in the paramagnetic m h µ , m x i = 0 phase, we obtain the correspondence, similar to eq. (28) and valid a priori for any given realization of L, in the high temperature phase:

Plefka expansion and Expectation Consistency approximations
In this section we perform Plefka expansions for generic models of pairwise interactions, both symmetric and bipartite. In Sec. 3.1 we recall some known facts on the Expectation Consistency (also called Expectation Propagation [Min01]), adaTAP and VAMP approximations to compute the free entropy of such models. In Sec. 3.2 and Sec. 3.4 we generalize the results of the Plefka expansions of Sec. 2 to these models and highlight the main di erences and assumptions of our method. This yields a very precise and systematic justi cation of the TAP equations for rotationally invariant models. We apply these results to retrieve the TAP free entropy of the Hop eld model, Compressed Sensing, as well as di erent variations of high-dimensional inference models called Generalized Linear Models (GLMs). Sec. 3.3 is devoted to the study of a generic replicated system using these approximations, and the Plefka expansion. We show that they can be used in the celebrated replica method [MPV87] of theoretical physics, to compute the Gibbs free entropy of a generic pairwise inference model.

Expectation Consistency, adaptive TAP, and Vector Approximate Message Passing approximations
Expectation Consistency (EC) [OW05a,OW05b], is an approximation scheme for a generic class of disordered systems that can also be applied to many inference problems. In this section we show how this scheme is derived and is closely related to the adaTAP approximation [OW01a,OW01b], and the VAMP approximation [RSF17]. Let us shortly comment on the history of these methods. The adaTAP scheme was developed and presented in 2001 in [OW01b,OW01a], and was discussed in details in the review [OS01] for systems close to the SK model. The same year, Thomas Minka's Expectation Propagation (EP) approach was presented [Min01].
Opper and Winther used an alternative view of local-consistency approximations of the EP-type which they call Expectation Consistent (EC) approximations in [OW05a,OW05b], e ectively rederiving their adaTAP scheme from this new point of view. The VAMP approach is more recent [SRF16], and is again another take EP for a di erent problem (compressed sensing) but it has the advantage that, compared with other EP-like approaches [ÇOFW16] it leads to a practical converging algorithm, and a rigorous treatement of its time evolution. The connection between these approaches and the Parisi-Potters formulation for inference problems [JR16] was hinted several times for SK-like problems, see e.g. [OCW16,ÇO19]. We hope that our paper will help providing a unifying presentation of these works, generalizing them way beyond the SK model alone by leveraging random matrix theory. We recall brie y the main arguments of these papers which are useful for our discussion.

Expectation Consistency approximation
Consider a model in which the density of a vector x ∈ R N is given by a probability distribution of the form: Such distributions typically appear in Bayesian approaches to inference problems. We will use the Bayesian language and denote P 0 as a prior distribution on x, which will be typically factorized (all the components of x are assumed to be independent under P 0 ) ; The distribution P J is responsible for the interactions between the {x i }. In this paper we are interested in pairwise interactions, which means that the log of P J is a quadratic form in the {x i } variables. An example of such a model is the in nite-range Ising model of statistical physics at inverse temperature β ≥ 0, with a binary prior and a quadratic interaction governed by a coupling matrix J. In this speci c model, we have: for some {h i } ∈ R N . Our goal is to compute the large N limit of the free entropy log Z in the model of eq. (37). Each of the two distributions P 0 and P J allows for tractable computations of physical quantities (like averages), but the di culty arises when considering their product. The idea behind EC is to simultaneously approximate P 0 and P J by a tractable family of distributions. For the sake of the presentation we will consider the family of Gaussian probability distributions, although this can be generalized to di erent families, see the general framework of [OW05a]. We de ne the rst approximation as: Here, the parameter Γ 0 is a symmetric positive matrix and λ 0 is a vector. We will denote · 0 the averages with respect to µ 0 . We can write the trivial identity: The idea of EC is to replace, when one computes the average P J (x) e 1 2 x Γ 0 x−λ 0 x 0 , the distribution µ 0 by an approximate Gaussian distribution, that we can write as: Performing this replacement yields the expectation-consistency approximation to the free entropy: Note that all three parts of this free entropy are tractable. In order to symmetrize the result we can de ne a third measure: The nal free entropy should not depend on the values of the parameters, so we expect that the best values for Γ 0 , Γ J , λ 0 , λ J make Z EC stationary. This is a strong hypothesis, and the reader can refer to [OW05a] for more details and justi cations. This yields the Expectation Consistency conditions, giving their name to the procedure:

Adaptive TAP approximation
The adaptive TAP approximation (or adaTAP) [OW01a,OW01b] provides an equivalent way to derive the free entropy of eq. (42) for models with pairwise interactions. Let us brie y sketch its derivation and the main arguments behind it. We follow the formulation of [HK13] and we consider again the in nite-range Ising model of eq. (38). The extensive Gibbs free entropy N Φ = log Z at xed values of the magnetizations m i = x i and v ij = x i x j c can be written using Lagrange parameters: a vector λ and a symmetric matrix Γ.
The adaTAP approximation consists in writing: x 1 Figure 2: Duplicated factor graph for the VAMP approximation. Circles represent vector nodes and squares factor nodes. We represent the two messages m 0 and m J in terms of which we can write the full BP equations.
In this expression, Φ g (β, m, v) denotes the free entropy of the same system, but where the spins have a Gaussian statistics. The idea behind the adaTAP approximation is as follows.
ij J ij x i x j is an expectation of a sum over a large number of terms; therefore it is reasonable to assume that this expectation is the same as if the underlying variables were Gaussian. This assumption of adaTAP, although reasonable, is a priori hard to justify more rigorously and systematically. It is important to notice that the free entropy (46) of adaTAP is equivalent to the one derived using Expectation Consistency in eq. (42). Indeed, using Lagrange parameters we can write the three terms of eq. (46) as: Once written in this form, the extremization over m and v of the free entropy implies that Γ S = Γ 0 + Γ J and λ S = λ 0 + λ J . It is then clear that we found back log Z EC of eq. (42).

Vector Approximate Message Passing approximation
The Vector Approximate Message Passing (VAMP) algorithm [RSF17] extends previous message-passing approaches like the GAMP algorithm [Ran11] (that we will describe in more details in Sec. 4.1) to a class of correlated interaction matrices, namely matrices that satisfy a right-rotation invariance property, similarly to Model S and Model R. The algorithm itself can be derived in several ways (see [RSF17]). Here we brie y recall the use of belief-propagation equations on a "duplicated" factor graph and their Gaussian projection. As we shall see, the Bethe free entropy, given as a function of the BP messages, is then equivalent to the expectation-consistency free entropy. For simplicity, we consider again the problem of eq. (38) with a pairwise interaction involving a matrix J following Model S. The idea behind VAMP is to consider two vector spin variables x 1 , with measure P 0 and x 2 with measure P J , and to impose that they are equal. The partition function can be written using a trivial decomposition: This partition function can be represented as a "duplicated" factor graph involving two vector nodes, see Fig. 2. One then writes the BP equations for this problem using this factor graph representation (the reader who is not familiar with BP equations and factor graph representations can consult [MM09]). The BP equations can be written in terms of two "messages" m 0 (x 2 ) and m J (x 1 ). They can be obtained by looking at the stationarity conditions of the following "Bethe free entropy": As the factor graph has a tree structure, these BP equations are an exact representation of the original problem, but they are in general intractable. In order to make the computation tractable one can make a Gaussian approximation, which is at the core of the VAMP algorithm: the messages m 0 and m J on the factor graph of Fig. 2 are assumed to be Gaussian, and thus are only characterized by their mean and their covariance. We can thus write: Writing the BP update rule with this assumption yields the VAMP algorithm of [RSF17]. In the present case it reads: where the measures µ 0 and µ J are respectively Plugging the ansatz of eq. (50) into eq. (49) immediately gives back the EC free entropy of eq. (42). And the BP equations of eq. (51) are identical to the Expectation Consistency conditions of eq. (44).
In the end, the three approximation schemes, Expectation Consistency, adaptive TAP and VAMP give the same expression for the free entropy. However, an important advantage of the VAMP approach is that it "naturally" gives an iterative scheme to solve the xed point equations because it was derived via the belief propagation equations. This iterative scheme turns the xed point equations into an e cient algorithm, as noticed by [RSF17] and as we show later in Sec. 4.2.

Plefka expansion for models of symmetric pairwise interactions 3.2.1 A symmetric model with generic priors
In this subsection we consider a generic model of N "spin" variables x = {x 1 , · · · , x N } ∈ R N . They interact via a pairwise interaction and are subject to a possible external eld, which is modeled by the following Hamiltonian: We will consider random symmetric coupling matrices {J ij }, generated from a rotationally invariant ensemble satisfying Model S. We assume that the variables {x i } have a prior distribution under which they are independent variables, each with a distribution P i . For instance, this includes Ising (binary) spins by choosing P i = 1 2 (δ 1 + δ −1 ). At a given inverse temperature β ≥ 0 and a xed realization of the coupling matrix J we de ne the Gibbs-Boltzmann distribution of the spins, and the partition function, as: We will compute the large N limit of the free entropy Φ J (β) ≡ 1 N log Z J (β) at xed values of the magnetizations m i = x i and variances v i = (x i − m i ) 2 using the Plefka expansion. We will x these variables using Lagrange multipliers {λ i } for the {m i } variables, and {γ i } for the {v i }. The goal of this section is to show how, and under which assumptions, the calculation of Sec. 2.1.2 can be generalized in this context. Clearly the zeroth order term is di erent from the spherical case and it is given by: As we underlined in Sec. 2.1.2, in the Georges-Yedidia method the Lagrange parameters are always considered at β = 0. At order 1 in β we obtain at leading order: The operator U of eq. (16), de ned in [GY91] and taken at β = 0, is thus exactly the same as the one of eq. (21). Using this remark we can see that many of the results obtained in Sec. 2.1.2 will apply to the present case. For instance the second order term is identical and given in eq. (22). We then conjecture, backed by our diagrammatic results in Sec. 5, that the higher order terms are di erent from the spherical model only in terms which are sub-leading in N . For instance at third order we obtain: In this equation, we denoted κ (p) i the cumulant of order p of the distribution of x i at β = 0. Note that the rotation invariance of Model S implies that if i = j, typically J ij ∼ 1 √ N . Therefore a term like i =j J 3 ij gives a negligible contribution to the free entropy. We shall therefore assume that the second part of the RHS of eq. (58) is negligible as N → ∞. This is correct provided that the possible correlations of the third order cumulants κ (3) i with the matrix J do not change the scaling of this term su ciently to make it thermodynamically relevant (see Sec. 5.4 for more details on this particular point). The rst term corresponds to a simple cycle of order 3 and is the same term that appeared in Sec. 2.1.2.
Higher orders We can carry on the computation of the derivatives ∂ p ∂β p Φ J (β = 0). We explain in more details in Sec. 5.4 under which precise results and assumptions we can leverage the results of Sec. 2.1.2, summarized in eq. (25), to conjecture the following value of the free entropy at all orders of perturbation and at leading order in N : Homogeneous variances For the remainder of this section we assume that the maximum of the free entropy of eq. (59) is attained for variables {v i } such that v i = v. This hypothesis can be argued as reasonable for many models, but we postpone this argumentation to the applications of eq. (59) to speci c models. We obtain a resummation of the Plefka free entropy using the correspondence of eq. (28): Recall nally that Φ J (0) is given by eq. (56). We were able to perform this expansion and its resummation almost only by applying our results on the spherical models of Sec. 2. The study of the large-N behavior of diagrams made out of matrix elements of J, performed in Sec. 5, proves to be of crucial importance both in the expansion and its resummation.

Connection of the Plefka expansion to EC approximations
Although not obvious at rst, the result of the Plefka expansion in eq. (59) provides a systematic and precise analysis of asymptotic exactness for rotationally invariant models of the Expectation Consistency approximations. The more straightforward way to see how the Plefka expansion relates to these approximations is to start from the adaTAP approximation, see eq. (46). In the language of the Plefka expansion, adaTAP amounts to assuming that at every order p ≥ 1 of perturbation in β, one can perform the calculation as if the statistics of the variables were Gaussian. An equivalent formulation is that all the terms of order p ≥ 1 in the low-β expansion of the free entropy should be the same for the model with a generic prior of Sec. 3.2.1 and for the spherical model of Sec. 2.1. This statement, which generalizes the Parisi-Potters result of [PP95], is exactly what we argued in the calculation of Sec. 3.2.1, using the diagrammatic analysis of Sec. 5. Therefore, the diagrammatic analysis 'à la Plefka' provides a clear meaning to the EC approximations: the class of diagrams that are neglected in these approximations are explicited in Sec. 5. We believe that these diagrams are actually negligible in the large N limit, so that the EC approximations are actually exact asymptotically for rotationally invariant models, in the high temperature phase as captured by the resummation of the Plefka expansion, which we summarized in Conjecture 1. We believe that this asymptotic exactness extends beyond the high temperature phase to any model in the replica symmetric phase. The diagrammatic analysis provides a route to proving this statement rigorously.

Application to the Hop eld model
In the Hop eld model [Hop82] we consider binary spins x ∈ {±1} N and the coupling matrix J is constructed out of P patterns, which are spin con gurations ξ l ∈ {±1} N , for l ∈ {1, · · · , P }. The coupling constants are de ned as: and we assume that the {ξ l i } are i.i.d. variables with equal probability in {±1}, so that EJ ij = 0 and EJ 2 ij = P/N 2 . We study this system in the limit in which both P, N → ∞ with a xed ratio P/N → α. The derivation of the TAP free energy for these models has been performed in [NT97,Méz17] via the Plefka expansion, and via the cavity method in [MPV87]. If the random matrix ensemble of eq. (61) is a priori not rotationally invariant, one can show that since the variables {ξ l i } are i.i.d., only the rst and second moment of their distributions will contribute to the thermodynamic limit of the free entropy, so that we can assume that they are actually standard centered Gaussian variables without changing the free entropy. This is strengthened by the classical results of [MP67], who only need to consider i.i.d. variables {ξ l µ } to obtain that the spectral law of the covariance matrix written in eq. (61) converges weakly to the celebrated Marchenko-Pastur distribution.
The ensemble of eq. (61) is thus for our purposes essentially a Wishart matrix model in which the diagonal has been removed. It is then a known result of random matrix theory (see for instance [TV04]) that its asymptotic R-transform reads: The term −α in the rst equality accounts for the "removal" of the diagonal of the Wishart matrix. Let us apply eq. (59) and eq. (60) for this model. At β = 0, the free entropy is given by eq. (56). For an Ising prior on the spins, it reads: Note that because Substituting the squared means m 2 i by the spin glass order parameter q ≡ (1/N ) i m 2 i in eq. (64) and using the resummed form of eq. (60), we obtain: Starting from eq. (65) and eq. (62), we reach the nal form for the free entropy: Maximizing it over the continuous set of magnetizations {m i } yields the TAP equations: This is in agreement with the ndings of [MPV87,NT97,Méz17]. However, our framework and results allowed us to treat this kind of model in a very generic way.

The Replica approach
Interestingly, the same approaches can be used to study the partition function and derive the free entropy of rotationally invariant models using the replica approach. The essence of the replica method is well known [MPV87]: one studies the n-th moment of the partition function Z by introducing n 'replicas' of each original variable, and one then studies the average of Z n in the limit n → 0, which allows to reconstruct the average of log Z, and therefore the free entropy. We shall illustrate in this section how the three approximation schemes (EC, adaTAP and VAMP) can be used to derive the replica free entropy, and how the high-temperature Plefka expansion justi es these approximations. Let us consider again the generic model of eq. (53), with h i = 0 for simplicity. The n-th moment of the partition function reads (a, b indices always denote replica indices running from 1 to n): This "replicated partition function" should then be averaged over the disorder. Here, we deal with a J matrix generated from Model S, and the disorder average means an average over the orthogonal matrix O in the decomposition J = ODO T (keeping the eigenvalues D xed). Let us see how it can be analyzed using the four (equivalent) approximation schemes. Interestingly the average over O is in fact not needed in these approaches, as they show that the repliacted partition function is "self-averaging", meaning that it gives the same free entropy density for almost all realizations of J. The order parameter we will xe here is the n × n symmetric matrix Q with elements Q ab = 1 N i x a i x b i c , called the overlap matrix in the statistical physics language.

Expectation-Consistency approximation
As we have seen in Sec. 3.1.1, in the Expectation-Consistency (EC) approximation, we decompose the replicated free entropy Φ = 1 N log Z n J (β) as a function of three auxiliary free entropies: The Γ 0 , Γ J matrices are symmetric n × n matrices, and x i = (x a i ) n a=1 . Indeed, as we will only x the Q ab , we expect the rst moments to vanish and the second moments to be uniform in space (but not in replica space). The extremization over Γ 0 , Γ J yields the Expectation-Consistency equations, as described in eq. (44). In particular, the average 1 N i x a i x b i is constrained to be the same under the three di erent measures appearing in eq. (69). As we want to impose this average to be Q ab , we can introduce a Lagrange parameter Γ S to x this average for instance in the third part of the r.h.s. of eq. (69). This third term becomes − extr Changing Γ S → Γ S + Γ 0 + Γ J gives: In the end, performing explicitely the Gaussian integration in the basis of eigenvectors of J in eq. (69), we obtain the free entropy at xed overlap Q (recall that ρ D is the asymptotic eigenvalue density of J): The total free entropy is obtained simply as extr Q Φ(Q). Assuming an ultrametric structure in replica space [MPV87] (for instance in the replica symmetric case the matrix elements of Q take only two values, one on the diagonal and one out of the diagonal, and the same holds for Λ and R), one can perform an explicit computation. For random orthogonal models, this expression was rst derived using Plefka expansion in [MPR94a,MPR94b], and written in exactly the same form as ours in [CDL03].

The adaTAP approximation
Let us now describe the adaTAP approach to this problem. We introduce Lagrange parameters Λ ab which act as external elds, xing the values of the order parameter Q ab . The matrix Λ is a n × n symmetric matrix.
The free entropy at xed overlap matrix Φ(Q) is expressed as: The total free entropy will thus be obtained as extr Q Φ(Q). We use the adaTAP approximation, which gives: where Φ G is the Gibbs free energy for a model with Gaussian spins, at xed overlap between spins. The three pieces of Φ can be written as: In Φ G (Q, β) we carry the integral over s a i in an (orthonormal) basis of eigenvectors of {J ij }. Φ G (Q, β = 0) is easily evaluated: the extremization gives Λ = Q −1 , and therefore Φ G (Q, β = 0) = n 1 + log 2π 2 + 1 2 Tr log Q. (76) We change notation and denote by R the matrix Λ that appears in Φ G (Q, β). This gives nally: with We found back exactly the EC free entropy of eq. (70).

The VAMP approach
Following eq. (48), the replicated partition function can be written as a "duplicated" integral: where We now write the Bethe free entropy Φ Bethe as in eq. (49), and project it onto the space of Gaussian messages which are assumed to be proportional to identity in space, but with an arbitrary replica structure: We did not include rst moments because we expect all the rst moments to vanish in the replicated system. We can now write the stationarity equations of Φ Bethe with respect to A 0 and A J : It is easy to derive the replicated free entropy Φ. One nds where This is again exactly the same result as the replicated free entropy found with EC or adaTAP in the previous sections.

Plefka expansion
As before, we x only the overlap Q ab , via Lagrange parameters γ ab . Let us denote Φ(Q, β) the corresponding free entropy. At β = 0 the replicas are not coupled so that we obtain One can then compute the order 1 perturbation and the U operator of Georges-Yedidia: Note that at β = 0 we have x a i 0 = 0. We obtain the order 2 correction in the same way as before: Here the trace is taken in the replica space. One can continue the Plefka expansion at any order, and one obtains very similar results to the non-replicated free entropy, simply the product of variances is replaced by traces of the overlap matrix Q. Indeed, the diagrams constructed from the matrix indices {J ij } that appear in the replicated free entropy are exactly the same as in the non-replicated case, so that all the results of Sec. 5 stay valid for this replicated calculation. In the end, the resummation yields the single-graph replicated free entropy as a function of the overlap matrix Q: The series in the second part of this equation is nothing but Tr [G ρ D (Q)] (see Appendix C for the de nition of this function). Recalling the expression (88) for the rst, β = 0, piece, we get nally: and recall that the β = 0 term is given by eq. (88) and that the G ρ D function is the integrated R-transform de ned in Appendix C, which veri es: Thus we recognize once again in eq. (93) the expression obtained via EC and adaTAP, see eq. (70).

A bipartite model with generic priors
We consider here another generic model, which is an extension of the bipartite model studied in Sec. 2.2, the di erence being that we assume a generic prior on the variables rather than a spherical constraint. As we will see, this model is closely related to the symmetric model of Sec. 3.2.1. Let M, N ≥ 1. We consider two types of variables: a vector x ∈ R N and a vector h ∈ R M . These two elds are assumed to follow prior distributions under which they are independent and the distribution of all their components decouples. For instance, the prior P X on x can be written as P X (dx) = i P i (dx i ). The two elds h, x interact via the following Hamiltonian: As in Sec. 2.2, Greek indices µ, ν will always run from 1 to M while Latin indices i, j run from 1 to N . We assume that the coupling matrix F ∈ R M ×N satis es the rotation invariance property described in Model R. For a xed β ≥ 0 and a realization of F we de ne the Gibbs-Boltzmann distribution and the partition function: As in the previous section we will compute the large N limit of the free entropy Φ F (β) ≡ 1 N log Z β,F . We look at this problem in the thermodynamic limit, with N, M → ∞ and a xed ratio M/N → α > 0. We constraint the rst and second moments of {x i } and {h µ } under the Gibbs measure to be The Lagrange multipliers introduced to enforce these conditions will be denoted λ x i , λ h µ for the rst moments and γ x µ , γ h µ for the second moments. At order 0, we obtain: The calculations at order 1 and 2 are very similar to the ones of the spherical model of Sec. 2.2.2. One can also refer to the symmetric case of Sec. 3.2.1. We obtain the rst four orders: 1 2 Recall that κ On higher orders The discussion on the higher orders in perturbation is very similar to the one we made in Sec. 3.2.1. In the end, we only retain the simple cycles made of matrix elements {F µi } in our Plefka expansion.
As we explain in more details in Sec. 5, and in particular in Sec. 5.5 for the bipartite case, we make two crucial statements to obtain this result: • Let us forget for the moment about the factors of variances or higher-order moments of the distributions of x i , h µ at β = 0. We can then study all the possible terms appearing in the Plefka expansion at every order as diagrams made of matrix elements {F µi }, and show that the only non-vanishing diagrams are the simple cycles. This is shown in more details in Sec. 5.5. • We assume that the factors arising from the variances v x i , v h µ or higher-order cumulants of the variables x i and h µ do not change signi cantly the scaling of the diagrams, that is we can still only retain the simple cycles in the thermodynamic limit. More details are given in Sec. 5.5.2.
For instance, at order 3 these statements yield ∂ 3 β Φ N,F = O N (1) since one can not construct a simple cycle for bipartite models at order 3. This also explains the O N (1) term in the order 4, see eq. (102). In the end, we obtain the following value of the free entropy at leading order in N : In the summation all indices µ 1 , · · · , µ p are pairwise distinct, and so are i 1 , · · · , i p .
Homogeneous variances Let us assume that the maximum of the free entropy of eq. (103) is attained for As in the symmetric case this hypothesis can be justi ed for many models that we will analyze later on. Using eq. (36) the free entropy of eq. (103) can be resummed: where ρ D is the spectral distribution of F T F . At β = 0, Φ F (0) is given by eq. (98). As in the symmetric model of Sec. 3.2 we were able to perform this expansion and its resummation by applying our results on the spherical models of Sec. 2. The diagrammatic study performed in Sec. 5.5, which is a generalization of the diagrammatic for symmetric matrices, plays a decisive role in this analysis.
A remark on Restricted Boltzmann machines In the case of an i.i.d. matrix F (see Sec. 5.6 for a remark on how to apply our results to this class of matrices), we recognize in particular in eq. (103) the result obtained in eq. (36) of [TGM + 18] for Restricted Boltzmann Machines (RBMs). For generic rotationally invariant F , the xed point equations corresponding to the extremization of Φ in eq. (104) also correspond to the xed point of Algorithm 3 of [TGM + 18] (which was described there as "adaTAP inference algorithm"). This generic property of the xed points of the Plefka-expanded free entropy will be investigated in Sec. 4.

Generalized Linear Models with correlated matrices
Generalized Linear Models (GLMs) [NW72,McC18] arise as a generalization of the Compressed Sensing problem (see below). They are of primary importance in a very wide variety of scienti c and engineering elds, such as phase retrieval in optics [Fie82] and classi cation problems in statistics. GLMs can also be thought of as the building blocks of fully connected neural networks [LBH15]. Let us now de ne more precisely the model we will study. Consider M, N ≥ 1 both going to in nity with a xed ratio M/N → α > 0. We are given a (random) measurement matrix F ∈ R M ×N which comes from the ensemble of Model R. Given F , data samples {Y µ } are generated as: in which X ∈ R N is the vector we try to recover from the observation of {Y µ } M µ=1 , and P out is a xed probabilistic channel. Recall that Greek indices µ, ν will always run between 1 and M and Latin indices i, j between 1 and N . The vector X ∈ R N is assumed to be drawn with i.i.d. coordinates {X i } N i=1 according to a prior P X with zero mean and variance ρ > 0.
Compressed Sensing, the Gaussian channel case Compressed Sensing [Don06] arises as a particular case of channel distribution in eq. (105), in which the channel is taken to be a Gaussian distribution with zero mean and variance ∆. Equivalently, it can be formulated as: Our aim is to infer the vector X from these observations. In this equation we modeled the noise by a standard Gaussian variable z µ , the strength of the noise being ∆ > 0. In [KMS + 12] the authors have considered a subclass of matrices F , namely i.i.d. matrices. We follow here the same probabilistic inference approach, as we aim to study this problem by sampling from the following distribution: As the parameters (P X , ∆) of the signal are known, we could use them in our model, a setting which is known as the Bayes-optimal setting. Using the matrix J = −F F (which follows Model S) can rewrite the posterior distribution, up to a normalization, as: De ning β ≡ ∆ −1 it becomes clear that this can be written as a Gibbs-Boltzmann distribution of the model we studied in Sec. 3.2.1, with J = −F F and h i = − µ F µi Y µ . Assuming that the variance variables at the maximum of the free entropy are homogeneous (v i = v) we can use eq. (60) to directly obtain the free entropy of this problem as a function of R J , the R-transform of the asymptotic spectrum of the J matrix: We postpone the analysis of the corresponding xed point equations to our algorithmic discussion in Sec. 4.1 and Sec. 4.2.
Generic channel distributions We now turn to generic P out distributions. We assume that both P out and P X are known (this is the Bayes-optimal setting, known in statistical physics as the Nishimori line), so that we can use them in the posterior distribution: from which we will sample to obtain an estimate of X. While in the compressed sensing setting β = ∆ −1 played naturally the role of an inverse temperature, in the general setting of eq. (105) there is a priori no way to perform a Plefka expansion. As it turns out, there is a way to introduce an auxiliary parameter in terms of which we will perform the expansion, similarly to what is done in [AFP16,Alt18]. Introducing the usual Lagrange parameters to x the mean and variance of {x i }, we obtain the free entropy: in which we introduced an action S[x]: As before {λ i , γ i } are Lagrange parameters used to enforce the condition x i = m i and x i 2 = v i + m 2 i . Introducing an auxiliary eld h ≡ Fx ∈ R M , and using the Fourier representation of the Dirac distribution, we reach: with a new e ective action S eff : The key idea is to treat x and ih as two independent non-Gaussian elds that interact via the last (quadratic) term of eq. (114) and to perform a Plefka expansion in terms of this e ective Hamiltonian, which is exactly the bipartite Hamiltonian of the general model of Sec. 3.4.1. We will call η the "inverse temperature", that is in eq. (114) we substitute: and at the end of the expansion we will set η = 1. Similarly as for the eld x we will x the rst and second moments of the eld ih as ih µ η = f µ and (ih µ ) 2 η = −r µ + f 2 µ , conditions that will be enforced by new Lagrange parameters {ω µ , b µ }. Although a bit tedious, this is straightforward, and we obtain a free entropy in which we will perform a low-η expansion: The e ective action and Hamiltonian are expressed as follows: From these equations it is clear that: Points (i) and (ii) allow us to use the general results of Sec. 3.4.1. We can therefore directly conjecture the nal form of the free entropy we were seeking: Homogeneous variances Once again we can assume that the maximum of the free entropy written in eq. (120) will be attained for variance variables {v i , r µ } that are homogeneous: they satisfy v i = v and r µ = r.
Using the resummation of eq. (104) this leads to a simpli ed expression for eq. (120): We will study the xed point equations corresponding to the free entropies of eq. (120) and eq. (121) in Sec. 4.1 and Sec. 4.3.

Consequences for iterative algorithms
In the previous section we showed how to use the Plefka expansion to derive the single-graph free entropy of a large class of systems with pairwise interactions. One then needs to maximize this free entropy, which yields xed point equations. Iterating these xed point equations is in itself a challenge since di erent choices for the iteration scheme can lead to drastically di erent convergence properties. In the context of the Plefka expansion, or equivalently in the adaTAP or EC approximation, several iterations schemes for the TAP equations have been studied, see for instance [OCW16,ÇOFW16].
On a parallel point of view, message-passing algorithms have been extensively studied in the statistical physics literature. In particular the belief-propagation equations [MM09] can be shown in the large N limit to reduce to a simpler algorithm called Approximate Message Passing (AMP) [DMM09], and derived initially for i.i.d. coupling matrices. It is well understood that for these matrices the stationary limit of the AMP equations is directly related to the xed point equations of the Plefka free entropy (stopping here at order 2 in the couplings). Extending these algorithms to correlated matrices has been the subject of extensive studies [CWF14,MP17]. The generic (and appealing by its simplicity) iteration scheme called Vector Approximate Message Passing (VAMP) [RSF17] has proven to be very successful both numerically and in its theoretical justi cation in the case of rotationally-invariant sensing matrices. It has then been generalized to the broader class of generalized linear models with generic output channels [SRF16].
In Sec. 4.1 we describe the connection between AMP equations and the Plefka expansion in the context of generalized linear models with i.i.d. matrices, retrieving the GAMP algorithm [Ran11] and the analysis of [KMS + 12]. In Sec. 4.2 we describe the VAMP algorithm as an iteration scheme of the Expectation-Consistency equations and show that we nd back the stationary limit of the algorithm as the TAP equations. We note that in the naïve Plefka expansion performed in terms of ∆ −1 for the compressed sensing problem there is no clear way to iterate the TAP equations to nd back an asymptotically exact algorithm. Finally, in Sec. 4.3 we extend this analysis to Generalized Linear Models with correlated matrices and the G-VAMP algorithm. In contrast with the naïve ∆ −1 -expansion in compressed sensing, the mapping to a bipartite model that we performed in Sec. 3.4.2 allows to retrieve the more general G-VAMP algorithm as an iterated version of our TAP equations.

Generalized Approximate Message Passing for a GLM with i.i.d. matrices
We consider in this subsection the Generalized Linear Model (GLM) of Sec. 3.4.2. We will concentrate on a particular subclass of sensing matrices, namely matrices F ∈ R M ×N that are generated with i.i.d. centered standard Gaussian matrix elements {F µi }. Note that the remarks of Sec. 5.6 (coherently with the analysis of [Ran11, BKM + 19]) show that the particular distribution of the elements does not need to be Gaussian, as long as the {F µi } are distributed independently and identically. Generalized linear estimation with such i.i.d. matrices F and generic prior and channel distributions has received a lot of attention recently. In particular Generalized Approximate Message Passing (GAMP), an algorithm rst developed in [Ran11], has been shown to be optimal among all polynomial-time algorithms for this problem, see [BKM + 19]. A description of the algorithm in our case can be found in eqs. (171)-(177) of [ZK16]. We state here the iterative GAMP equations with our notations: In these equations P X (λ i , γ i ) is the probability measure with density: and g out is de ned from the channel distribution P out as: We now turn to the TAP equations that we can derive from the extremization of the Plefka-expanded free entropy of eq. (120). Note that as F is an i.i.d. matrix, all the terms with p ≥ 2 in eq. (120) will be negligible. More discussion on this can be found in Sec. 5.6. We obtain: Recall that Φ F (η = 0) is given by eq. (98). Extremizing Φ F (η = 0) over the Lagrange parameters yields the moments conditions that we wish to enforce: On the other hand, the maximization of eq. (125) with respect to the physical parameters {m i , v i , f µ , r µ } leads to four additional equations: Combining eq. (126) and eq. (127) we see easily that the equations that one has to solve are exactly the ones of the GAMP algorithm of eq. (122) (without time indices).
The Gaussian channel case In the case of an additive gaussian channel with variance ∆ the problem reduces to the compressed sensing studied in Sec. 3.4.2 and Sec. 4.2, with a Gaussian i.i.d. sensing matrix. In this case, the function g out of eq. (124) is computable explicitly, and leads to f µ = (ω µ − y µ )/(∆ + b µ ) and r µ = (∆ + b µ ) −1 . In this particular case one recovers the AMP algorithm of [KMS + 12], which is also compatible with the VAMP algorithm of Sec. 4.2.

The TAP equations in Compressed sensing
We analyze here the xed point equations for the Compressed Sensing (CS) problem, see Sec. 3.4.2 for the corresponding free entropy derivation. Our starting point is the Plefka-expanded free entropy written in eq. (109). The extremization over the Lagrange parameters {λ i , γ i } (which are considered at β = 0) yields: where (a) and (b) respectively de ne the functions F m and F v . Maximizing the free entropy with respect to the physical parameters {m i , v} results in the following equations (recall that β = ∆ −1 ): Eq. (128) and eq. (129) de ne a set of xed point equations that one has to solve in order to retrieve the maximum of the free entropy of eq. (109).

TAP equations and the xed point of the VAMP algorithm
A remark on i.i.d. matrices We start with a remark on the case of an i.i.d. matrix F . Remarkably, eqs. (128) and (129) are compatible with the xed points of AMP, see eqs. (22) and (23) in [KMTZ14] with R i = −λ i /γ and Σ −2 = γ, since in this case R F F/∆ (−v) = α/(∆ + v), see for instance [TV04]. We now turn to the VAMP algorithm for a general rotationally invariant matrix F . Applying the VAMP derivation of Sec. 3.1.3 to the Compressed Sensing problem of Sec. 3.4.2, the VAMP algorithm reads 1 : where F m and F v were de ned in eq. (128). Note that in Compressed Sensing the matrix γ t 0 + F F/∆ has only strictly positive eigenvalues since γ t 0 ≥ 0, so the previous iterative equations are always well de ned. At the xed point, we expect m 1 = m 2 = m and v 1 = v 2 = v. In the stationary limit eq. (132) yields: From eq. (133), one has And from eq. (132), we obtain and which gives One now recognizes easily the xed points obtained with the Plefka expansion in Sec. 4.2.1, namely eq. (128) and eq. (129), with λ J = λ and γ J = γ.
A remark on iterating the TAP equations in the i.i.d. case Note that in the i.i.d. case (Sec. 4.1), doing the Plefka expansion in terms of the η parameter after having mapped the GLM to a bipartite problem allows us not only to retrieve the xed point of the GAMP algorithm (and even the G-VAMP for non-i.i.d. matrices, as we will see in Sec. 4.3), but there is a simple iterating scheme of the TAP equations that exactly yields the GAMP algorithm. We insist that this is not true when making the correspondence of the VAMP algorithm with the Plefka expansion in ∆ −1 for compressed sensing with an i.i.d. matrix. This underlines one of the possible limitations of the EC, adaTAP and Plefka methods for these problems, as iterating the TAP equations with an algorithmic scheme that guarantees convergence is a very involved task, while the VAMP derivation provides an iteration scheme of the equations.

Generalized Vector Approximate Message Passing (G-VAMP) for Generalized Linear Models
We focus in this section on Generalized Linear Models with a correlated matrix F that satis es rotation invariance (Model R). We rst derive the TAP equations from the Plefka expansion we performed in Sec. 3.4.2, before stating the G-VAMP algorithm for this problem following [SRF16]. We then analyze how G-VAMP can be interpreted as an iterative scheme for these TAP equations.

The TAP equations from the Plefka expansion
Recall that the Plefka-expanded free entropy was computed in Sec. 3.4.2. Following the assumptions of the VAMP and G-VAMP algorithms [SRF16,RSF17] we assume that the variances {v i , r µ } are homogeneous, that is r µ = r and v i = v. We can then use the resummed expression of the Plefka free entropy expressed in eq. (121). We rst extremize this expression with respect to the Lagrange parameters {λ i , γ i , ω µ , b µ } and we obtain an equivalent expression to eq. (128). We reach more precisely: Recall the de nitions of P X (λ, γ) and g out (y, ω, b) from eq. (123) and eq. (124). The remaining equations are obtained by maximizing eq. (121) with respect to the physical parameters. We make use of the Jacobi formula for a symmetric positive de nite matrix J ∈ S ++ N : ∂ ∂J ij log det J = (J −1 ) ij . We reach: Remark: Additive gaussian channel In the case of an additive Gaussian channel with variance ∆ we nd r = (∆ + b) −1 , which gives ζ = ∆ and γ = R F T F/∆ (−v). We thus coherently recover the TAP equations for the compressed sensing problem (see Sec. 4.2.1) even though these equations were derived with a "naïve" Plefka expansion in powers of β ≡ ∆ −1 .

The G-VAMP algorithm for Generalized Linear Models
With a similar reasoning that we used to derive the VAMP algorithm for a symmetric pairwise models, we can write a VAMP algorithm for a bipartite model. We do not describe its full derivation here, and we simply report the G-VAMP algorithm for the GLM as stated in [SRF16]. We de ne a set of functions: The full algorithm then amounts to iterate the following equations:

TAP equations and xed points of G-VAMP
We want to see if the stationary limit of G-VAMP, that is the G-VAMP equations without time indices, is related to the TAP equations. At the xed points of the G-VAMP algorithm written in eq. (142), one expects the following equalities to take place: m 1 = m 2 = m, z 1 = z 2 = z, v 1 = v 2 = v and κ 1 = κ 2 = κ. We start from the TAP equations, eq. (139) and eq. (140), and we will try to recover every equation in eq. (142).
All these relations show the equivalence between the G-VAMP algorithm of [SRF16] and the (TAP) maximization equations of the free entropy that we derived with our Plefka expansion in Sec. 3.4.2.

The diagrammatics of the Plefka expansion
The goal of this section is to precise how the di erent diagrams arising in our Plefka expansions in Sec. 3 can be computed. Recall that for symmetric random matrices J we construct diagrams as described in Fig. 3. For instance the diagram depicted in Fig. 3a is equal to: The perturbation order of any diagram is equal to its number of edges, since each of them represents a factor J ij . In this whole section we will only consider connected diagrams (unless stated otherwise). The structure of the section is the following: • In Sec. 5.1 we prove a rst rigorous result on the 'simple cycles' arising in the Plefka expansion of Sec. 2, namely we study these diagrams in expectation over J and show a weaker version of Theorem 1. • In Sec. 5.2 we extend this study to all possible diagrams, in expectation over J.
• In Sec. 5.3 we show how the results of Sec. 5.1 and Sec. 5.2 can be extended to study the second moments of these diagrams, and use it to show concentration results. This will in particular imply the full statement of Theorem 1. • In Sec. 5.4 we explain how to handle the higher-order moments that can appear as additional factors in these diagrams for the statistical models studied in Sec. 3. • In Sec. 5.5 we explain how to generalize all these techniques and results to diagrams made of rectangular matrices, that arise in the Plefka expansion for bipartite models. • Finally, in Sec. 5.6 we show that if one considers an i.i.d. coupling matrix, all the diagrams of order greater than 3 will not contribute in the thermodynamic limit and that one can e ectively consider the distribution of the matrix elements to be Gaussian.
Some technicalities, as well as side results and generalizations of these diagrammatics for Hermitian matrices and diverging-size diagrams, which are not directly useful for our expansions, are detailed in Appendix D. We nally note that some of our results are similar to the recent independent work of [BBJ19] that was recently brought to our attention.

A weaker version of Theorem 1
We will consider the random matrix ensemble de ned by Model S. In the following, J ∈ S N is a random matrix from this ensemble. Recall as well the random matrix tools de ned in Appendix C, in particular the free cumulants {c p (ρ D )}. We rst show a weaker version of Theorem 1: Theorem 2 (Expectation of simple cycles and free cumulants). For J following Model S, for any p ≥ 1, and any set of pairwise distinct indices i 1 , · · · , i r ∈ N p , one has: A stronger result actually takes place, that is we only need to average over O to obtain the result: This last equality is true a.s. with respect to the law of D.
Note that in the Plefka expansions we perform in Sec. 2.1.2 and Sec. 3.2 we consider sums over all distinct pairwise indices of eq. (149). The expectation of these sums over O is an immediate consequence of Theorem 2: We now turn to the proof of Theorem 2.
Proof of Theorem 2. A rst pedestrian way to show eq. (150) for small values of p is to use explicit integration of polynomials over the Haar measure of the orthogonal or unitary group, see for instance [CŚ06]. This can be used to check eq. (150) for the rst values of p. Since we aim at a generic proof we will choose a di erent path, leveraging from HCIZ-type integrals [HC57] [IZ80], in the particular case in which one matrix has nite rank. In our setting, the computation of these integrals has been made rigorous in [GM05]. Let us denote: In order to simplify the following calculation, we assume that (i 1 , · · · , i p ) = (1, · · · , p). Since the sought result does not depend on the particular choice of indices (as is clear by rotational invariance), this does not remove any generality. We rst note that the case p = 1 and p = 2 are trivial to show by an explicit computation, so we will assume p ≥ 3 in the following. One can rewrite eq. (151) as: in which we denoted b ≡ (b 1 , · · · , b p ) and M (b) ∈ S N the following symmetric block matrix of rank p: in which M 1 (b) ∈ S p with: Now we can apply Theorem 2 of [GM05]. Recall that G ρ D is (up to a factor) the integrated R-transform of ρ D . We obtain: . Note that di erentiating Z(b) with respect to b 1 yields (by cyclicity of the trace): with elementary symmetric matrices (E ab ) ll ≡ δ l,a δ l ,b + δ l ,a δ l,b . These matrices are such that for each a < b and c < d, E ab E cd = 0 if {c, d} ∩ {a, b} = ∅. The only way to obtain a matrix of non-zero trace with a product of matrices {E ab } is to have a cycle structure in the indices of the matrices. Recall that the indices are symmetric, that is E ba = E ab . For instance: Tr E 2 12 E 13 E 23 E 12 = Tr [E 12 E 21 E 13 E 32 E 21 ] = 0, Tr E 2 12 E 24 E 23 E 12 = 0.
Because of this and the fact that M (b = 0) = 0, it is easy to see that the only term that will survive after taking all the successive derivatives and taking b = 0 will be the derivatives of the right-hand-side of eq. (157), and not other derivatives of Z(b). Let us analyze what di erentiating this term yields. As we saw, taking derivative with respect to b 1 yields a matrix E 12 . When di erentiating with respect to b 2 this yields a matrix E 23 . Note that a priori, one would have: However, the following di erentiations with respect to b 3 , · · · , b p will never yield a matrix E ab with one of the indices being equal to 2. So in eq. (158) it is clear that only two terms of the sum, the term k = 0 and k = n − 2, will yield a non-zero contribution. In the end, after taking all the p successive derivatives, only two  terms will remain, which correspond to the two possible orientations of the simple cycle: using that M (0) = 0, which nishes the proof.

The expectation of generic diagrams
Following the remarks of [GY91] and [PP95], we can separate some of the diagrams constructed as in Fig. 3 in three disjoint categories or types: T.1 Non-Eulerian diagrams. By de nition, a diagram is Eulerian if one can construct a cyclic path in the graph that goes through each edge exactly once. Note that this is a classic result of graph theory (the Euler-Hierholzer theorem) that these graphs are exactly the connected graphs with even degree in each vertex. For instance, the graph depicted in eq. (4a) is not Eulerian, whereas the one of Fig. 4b is Eulerian. T.2 Eulerian diagrams that are strongly irreducible but not simple cycles. By strongly irreducible, we mean [GY91] that one can not make it disconnected by removing any single vertex. For instance, the diagram of Fig. 4b is strongly irreducible, whereas the diagram of Fig. 4c is not. T.3 Cactus diagrams. These diagrams, like the one of Fig. 4c, are trees made of simple cycles joining at their vertices. Among them are of course the simple cycles.
We are not interested in Eulerian diagrams that are not strongly irreducible. Indeed, as argued in [GY91], only strongly irreducible diagrams will appear in the Plefka expansions. This is an important hypothesis of the Plefka expansion, somehow a bit hidden by the formalism. We give precise descriptions of the large N limit of the expectation of all these diagrams in the following. When we write "expectation" we will always mean expectation over the orthogonal matrix of Model S. More precisely, we will show: (i) All non-Eulerian graphs of type T.1 have a vanishing expectation in the N → ∞ limit.
(ii) All strongly irreducible diagrams of type T.2 also have a vanishing expectation in the N → ∞ limit.
(iii) We already showed that the expectation of a simple cycle of size p converges to the p-th free cumulant of ρ D in Sec. 5.1. We show that the expectation of a cactus diagram converges to the product of the expectations of all its constituent simple cycles. For instance, for the diagram C of Fig. 4c we obtain that its expectation converges to: Results (i) and (ii) are justi ed in Sec. 5.2.1, and are directly useful for our diagrammatic expansions. Result (iii) on the other hand is a side result that is not used in our expansions, as we argued that only strongly irreducible diagrams come up in our expansions [GY91]. It is justi ed in Sec. 5.2.2.

Eulerian diagrams, strongly irreducible diagrams and simple cycles
Let us consider a connected diagram G with V vertices and E edges. We will show that: • If G is not Eulerian, its expectation goes to 0 as N → ∞.
• If G is Eulerian and strongly irreducible, but is not a simple cycle, its expectation also goes to 0 as N → ∞.
Once averaged over the orthogonal matrices, the permutation invariance of the indices allows us to write in which the ll are positive integers such that l<l ll = E. We can now use the results of [GM05], as we did in Sec. 5.1, to write this diagram as (in the N → ∞ limit): In this expression, M (b) ll ≡ b ll = M (b) l l for l < l ,and the diagonal is zero: M (b) ll = 0. Exactly as in Sec. 5.1, the elementary matrices {E ll } will appear in eq. (162) by successive derivatives of the exponential, using the fact that ∂ ∂b ll M (b) = E ll and then using M (b = 0) = 0. As we explained in Sec. 5.1, a trace of the products of the {E ll } matrices will only be non-zero if and only if the indices in the products form a cycle. Moreover, as is clear in eq. (162), the terms corresponding to the decomposition of E G into the maximum number of such cycles will dominate in the large N limit, as each derivation of the exponential term adds a multiplicative factor N 2 . These two facts together imply that: • If G is not Eulerian, as in Fig. 4a, its expectation will be 0 in the limit N → ∞ since it is not possible to decompose it into disjoint cycles by de nition. • If G is Eulerian, strongly irreducible, but not a simple cycle, the dominant contribution to E G in eq. (162) will arise from decomposing the graph G into simple cycles, as this decomposition maximizes the number of cycles, and we already showed that each simple cycle has a O N (1) contribution. For the graph of Fig. 4b, we show two such possible decompositions in Fig. 5.
Given the remarks above we assume now that G is Eulerian and strongly irreducible. Let us denote P the maximal number of simple cycles in such a decomposition of the graph G. Then one can see that the scaling of eq. (162) will be: One can easily be convinced that for a strongly irreducible diagram G we have V + P − E − 1 ≤ 0, and we have equality only if G is a simple cycle. This implies that all the strongly irreducible diagrams that are not simple cycles and that appear in our Plefka expansions in Sec. 2 and Sec. 3 will not contribute in the N → ∞ limit.

Cactus diagrams
As a side result, although it's not directly useful for our Plefka expansions, we show that we can compute the large N limit of any "cactus" [PP95] diagram (like the one of Fig. 4c) as a function of the free cumulants of ρ D . The argument is straightforward and uses the same technique as in Sec. 5.2.1. Consider a cactus diagram G with V vertices and E edges. One can write the same equation as eq. (162): Again, the dominant contribution is obtained by decomposing G in as many simple cycles as possible. For a cactus diagram it is easy to see that there is only one such decomposition, which corresponds to its natural decomposition into its constituent simple cycles, and that the number of such cycles is P = E + V − 1. Let us denote {r 1 , · · · , r P } the number of vertices in each of these P simple cycles. The dominant contribution corresponds to di erentiating P times inside the exponential of eq. (163). Using exactly the argument of Sec. 5.1 for each of the P simple cycles we nally obtain: This justi es the point (iii) that we gave in the introductory part of the section: the expectation of the cactus diagrams decouple into the products of their simple cycles constituents.

Concentration of the diagrams: a second moment analysis
Using our rst moment results of Sec. 5.1 and Sec. 5.2, we will show the following results: (i) If C p is the simple cycle of order p, then we have that lim N →∞ C p L 2 = c p (ρ D ), which implies directly Theorem 1 and thus ends its proof. Moreover, if G is a cactus diagram then it converges in L 2 to the products of the free cumulants corresponding to its constituent simple cycles.
(ii) If G is of the type T.1 or T.2, we have: This implies that the diagram G will be negligible in the N → ∞ limit.
Note that following the arguments of [GY91], one can convince oneself that only strongly irreducible diagrams will contribute in general to the expansion in our models. Together with point (ii) this shows in more detail Figure 6: Second moment decomposition of the simple cycle of order 3. We detail the combinatorial factors.
why only the simple cycles contribute in our Plefka expansions, like in eq. (25) for the spherical model of Sec. 2.1. In order to show (i) and (ii) we will establish the following fact. Consider a diagram G with V vertices and E edges, of any of the types T.1, T.2, or T.3. Then one has: In this formula, the sum α C α represents all the possible diagrams that one can obtain by 'gluing' together two replicas of the diagram G. Indeed, one can write the generic form of a diagram G as: in which the integers ll verify l<l ll = E. Thus one has: In this expression, one can see that two types of terms have to be taken into account: • A term for which all indices {i 1 , · · · , i V , j 1 , · · · , j V } are pairwise distinct. Diagrammatically, this corresponds to a graph with two disconnected components that are identical and equal to G. Therefore, one can repeat the arguments of Sec. 5.1 and Sec. 5.2 straightforwardly. Indeed, as all the indices are distincts, the decomposition of this diagram into the maximum number of simple cycles will be two copies of the maximal decomposition of G. This yields that this term is equal in the N → ∞ limit to (E G) 2 . • Terms for which there is at least one equality of the type i l = j l for 1 ≤ l, l ≤ V . Such a term thus corresponds to a diagram with a single connected component and constructed by 'gluing' some of the vertices of two identical copies of G. Since these diagrams have a single connected component, they carry a single 1 N factor, which explains the term 1 N α E C α in eq. (166), if we denote C α each of these possible terms.
We give a schematic representation of eq. (166) for a simple cycle in Fig. 6. It is now possible to see why it implies our results (i) and (ii). Indeed, all the diagrams C α have an expectation that is O N (1) by the rst moment analysis we performed in Sec. 5.1 and Sec. 5.2. So very generically, for every kind of diagram we described we have: Given our previous computations of the rst moments this implies results (i) and (ii).

The higher-order moments and their in uence on the diagrammatics in the symmetric model
All the results of Sec. 5.1, Sec. 5.2 and Sec. 5.3 that we derived for the diagrammatics of the Plefka expansion in this context were valid for diagrams solely made out of the matrix elements {J ij }, without any additional factors. However in the Plefka expansions there generically are possible factors that are the cumulants (or the moments) of the variables x i at β = 0, see Sec. 3.2. Recall that we denote κ (p) i the cumulant of order p of x i at β = 0. As an example, consider the diagram of Fig. 4b. Two possible contributions to the free entropy in our Plefka expansion at order 6 would be: Note that both these contributions are represented by the diagram of Fig. 4b. One can now clearly see that in order to apply the diagrammatic results of Sec. 5.1, Sec. 5.2 and Sec. 5.3 to our Plefka expansion, and justify eq. (59), we need to make some additional assumptions that we detail here: A.1 From the construction of the diagrams, odd cumulants of order greater or equal to 3 only appear in non-Eulerian graphs. By the results of Sec. 5.2 and Sec. 5.3 we know that these diagrams, without the moments or cumulants as factors, are negligible. We assume that the possible correlations of the higher order moments of x i with the matrix elements {F µi } are not strong enough to yield thermodynamically relevant corrections to the free entropy. A.2 Eulerian strongly irreducible diagrams that are not simple cycles are negligible by our previous result.
We assume that the higher order (even) moments that appear as additional factors do not change their scaling, so that they remain negligible in the thermodynamic limit.
For instance, A.2 implies that the contributions of both eq. (168) and eq. (169) are negligible in the N → ∞ limit, as the diagram of Fig. 4b is strongly irreducible but is not a simple cycle. Concerning the simple cycles, we already know that they are not thermodynamically negligible. So we do not need to assume anything additional regarding them. Note however that in order to "resum" the free entropy of the Plefka expansion, as we did in Sec. 3.2, we will need to assume that all the variance factors appearing in these simple cycles will be the same, that is v i = v (at the maximum of the free entropy).

Extension to bipartite models
We detail here how we can treat the diagrams that arise in the Plefka expansion of bipartite models with pairwise interactions (as the generalized linear models) that we perform in Sec. 3.4. The structure of this section is the following: • We show in Sec. 5.5.1 how we can generalize all the techniques and results already seen in the rest of Sec. 5 to diagrams constructed from a random rectangular matrix L drawn from the rotationally invariant ensemble given by Model R. • In Sec. 5.5.2 we transpose the assumptions of Sec. 5.4 to this bipartite case, to deal with the higher-order moments of the elds that can arise in the high-temperature Plefka expansions.

Generalization of the previous results to rectangular matrices
Consider a random matrix F ∈ R M ×N drawn from a rotation invariant ensemble satisfying Model R. We are interested in the limit M, N → ∞ with a nite ratio M/N → α > 0. In the Plefka expansions performed for bipartite models in Sec. 2.2.2 and Sec. 3.4 they appear some quantities that we can represent as diagrams. In this subsection, we construct diagrams as explained Fig. 7. For instance, the diagram depicted in this gure represents the quantity: The analogous to Theorem 2 of [GM05] for this setting can be stated. Let Σ ∈ R M ×N be a matrix such that the empirical spectral distribution of D ≡ Σ Σ converges (almost surely) as N → ∞ to a probability measure ρ D . Denote G α,ρ D the following function: Note that this is obviously an even function of x, and that G α,ρ D (0) = 0. The function G α,ρ D stands as an analog to the integrated R-transform G ρ D for this problem. Since one could expand the function G ρ D using the free cumulants of ρ D , we analogously expand formally G α,ρ D (x) around x = 0, and de ne the coe cients Γ p (α, ρ D ) by: Recall that for any function f (x), and any symmetric matrix If one can expand f (x) = k≥0 c k x k , this de nition is coherent with f (J) = k≥0 c k J k . Our generalization of Theorem 2 of [GM05] is the following: consider a rectangular matrix Λ ∈ R M ×N of nite rank p. In other terms, one can write its SVD decomposition as: with Λ p ∈ R p×p a square diagonal matrix, and U 0 , V 0 orthogonal matrices. We can now state: Note rst that the right hand side of this equation can also be written as: since Λ is of nite rank p. Equality (a) is true since G α,ρ D (0) = 0, and the spectrum of Λ Λ and ΛΛ only di er by eigenvalues which are all equal to 0. Note also that we already derived and used this relation, for p = 1, when computing the free entropy of the model of Sec. 2.2, as stated in eq. (33). Equipped with the de nitions of G α,ρ D , Γ p (α, ρ D ), and eq. (173), we can state the counterpart of all our previous results in this rectangular setting: R.1 Consider a simple cycle of size 2p. Then it converges (in L 2 ) to Γ p (α, ρ D ) as N → ∞. More precisely we have: R.2 Any diagram G that is not Eulerian will have a vanishing rst and second moment as N → ∞: R.3 Any diagram G that is strongly irreducible (that is it can not be disconnected by removing a single vertex) but not a simple cycle, as in Fig. 7, will also have a vanishing rst and second moment. R.4 If G is a cactus (a tree made of simple cycles joining at vertices [PP95]) made of r simple cycles of size (2p 1 , · · · , 2p r ), we have: Since every argument to show points R.1 to R.4 is straightforwardly given by slightly modifying what we already did in Sec. 5, the rest of Sec. 5.5.1 will be devoted to show point R.1, and we leave the remaining points for the reader.
Justifying R.1 In order to show eq. (175), we proceed as in Sec. 5.1 and begin by showing: By rotation invariance of the indices we can replace the left-hand side of eq. (178) by a term without summation on the indices, and as in Sec. 5.1 we obtain at leading order in N : , with M (b, c) a block matrix of rank p de ned as: Using eq. (173), we obtain: We de ne the elementary matrices (T ab ) ll = δ al δ bl and the symmetric elementary matrices E ab = T ab + T ba . One easily derives that ∂ 2 ∂b 1 ∂c 1 M (b, c) M (b, c) = E 12 . In a very similar way to what was done in Sec. 5.1, the dominant terms in eq. (179) will be given by the maximum number of di erentiations of the exponential term. However, one can see that the exponential can only be di erentiated once: since M (0, 0) = 0, one would need to create cycles with the matrices E ab , and such a cycle can only appear if one derives a single time the exponential term. As in Sec. 5.1, there are two cycles that are created by the successive derivatives: E 12 E 23 · · · E p1 and E 21 E 1p · · · E 32 . These two cycles yield the dominant contribution: Tr (E 12 E 23 · · · E p1 + E 21 E 1p · · · E 32 ) (M (0, 0) M (0, 0)) n−p + O N (1), This shows eq. (178). The exact same arguments as the ones used in Sec. 5.3 show that we have L 2 concentration, which means: which is the point R.1 we wanted to show.

The higher order moments and their in uence on the diagrammatics
In Sec. 3.4, we deal with diagrams which have additional factors coming from the higher order moments of the elds x i and h µ at β = 0, while all the results R.1 to R.4 that we derived for the diagrammatics of the Plefka expansion in this context were made solely out of the matrix elements {F µi }, without any additional factors. We adopt the notation of Sec. 5.4 for the higher order cumulants. Exactly as in Sec. 5.4, when considering the diagram of Fig. 7, two possible contributions to the free entropy at order 8 would be: The assumptions we need to make in order to deal with these diagrams are very similar to A.1 and A.2, and we state them here for completeness: B.1 From the construction of the diagrams, odd moments of order greater or equal to 3, like κ (3,x) , only appear in non-Eulerian graphs. By R.2 we know that these diagrams (without the moments as factors) are negligible. We assume that the possible correlations of the higher order moments with the matrix elements {F µi } are not strong enough to yield thermodynamically relevant corrections to the free entropy. B.2 Eulerian strongly irreducible diagrams that are not simple cycles are negligible by R.3. We assume that the higher order (even) moments that appear as additional factors do not change their scaling, so that they remain negligible in the thermodynamic limit.

A note on i.i.d. matrices
We make here a side comment on i.i.d. rectangular matrices. We consider a random matrix F ∈ R M ×N whose elements {F µi } are taken i.i.d., such that √ N F µi is drawn from a given probability measure ρ. We assume that ρ has zero mean and nite moments of all orders. These matrices appear in our study of the GAMP algorithm in Sec. 4.1. Except if ρ is a Gaussian probability measure, the matrix F is not rotationally invariant, in the sense that it does not satisfy Model R. However, one can still derive strong results on the diagrammatics of F . We still assume B.1 and B.2, that is we assume that the additional factors in the diagrams do not change the scaling of a negligible diagram enough to make it thermodynamically relevant. It is then easy to see that because the {F µi } are uncorrelated, all diagrams with order p ≥ 3 are negligible in the N → ∞ limit. The only diagram that remains in the N → ∞ limit is: In particular, we can only retain this diagram and apply the results of our Plefka expansions in this case as well, despite the fact that F is not rotationally invariant.

A The Georges-Yedidia formalism
In this section we recall the formalism of [GY91] which allows to systematically expand the free entropy around β = 0, at xed values of the rst and second moments of the variables. We consider here a generic Hamiltonian H J (x), with variables (x 1 , · · · , x N ). We x the rst and second moments x i β = m i and (x i − m i ) 2 β = v i , using Lagrange parameters respectivelly denoted λ i (β) and γ i (β). Recall that · β stands for the expectation over the Gibbs measure at inverse temperature β, constrained by the Lagrange multipliers λ i and γ i . From now on, we will drop the β supbscript.
We introduce the operator U from Appendix A of [GY91]: Then the derivative of the thermal average of any observable O is given by As the Lagrange multipliers λ i and γ i have been introduced to x the average of x i and its variance one has the following easy identity, valid at any β: Moreover, given that the magnetizations {m i } and the variances {v i } do not depend on β one has: Considering the previous results one can compute the derivative of U : Equipped with these relations one can compute the derivatives of the free entropy up to fourth order. Recall that Φ J is the intensive free entropy of the system. We obtain its derivatives: These relations are valid at any inverse temperature β ! In the main sections we derive the explicit expression of the operator U for our particular choice of Hamiltonian, and we will use these relations (and show how to conjecture their higher order counterparts) to compute the expansion of the free entropy around β = 0.
B Order 4 of the Plefka expansion for Sec. 2.1.
We start from eq. (192) in Appendix A, that we consider at β = 0 : For simplicity we will denotex i ≡ (x i − m i ), so that at β = 0 the {x i } variables are Gaussian variables with mean x i = 0 and covariance x ixj = δ ij v i . In particular eq. (21) becomes: From the calculation at order 2 we obtain the following relation that we can represent diagrammatically: The possible contractions arising from Wick's theorem yield several contributions, that we can represent by diagrams. Note that these diagrams are very di erent from the diagrams that we described for instance in Fig. 1, and are merely a way to visualize the contractions in Wick's theorem. The rst column contains the i α indices and the second contains the j α . Note that we always have i α = j α . The two di erent types of contractions are represented as Fig. 8a and Fig. 8b. They are 12 possible contractions of the type of Fig. 8a and 48 of Fig. 8b. We also take into account that in the pairings of Fig. 8b indices are not all necessarily pairwise distinct. Discarding terms that are O N (N ), we nally reach: Finally, combining eq. (195), eq. (197), and eq. (198) to plug them into eq. (193), we reach: which is what we wanted to show !

C Some de nitions and reminders of random matrix theory
For a complete mathematical introduction to random matrix theory, the reader can refer to [Meh04,AGZ10], while a more practical approach is carried out in [TV04]. Let us consider a compactly supported probability measure µ on R. We denote λ max ≡ max supp(µ) and λ min ≡ min supp(µ). One can introduce the Stieltjes transform of µ as: On (λ max , +∞), S µ induces a strictly increasing C ∞ di eomorphism S µ : (λ max , ∞) → (−∞, 0), and we denote its inverse S −1 µ . One can then introduce the R-transform of µ as: R µ (z) is a priori de ned for −z ∈ S µ [(λ min , λ max ) c ] and admits an analytical expansion around z = 0. We can write this expansion as: The elements of the sequence {c k (µ)} k∈N are called the free cumulants of µ. In particular, one can show that c 1 (µ) = E µ (X) and c 2 (µ) = E µ (X 2 ) − (E µ X) 2 . The free cumulants can be recursively computed from the moments of the measure using the so-called free cumulant equation: Finally, for practical purposes, for all x ∈ (−S µ (λ min ), −S µ (λ max )) we can de ne:

D Technical derivations and generalizations of the diagrammatics
We detail here some extensions of the results of Sec. 5. In Sec. D.1, we explain how to transpose these results to Hermitian matrix models, and in Sec. D.2 we show how to extend some of them to diagrams of diverging size (as N → ∞).

D.1 Hermitian matrix model
One can generalize the results of Sec. 5 to the following Hermitian matrix model, similar to Model S: Model 3. Let N ≥ 1 and U(N ) be the unitary group. Let J ∈ C N ×N be a random matrix generated as J = U DU † with U ∈ U(N ) drawn uniformly and independently from D. D is a real diagonal matrix such that its empirical spectral distribution ρ (N ) δ d i converges (almost surely) as N → ∞ a.s. to a probability distribution ρ D with compact support. The smallest and largest eigenvalue of D are assumed to converge almost surely to the in mum and supremum of the support of ρ D .
Note that the diagrams are now directed, as J ij = J ji . We describe such diagrams in Fig. 9. For instance, the diagram of Fig. 9a is equal to:  Figure 9: Diagrams similar to the ones of Fig. 4, but for Hermitian matrices. Note that the diagrams of Fig. 9a and Fig. 9b are di erent because of the di erent directions of the edges, but that both are Eulerian.
while the diagram of Fig. 9b represents the quantity: In the complex case, an Eulerian graph is similarly de ned as a graph in which one can construct a cyclic path (following the directions of the edges) that visits each edge exactly once. Note that a simple cycle is de ned such that the arrows on its edges themselves form a cycle, like the constituent cycles of Fig. 9c. We describe the main results we get, using the same kind of techniques as used in Sec. 5.1: (i) Only Eulerian diagrams contribute in the N → ∞ limit.
(ii) Consider a simple cycle C p with p vertices. Then this diagram converges in the N → ∞ limit to the free cumulant c p (ρ D ) in L 2 norm, as in the real case. More precisely: (iii) Any Eulerian strongly irreducible diagram that is not a simple cycle will be negligible in the N → ∞ limit (in L 2 norm). (iv) Any Eulerian cactus diagram (like in Fig. 9c) will converge in L 2 to the products of the free cumulants of ρ D corresponding to each one of its constituent simple cycles.
These results are straightforward generalizations of the ones obtained for real matrices in Sec. 5. For completeness, we describe how to show a weaker version of (ii), and leave other statements as easy generalizations of Sec. 5. Let us now show that the limit of the expectation of the term in (ii) is the free cumulant, as in Sec. 5.1. As before, by unitary invariance we can assume that (i 1 , · · · , i p ) = (1, , · · · , p), and we can apply the results of [GM05] to obtain a similar equation to eq. (155): with now the matrix M (b, c) de ned as: Now, we have ∂ ∂b i + i ∂ ∂c i M (b, c) = F i+1,i , in which (F a,b ) ll ≡ δ al δ bl are elementary non-symmetric matrices. In the exact same way as in Sec. 5.1, the dominant contribution in eq. (209) will be given by di erentiating a single time the exponential term, and creating a cycle with the matrices F i+1,i . Note that contrary to the symmetric case of Sec. 5.1, here only the directed cycle will contribute, whereas both possible directions of the cycle contributed in eq. (159). Indeed, the cycles in terms of the matrices {F a,b } have to be directed in order to yield a non-zero contribution: Tr [F 1,2 F 2,1 F 1,3 F 3,2 F 2,1 ] = 0, Tr [F 1,2 F 2,1 F 2,3 F 3,2 F 2,1 ] = 0.
In order to get L 2 concentration of the simple cycle on the free cumulant, one can exactly repeat the arguments of Sec. 5.3.

D.2 A note on the expectation of diagrams of diverging size
Although it is not directly useful in our Plefka expansions, another side question one can ask on the behavior of these diagrams is: how do diagrams that have a number of edges that diverge with N behave in the N → ∞ limit ? In all of Sec. 5 we only considered diagrams of nite size. The behavior of the HCIZ-type integrals with a matrix with diverging rank (as opposed to the nite-rank case) has been rigorously treated in [GM05] and then generalized in [CŚ07] as soon as the rank of the matrix diverges sub linearly in N . We recall the main result of [CŚ07]: Theorem 3 (Collins-Śniadyc). Let A N , B N be diagonal real matrices of size N . Assume that the rank M (N ) of A N is such that M (N ) = O(N ), and denote a 1,N ≥ · · · ≥ a M,N the eigenvalues of A N . Assume that the spectral measure of B N converges a.s. and in the weak sense to a probability measure ρ B , and that all elements of A N are bounded by a constant independent of N . Then one has: The techniques of Sec. 5 thus generalize to this case. We consider real symmetric matrices under Model S (in the Hermitian case, the results also generalize following the line of Appendix D.1). We say that a sequence {p(N )} satis es the bounded free cumulant property if it satis es the following: Property 1. There exists C > 0 such that for all N ∈ N, |c p(N ) (ρ D )| < C.
We state two of the results of Sec. 5 that can be easily generalized to the diverging size case without changing any of the arguments: (c) Consider a cactus diagram G composed of P (N ) simple cycles of size (r 1 (N ), · · · , r P (N )), joining at vertices. Assume that (N ) and that all the sequences r i (N ) satisfy the bounded free cumulant property. Then one has: Other results obtained in Sec. 5 for nite-size diagrams might also be applicable to the diverging size case, and we leave them for future work.