Statistical mechanics of biomolecular condensates via cavity methods

Summary Physical mechanisms of phase separation in living systems play key physiological roles and have recently been the focus of intensive studies. The strongly heterogeneous nature of such phenomena poses difficult modeling challenges that require going beyond mean-field approaches based on postulating a free energy landscape. The pathway we take here is to calculate the partition function starting from microscopic interactions by means of cavity methods, based on a tree approximation for the interaction graph. We illustrate them on the binary case and then apply them successfully to ternary systems, in which simpler one-factor approximations are proved inadequate. We demonstrate the agreement with lattice simulations and contrast our theory with coacervation experiments of associative de-mixing of nucleotides and poly-lysine. Different types of evidence are provided to support cavity methods as ideal tools for modeling biomolecular condensation, giving an optimal balance between the consideration of spatial aspects and fast computational results.

We apply cavity methods to calculate the free energy of biomolecular condensates We reproduce simulations of ternary systems where one-factor approximations fail Cavity methods are fast and apt to inverse model coacervation experiments We reconstruct the phase diagram of polylysinenucleotides associative demixing

INTRODUCTION
The spatial organization of the components of biological cells is a very important aspect of their physiology 1 and its nature is eminently physical. For instance, with regard to metabolism, different processes require in principle different environmental conditions and segregation mechanisms to ensure an efficient orchestration of cellular functionalities through compartmentalization. Classical, well-understood examples include oxidative phosphorylation and photosynthesis (performed in specialized organelles, mitochondria, and chloroplasts, respectively 2 ). It has been recently proposed that, apart from compartmentalization through lipid membranes, living systems could deal with the problem of creating and controlling microenvironments by means of the physical mechanism of phase separation, where liquid mixtures spatially segregate. 3 Examples range from ATP concentration in stress granules 4 to control of gene expression by chromatin condensation 5,6 and formation of protein complexes, 7 while a better-established mechanism is the storage of carbohydrates into starch and/or glycogen, 8 avoiding potential osmotic imbalance. On the flip side, it is well known that the inadequate formation of biomolecular condensates is the physical correlate of many prion-based pathologies, such as mad cow or Alzheimer's disease, for instance. Further evidence is accumulating to support that phase separation could play a key role in gene regulation in diseases, including cancer. [9][10][11][12] Additionally, in the field of origins of life the interest in coacervation has been ''rediscovered'' in recent years 13-16 as a simple and highly plausible compartmentalization mechanism under prebiotic conditions (as it was actually suggested in the early days of the field 17 ). It is becoming increasingly apparent, particularly among researchers in the protocell camp, that both physics and chemistry must come together to open the way toward biological complexity, in a scenario in which it is necessary to combine not only a significant diversity of molecular components but also of interactions and transformation processes. 18 One main difference with respect to classical physical and chemical studies on phase separation is the extremely heterogeneous and complex nature of biological components, 19 with thousands of different species of microscopic units (that can be complex themselves, such as polymers) even in a relatively simple bacterium such as E. coli. 20 Besides, the specific focus of investigations in life sciences is centered on problems of control, design, and inverse modeling. These aspects spurred the wide use of mean-field approximations for theoretical and computational studies, in particular regarding the extension of the regular solution model. [21][22][23][24][25] In the case of polymer solutions, the classical Flory-Huggins (FH) model can be used to describe the segregative de-mixing with the formation of multiple phases, each enriched in one respective polymer. 26,27 This model was later extended by Voorn and Overbeek (VO) for solutions of oppositely charged poly-ions (i.e., charged polymers) which usually display associative de-mixing, with the formation of one phase enriched in multiple poly-ions. 28 Both models are mean-field approximations that build on the interplay between the entropy that drives the mixing of the system and the enthalpy, which results from the interaction energies (derived from van der Waals or ionic forces) between the molecules. As such, they do not explicitly deal with the partition function of the system.
In recent years these mean-field models have been extended successfully to the case of complex coacervation, in particular taking into account the effect of the sequence and focusing on the problem of phase separation of intrinsically disordered proteins, by means of random phase approximation theory. 29,30 In this article we will focus on the multi-canonical model defined in 21-25 and its proposed mean-field approximation, namely the regular solution model. We chose to analyze this model for the sake of simplicity, since it is a generalization of the Potts model, which is a minimal model representing an instance of universality class in phase transitions. 31 One common shortcoming of the aforementioned mean-field approaches is that they tend to neglect spatial correlations by recurring to one-factor approximations, akin to the well-known Curie-Weiss (CW) approximation in magnetic systems. 31 This is known to lead to difficulties in presence of idiosyncratic, repulsive interactions and frustration, yielding multi-equilibrium. To overcome these difficulties more refined mean-field approximations were developed within the framework of magnetic systems, among which we can find the Bethe-Peierls (BP) approximation, 32,33 recently reformulated as cavity methods, 34 or message passing and belief propagation algorithms. 35 The latter are considered important standard methods for the statistical physics of spin glasses and disordered systems, with applications that include inference, information theory and resolution of combinatorial optimization problems. 36 In this work we will apply the BP mean-field technique to describe the self-assembly of biomolecular condensates, focusing more specifically on the problem of reproducing numerical simulations of a general grand-canonical heterogeneous lattice model. The article is organized as follows. First, we will introduce the BP approach and compare it with the regular solution model on a simple standard binary system, providing an analytical formula for the spinodal line that improves quantitatively the match with numerical simulations (in comparison with the classical formula coming from the regular solution model). Then a simple ternary system will be considered, to explore a case where the regular solution model is clearly inadequate (i.e., unable to show even a qualitative agreement with numerical simulations) for the case of mixed repulsive and attractive interactions, known to lead to associative de-mixing. We will demonstrate that the BP approach reproduces much better numerical simulations and provides an immediate method to draw phase diagrams with a semi-quantitative controlled match. This supports the use of BP to classify much more accurately de-mixing phenomena in ternary systems in the interaction coupling space, where we successfully recapitulate the main three modes of phase separation, i.e. i) associative, ii) segregative, and iii) counter-ionic de-mixing. Thus the way is paved for inverse modeling and inference of couplings from experimental results, so our last section will be devoted to the reconstruction of de-mixing phase diagrams from high-throughput data. A relatively simple but well-controlled in vitro system with poly-lysine and nucleotides in buffer solution was chosen because it allowed us to explore parameter space more sistematically, it is relevant in the context of prebiotic chemistry and it fulfills our quantitative modeling purposes. Our findings and the potential implications of our work will be finally summarized in a conclusion section.
Potts model Jðs i ;s j Þ = d si ;sj ) and the number of different kinds of particles is controlled by their chemical potentials mðs i Þ. The Hamiltonian of the system is, therefore: where the first sum runs over all neighboring lattice sites Ci;jD. In contrast with the regular solution model, we will not postulate a form for the free energy, but rather derive it from microscopic interactions by computing the partition function. This is the fundamental quantity that bridges between the molecular microscopic interactions and the collective macroscopic behavior of the system, alongside with its thermodynamic properties. 3 Its computation makes it possible to map between the energy as a function of the diverse microscopic configurations and the free energy as a function of macroscopic variables (e.g. concentrations and/or chemical potentials, temperature). Here its expression is (b = 1=T is the inverse temperature) We remark that the computation of the partition function is in general a very difficult computational task since an evaluation of the sum on the RHS could require to count objects with involved combinatorics. 36 For instance, its calculation for the simple Ising model without an external magnetic field corresponds to the enumeration of all closed loop in the underlying graph. 38 We will compute it by approximating the lattice in terms of a tree-graph branching out from any given site. In this way, the lattice is decomposed into sub-systems that are connected only by the sprouting site. Once the state value of the latter is fixed, the partition function can be factorized in terms of the partition functions of the sub-systems and the procedure can be iterated recursively (see Figure 1).
We end up with the equations where Q stands for the product, v ji are all the sites connected to j except from i, and Z i/j ðs i Þ is the partition function of the sub-system starting from site-j, given that site-i is fixed to the value s i . In general, we have: Z i/j ðs i ÞsZ j/i ðs j Þ

Binary system
For a simple binary phase separation we have q = 2 and s i = 0; 1. Fixing Jðs i ; 0Þ = mð0Þ = 0 the cavity equations will be Considering that the tree-graph is a Caley-Graph with a branching of C = K + 1, and assuming homogeneity u i/j = u ðci; jÞ we get for Equation 5: In addition one can assume that the average site occupation or density (equivalent to the occupation probability of the lattice site by the solute) CsD = 4 will verify the following equation (see STAR Methods S.1 for further details): These equations express implicitly the state equation 4ðmÞ, from which the phase separation curve ðbJÞð4Þ can be obtained, in implicit form, by standard thermodynamic stability analysis upon introducing parameter w = e bu : where the values w 1;2 correspond to the two branches of the spinodal line. The above parametric formula can be compared now with the one obtained from the Curie-Weiss model (see STAR Methods S.3 for a derivation) bJ = 1 ðK + 1Þð1 À 4Þ4 (Equation 9) and contrasted with microscopic numerical simulations of the model on a regular square lattice (for which K = 3) as illustrated in Figure 2, where we show the binodal lines as well (calculated via Maxwell construction, see STAR Methods S.4). As it can be easily observed, even though the model on a nearestneighbor lattice is extremely far from being tree-like, numerical simulations are in better quantitative agreement with the predictions of the cavity method, compared with the regular solution equations. An example of the biological application of the phase separation in a binary model to receptor clustering can be found in 39 .

(Equation 12)
provide the state equations of the system. The phase diagram can be drawn by checking if the matrix is positive definite (by the Routh-Hurwitz criterion det H > 0 and trH > 0). Results from numerical simulations and mean-field calculations are summarized in Figure 3, where depending on the interaction signs phase separation can be classified into three different kinds: i) associative, ii) segregative, and iii) counter-ionic de-mixing. Those three general types of phase behavior are well accounted for if cavity methods are applied, but more naive or direct mean-field models clearly fail to do so. A strong advantage of mean-field approximations is their low computational cost as compared to an explicit lattice model simulation, considering the fact that the system behavior can be assessed by solving a handful of nonlinear equations. In comparative terms, the time to reconstruct the phase diagrams for the ternary system, shown in Figure 3, differs by 6-7 orders of magnitude when we switch from the lattice model simulations to the resolution of the BP equations (hours vs ms in our implementation). This reduction of computational time enables a full inverse modeling approach to experimental data. The black data points are obtained from the peak of the specific heat, whereas the green ones by means of the image processing method described in the text. The green shaded region depicts the transition from a metastable to a phaseseparated state as detected by the image processing algorithm (see Figure S3 in the STAR Methods for more details).

Modeling experiments
We will consider here the experimental phase diagram of a system of poly-lysine and adenosine-diphosphate (ADP) in buffer solution, as obtained by microscopy imaging. This system, given the residual electrostatic charge of its components, is expected to show typical associative de-mixing behavior, which is not accounted for correctly by the regular mean-field solution model. We considered thus the task of inferring the parameters of the aforementioned ternary solution model that reproduce the experimental phase diagram.
This has been formally modeled as a binary classification problem and we implemented an algorithm for parameter inference based on heuristic optimization via differential evolution algorithms. 40 Results are reported in Figure 4, where we show the experimental phase diagram together with the simulations of the Comparison of the phase diagrams in the plane of solutes volume fractions ð4 + ; 4 À Þ for the ternary system obtained from lattice model simulations (top, obtained from image processing), BP mean-field (center, spinodal), and regular solution mean-field model (bottom, spinodal), for the associative (left J À À = J + + = À 1;J + À = 3), segregative (middle J À À = J + + = 1; J + À = À 3) and counter-ionic (right J + + = 2; J À À = 0; J + À = 0:5) de-mixing cases (see text), respectively BP classifies simulation results with 88% accuracy. iScience Article inferred models that are compatible with the associative de-mixing case. Although this provides a quantitative description, a small perturbation in the initial parameter values can lead to an equally well-inferred model, with different parameters. This hints at the presence of many local maxima for the likelihood of model parameters and calls for more refined experiments and/or an inference scheme going beyond simple binary classification. We then performed an inference calculation by constraining the model parameters to be in the region of segregative de-mixing (J À À > 0; J + + > 0; J + À < 0, not shown), obtaining a consistently higher error rate (that is, the fraction of mismatch in binary phase classification, 15%, versus the 3% of the associative case). This shows that our simple setting is able to tell apart the different phase separation mechanisms.
Predicted re-entrance in the experimental phase diagram As an additional piece of evidence, let us report here the confirmation of a successful prediction made by our inferred model. Our inferred binding interaction energies predict a form of the phase diagram that falls in the case of associative de-mixing, implying a re-entrance to the symmetric, well-mixed phase for high enough concentrations/volume fractions. Therefore, we performed experiments in that concentration range where such a phenomenon should occur and we found it, in nice agreement with our theoretical prediction (see Figure 5). We remark that the binding energies were inferred in the first part of the phase diagram, apparently showing no sign of re-entrance, so this can be seen as a bona fide prediction or a successful non-trivial generalization.

Image processing and simulations
Numerical simulations were performed via the Monte Carlo Kawasaki scheme, 41 enforcing fixed volume fraction, and the inverse critical temperature was estimated independently, as the location of the peak of specific heat and the point where the free energy profile changes its concavity (see Figure 6, right). The results of Monte Carlo simulations were analyzed via a heuristic image processing algorithm to identify the occurrence of phase separation. Local particle densities were computed for each component in a lattice snapshot through 2 days convolution with a Gaussian kernel and periodic boundary condition. The logarithm of the distribution of the local density thus obtained was considered via the Gibbs equation as a bona-fide approximation of the free energy of de-mixing. An automated inspection of the number of minima of this reconstructed profile leads to the classification of systems into well-mixed and phaseseparated. The procedure is inspired by statistical tests comparing non-parametric distributions and produces a separation confidence score depicted as the color scale in Figure 3. The method is illustrated in Figure 6, left, for two instances of the ternary system that are representative of the well-mixed and phaseseparated systems, respectively. The method provided very accurate estimates, as can be seen in Figure 6, right where we show the scattering plot of the inverse critical temperature, at varying volume fractions, for the binary system obtained from the calculation of the peak of the specific heat (see STAR Methods S.5 and S.6). The study of how biological cells manage or fail to control the spatial/physical conditions of their internal milieu through mechanisms of phase separation is of paramount importance. This should benefit from the wealth of knowledge acquired in the field of the statistical physics of phase transitions in disordered systems, not only in terms of quantitative modeling, but also in data analysis and experiment design. In this article, we have illustrated an application of the mean-field Bethe-Peierls (BP) approximation in the context of heterogeneous phase separation. We have shown that the BP approach quantitatively reconstructs phase diagrams where the standard regular solution model fails even to give a qualitative description: more precisely, in a minimally heterogeneous ternary lattice microscopic model. In particular, the regular solution model 21-25 can be seen as coming from an underlying hypothesis of a fully connected graph (see STAR Methods S.2) and this procedure is not guaranteed to provide a reliable approximation in presence of idiosyncratic interactions without appropriate ad-hoc gauge transformations (an elementary example being given by the Ising antiferromagnet 37 ).
An explicit derivation of the free energy from the partition function, as we do here with cavity methods, helps in the formulation of an appropriate ansatz for the order parameter form and underlying symmetries. This, apart from being a more adequate theoretical strategy to deal with condensation phenomena, opens a new way for quantitative modeling of experimental data. We actually provided an example reproducing the experimental phase diagram of the associative de-mixing of poly-lysine in the presence of nucleotides. The finding that many models lead to a quantitative description of experimental phase diagrams will deserve further investigation. In this respect, our approach, applied to synthetic data from lattice model simulations could shed light on the right experimental quantities to be measured, leading to well-defined descriptions of the system (i.e., optimal experimental protocols).
In dealing with real data, an interesting ingredient to analyze with our method would be the introduction of inner degeneracy for the basic degrees of freedom, in order to model complex mixtures, since it could also potentially trigger inverse behaviors. 42 Yet, in order to reproduce the experimental phase diagram we did not need to consider this additional parameter. This could be ascribed to the simplicity of the data being analyzed and to the fact that the interpretation of single lattice points as single molecules is really for illustrative purposes and it should be revised in the light of quantitative comparison with experiments. This is well known for the Ising model, where lattice points are usually not identified with spins of single atoms or molecules but rather with aptly coarse-grained magnetic domains of a given scale. 31 This aspect is non-trivial and its clarification would touch upon the establishment of a renormalization group approach for these systems, a task that we leave for further investigations.
Although our case study is relatively simple (in comparison with the biomolecular condensates that one can find in a cell's cytoplasm), it is sufficient for our purposes here, and it is offered also as a relevant example for the origin of life research. Another interesting modification of the model would be to include explicit spatial One of the caveats of the cavity method consists in its mean-field character, thus limiting its predictive capabilities, in particular for the characterization of the phase transition (e.g. critical exponents 31 ). Field theory methods could overcome this aspect and they have been applied successfully in this context recently to model phase separation of intrinsically disordered proteins 45-47 see also the recent review. 48 Apart from that, a promising next step would be to use the BP approach to analyze strongly heterogeneous multicomponent systems via replica methods, as it was originally the aim of the regular solution model. In contrast with the latter, BP will be not restricted to the case of mildly attracting interaction matrices. In fact, it could be used to explore any kind of interaction patterns, with idiosyncratic terms, since this approximation showed to be successful in attacking systems with a much more complex free energy landscape, such as spin glasses.

Limitations of the study
The nature of the proposed cavity method approximation is inherently mean field and thus semi-quantitative. Rigorously, it is known that the free energy calculated in a mean-field approximation (including our approach) is always an upper bound for the true free energy and that mean-field approximations fail to describe quantitatively critical behavior. More refined Monte Carlo numerical simulations or field ftheory-based methods would be needed e.g. for a quantitative assessment of critical exponents. The experimental system we successfully model is arguably rather simple, especially if compared to the complexity of a true cellular environment.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

DECLARATION OF INTERESTS
The authors declare no competing interests.

Materials availability
All reagents were obtained from commercial sources and directly used without any further purification. The polymers, poly-lysine of different lengths, were obtained from Alamanda polymers, Inc. Adenosine 5-diphosphate sodium salt (cat. no.: A2754), Adenosine 5-triphosphate disodium salt (cat. no.: A3377), reduced disodium salt hydrate (cat. no.: N8129) were obtained directly from Sigma Aldrich.

Data and code availability
Source code for the parameter inference as well as the Monte Carlo numerical simulation is available online: https://github.com/ondrejtichacek/cavity-phase-sep.

Formulation of the Bethe-Peierls (BP) model
To derive the BP mean field in order to calculate the phase-diagrams, one starts form the Hamiltonian for the system (Equation 1 in the main text). Furthermore, as depicted in Figure 1 in the main text, the lattice is approximated as a tree-graph that branches out from any fixed lattice-site. This allows for the lattice to be decomposed into sub-systems which are only connected through the lattice-site they emerge from, which furthermore means that the Hamiltonian of the system can be decomposed as well: Hð s ! Þ = À X

OPEN ACCESS
where v 1 is the neighbourhood of the lattice-site-11 (see again Figure 1 in the main text) and H À 1 is the Hamiltonian of the system without site-1. Substituting this expression into the partition function (Equation 2 in the main text) results in: Assuming that there are no interactions between sub-systems, one can write: One can further decompose the sub-partition function Z 1/i ðs 1 Þ = P subi e bJðs1;si Þ e À bHi . As above the Hamiltonian H i of the i-th sub-system can be decomposed into further sub-systems (see again Figure 1 of the main text): where v i\1 is the neighbourhood of the lattice-site-i without lattice-site-1 (and it's connected subs-systems) and H À i is the Hamiltonian of the sub-system emerging from lattice-site-i. As above, one can assume that there no interactions between sub-systems, thus one can write H À i = P j˛v i\1 H j . The sub-partition function can then be further decomposed as: iScience Article In other words, the sub-partition function for the sub-systems emerging from site-1 is: This can be generalized to: which is Equation 3 in the main text. In other words Z i/j ðs i Þ is the sub-partition function of the sub-system starting from lattice-site-j given that lattice-site-j itself starts from lattice-site-i (i.e. lattice-site-i is fixed). In general we have: Z i/j ðs i ÞsZ j/i ðs j Þ. In the following sections this partition function will be solved for the special cases of a binary-and a ternary solution.

Binary system
As described in the main text, for a binary solution, i.e. q = 2 and s i = 0; 1, one can fix Jðs i ;0Þ = mð0Þ = 0, as well as Jð1; 1Þ = J; mð1Þ = m. Putting this into Equation s1 one can simplify the partition function: Therefore the partition function for a the binary solution is: iScience Article Taking the logarithm on both sides of the above equation results in the following set of self-consistent equations for the messages u i/j : which is Equation 5 from the main text. To further simplify this set of equations, one can assume that the tree-graph is a Caley-Graph with a branching of C = K + 1 and that there is homogeneity of the messages, i.e. u i/j = u; ci; j. Thus one has P k˛v j\i u j/k = Ku as v j\i = K. One therefore retrieves for Equation s5: which is Equation 6 from the main text. One can furthermore retrieve the probability Pðs 1 = 1Þ of a lattice site being in state 1 (i.e. being occupied by a solute particle): which is essentially the volume-fraction 4 solute particles2. Applying the same parametrization from above results in: Again the assumption that the tree-graph is a Caley-Graph, therefore jv 1 j = K + 1, as well as the homogeneity u 1/i = u; ci, was used. This way one retrieves 4 is depending on u and m:  vðbmÞ vu = À be bu e bu À 1 + bK À be bu e bJ À e bu = 0 substituting w = e bu one can solve this equation: À bwðe bJ À wÞ + bKðw À 1Þðe bJ À wÞ À bwðw À 1Þ ðw À 1Þðe bJ À wÞ = 0 Kðw À 1Þ À e bJ À 1 Á = wðe bJ À wÞ + wðw À 1Þ Kðw À 1Þ À e bJ À 1 Á = wðe bJ À 1Þ Kwe bJ À Ke bJ À Kw 2 + Kw = we bJ À w À Kw 2 + À Ke bJ + k À e bJ + 1 Á w À Ke bJ = 0 The solution to this quadratic equation yields the followng expressions: which is Equation 8 from the main text. In conclusion, to get a spinodal-curve one can chose values for bJ in order to get values for w 1:2 . One then substitutes them into Equation s10, to get values for 4. This way one thus has a function 4ðbJÞ, in order to get bJð4Þ one simply takes the inverse ð4ðbJÞÞ À 1 = bJð4Þ.

Ternary system
For a ternary solution with s i˛f À 1; 0; + 1g one can assume, like in the binary solution, that Jðs i ; 0Þ = mð0Þ = 0. Furthermore, as described in the main text, one can do the following relabeling: Jð À 1; À 1Þ = J À À ; Jð + 1; + 1Þ = J + + ; Jð À 1; + 1Þ = Jð + 1; À 1Þ = J + À ; mð À 1Þ = m À ; mð + 1Þ = m + . Putting this into Equation s1 for the partition function gives: Thus one retrieves the partition function for ternary solution as: In a similar way we one can modify Equation s2: Which gives the sub-partition function for the sub-systems in the ternary lattice as: which is Equation 10 in the main text. Like in the binary solution one can introduce a parametrization: Z i/j ðs i Þ = A i/j e bðvi/j si + wi/j s 2 i Þ . Applying this to the RHS and the LHS of Equation s12 gives: iScience Article As in the binary solution one can take the fraction Z i/j ð À 1Þ=Z i/j ð0Þ: The same way one can get for Z i/j ð + 1Þ=Z i/j ð0Þ: These are again the messages for the ternary system. Taking the logarithm on both sides of the above equations one can retrieve get the formulas: Again, one can assume a Caley-Graph with a branching of C = K + 1 and u G i/j = u G ; ci; j. Thus one has P k˛v j\i u G j/k = Ku G as v j\i = K. Equation s13 then simplifies to: log e bðJÀÀ À m À + Ku À Þ + 1 + e bðJ + À À m + + Ku + Þ e bðÀ m À + Ku À Þ + 1 + e bðÀ m + + Ku + Þ ! u + = 1 b log e bðJ + À À m À + Ku À Þ + 1 + e bðJ + + À m + + Ku + Þ e bðÀ m À + Ku À Þ + 1 + e bðÀ m + + Ku + Þ

(Equation s15)
which is Equation 12 from the main text.

Calculating the spinodal-curve for a ternary solution
In order to get the spinodal-curve for a ternary system a small ''workaround'' is needed. First, one can define the following variable transformation m G = À m G + Cu G (recall that for our tree-graph one has C = K + 1). One thus gets for Equation s15: Given 4 G one can numerically solve this self-consistent equation to get an estimator for m G . In a similar way one can use the variable transformation from above on Equation s14 (recall K = C À 1): log e bðJ GG + m G À u G Þ + e bðJ + À + m H À u H Þ + 1 e bðm À À u À Þ + e bðm + À u + Þ + 1 ! Together with the estimate for m G one can again numerically solve this self consistent equation to get an estimate for u G . Finally one can get m G from the variable transformation together with the estimates for m G and u G : This way one retrieves an estimate for the external fields m G in every point ð4 + ; 4 À Þ. to get the spinodalcurve one then needs to take the Hessian-matrix: . One can compute the derivatives numerically. With the hessian one can use the property that if trðHÞ > 0 and detðHÞ > 0 the matrix is positive-definite meaning that the free energy F is concave-upward which means that there is no phase-separation. Thus for all points ð4 + ; 4 À Þ where this is is not the case we know that there is phase-separation happening. To derive the mean field equations of the regular solution model one again starts from the Hamiltonian of the system (Equation 1 in the main text). Again the lattice is approximated as a graph, however in contrast to the BP-model one assumes a fully connected graph, rather then a tree. One can furthermore substitute the interaction function Jðs i ; s j Þ and the chemical potential mðs i Þ with a symetric matrix and a vector respectively, i.e.: Jðs i ;s j Þ = J si sj ;mðs i Þ = m si . As the lattice is now fully connected the pairwise sum runs now over all lattice sites. This means one can rewrite the Hamiltonian to: One can further rewrite J si sj and m si to: .; q À 1 being the possible states for s i (q possible states overall4) and d ksi being the Kroneckerdelta. Substituting this back into the Hamiltonian yields: with N k being the number lattice sites that are in state k, i.e. the number of type-k molecules. Thus the Hamiltonian of the regular solution model is: Having transformed the exponent to be dependent on N k rather than s i one would also want the sum to be running running over the state vector N ! = ðN 0 ; .; N k ; .; N q À 1 Þ which is the vector representing the amount of lattice-sites that are in a certain sate k. In general a certain state vector N ! can be represented by multiple vectors s ! 5. This multiplicity is accounted for by further introducing a multinomial coefficient in the sum. One can therefore rewrite the partition function to: One can use Stirling-approximation, i.e. N!ze Nðlog N À 1Þ , to do the following simplification: which is a partition sum over the state space fN k g with the constraint that P N k = N. Since N k basically represents the amount of type-k molecules in the system, one has: 4 k = N k N , with 4 k being the volume fraction of type-k molecules, with P 4 k = 1. Thus the state space f4 k g is essentially defined by a simplex. One can generalize the above partition sum to an integral over this simplex: With Fð 4 ! Þ being the free energy of the system. Comparing this integral with the partition sum one can see that f ðN ! Þ = À bNFð 4 ! Þ. Therefore one can rewrite the exponent in the partition sum from above in order to get Fð 4 ! Þ: = À bN 2 6 6 6 6 6 6 4 Using the relation f ðN ! Þ = À bNFð 4 ! Þ from above one retrieves: which is the free energy of the regular solution model.

! (Equation s25)
Looking at the derivative of log Z with respect to bm for the general expression Z = P si e bfint e À bm P si : In other words one retrieves the following relation: Calculating the binodal-curve The binodal curve has been calculated through the Maxwell construction. We report an example here below in Figure S1 (BP for the binary system with K = 3, b = 3). This is obtained by the standard procedure of finding the point where the chemical potentials of the two phases are equal. Upon looking in Figure S1 (top) this geometrically corresponds to find the horizontal line whose intersection with the mean-field isothermal curve gives regions of equal area. We report here in Figure S1 (bottom) the binodal curves for the binary system with both approximations (CW and BP) alongside with the spinodals.

Finite size scaling
The systems we are considering here have short-range interactions and finite number of components (we are not dealing with a disordered systems) and thus, in regard to the study of the peak of specific heat as a function of the temperature, finite size effects shall follow established results of classical finite size scaling theory. 49 In particular, in our case, the rounding and shifting of the transition point is ruled by terms of the order OðL À d Þ (L being the lattice size and d the dimension), as we show below in Figure S2 Calibration of the image processing algorithm iScience Article with numerical simulations of different systems along a simulated annealing pathway (b slowly increasing). Then, the method can be used for binary classification (mixed/separated). The Figure S3 shows the measure of phase separation x compared to the specific heat, as a part of the calibration procedure.

Further details on numerical simulations
In all simulations, the lattice was initialized randomly with a uniform distribution of all components. In case of the results presented in the Figure 2 (main text), the system was slowly evolved along a simulation pathway with the inverse thermodynamic temperature b increasing linearly from 0 to 8 over 10 4 iterations (simulated annealing). During each iteration, the system was equilibrated for 10 3 steps and the final snapshot was analysed for phase-separation. The total number of steps per system (defined by volume fraction 4) was therefore 10 7 , each step consisting of L 2 Kawasaki swaps, where L is the system size. For the Figure 3 (main text), the following procedure was employed. During an initial annealing period (10 4 steps), b was increased from 0 to 1, which was used for the rest of the simulation. Afterwards, the system was simulated for 10 6 steps in order to reach equilibrium, which has been verified by observing a plateau in the measured total energy of the system. Finally, the simulation was advanced further 10 6 steps during which a total of 10 3 uniformly distributed snapshots were collected for present analysis.

Further details on the comparison between the cavity method results and numerical simulations
For sake of clarity, we report in Figure S4 phase diagrams for the ternary system where we superimpose the results from BP with the ones from numerical simulations: in all cases we are above 88% accuracy.

General experimental conditions
Experiments were carried at room temperature in a 1536 well microplate (Greiner bio-one, item no.: 783096) where the coacervate forming components, i.e. Poly-L-lysine hydrochloride of length 10mer (Alamanda polymers, item: PLKC10) and Adenosine 5 0 -diphosphate sodium salt (Sigma Aldrich, cat. no.: A2754), were distributed, from 10X stock solutions, using Labcyte Echo 550 acoustic liquid dispenser. The phase diagram was obtained in 20mM Tris-HCl buffer at pH 9.0 which was dispensed using Fluidx XRD-384 reagent dispenser. After mixing the solutions, bright-field images of each of the wells were acquired using Yokogawa CellVoyagerä CV7000 high-throughput cytological discovery system at 60X magnification (Olympus objective UPLSAPO60XW, product no.: N1480800). The formation of coacervate was detected by visual inspection of the acquired images and by automated detection of phase separation (through image processing). All the chemicals were prepared in water at 200 mM concentrations. The buffer used here is Tris-HCl, which was prepared by dissolved 25 mM Trizma base (cat. no.: T1503) and was adjusted to pH 9.0 using hydrochloric acid.

Automated liquid handling
Automated liquid handling was performed using two different instruments. The Fluidx XRD-384 reagent dispenser was using to fill 3.2 mL of buffer solution in a 1536 well microplate (Greiner bio-one, item no.: 783096). The positive and the negative chemical species were dispensed using Labcyte Echo 550 acoustic liquid dispenser in a volume of 4 nL each. The remaining volume up to 4 mL was filled up with Milli-Q water.
The .csv file that served as an input for the Labcyte Echo 550 was prepared using KNIME. The KNIME workflow takes an excel sheet with desired concentrations of positive and negative chemical species and performs required transformations in order to produce a .csv file that is Echo 550 readable. Once the dispensing is over the 1536 well microplate is vortexed in order to mix the two components. The microwell plate is then proceeded for microscopy. The Figure S5 provides a comprehensive view of the complete experimental pipeline.