Ensemble equivalence for dense graphs

In this paper we consider a random graph on which topological restrictions are imposed, such as constraints on the total number of edges, wedges, and triangles. We work in the dense regime, in which the number of edges per vertex scales proportionally to the number of vertices $n$. Our goal is to compare the micro-canonical ensemble (in which the constraints are satisfied for every realisation of the graph) with the canonical ensemble (in which the constraints are satisfied on average), both subject to maximal entropy. We compute the relative entropy of the two ensembles in the limit as $n$ grows large, where two ensembles are said to be \emph{equivalent} in the dense regime if this relative entropy divided by $n^2$ tends to zero. Our main result, whose proof relies on large deviation theory for graphons, is that breaking of ensemble equivalence occurs when the constraints are \emph{frustrated}. Examples are provided for three different choices of constraints.

1 Introduction Section 1.1 gives background and motivation, Section 1.2 describes relevant literature, while Section 1.3 outlines the remainder of the paper.

Background and motivation
For large networks a detailed description of the architecture of the network is infeasible and must be replaced by a probabilistic description, where the network is assumed to be a random sample drawn from a set of allowed graphs that are consistent with a set of empirically observed features of the network, referred to as constraints.Statistical physics deals with the definition of the appropriate probability distribution over the set of graphs and with the calculation of its relevant properties (Gibbs [14]).The two main choices 1 of probability distribution are: (1) The microcanonical ensemble, where the constraints are hard (i.e., are satisfied by each individual graph).
(2) The canonical ensemble, where the constraints are soft (i.e., hold as ensemble averages, while individual graphs may violate the constraints).
a Mathematical Institute, Leiden University, P.O.Box 9512, 2300, RA Leiden, The Netherlands b Korteweg-de Vries Institute, University of Amsterdam, P.O.Box 94248, 1090 GE Amsterdam, The Netherlands 1 The microcanonical ensemble and the canonical ensemble work with a fixed number of vertices.There is a third ensemble, the grandcanonical ensemble, where also the size of the graph is considered as a soft constraint.
For networks that are large but finite, the two ensembles are obviously different and, in fact, represent different empirical situations: they serve as null-models for the network after incorporating what is known about the network a priori via the constraints.Each ensemble represents the unique probability distribution with maximal entropy respecting the constraints.In the limit as the size of the graph diverges, the two ensembles are traditionally assumed to become equivalent as a result of the expected vanishing of the fluctuations of the soft constraints, i.e., the soft constraints are expected to become asymptotically hard.This assumption of ensemble equivalence, which is one of the corner stones of statistical physics, does however not hold in general (we refer to Touchette [28] for more background).
In Squartini et al. [27] the question of the possible breaking of ensemble equivalence was investigated for two types of constraint: (I) The total number of edges.
(II) The degree sequence.
In the sparse regime, where the empirical degree distribution converges to a limit as the number of vertices n tends to infinity such that the maximal degree is o( √ n), it was shown that the relative entropy of the micro-canonical ensemble w.r.t. the canonical ensemble divided by n (which can be interpreted as the relative entropy per vertex) tends to s ∞ , with s ∞ = 0 in case the constraint concerns the total number of edges, and s ∞ > 0 in case the constraint concerns the degree sequence.For the latter case, an explicit formula was derived for s ∞ , which allows for a quantitative analysis of the breaking of ensemble equivalence.
In the present paper we analyse what happens in the dense regime, where the number of edges per vertex is of order n.We consider case (I), yet allow for constraints not only on the total number of edges but also on the total number of wedges, triangles, etc.We show that the relative entropy divided by n 2 (which, up to a constant, can be interpreted as the relative entropy per edge) tends to s ∞ , with s ∞ > 0 when the constraints are frustrated.Our analysis is based on a large deviation principle for graphons.

Relevant literature
In the past few years, several papers have studied the microcanonical ensemble and the canonical ensemble.Most papers focus on dense graphs, but there are some interesting advances for sparse graphs as well.Closely related to the canonical ensemble are the exponential random graph model (Bhamidi et al. [3], Chatterjee and Diaconis [9]) and the constrained exponential random model (Aristoff and Zhu [1], Kenyon and Yin [19], Yin [30], Zhu [32]).
In Aristoff and Zhu [1], Kenyon et al. [18], Radin and Sadun [24], the authors study the microcanonical ensemble, focusing on the constrained entropy density.In [1] directed graphs are considered with a hard constraint on the number of directed edges and j-stars, while in [18,24] the focus is on undirected graphs with a hard constraint on the edge density, jstar density and triangle density, respectively.Following the work in Bhamidi et al. [3] and in Chatterjee and Diaconis [9], a deeper understanding has developed of how these models behave as the size of the graph tends to infinity.Most results concern the asymptotic behaviour of the partition function (Chatterjee and Diaconis [9], Kenyon, Radin, Ren and Sadun [18]) and the identification of regions where phase transitions occur (Aristoff and Zhu [2], Lubetsky and Zhao [21], Yin [29]).For more details we refer the reader to the recent monograph by Chatterjee [7], and references therein.Significant contributions for sparse graphs were made in Chatterjee and Dembo [8] and in subsequent work of Yin and Zhu [31].
For an overview on random graphs and their role as models of complex networks, we refer the reader to the recent monograph by van der Hofstad [16].The most important distinction between our paper and the existing literature on exponential random graphs is that in the canonical ensemble we impose a soft constraint.

Outline
The remainder of this paper is organised as follows.Section 2 defines the two ensembles, gives the definition of equivalence of ensembles in the dense regime, recalls some basic facts about graphons, and states the large deviation principle for the Erdős-Rényi random graph.Section 3 states a key theorem in which we give a variational representation of s ∞ when the constraint is on subgraph counts, properly normalised.Section 4 presents our main theorem for ensemble equivalence, which provides three examples for which breaking of ensemble equivalence occurs when the constraints are frustrated.In particular, the constraints considered are on the number of edges, triangles and/or stars.Frustration corresponds to the situation where the canonical ensemble scales like an Erdős-Rényi random graph model with an appropriate edge density but the microcanonical ensemble does not.The proof of the main theorem is given in Sections 5-6, and relies on various papers in the literature dealing with exponential random graph models.Appendix A discusses convergence of Lagrange multipliers associated with the canonical ensemble.

Key notions
In Section 2.1 we introduce the model and give our definition of equivalence of ensembles in the dense regime (Definition 2.1 below).In Section 2.2 we recall some basic facts about graphons (Propositions 2.4-2.6 below).In Section 2.3 we recall the large deviation principle for the Erdős-Rényi random graph (Proposition 2.7 and Theorem 2.8 below), which is the key tool in our paper.

Microcanonical ensemble, canonical ensemble, relative entropy
For n ∈ N, let G n denote the set of all 2 ( n 2 ) simple undirected graphs with n vertices.Any graph G ∈ G n can be represented by a symmetric n × n matrix with elements h G (i, j) := 1 if there is an edge between vertex i and vertex j, Let C denote a vector-valued function on G n .We choose a specific vector C * , which we assume to be graphic, i.e., realisable by at least one graph in G n .For this C * the microcanonical ensemble is the probability distribution P mic on G n with hard constraint C * defined as where is the number of graphs that realise C * .The canonical ensemble P can is the unique probability distribution on G n that maximises the entropy subject to the soft constraint C = C * , where denoting the Hamiltonian and the partition function, respectively.In (2.6)-(2.7) the parameter θ * (which is a real-valued vector the size of the constraint playing the role of a Langrange multiplier) must be set to the unique value that realises C = C * .The Lagrange multiplier θ * exists and is unique.Indeed, the gradients of the constraints in (2.5) are linearly independent vectors.Consequently, the Hessian matrix of the entropy of the canonical ensemble in (2.6) is a positive definite matrix, which implies uniqueness of the Lagrange multiplier.
The relative entropy of P mic with respect to P can is defined as then P mic and P can are said to be equivalent.
Before proceeding, we recall an important observation made in Squartini et al. [27].For any , the canonical probability is the same for all graphs with the same value of the constraint.We may therefore rewrite (2.8) as where G * is any graph in G n such that C(G * ) = C * (recall that we assumed that C * is realisable by at least one graph in G n ).This fact greatly simplifies computations.
Remark 2.2.All the quantities above depend on n.In order not to burden the notation, we exhibit this n-dependence only in the symbols G n and S n (P mic | P can ).When we pass to the limit n → ∞, we need to specify how C(G), C * and θ * are chosen to depend on n.This will be done in Section 3.1.

Graphons
There is a natural way to embed a simple graph on n vertices in a space of functions called graphons.Let W be the space of functions h : [0, 1] 2 → [0, 1] such that h(x, y) = h(y, x) for all (x, y) ∈ [0, 1] 2 .A finite simple graph G on n vertices can be represented as a graphon h G ∈ W in a natural way as (see Fig. 1) there is an edge between vertex nx and vertex ny , 0 otherwise.
(2.11) The space of graphons W is endowed with the cut distance On W there is a natural equivalence relation ≡.Let Σ be the space of measure-preserving bijections σ : This equivalence relation yields the quotient space ( W , δ ), where δ is the metric defined by To avoid cumbersome notation, throughout the sequel we suppress the n-dependence.Thus, by G we denote any simple graph on n vertices, by h G its image in the graphon space W , and by hG its image in the quotient space W .Let F and G denote two simple graphs with vertex sets V (F ) and V (G), respectively, and let hom(F, G) be the number of homomorphisms from F to G. The homomorphism density is defined as (2.14) Two graphs are said to be similar when they have similar homomorphism densities. (2.15) (2.17) We conclude this section with three basic facts that will be needed later on.The first gives the relation between left-convergence of sequences of graphs and convergence in the quotient space ( W , δ ), the second is a compactness property, while the third shows that the homomorphism density is Lipschitz continuous with respect to the δ -metric.Proposition 2.4 (Borgs et al. [4]).For a sequence of labelled simple graphs (G n ) n∈N the following properties are equivalent: (ii) ( hGn ) n∈N is a Cauchy sequence in the metric δ .(iii) (t(F, h Gn )) n∈N converges for all finite simple graphs F .(iv) There exists an h ∈ W such that lim n→∞ t(F, h Gn ) = t(F, h) for all finite simple graphs F .Proposition 2.5 (Lovász and Szegedy [20]).( W , δ ) is compact.Proposition 2.6 (Borgs et al. [4]).Let G 1 , G 2 be two labelled simple graphs, and let F be a simple graph.Then For a more detailed description of the structure of the space ( W , δ ) we refer the reader to Borgs et al. [4,5] and Diao et at.[12].

Large deviation principle for the Erdős-Rényi random graph
In this section we recall a few key facts from the literature about rare events in Erdős-Rényi random graphs, formulated in terms of a large deviation principle.Importantly, the scale that is used is n 2 , the order of the number of edges in the graph.
We start by introducing the large deviation rate function.For p ∈ (0, 1) and u ∈ [0, 1], let with the convention that 0 log 0 = 0.For h ∈ W we write, with a mild abuse of notation, On the quotient space ( W , δ ) we define I p ( h) = I p (h), where h is any element of the equivalence class h.
Proposition 2.7 (Chatterjee and Varadhan [11]).The function I p is well-defined on W and is lower semi-continuous under the δ -metric.
Consider the set G n of all graphs on n vertices and the Erdős-Rényi probability distribution P n,p on G n .Through the mappings G → h G → hG we obtain a probability distribution on W (with a slight abuse of notation again denoted by P n,p ), and a probability distribution Pn,p on W .
Theorem 2.8 (Chatterjee and Varadhan [11]).For every p ∈ (0, 1), the sequence of probability distributions ( Pn,p ) n∈N satisfies the large deviation principle on ( W , δ ) with rate function I p defined by (2.20), i.e., (2.21) Using the large deviation principle we can find asymptotic expressions for the number of simple graphs on n vertices with a given property.In what follows a property of a graph is defined through an operator T : W → R m for some m ∈ N. We assume that the operator T is continuous with respect to the δ -metric, and for some T * ∈ R m we consider the sets (2.22) By the continuity of the operator T , the set W * is closed.Therefore, using Theorem 2.8, we obtain the following asymptotics for the cardinality of W * n .
Corollary 2.9 (Chatterjee [6]).For any measurable set W * ⊂ W , with W * n as defined in where int( W * ) is the interior of W * .

Variational characterisation of ensemble equivalence
In this section we present a number of preparatory results we will need in Section 4 to state our theorem on the equivalence between P mic and P can .Our main result is Theorem 3.4 below, which gives us a variational characterisation of ensemble equivalence.In Section 3.1 we introduce our constraints on the subgraph counts.In Section 3.2 we rephrase the canonical ensemble in terms of graphons.In Section 3.3 we state and prove Theorem 3.4.

Subgraph counts
First we introduce the concept of subgraph counts, and point out how the corresponding canonical distribution is defined.Label the simple graphs in any order, e.g., The term p(F k ) counts the edge-preserving permutations of the vertices of F k , i.e., p(F 1 ) = 2 for an edge, p(F 2 ) = 2 for a wedge, p(F 3 ) = 6 for a triangle, etc.The term C k (G)/n V k represents a subgraph density in the graph G.The additional n 2 guarantees that the full vector scales like n 2 , the scaling of the large deviation principle in Theorem 2.8.For a simple graph F k we define the homomorphism density as which does not distinguish between permutations of the vertices.Hence the Hamiltonian becomes where The canonical ensemble with parameter θ thus takes the form where ψ n replaces the partition function: In the sequel we take θ equal to a specific value θ * , so as to meet the soft constraint, i.e., The canonical probability then becomes In Section 5.1 we will discuss how to find θ * .
Remark 3.1.(i) The constraint T * and the Lagrange multiplier θ * in general depend on n, i.e., T * = T * n and θ * = θ * n (recall Remark 2.2).We consider constraints that converge when we pass to the limit n → ∞, i.e., lim Consequently, we expect that lim Throughout the sequel we assume that (3.10) holds.If convergence fails, then we may still consider subsequential convergence.The subtleties concerning (3.10) are discussed in Appendix A.
(ii) In what follows, we suppress the dependence on n and write T * , θ * instead of T * n , θ * n , but we keep the notation T * ∞ , θ * ∞ for the limit.In addition, throughout the sequel we write θ, θ ∞ instead of θ * , θ * ∞ when we view these as parameters that do not depend on n.This distinction is crucial when we take the limit n → ∞.

From graphs to graphons
In (2.16) we saw that if we map a finite simple graph G to its graphon h G , then for each finite simple graph F the homomorphism densities t(F, G) and for some h ∈ W , as an immediate consequence of Theorem 2.4.We further see that the expression in (3.3) can be written in terms of graphons as With this scaling the hard constraint is denoted by T * , has the interpretation of the density of an observable quantity in G, and defines a subspace of the quotient space W , which we denote by W * , and which consists of all graphons that meet the hard constraint, i.e., The soft constraint in the canonical ensemble becomes T = T * (recall (2.5)).

Variational formula for specific relative entropy
In what follows, the limit as n → ∞ of the partition function ψ n ( θ) defined in (3.6) plays an important role.This limit has a variational representation that will be key to our analysis.
For any θ ∈ R m (not depending on n), with I and ψ n as defined in (2.20) and (3.6).
The key result in this section is the following variational formula for s ∞ defined in Definition 2.1.Recall that for n ∈ N we write θ * for θ * n .
where I is defined in (2.19) and Proof.From (2.10) we have where G * is any graph in G n such that T (G * ) = T * .For the microcanonical ensemble we have where . This operator can be extended to an operator (with a slight abuse of notation again denoted by T ) on the quotient space ( W , δ ) by defining T ( h) = T (h) with h ∈ h.Define the following sets From the continuity of the operator T on W , we see that W * is a compact subspace of W , and hence is also closed.From Theorem 2.6 we have that T is a Lipschitz continuous operator on the space ( W , δ ).Since W is a compact space, we have that The large deviation principle applied to (3.18) yields Consider the canonical ensemble and a graph G * n on n vertices such that T (G * n ) = T * .By Definition 2.3, Proposition 2.4, and (3.9) we may suppose that (G * n ) n∈N is left-convergent and converges to the graphon h * .Since T is continuous, we have that There is an additional subtlety in proving (3.24) in our setup because θ * depends on n.This dependence is treated in Appendix A. Combining (3.22) and (3.24), we get By definition all elements h ∈ W * satisfy T ( h) = T * ∞ .Hence the expression in the right-hand side of (3.25) can be written as which settles the claim.

Main theorem
The variational formula for the relative entropy s ∞ in Theorem 3.4 allows us to identify examples where ensemble equivalence holds (s ∞ = 0) or is broken (s ∞ > 0).We already know that if the constraint is on the edge density alone, i.e., T (G) = t(F 1 , G) = T * , then s ∞ = 0 (see Garlaschelli et al. [15]).In what follows we will look at three models: (I) The constraint is on the triangle density, i.e., T 2 (G) = t(F 3 , G) = T * 2 with F 3 the triangle.This will be referred to as the Triangle Model.
(II) The constraint is on the edge density and triangle density, i.e., with F 1 the edge and F 3 the triangle.This will be referred to as the Edge-Triangle Model.
(III) The constraint is on the j-star density, i.e., T (G) = t(T [j], G) = T [j] * with T [j] the j-star graph, consisting of 1 root vertex and j ∈ N \ {1} vertices connected to the root but not connected to each other (see Fig. 2).This will be referred to as the Star Model.
For a graphon h ∈ W (recall (2.15)), the edge density and the triangle density equal ) while the j-star density equals Theorem 4.1.For the above three types of constraint: , ∈ N \ {1}, and T * 2 is such that (T * 1 , T * 2 ) lies on the scallopy curve in Fig. 3, then s ∞ > 0.
(III) For every j Here, ), but in order to keep the notation light we now also suppress the index ∞.
(0, 1  8 ) The admissible edge-triangle density region is the region on and between the blue curves (cf.Radin and Sadun [24]).
Theorem 4.1, which states our main results on ensemble equivalence and which is proven in Sections 5-6, is illustrated in Fig. 3.The region on and between the blue curves corresponds to the set of all realisable graphs: if the pair (e, t) lies in this region, then there exists a graph with edge density e and triangle density t.The red curves represent ensemble equivalence, the blue curves and the grey region represent breaking of ensemble equivalence, while in the white region between the red curve and the lower blue curve we do not know what happens.Breaking of ensemble equivalence arises from frustration between the edge and the triangle density.
Each of the cases in Theorem 4.1 corresponds to typical behaviour of graphs drawn from the two ensembles: • In cases (I)(a) and (II)(a), graphs drawn from both ensembles are asymptotically like Erdős-Rényi random graphs with parameter p = T * 1/3 2 .
• In cases (I)(b) and (II)(e), almost all graphs drawn from both ensembles are asymptotically like bipartite graphs.
• In cases (II)(b), (II)(c) and (II)(d), we do not know what graphs drawn from the canonical ensemble look like.Graphs drawn from the microcanonical ensemble do not look like Erdős-Rényi random graphs.The structure of graphs drawn from the microcanonical ensemble when the constraint is as in (II)(d) has been determined in Pirkhurko and Razborov [26] and Radin and Sadun [24].The vertex set of a graph drawn from the microcanonical ensemble can be partitioned into subsets: the first − 1 have size cn and the last has size between cn and 2 cn , where c is a known constant depending on .The graph has the form of a complete −partite graph on these pieces, plus some additional edges in the last piece that create no additional triangles.
• In case (III), graphs drawn from both ensembles are asymptotically like Erdős-Rényi random graphs with parameter p = T [j] * 1/j .Remark 4.2.Similar results hold for the Edge-Wedge-Triangle Model and the Edge-Star Model.
• Is s ∞ = 0 as soon as the constraint involves a single subgraph count only?
• What happens for subgraphs other than edges, wedges, triangles and stars?Is again s ∞ > 0 under appropriate frustration?

Choice of the tuning parameter
The tuning parameter is to be chosen so as to satisfy the soft constraint (3.7), a procedure that in equilibrium statistical physics is referred to as the averaging principle.Depending on the choice of constraint, finding θ * may not be easy, neither analytically nor numerically.In Section 5.1 we investigate how θ * behaves as we vary T * for fixed n.We focus on the Edge-Triangle Model (a slight adjustment yields the same results for the Triangle Model).In Section 5.2 we investigate how averages under the canonical ensemble, like (3.7), behave when n → ∞.Here we can treat general constraints defined in (3.4).
For the behaviour of our constrained models, the sign of the coordinates of the tuning parameter θ * is of pivotal importance, both for a fixed n ∈ N and asymptotically (see Bhamidi et al. [3], Chatterjee and Diaconis [9], Radin and Yin [25], and references therein).We must therefore carefully keep track of this sign.The key results in this direction are Lemmas 5.1 and 5.2 below.Proof.Define, for θ 1 , θ 2 ∈ R, the function

Tuning parameter for fixed
We first prove that g attains a unique global minimum at (θ 1 , θ 2 ) = (0, 0).Consider the canonical ensemble P can as defined in (3.5) and (3.8), with T as defined above, and the probability distribution P hom on G n that assigns probability 2 −( n 2 ) to every graph G ∈ G n .Since P hom is absolutely continuous with respect to P can , the relative entropy S n (P hom |P can ) is well defined: Using the form of the canonical ensemble we get, after some straightforward calculations, that, for all θ 1 , θ 2 ∈ R, where the term in the right-hand side comes from the relation Observe that the left-hand side represents the average edge and triangle density, multiplied with θ 1 , θ 2 , in an Erdős-Rényi random graph with parameters (n, 1 2 ).From (5.3) we find that g(θ 1 , θ 2 ) ≥ 2 ( n 2 ) = g(0, 0) for all θ 1 , θ 2 ∈ R, and so g attains a global minimum at (0, 0).In what follows we show that this global minimum is unique.A straightforward computation shows that 2 and T 2 = 1 8 .Furthermore, the Hessian matrix is a covariance matrix and hence is positive semi-definite.For θ = (θ 1 , θ 2 ) = (0, 0) we know that T 1 = 1 2 and T 2 = 1 8 .Hence, by uniqueness of the multiplier θ * for the constraint , we obtain that g has a unique global minimum at (0, 0).Moreover, this shows that g has no other stationary points.Consider the parameter (θ 1 , θ 2 ) = (θ * 1 , θ * 2 ).We have (5.5) Because g has a unique stationary point at (0, 0), which is a global minimum, we get θ * 2 ≥ 0. Similarly, we can show that if (5.6) Arguing in a similar way as before, we conclude that θ * 1 ≥ 0 if and only if Consider the Edge-Triangle Model and suppose that the constraint Then θ * 2 = 0 and θ * 1 matches the constraint on the edge density only.The following lemma shows that in this case the canonical ensemble behaves like the Erdős-Rényi model with parameter T * 1 , a fact that will be needed later to prove equivalence.
Lemma 5.3.Consider the Edge-Triangle Model with the constraint given by the edge-triangle densities Consider the canonical ensemble as defined in (3.8).Then, for every n ∈ N, Proof.From the definition of the canonical ensemble we have that, for G ∈ G n , where ψ n ( θ * ) is the partition function defined in (3.6).For the specific value θ = θ * we have that (recall (3.7)) (5.9) We claim that the correct parameter is θ * = ( 1 2 log ).The average fraction of edges is T * 1 (see Park and Newman [22]).The average number of triangles is where the last equation comes from the fact we are calculating the average number of triangles in an Erdős-Rényi model with probability T * 1 .Since the multiplier θ * is unique, the proof is complete.

Tuning parameter for n → ∞
In Lemma 5.4 below we show how averages under the canonical ensemble behave asymptotically when θ does not depend on n.In Lemma A.2 we will look at what happens when θ is a onedimensional multiplier and depends on n.Lemma 5.4.Suppose that the operator T : W → R m is bounded and continuous with respect to the δ -norm as defined in (2.13).For θ ∈ R m independent of n, consider the variational problem sup where I is defined in (2.19).Suppose that the supremum is attained at a unique point, denoted by h * ( θ).Then Proof.The average of T k under the canonical probability distribution is equal to Pick δ > 0 and consider the δ-ball B δ ( h * ) around the maximiser h * in the quotient space We denote by G δ a graph on n vertices whose graphon is a representative element of the class hG .With a slight abuse of notation, we denote by G δ both the graph and the corresponding graphon, and by hG the corresponding equivalence class in the quotient space ( W , δ ).Since ( W , δ ) is compact space (recall Proposition 2.5), and the graphons associated with finite graphs form a countable family that is dense in ( W , δ ) (see Diao et al. [12], Lovász and Szegedy [20]), there exists a sequence ( hGn ) n∈N such that lim n→∞ δ ( hGn , h * ) = 0.For n large enough the neighbourhood B δ ( h * ) contains elements of the sequence ( hGn ) n∈N and, due to the Lipschitz property (recall Proposition 2.6), δ ( hGn Upper bound for J n .We decompose the sum over G ∈ G n into two parts: the first over G whose graphon lies in B δ ( h * ), the second over G whose graphon lies in B δ ( h * ) c =: W δ,# .We further denote by the set of all graphs whose subgraph densities T k (G) are δ-close to T k ( h * ).A graph from this set is denoted by G δ .We define the set and, for k = 1, . . ., m, obtain the following upper bound: . (5.16) Next, we further bound the second term in (5.16).By definition, for every n ∈ N the range of the operator T is a finite set For the set R n we observe that |R n | = o(n m 2 ).In addition, introduce the sets ( The operator T is bounded, and so there exists an M > 0 such that T (G) ≤ M for all G ∈ G n .Hence, the second term in (5.16) can be bounded from above by . ( By the large deviation principle in Theorem 2.8, we have where W g = { h ∈ W : T ( h) = g}.As a consequence, (5.19) is majorised by (5.21) The last equation can be justified as follows.Define the sets Since the graphons associated with finite graphs form a countable set that is dense in ( W , δ ), we have that where cl denotes closure.Using (5.23), and recalling that T is continuous and I is lowersemicontinuous, we get and a similar result can be established for the second supremum in the exponent in (5.21).The exponent in (5.21) is negative for all δ > 0 and is independent of n.Moreover, by the left-continuity of the graph sequence (G δ n ) n∈N , we have that lim n→∞ T k (G δ n ) = T k ( h * ) for every k = 1, . . ., m and every δ > 0. Combined with the inequality in (5.16), we obtain, for k = 1, . . ., m, lim (5.25) Lower bound for J n .We distinguish two cases: T k ( h * ) = 0 and T k ( h * ) > 0. For the first case we trivially get the lower bound (5.26) For the second case we show the equivalent upper bound for the inverse, i.e., Using the fact that T k ( h * ) = 0 is bounded, and using a similar reasoning as for the upper bound on J n , the latter is easily verified.Remark 5.6.The analogue of Lemma 5.4 when the supremum in (5.10) has multiple maximisers in W is considerably more involved.
As observed in Remark 2.2, in general the tuning parameter θ * depends on n.We discuss this dependence in Appendix A.
As to the second term in the right-hand side of ( 6 for the collection of all graphs with triangle density equal to zero.From (2.6) we obtain that Hence P can (G) = P mic (G) when the constraint is given by T * = 0, which yields and hence s ∞ = 0.For the case we have shown in Lemma 5.3 that the canonical ensemble essentially behaves like an Erdős-Rényi model with parameter p = T * 1 .Furthermore, the microcanonical ensemble also has an explicit expression, which is found by using the following lemma.
Using the convexity of I on W and Jensen's inequality, we get Hence I( h) ≥ I(T * 1 2 ) for every h ∈ W * , which proves the claim.
Proof of (II)(a).Consider the relative entropy s ∞ as defined in (2.9) and (2.10).Using Lemma 5.3, we obtain the expression From Lemma 6.1 we have that inf An argument similar as above yields where for θ * 1 ≥ 0 and θ * 2 ≥ 0 the last supremum has a unique solution (see Radin and Yin [25, Proposition 3.2]), while for θ * 1 < 0 and θ * 2 ≥ 0 it either has a unique solution or two solutions.We treat these two cases separately.
Unique solution.Because of the uniqueness of the solution, not all realisable hard constraints can be met in the limit (see Lemma 5.4).We observe that, if T * 2 ≥ 1 8 and T * 2 = T * 3 1 , in the limit as n → ∞ the canonical ensemble becomes Erdős-Rényi with parameter p.This regime is known as the high-temperature regime (see Bhamidi et al. [3] and Chatterjee and Diaconis [9]).In what follows we determine the parameter p of the canonical ensemble in the limit.From Bhamidi et al. [3,Theorem 7] we have that p = u * ( θ * ) 1 3 with u * ( θ * ) 1 3 the unique maximiser of (6.10).The expression in (6.10) thus takes the form (6.11) Consider the second term in the right-hand side of (3.16).From the definition of W * it is straightforward to see that where Two solutions.The regime in which the right-hand side of (6.10) has two solutions is known as the low-temperature regime.In this case the hard constraints , lie on a curve on the (T 1 , T 2 )-plane in such a way such that the tuning parameters (θ * 1 , θ * 2 ) lie on the phase transition curve found in Chatterjee and Diaconis [9] and Radin and Yin [25].Denote the two solutions of (6.10) by u * 1 , u * 2 .Because of the constraint we are considering, we have that neither of them lies in W * .From the compactness of the latter space we see that s ∞ > 0.

Proof of (II)(c) (T
For the case 0 < T * 1 ≤ 1 2 , T * 2 < 1 8 we know from Lemma 5.2 that θ * 1 ≤ 0 and θ * 2 < 0 for every n.Hence, because of (3.10), we have that θ * 1 ≤ 0 and θ * 2 < 0. This regime is significantly harder to analyse than the previous regimes.Consider the relative entropy s ∞ and the variational representation given in (3.16).We consider two cases: In this case we have the straightforward inequality Using the convexity of I on W and Jensen's inequality, we obtain that I( h) ≥ I(T * 1 ) for all h ∈ W * .Hence which settles (6.14).Hence s ∞ > 0.
6.6 Proof of (II)(d) ((T * 1 , T * 2 ) on the scallopy curve) We show that if (T * 1 , T * 2 ) lies on the lower blue curve in Fig. 3 (referred to as the scallopy curve), then s ∞ > 0. The case where T * 2 ≥ 1 8 can be dealt with directly via Theorem (II)(b).The proof below deals with the case T * 2 < 1 8 .
Proof.We give the proof for = 2, the extension to > 2 being similar.Suppose that T * 1 = 1 2 + with ∈ (0, 1  6 ), and that T * 2 is chosen as small as possible.It is known that graphs with a relatively high edge density and with a triangle density that is as small as possible have a d-partite structure with edges added in a suitable way so that the desired triangle density is obtained (see Radin and Sadun [24] and Pikhurko and Raborov [26]).Consider a graph on n vertices, denoted by G, with edge density T 1 ∈ ( 1 2 , 2 3 ) and triangle density as small as possible.The structure of such graphs has been described above before Section 5.The graphon counterpart of such graphs is the optimiser of the second supremum in the right-hand side of the variational formula for s ∞ .Using Radin and Sadun [24, Theorem 4.2], we obtain ).This is done by slightly modifying the proof of Chatterjee and Diaconis [9,Theorem 6.4].Indeed, observe that dy h(x, y).(6.28) Since I is convex we have dx I(M (x)), h ∈ W, (6.29) with equality if and only if h(x, y) is the same for almost all y.Since h is a symmetric function, we get that equality holds if and only if h is constant.For the constant function h ≡ (T j ) 1/j ∈ W * := {h ∈ W : T j (h) = T j }, (6.29) is an equality.Hence, for any minimiser of I on W * the inequality must be an equality, and thus any minimiser must be constant.This shows that s ∞ = 0.

A Appendix
In this appendix we elaborate on the assumption made in (3.10), i.e., the multiplier θ * n converges to a limit θ * ∞ as n → ∞.In order to get a meaningful limit, we consider constraints T * n such that lim n→∞ It is straightforward to deduce from Corollary 2.9 and (3.3)-(3.7)that if { T * n } is bounded away from 0 and 1 component-wise, then ( θ * n ) n∈N is bounded away from −∞ and +∞ componentwise.Such a sequence contains a converging subsequence, say, ( θ * n k ) k∈N , which in general need not be unique.Thus, as long as the constraint is component-wise bounded away from 0 and 1, the asymptotic expressions derived in this paper exist, but their values may depend on the subsequence we choose.The value of s ∞ depends on the chosen subsequence, but whether it is positive or zero (i.e., whether there is equivalence) does not.A deeper investigation of the behaviour of { θ * n } n∈N is interesting, but is beyond the scope of this paper.We first extend Theorem 3.4 for the case when the tuning parameter θ * depends on n.

1 hFigure 1 :
Figure 1: An example of a graph G and its graphon representation h G .

Remark 3 . 5 .
Theorem 3.4  and the compactness of W * give us a variational characterisation of ensemble equivalence: s ∞ = 0 if and only if at least one of the maximisers of θ * ∞ • T ( h) − I( h) in W also lies in W * ⊂ W . Equivalently, s ∞ = 0 when at least one the maximisers of θ * ∞ • T ( h) − I( h) satisfies the hard constraint.

Remark 5 . 5 .
The convergence in(5.11)  is not necessarily uniform in θ.Our results in Theorem (4.1) (II)(b)-(II)(d) indicate that breaking of ensemble equivalence manifests itself through non uniform convergence in(5.11).In Lemma (A.2) we show that uniform convergence holds when the constraint is on the triangle density only, which explains our result in Theorem (4.1) (I).