Transitions in random graphs of fixed degrees with many short cycles

We analyze maximum entropy random graph ensembles with constrained degrees, drawn from arbitrary degree distributions, and a tuneable number of three-cycles (triangles). We find that such ensembles generally exhibit two transitions, a clustering and a shattering transition, separating three distinct regimes. At the clustering transition, the graphs change from typically having only isolated cycles to forming cycle clusters. At the shattering transition the graphs break up into many small cliques to achieve the desired three-cycle density. The locations of both transitions depend nontrivially on the system size. We derive a general formula for the three-cycle density in the regime of isolated cycles, for graphs with degree distributions that have finite first and second moments. For bounded degree distributions we present further analytical results on cycle densities and phase transition locations, which, while non-rigorous, are all validated via MCMC sampling simulations. We show that the shattering transition is of an entropic nature, occurring for all three-cycle density values, provided the system is large enough.


Introduction
Graph theory was introduced by Euler to solve the problem of the seven bridges of Königsberg [1]. He noted that upon stripping all unnecessary details, to solve this problem, one was left only with a set of 4 nodes and 7 links between them. Since then, networks and graphs have proven to be fundamental in the modelling of many real world phenomena. While with the advent of powerful computers accessible to almost all researchers it is now typical for network scientists to work on a daily basis with networks of nodes ranging from thousands to millions, still the modelling strategy is the same: remove unnecessary details and reduce the problem to nodes and links.
For scientists, and especially those with a statistical training-used to thinking in terms of null models in hypothesis testing [2]-it is natural to ask a very simple question regarding observed networks: which are typical and which are atypical topological features? To answer this question one commonly works with random graph ensembles, designed to mimic real-world networks; see e.g. [3][4][5][6]. In addition to studying properties of graphs, one usually also seeks to understand processes for which these graphs define the interaction infrastructure, and the relation between graph topology and process efficacy. Here one would benefit from exact analytical solutions for processes defined on nontrivial graph ensembles. Unfortunately, this is hard. While there has been an explosion of exact solutions for processes on random graphs, the vast majority of these are locally tree-like graphs. The property of being locally tree-like allows one to write recursive equations, that become exact for large graphs and show very good agreement with simulations on finite ones. Ironically, this property that makes the models solvable is the same property that makes them unrealistic.
In addition to the lack of solvable models on graphs with many short cycles, there is also the fact that there is no easily controllable random graph ensemble that generates graphs with given numbers of links and short cycles in the sparse regime. The natural extension of the Erdös-Rényi ensemble [7] was presented by Strauss in [8], which consisted in simply including an additional bias for the number of triangles in the model. Surprisingly, a very sharp transition was observed in the ensemble. When tuning the control parameters, the ensemble very quickly switched from typically sampling sparse graphs to assigning high probability to very dense graphs as well, losing any resemblance to real networks. There is a long history of attempts at understanding this transition [9][10][11][12][13][14], and many alternative random graph ensembles and algorithms have since then been presented [15][16][17][18][19][20][21][22] that possess a high number of short cycles, yet none generates easily controllable graphs. It appears very hard to access a regime where there is high number of triangles while keeping a 'nice' topology. More recent models conserve the degree sequence to avoid the condensation observed by Strauss, but still show a transition into a clustered regime [21][22][23][24][25][26][27]. The logical way of stopping the appearance of a clustered regime is to restrict the number of triangles in each node via a hard constraint, as proposed in [17,18]. While there have been numerical and theoretical advances with this model [28][29][30][31][32][33], it remains difficult to keep the target degree distribution and the target total number of triangles under control [34].
In this paper we study a random graph ensemble with a tuneable number of short cycles, achieved with a soft global constraint on the number of triangles in combination with a hard constraint on the degrees, each drawn from a fixed degree distribution. This guarantees that our graphs will both be sparse and with a large number of short cycles, which are desired properties to mimic real networks. This model was previously studied in [19]. However, in that previous study the chosen MCMC move acceptance probabilities did not ensure convergence to the target distribution. This was pointed out in [5,35], where it was shown that in edge swap graph dynamics nontrivial acceptance probabilities are needed (see also appendix A). We show in this paper that with the correct MCMC sampling the model again displays a transition into a clustered phase, and that the overall phenomenology presented in [19] coincides with our results. We then proceed to develop an extended theoretical and quantitative understanding of the behaviour of the model, including an analytic characterization of the low triangle density phase, expressions for the locations of the two (clustering and shattering) transitions, and scalar measures to probe the interactions between cycles and their relevance for the phases of the ensemble. Our results and predictions are supported via nontrivial graph sampling simulations involving different degree distributions, using the exact move acceptance probabilities of [5,35].

The model
We study a random graph ensemble defined on the set of N-node graphs with a given degree distribution. A graph is an ordered pair (V, E) of nodes and edges, respectively. We model graphs through their adjacency matrices A, defined by the entries A ij = 1 if (i, j) ∈ E, and 0 otherwise. We will only be concerned with simple undirected graphs, which in terms of the adjacency matrix implies that A ij = A ji and A ii = 0 for all (i, j). The degree of a node is the number of edges connected to it, k i (A) = j A ij . Throughout this paper we will work with graphs that have a prescribed degree sequence {k i } i=1,...,N . Each element of this sequence is drawn randomly and independently from a given distribution p(k).
We expect that in the large N limit the empirical distribution of degrees will converge to the target distribution, In general this property might not be true if the tails of the distribution are fat enough, but since we will focus our discussion on degree distributions with finite first and second moments this will not be a problem, as discussed in [36].
Our main interest will be to tune the number of triangles in the graphs. It will be more convenient to work with three-cycles than with triangles, for reasons that will become evident in further sections. The number of three-cycles in a graph corresponds to 6 times the number of triangles, as the former also account for starting point and directionality. There is a three-cycle around node i if there exist j and k such that (i, j) ∈ E, (j, k) ∈ E, and (k, i) ∈ E. Since our graph is simple, the indices i, j, k are all different. In the language of the adjacency matrix, the indicator function for a given three-cycle takes the simple form The total number of three-cycles is then simply the trace of the third power of the adjacency matrix, We now define an ensemble of random graphs such that the average number of triangles can be controlled, using a parametrized distribution over graphs denoted by p(A). Our choice is a maximum entropy (ME) ensemble. That is, we take p(A) to be such that the average number of three-cycles is fixed, and that the degree sequence k = {k i } i=1,...,N is achieved exactly. Among those distributions p(A) that share these two properties, we choose the one that maximizes the Shannon entropy S[p] = − A p(A)log p(A). This will guarantee that the distribution is statistically unbiased [37,38]. The ME distribution is of an exponential form, with one tuneable parameter α, The product over Kronecker deltas enforces the degree sequence of the graph. For α = 0 the ensemble reduces to the configuration model (CM) [39], a uniform distribution over all graphs with degree sequence k.
Our main observable of the ensemble will be the number of three-cycles per node. We will refer to it as the three-cycle density, Where . This quantity reflects the typical number of cycles in the neighborhood of a node. Each node can have a different maximum number of triangles, depending on its degree. Once a random graph ensemble like (4) is defined it is desirable to have both an algorithm to generate graph samples numerically and an analytic theory of its statistical properties. In order to generate samples form an ensemble such as (4) we use a Markov chain Monte Carlo (MCMC) approach. The algorithm starts with a seed graph satisfying the degree sequence, and evolves it by performing degree preserving edge swaps as shown in figure A1. Edge swaps are either accepted or rejected, with a nontrivial acceptance probability that not only takes into account the specific ensemble (4) but also the availability of possible edge swaps as the graph evolves. The theory of this MCMC algorithm was developed in [35] and presented more extensively in [5]. It is also summarized briefly in appendix A.
To find an analytic expression for the three-cycle density we need to calculate the generating function φ(α): Ideally, knowledge of the functions φ(α) and m(α) would allow us to generate random graphs with any desired three-cycle density. Although it is not possible to calculate φ(α) analytically, in section 3 we will show that a small α approximation will give very good results for a wide range of values. Additionally we will give a description of the general behaviour of this ensemble for the whole range of α values. Another important observable of the ensemble reports on the amount of interaction between the triangles in the graph, i.e. the number of edges and nodes that different triangles share. This varies in a nontrivial way with different values of α and different system sizes N. To measure the degree of interaction between three-cycles, we define where Θ(x) = 1 if x > 0 and zero otherwise. This ratio of nodes in three-cycles to three-cycles is independent of the total number of three-cycles in the graph. If r(A) = 1/2, the three-cycles are all non-interacting in the sense that they do not share any nodes. If r(A) < 1/2, then three-cycles are sharing nodes. Motivated by this intuition, we define r(A) = 1/2 in the case of Tr(A 3 ) = 0, since no three-cycles implies there are no threecycles interacting. Some simple examples are shown on the top row of figure 5. In the particular case where graphs form cliques of q + 1 nodes, we would have r(A) = 1/(q 2 − q); this is a natural lower bound for graphs of maximum degree q.

Main results
We will now outline the main results of our analysis of the ensemble (4). We found the same initial behaviour for all degree distributions as the triangle-inducing control parameter α is increased form α = 0. This behaviour depends only on the first two moments of the degree distribution, c = k and k 2 (where f (k) = N −1 N i=1 f (k i )), and on the maximum degree q = max i=1,...,N {k i } (for bounded degree distribution). We will only consider the case where N is sufficiently large, Np(q) q + 1, so that all degrees are typically represented in the graph with an extensive number of nodes.
In figure 1 we show the results of numerical sampling of graphs from (4), using an appropriate MCMC process (simulation details are given in sections 3 and 4). The triangle density m(α) increases with α, as expected. We observe distinct regimes of α-values, as had already been observed for regular graphs in [27]. Interestingly, to understand properly the nature of the different regimes of the ensemble it is necessary to also look at two other graph observables: the level of interaction between cycles, measured with r(A) as defined in (8), and the number n(A) of connected components of the graph. We define their respective ensemble averages as r(α) = r(A) and n(α) = n(A) . The observed regimes (see figure 1) are the following: The three-cycle density m(α) grows exponentially with α, following This formula is one of the main results of this letter, it is derived in 3. It corresponds to the straight lines observed before α 1 in figure 1. Only the proportionality constant and the transition point α 1 (N) depend on the degree distribution. This formula allows for an explicit calculation of α, given a desired three-cycle density, simply by inversion. For α = 0 it reproduces the rigorous result for the three-cycle density for large graphs in [39]. The degree of interaction between cycles is as low as r(A) ≈ 1/2 for large graphs. The number n(α) of components of the graph is the same as in the α = 0 case. It is relatively easy to obtain samples in this regime with the MCMC edge swap dynamics.
Here the triangle density m(α) grows faster than (9). Depending on the chosen degree distribution, this growth may exhibit sudden jumps or may be more smooth. The main difference with the previous regime is that cycles start sharing edges. This follows from the observed drop of r(α), clearly observed in all cases in figure 1. Nodes start to form clusters of similar degree. We call this the clustered regime, and α 1 (N) the clustering transition point.  There is a drastic topological change associated with a second transition at α 2 (N): the graph breaks down into small disconnected cliques. Cliques of k + 1 nodes maximize the number of cycles around a node of degree k, see figures 2 and 3. Cliques associated with the maximum degree, with q + 1 nodes, will appear first, followed by those of the second largest degree, and so on. If, due to finite size effects, there are insufficient nodes to generate cliques, the graphs break down into small incomplete cliques. We call the transition at α 2 (N) the shattering transition, and this phase α > α 2 (N) the disconnected or shattered phase. The rest of the nodes, those unable due to degree constraints to form cliques, will continue to be connected and follow qualitatively similar regimes, but now for a new degree distribution that excludes the separated nodes. In the previous list we have presented a very general picture regarding the phenomenology of ensemble (4). In particular panel (b) in figure 1 follows closely this qualitative picture. We expect the behaviour in the connected phase, α < α 1 (N), to be general. For the behaviour between α 1 (N) and α 2 (N) we observe a dependence on the the degree distribution, but still a strong similarity in the sharp drop in the value of r(α). The details for the behaviour for α > α 2 (N) have a much stronger dependence on the degree sequence and on the system size, an extensive exploration of this regime is beyond the scope of this paper since the MCMC simulations would require an enormous amount of computation time and power. Before developing the theory in sections 3 and 4, we would like to present the following list of subtleties necessary to understand the further theoretical arguments.
The transitions at α = α 1,2 (N) are not phase transitions in the conventional sense-they depend on N, which is taken to be large but still finite-so they are not marked by non-analyticities in the thermodynamic limit. The definitions given for α 1 (N) and α 2 (N) are instead of a descriptive nature, marking the α-values where r(A) drops for the first time and where n(A) increases for the first time, respectively. As will become apparent in the next section, the system size N affects severely the ensemble. Additionally, hysteresis has been observed in [19] for this ensemble. This implies that certainly the observed values of α 1,2 (N) would be shifted to the left from what is observed in figure 1 if the MCMC simulations were to be performed starting with seed graphs with high three-cycle density. In this letter we will only focus on writing an approximate theory of what happens when going left to right, that is starting at α = 0 and increasing its value gradually. The analytic predictions obtained will be validated by the MCMC sampling performed with this prescription.
We make a distinction between bounded and unbounded degree distributions p(k), since boundedness affects the way in which the ensemble behaves with increasing N, e.g. in the asymptotics of α 1 (N) and α 2 (N). For large graphs with bounded degree distributions both transitions are close, α 1 (N) ≈ α 2 (N). There is a sudden appearance of disconnected cliques of q + 1, nodes giving rise to a sharp jump in m(α). Strong numerical evidence and mathematical arguments support the proposition that α 1 (N) and α 2 (N) both scale as O log N . For graphs with unbounded p(k), the maximum degree present in the graph will diverge slowly with N. Hence there are not many nodes of large degree to create cliques, and the structures created when the graphs shatter are less clear. The asymptotics of α 1 (N) and α 2 (N) should depend heavily on the tail of p(k), as this tail governs the growth of the maximum degree with N. Nevertheless, in both cases the ensemble will end in a set of disconnected cliques, as this is the graph that maximizes the number of cycles around each node. The difference between bounded and unbounded p(k) can be seen clearly when comparing figures 3 and 2. For the graph with a bimodal degree distribution in figure 3 the cliques appear immediately as the graph clusters, while for the one with an exponential distribution in figure 2 one can see clusters appearing before the breaking down of the graph.
Expressions like (6) are hard to evaluate analytically, especially for a finite N. The typical approach of statistical mechanics would be to derive exact results in the limit N → ∞, and then to show they are a good approximation for finite N. In contrast, here it is important not to take the limit N → ∞, but rather to work with asymptotically vanishing expressions for the three-cycle density, m(α) = O N −δ . A clear example is that of the connected non interacting regime; here equation (9) shows correctly that lim N→∞ m(α) = 0, but it is the way in which m(α) approaches 0 that gives us formula (9), which is seen to be very accurate. One would normally rescale α with N to avoid this effect, but it will become clear that in that case m(α 1 (N)) → 0 for any proper scaling of α with N, meaning that the description of the first regime would vanish, which is not something we want.
Regarding the sampling, we note that convergence from a given seed graph towards equilibration requires increasing numbers of edge swaps as α is increased. Only for values in the connected regime α ∈ [0, α 1 (N)) will equilibration be fast enough to sample graphs in a reasonable amount of time on a personal computer. Close to the transitions there is a significant divergence of relaxation times. We conjecture that the main reason for this change is precisely the clustering of triangles: in order to break a clique one has to destroy many triangles, an event that becomes extremely unlikely during the dynamics for large graphs. Therefore we expect there to be an effective breaking of ergodicity when sampling with MCMC for α α 2 (N) and large N. Therefore, even though we performed extensive numeric simulations, we have no guarantee that the MCMC was properly equilibrated close to the transitions. Due to hysteresis, we would require repetition of the MCMC sampling starting from different seeds with very different initial three-cycle densities. Rather than pursuing this path, we will focus only on painting a general picture of what happens when increasing α starting from α = 0.
For the above reasons, from the point of view of applied network science, working with ensemble (4) has to be done carefully. Given a seed network, it is possible to randomize via edge swaps while retaining the value of the three-cycle density, but there will be two problems. First, it could be that it takes a long time to sample correctly. Second, it could be that samples generated with the same three-cycle density have completely different topologies, according to their values of r(A). The first problem is a matter of computing power and speed. The second problem is more tricky, and essentially unsolvable without modifying (4). If the graph one wants to randomize has a value of r(A) that deviates significantly from r(A) , then all samples will be typically very different in structure, even though they share the same three-cycle density.

The connected regime
We will now present an effective approximation for the generating function (6). It is analogous to the one presented in [27], but generalized for an arbitrary degree distribution p(k) with finite first and second moments. We use a small α (or large N) approximation to derive (9), using a known result about the distribution of triangles in the CM [39]. It is found to give very good results, suggesting it could be exact asymptotically, at least for bounded degree distributions. If we denote by T(A) the number of triangles in A, we have (due to overcounting): We can therefore calculate the generating function (6) as follows: Where we have introduced, Note that we have introduced P N (T) by multiplying and dividing by the total number of graphs with a given degree sequence k, denoted by N k . Our approximation now consists in replacing P N (T) by the known asymptotic distribution of isolated triangles, that is triangles that do not share edges or nodes. The latter was computed rigorously in [39]: This then leads us to the the following approximation for (11) and m(α): This formula has a simple interpretation. At α = 0 it correctly predicts the expected number of triangles in a CM, where one pictures these triangles to be very far away from each other. When α > 0 this number of triangles is multiplied by e 6α , giving another finite but larger amount of triangles when N → ∞. In this scenario we would view these triangles to be simply further and further apart as the system size grows. This picture will be revisited in the next section.  The solid lines correspond to the corresponding theoretical prediction (18). Markers are joined with dotted lines and the system size of certain curves is indicated for better readability.
We have tested the above approximation extensively with numerical simulations. We generated samples from (4) for many different degree distributions, shown in table 1. The results are shown in figure 4, where we have plotted the results for systems of multiple sizes N ∼ 100-4000. In order to have a better visualization, we plotted the cycle densities against a rescaled parameterα, defined via α =α + 1 6 log N, For the MCMC we used a different initial graph for each size and each degree distribution, p(k). To generate each initial graph we used a sample from the CM for a given degree sequence sampled from the corresponding p(k). In this regime, α ∈ [0, α 1 (N)], we used waiting times of 2 × 10 4 attempted edge swaps per link (AESPL), and subsequently recorded 20 samples spaced by 2 × 10 3 AESPL. To show the accuracy of the theory with a modest number of samples, we plot the average of the three-cycle density over the full time series of cycle densities between samples. We do this to reduce noise, and because our theory refers to the average (7), not to graph instances, since there is no self averaging at finite sizes. For graphs larger than 500 nodes, error bars are of the order of magnitude of the markers. For smaller graphs the error bars can be appreciated on the right panel of figure 6. In the remaining three-cycle density plots the error bars were omitted, in order to avoid cluttering of figures. Note that the scaling in (18) collapses all curves of the same degree distribution, up to a certain valueα 1 (N). As we will show in the next section, the three-cycle density at the transition vanishes as N → ∞, m(α 1 (N)) → 0. This can be clearly seen in figure 4.
The accuracy of (9) suggests that it could be the exact asymptotic result when N → ∞. This would imply that a bias of the form (4) with α = O (1) only modifies the number of expected triangles in large graphs by an O (1) amount, implying that the three-cycle density will still vanish asymptotically. To achieve a nonvanishing three-cycle density in the asymptotic limit, a different scaling of α should be introduced, as was done in [22] for two-regular graphs, i.e. for p(k) = δ k,2 . However, as will be discussed in the next section, for general degree distributions the effect of scaling α with N is much more complicated than in the two-regular case.

General results
We next investigate the behaviour of the ensemble beyond the clustering transitions, i.e. for α > α 1 (N), where (9) no longer reproduces the correct three-cycle density. For the two-regular case, the only three-cycle that can exist inside a graph is an isolated cycle, therefore it is possible for (18) to be exact asymptotically. For other degree distributions, many other cycle structures can appear in a graph. As we will show, it seems that structures with strongly interacting triangles dominate entropically. Therefore the statistics of different local structures needs to be taken into account, making (9) insufficient to describe the ensemble for all values of α.
In the regime α < α 1 (N), the desired three-cycle density is achieved by creating further triangles that are independent and far from each other, without sharing nodes. For α > α 1 (N), in contrast, the desired threecycle density is achieved by creating triangles that share as many edges as possible. This qualitative change appears to be purely entropic, since the latter regime appears for all cycle densities as long as the system is large enough, that is even for very small values of m. Put differently, the transition at α 1 (N) does not happen because there are too many triangles which need to share nodes due to of lack of space in the graph, as one might guess initially. The transition happens because for a given three-cycle density the number of graphs one can create by 'putting triangles aside' in small clusters is larger than the number of graphs one can create by embedding them in the graph in a non-interacting way. While we cannot prove this assertion rigorously, extensive numerical experiments support this claim.
We measured the interaction between cycles in samples of (4) using the observable r(A) defined in (8). The empirical value r(α) = r(A) was measured in all the numerical experiments listed in table 1. For values α > α 1 (N) we increased the number of AESPL by a factor ten, giving waiting times of 2 × 10 5 AESPL and inter-sample intervals of 2 × 10 4 AESPL. For each degree distribution and each size 20 samples were taken. In all experiments we observed the same behaviour as shown for the two cases in figure 5. An initial phase of r(A) ≈ 1/2, indicating non-interacting cycles, is followed by a sudden drop to r(A) = r min (N) < 1, indicating interacting cycles. At the value of α marking this sudden drop, which we defined to be α 1 (N), the graph has become clustered in order to achieve the desired three-cycle density. This α value coincides precisely with the point where formula (9) stops working, as can be seen in figure 1. When increasing the system size N, it is clear that the initial parts of the curves tend to flatten to plateaux at the level r = 1/2. This is consistent with the fact that equation (9), which accurately describes the three-cycle density in this regime, was derived assuming an underlying Poissonian distribution of triangles; the latter assumes, in turn, that the triangles are non-interacting [39].
The remaining question is how the two values α 1 (N) and r min (N) depend on N. For r min (N) the following possibilities must be considered: We made the distinction between bounded and unbounded distributions precisely because we believe that the behaviour of the ensemble for these distribution families might not be the same. As can already be seen in the bound r(A) 1/(q 2 − q), if q is growing with N, then r can approach the value 0 arbitrarily closely, contrary to the bounded case. This can also be appreciated in figure 5, for the exponentially distributed degree distribution r(A) appears to reach a lower value for larger N. Therefore we expect r min (N) → r * as N → ∞ for bounded degree distributions and r min (N) → 0 for unbounded ones. For the behaviour of α 1 (N), all numerical simulations suggest α 1 (N) → ∞. In the next section we will also give theoretical arguments in favor of this idea for bounded degree distributions. For the case of unbounded degree distributions more extensive numerical simulations on much larger graphs should explored in order to observe a deviation from logarithmic growing. For the case of bounded distributions, the maximum degree q asymptotically provides sufficiently many nodes to create cliques that will achieve the desired three-cycle density, see for example figure 3. If the desired three-cycle density is higher, then this density will be realized via cliques of the next highest degree k < q, in descending order. For unbounded degree distributions, this picture changes. Here one cannot guarantee the abundance of such cliques, therefore the observed topology seems to remain connected for larger values of α, in what we have called the clustered regime.

Results for bounded degree distributions
In this subsection we develop a further theoretical description of our graph ensemble for the case of bounded degree distributions. As mentioned before, numerical simulations suggest the need to include the statistics of the cliques formed by nodes of maximum degree. We denote by K q (A) the number of fully connected cliques of q + 1 nodes, and by T 0 (A) the number of triangles that are not in cliques of degree q, we use the superscript 0 to differentiate from the counting of all triangles T(A). Since these two cases are mutually exclusive, we can then decompose the total number of three-cycles in the following way: With this decomposition we can write the partition function as where we introduced N k = A i N δ k i , j A ij , and the joint distribution of triangles and cliques for the unbiased CM, Our main approximation consists in assuming that asymptotically the random variables T 0 and K become independent, each described by a Poisson distribution. This means that we again assume the main contribution of triangles for T(A) to come from isolated triangles. Since isolated triangles and cliques are almost independent, and are rare events in the CM, one could argue that according to the Poisson paradigm in [40], they should both be Poissonian random variables. For a similar argument regarding cycles of different lengths see [41]. In fact since cliques are so rare, we can use the same distribution as in T for T 0 , thus we put We can then immediately proceed to calculate the partition function, which leads to the following expression for the three-cycle density, Contrary to the regular case discussed in [27], there is for an arbitrary p(k) no established rigorous result for the expected number λ K (N) of cliques. Nevertheless, there is a good idea of what its scaling with N should be [42]. The expected number of isomorphisms of a given strictly balanced graph H (see [42] for definition) is , where e(H) and v(H) are the number of edges and nodes of H respectively. In the case of a clique of q + 1 nodes these numbers are, e(K q ) = 1 2 q(q + 1) and v(K q ) = q + 1. Therefore, We have included the factor q(q 2 − 1) in the denominator for convenience. With this expression we obtain the following result for small values α The first term corresponds to the contribution from isolated triangles at low density, to be denoted by m t (α). The second term represents triangles in the previously described cliques, we denote it as m K (α). The latter is bounded since the number of cliques of q + 1 nodes is bounded. This then gives It is convenient to define the shattering transition as the point where all the cliques of degree q have appeared. This automatically gives an estimate of how α 2 (N) behaves with N. Here we can see that α 2 (N) diverges logarithmically with N: This result depends on the degree distribution p(k) explicitly through q and p(q), but also implicitly through c q . Since we do not generally know c q , we cannot test the accuracy of the above prediction directly. Only for regular graphs c q is available, leading to accurate predictions for α 2 (N) [22]. However, alternative tests are possible. Equations (28) predicts a collapse of the various α 2 (N) curves under the following change of variable, Even though it is hard to sample graphs very precisely in the clustering regime, given that the waiting time of the MCMC algorithm is very large, overall the transition points of the curves do collapse nicely, as can be seen in figure 6. We stress that close the transition waiting times were so long that points on the steep part of the left panel on figure 6 were probably not equilibrated for system sizes N 1000. For this reason we show in the right panel smaller system sizes N = 200, 300, 400. In these cases we did not observe any drift in the empirical value of m(α) in the time scale of our simulations for the values close to the transition (except at the transition). The reader should keep in mind this still might not be the true value for (4) due to the presence  (29). Left: p(k) = 1 2 δ k3 + 1 2 δ k5 (circles), and p(k) = 1 2 δ k3 + 1 2 δ k9 (squares). System sizes were N = 200, 300, 400, 500, 750, from bottom to top. Averages over 20 samples, error bars are omitted to reduce cluttering. Right: close-up in the neighbourhood of the shattering transition, for p(k) = 1 2 δ k3 + 1 2 δ k7 , for system sizes N = 200, 300, 400 (from bottom to top). Here the error bars correspond to average plus/minus one standard deviation. Table 2. Comparison of the slope of α 2 (N ) plotted against log N, as measuerd from data in figure 7, versus the theoretically predicted value [2(q + 1)] −1 of (28). The degree distributions bim(k|a, b), u(k) and v(k) are defined as in table 1. Standard deviations shown in parentheses. of hysteresis, but as discussed in section 2, we will only work with the value obtained when increasing α from α = 0. We do see an almost perfect collapse of the transitions points of the curves. As described before, we used waiting times of 2 × 10 5 AESPL and inter-sample intervals of 2 × 10 4 AESPL. The values reported are the averages over the full time series as described in the previous section for figure 4.
The prefactor slope of 1 2(q+1) for the term proportional to log N in α 2 (N) in (28) was also tested. To do this we estimated α 2 (N) as the first jump of m(α). This was easily detected with a change of sign in the difference with prediction (9), as m(α) was always overestimated before the transition. This property can be seen clearly in figure 8. Results are presented in table 2 and in panel (c) of figure 7. We find a very good agreement for the bimodal distributions. For distributions u(k) and v(k) the prediction is close enough to the predicted value 0.83, but the observed value of 0.10(0.01) in both cases is actually closer to what we would observe with q = 4. This is consistent with the fact that, for these particular distributions, both degrees have a similar density and k = 4 is more abundant in the case of v(k).
With our estimate for α 2 (N) we can also derive an upper bound on the three-cycle density achieved in the connected regime, This value corresponds to the three-cycle density that would be reached if the contribution of cliques were not present. Given that cliques appear before, it becomes impossible to reach this density in the connected phase. Even though c q is unknown, we can conclude that m u vanishes when N → ∞, which is indeed consistent with numerical experiments, as can be seen in panels (a) and (b) of figure 7. The results are very good when looking at the chosen bimodal degree distributions, p(k) = 1 2 δ k3 + 1 2 δ kq . Figure 7 confirms two theoretical predictions. First, we see that the last value of the three-cycle density before the steep jump into the clustered phase scales with N in the manner predicted by (30). Second, the final value of the jump at α 2 (N) coincides with the prediction p(q)q(q − 1), as indicated by the dotted-dashed line in panel (a) of figure 7.
Note that even though we predict a vanishing density m(α) for all values of α in the asymptotic limit N → ∞, the transition at α 2 (N) does behave similar to a phase transition. A detailed exploration is presented in [19] where hysteresis is shown to be present in this model, (4). It is interesting to note that even though we did not explore the effects of hysteresis we still got good results with our theory, and even a very good match for regular graphs as shown in [27]. The reason behind this could be that the Poissonian assumption  (22) implies that we only sum over states with low three-cycle density when calculating (23). As mentioned before, a different scaling of α with N could be chosen in order to make α 2 (N) independent of N, for example the previously shown α = γ + (2(q + 1)) −1 log N. Nevertheless, as it is shown in equation (29), this different scaling would imply a vanishing of the connected phase. Since we want to write down a theory that incorporates all the regimes we chose to keep α = O (1) and N large but finite.
As a final comment, we point out that the Poissonian assumption of (22) implies that the shattering transition is of an entropic nature. To see this, we can study the behaviour of the ratio N (k|T) N (k|K) = #of graphs with degree sequence k and T isolated triangles #of graphs with degree sequence k and K q − regular cliques .
If we fix the three-cycle density to any arbitrary value m * < p(q)q(q − 1), this value can be achieved by the following numbers of triangles or cliques.
Using the Poissonian assumption, we can then prove (see appendix B) that Hence, no matter how small m * is, for a large enough system there will always be infinitely many more graphs that achieve it via cliques than via isolated triangles. Table 3. Comparison of three-cycle density, m(α); interaction, r(α); ASPL; and diameter. G297 (N = 121, c = 2.5, k 2 = 7) is compared with graphs generated according to (4)

Discussion
In this letter we have presented and analyzed a random graph ensemble were samples are both sparse and with a tuneable amount of three-cycles. Even though this ensemble (4) can be regarded as the simplest random graph ensemble with tuneability of short cycles, it is found to exhibit rather nontrivial behaviour. While one would hope for and expect a smooth and easy controllability of the three-cycle density via the control parameter α, we see that in fact there are very special nontrivial regimes, and there is surprisingly a very strong influence of the system size, i.e. the number of nodes in the graphs. Still, with appropriate care this ensemble could be used by practitioners of network science as a null model of networks with a nontrivial amount of three-cycles. If one has a given real network A 0 , that is to be compared with random samples having the same three-cycle density m(A 0 ), we propose the following steps should be taken: • If n(α) ≈ n(A 0 ) and r(α) ≈ r(A 0 ), then (4) is a suitable null model for A 0 .
• If they are different, it means that A 0 is still extremely atypical in (4), and thus it is not a suitable null model.
As an example we compared our model to a particular graph. We chose a graph representing the structure of a protein named G297 from the public network repository [46]. Nodes represent secondary structure elements and the edges their physical proximity, as described in [47]. In table 3 we can see that its value for the interaction parameter is r = 1/3. This value reflects that three-cycles in this graph are interacting, i.e. sharing edges. While this means it will not look as graph from the connected regime with r ≈ 1/2, it is still interesting to compare to graphs generated with (4), as it is still easy to generate many graphs with the desired three-cycle density in this case. In figure 8 it is shown how the target value of the three-cycle density can actually be reached by tuning α in the connected regime, in this case α * = 0.3825 was necessary. Although there are deviations from (9), it does give a good idea of the behaviour. Once we generate samples at the appropriate value of average threecycle density we can compare other observables. We chose two statistics of the shortest path lengths between nodes in the graph: the average shortest path length (ASPL), and the maximum path length or diameter. For simplicity, we simply discarded all infinite values of path lengths that may occur when graphs have multiple connected components, therefore keeping our observables well defined. Increasing the number of three-cycles will decrease the shortest path lengths between certain nodes, and this indeed can be seen in table 3. It is also clear that the actual values of ASPL and the diameter of G297 are several standard deviations away from the values for the ME ensemble (4) with the same average three-cycle density m = 0.45. This allows us to conclude that these increased values for the ASPL and diameter are actually statistically significant and not only due to the increase in three-cycle density. One can speculate that they are due to the geometric nature of the graph, which can already be appreciated in the lower value of r(A).
Even if all observables m(A), r(A) and n(A) of initial and sampled graphs match, it still could be the case that equilibration waiting times of the MCMC are very large. For graphs of more than a thousand nodes it could take days or more to get well-mixed samples. This just shows how the applied network scientist should be cautious when applying tools like edge swapping without a proper theory.
To summarize, we present our conjectured phase diagram in figure 7 (bottom right). With an exact solution for (6) one could find an analytic expression for the phase boundaries shown. The main lessons are that the same cycle densities may have very different topologies for different system sizes, and that sampling anywhere outside the connected regime takes a very long time, potentially days or weeks for large graphs, even on fast multi-core machines. We expect that for any model, any desired three-cycle density eventually falls in the disconnected regime as N grows. For the case of bounded degree distributions with Np(q) q + 1, the clustered region practically vanishes.
There are many directions in which to pursue further research, ranging from practical to theoretical. From a rigorous point of view it would be interesting to see how to prove or disprove any of the assertions made in this work, that is extending rigorous results of CM beyond uniform models. Additionally, longer and more extensive simulations should be carried out to try to determine the exact dependence on N of α 1 (N) and α 2 (N), especially to find out whether there is indeed a transition without scaling parameters for unbounded degree distributions.
The enormous waiting times seem to be due in part to the fact that in the clustered and disconnected phases many cycles have to broken in a predetermined sequence to get rid of certain structures like cliques. Given that this is unlikely, an alternative MCMC with moves that involve more edges rather than only two could be studied, in order to speed up the algorithm and let it explore more quickly the graph space.
Finally, there are many interesting questions about the spectral properties of (4) to discover. First, in [27] an analytic expression for the spectral density was found for the case of regular graphs in the connected regime. We are currently working on a generalization for an arbitrary degree distribution like in (4). The formation of clusters after the clustering transition points to a localization transition for the eigenvectors of A. A similar observation has been made for dense graphs in [26], where its nontrivial spectral properties were found; such spectral analysis has not been done yet for the sparse case like ours.
Overall, there are many open question when it comes to presenting random counterparts of real networks. It is safe to say that they are not defined by the number of three-cycles alone. It seems like real networks occupy a very small area of the abstract graph space. Finding the correct properties that will make a ME ensemble sample from a pool of realistically looking graphs is still very much an open problem. An alternative is to impose a constraint on the full set of eigenvalues of the adjacency matrix, in this way all cycle lengths would be controlled simultaneously. This full spectral constraint has been discussed in [27,43,44].

Acknowledgments
FAL gratefully acknowledges financial support through a scholarship from Conacyt (Mexico). The authors thank Alexander Mozeika and Mansoor Sheikh for valuable discussions.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A. Numerical sampling
In order for this paper to be sufficiently self-contained, we will present a brief recap of the algorithms described in [5,35] for generating samples from nondirected random graph ensembles with hard-constrained degrees. (c) For each F ∈ Φ there exists a unique inverse F −1 that acts on the same set of graphs, Ω F −1 = Ω F .
With these condition we will be able to define a dynamical process that will allow us to sample effectively from ensemble (4). The reason we need nontrivial moves is to be sure we respect the degree constraints; a single edge dynamics cannot achieve this. The simplest elementary move that respects the values of all degrees is called an edge swap. It involves choosing a pair of edges and interchanging them, see figure A1.
We next need to define the transition probabilities W(A|A ) of the Markov chain. They are chosen such as to obey the detailed balance condition, with (4) as invariant measure, i.e. W(A|A )p ∞ (A ) = W(A |A)p ∞ (A) for all (A, A ). Together with the known ergodicity of the edge swap moves [45], detailed balance is a sufficient condition to satisfy (a). We can write the transition probabilities as The interpretation of the above transition probabilities is as follows. At each step a candidate move is chosen uniformly at random from all possible moves, with probability 1/n(A). It is then accepted with probability A(FA|A), and otherwise rejected. The acceptance probabilities must satisfy the detailed balance condition with d = 1 2 q(q − 1). They are simply related to the number of graphs that exist, given the prescribed degree sequence, with the stated number of triangles or cliques, so We want to determine for a given three-cycle density whether asymptotically there are more graphs that realize the joint values (T, K) through triangles or through cliques. For this we need to write the number of triangles and cliques in terms of the desired three-cycle density, which gives We can now inspect the asymptotic limit We note, upon using Stirling's expression for the factorials, that this quantity is dominated by the N log N term, since d = 1 2 q(q − 1). Hence