Analytical maximum-likelihood method to detect patterns in real networks

In order to detect patterns in real networks, randomized graph ensembles that preserve only part of the topology of an observed network are systematically used as fundamental null models. However, the generation of them is still problematic. Existing approaches are either computationally demanding and beyond analytic control or analytically accessible but highly approximate. Here, we propose a solution to this long-standing problem by introducing a fast method that allows one to obtain expectation values and standard deviations of any topological property analytically, for any binary, weighted, directed or undirected network. Remarkably, the time required to obtain the expectation value of any property analytically across the entire graph ensemble is as short as that required to compute the same property using the adjacency matrix of the single original network. Our method reveals that the null behavior of various correlation properties is different from what was believed previously, and is highly sensitive to the particular network considered. Moreover, our approach shows that important structural properties (such as the modularity used in community detection problems) are currently based on incorrect expressions, and provides the exact quantities that should replace them.


I. INTRODUCTION
Detecting relevant patterns in real networks, a fundamental problem for many research fields [1][2][3], relies upon the possibility to distinguish the properties explained by the presence of simple constraints from more complex and nontrivial structural features.For this reason, statistical ensembles of graphs with specified constraints, and otherwise completely random, have been introduced and systematically used as a reference to identify non-random patterns in a real network .Such ensembles serve also as powerful models to study dynamical processes on networks displaying only a set of desired properties, and allow to highlight the dynamical effect of each property separately.The simplest and most important ensembles specify only local constraints.For unweighted networks, this amounts to specify the degree k i (number of incident edges) of each vertex (i = 1 . . .N where N is the total number of vertices), and results in the so-called configuration model [4,5,7].In the weighted case, the corresponding constraint is obtained by fixing the strength s i (sum of incident edge weights) of each vertex [15][16][17].More in general, one could enforce different or additional properties [6,11,13,14,16,[19][20][21][22][23].
Unfortunately, as we discuss in detail in what follows, it turns out that even in the simplest case with local constraints, the correct generation of random ensembles corresponding to a particular real world network is problematic.Both analytical and computational approaches proposed so far have severe limitations.Motivated by this, here we propose a new maximum-entropy method that is entirely analytical and does not require the generation of randomized variants of a real network.Our method provides the exact probabilities of occurrence of random graphs with the same (average) constraints as the real network, from which the expectation values and standard deviations (and in principle the higher moments) of any topological quantity of interest can be calculated mathematically, either exactly or using proper approximations.Due to its analytical character, our method is extremely faster than all the available alternatives.Moreover, it can be applied to undirected, directed, binary and weighted networks in a unified way.We will illustrate the power of our approach on several real-world networks of different nature and type, by studying a range of topological properties of interest.

II. AVAILABLE METHODS AND THEIR LIMITATIONS
We first briefly review the existing problems in the case of binary unweighted networks, which is the most frequently explored situation.A binary unweighted graph with N vertices is completely specified by a N × N adjacency matrix A with entries a ij = 1 if the vertices i and j are connected, and a ij = 0 otherwise.Generally, one is interested in comparing the observed topological properties of a particular real-world network A * against the average properties of a randomized family of networks with the same degree sequence k(A * ) = {k i (A * )}, where k i (A * ) = j a * ij is the degree (number of connec- tions) of vertex i in the network A * .The ensemble of binary undirected networks with specified degree sequence is known as the configuration model (CM) [4,5,7] and is currently treated in two very different ways: computationally, by explicitly generating many random networks with the desired degree sequence and averaging the quantities of interest across the randomized networks [4,5], or analytically, by using approximations that allow to directly estimate the average of topological properties as a function of the enforced degree sequence, without actually measuring them on any network [7,8].Currently, both approaches suffer from severe limitations.A 'bottom-up' computational approach consists in assigning each vertex i a number of 'edge stubs' equal to its observed degree k i (A * ), and randomly matching pairs of stubs (avoiding self-loops and multiple links) until all degrees reach their desired values (edge stub reconnection).However, this procedure is known to get stuck in configurations where vertices requiring additional connections have no more eligible partners [4,5].As a consequence, one must implement a 'top-down' computational approach where the entire real network A * is taken as the initial configuration, and a family of randomized variants is generated by iteratively applying a local rewiring algorithm (LRA) where two edges (A, B) and (C, D) are randomly selected and replaced by the two edges (A, D) and (C, B), if the latter are both not already present [4,5] (see fig. 1 for an illustration).This generates a microcanonical ensemble (see the Appendix for a detailed discussion) where all randomized networks have exactly the same degree sequence as the original network, and are sampled with equal probability.This method has been applied to various networks, including the Internet [5], cellular networks [6] and food webs [12], in order to detect higher-order patterns (such as clustering and motifs) not merely due to local constraints.However, this approach is time-consuming since many (a number R much larger than the observed number of links L [4,24], even if not rigorously specified) iterations of the LRA are required to obtain a single randomized network, and the entire process must be repeated several times to produce a large number M (again unspecified) of randomized networks, on each of which any topological property X of interest must be measured explicitly and averaged at the end to obtain an estimate for X .The computational time required to obtain X is therefore of the order O(M • T R • R) + O(M • T X ), where T R is the average time required to perform a single successful rewiring step and T X is that required to compute X on a single network in the randomized set.Moreover, even if the sufficient statistics of the problem is the degree sequence k(A * ) alone, the above approach requires the entire original network A * (or any other network with the same degree sequence, which is however difficult to obtain from scratch due to the problems discussed above) as the starting configuration, thus making use of much more information than required in principle.
By contrast, analytical approaches seek to provide theoretical expressions to directly obtain the ensemble averages of topological properties, without generating the ensemble computationally.Two main approaches exist.One makes use of generating functions for the relevant probability distributions.In the case we are discussing here, the key quantity is the generating function g(z) = k z k P (k) of the degree distribution [7].Unfortunately, this method assumes that the network is infinite and locally tree-like (even if in some cases this approximation turns out to perform unexpectedly well even beyond its formal range of applicability [27]), and is thus inappropriate if the size of the network is small and if the input degree distribution can only be realized by dense and/or clustered networks.In this approach, clustered or dense networks can only be generated by imposing additional constraints besides the degree sequence, such as the number of triangles attached to vertices [28], thus leading to a different ensemble which is not the one we are seeking to characterize.
A different approach looks for an analytical expression for the probability p ij that the vertices i and j are connected in the randomized ensemble [8].Due to its probabilistic nature, this approach generates a (grand)canonical ensemble where even graphs violating the constraint are present and assigned different probabilities.In such a case, the constraints are realized on average, i.e. the expectation value X of any specified property X is fixed exactly (see Appendix).While this approach is indeed very fast in providing averages of the desired properties, it has been shown [9] that it makes use of a highly approximate expression for p ij , valid only when the original network is sparse and/or the degree distribution is not too broad.This expression is where L * ≡ L(A * ) = i k i (A * )/2 = i<j a * ij is the total number of links.While the expected degree k i = j p ij generated by the above formula coincides with the desired degree k i (A * ), the probability p ij may exceed 1 for pairs of highly connected nodes such that k i (A * )k j (A * ) > 2L(A * ).In general, only if the degree sequence is such that then using eq.( 1) on the real network A * will not lead to the above problem.While the above condition is typically obeyed by networks with narrow degree distribution such as the Erdős-Rényi random graph, it is generally violated by scale-free networks displaying a power-law degree distribution P (k) ∼ k −γ , and this violation becomes stronger and stronger as the density of the network increases.In particular, it is possible to show that in order to ensure eq.( 2) the maximum degree k max in the network should not exceed the so-called structural cut-off k c ∼ N 1/2 [29].This is particularly evident for dense networks where the average degree k = i k i /N = 2L/N remains constant as N increases, so that eq.( 2) remains valid only if k max < √ 2L ∼ N 1/2 .By contrast, extreme value theory shows that in networks with degree distribution P (k) ∼ k −γ the maximum degree scales as k max ∼ N 1/(γ− 1) , so that if γ < 3 (as observed in most real-world scale-free networks) then k max > N 1/2 which exceeds k c .The meaning of p ij being larger than 1 for some pairs of vertices in eq.( 1) is that, in order to actually realize the degree sequence of the real network A * , one must let i and j be connected by more than one undirected edge.Also, since the desired equality k i = k i (A * ) is only ensured if one lets the sum in j p ij = k i run over all vertices including i itself, one must allow the presence of self-loops in the randomized networks.Thus, even if this is not evident at a first glance, the ensemble generated by eq.( 1) does not only contain binary and loop-less undirected graphs and is thus not a proper null model for an empirical binary loop-less network A * with degree sequence k(A * ) violating eq.( 2), as is typically the case for real-world networks with broad degree distributions.
An elegant proof that the correct ensemble probability p ij for loop-less graphs with no multiple connections differs from eq.( 1) has been proposed [9] and re-derived within the framework of maximum-entropy graph ensembles [14].We shall exploit this result to obtain an exact method later on.We will also show that in real networks the deviation is stronger than expected, and affects sparse networks as well.An independent proof of the inadequacy of eq.( 1) is that it does not generate the graph A * with maximum likelihood [30].This can be confirmed by treating L as a free parameter and look for its value L M L that maximizes the probability to obtain A * .One finds that L M L = L(A * ), which implies that under the maximum likelihood choice k i = k i (A * ) and L = L(A * ), violating the desired constraint on the degree sequence and the implied one on the number of links [30].This shows that the functional form of p ij in eq.( 1) is intrinsically problematic and does not give highest likelihood to A * and to all other graphs with the same degree sequence as A * .Therefore, while the available analytical methods are useful to characterise artificially generated networks with special properties, they cannot be used to correctly randomise any real-world network which is either small, clustered, or dense.Unfortunately, the above limitations are generally ignored, and eq.( 1) is frequently used beyond its limits of applicability to estimate connection probabilities.Moreover, as we note later on, it is also used as a key ingredient in order to define important structural properties which implicitly rely on a comparison against the CM.Analogous problems exist in the analysis of directed and/or weighted networks.We will consider each of these cases separately in what follows.

III. A FAST AND ANALYTICAL METHOD
The above discussion highlights that no method developed so far succeeds in obtaining randomized properties of a particular real-world network such that two requests are met simultaneously: i) the method is general and works for any network, even if displaying small size, high link density, and large clustering; ii) expected values across the ensemble can be computed analytically, without sampling the configuration space explicitly.The need to resort to the LRA as the only statistically correct method available, which however requires the artificial generation of many randomized networks, makes the general problem very complicated and all its applications time consuming.
In this paper we propose a solution to this longstanding problem.We develop an approach that combines exact expressions for the occurrence probabilities of graphs in maximum-entropy ensembles with given constraints [9,11,14,[21][22][23] with more recent results about the application of the Maximum Likelihood principle to graph ensembles [30].In the Appendix we describe our method in great detail.We start with a general discussion which is formally valid for any constraint, and then consider explicitly the application to real networks where a set of local constraints is enforced.We consider the cases of binary, weighted, directed and undirected networks separately.We show that in all these cases the enforcement of local constraints always leads to exact probabilities that can be easily obtained analytically.Then we also consider an extension to non-local constraints which can still be dealt with analytically.Finally, we compare our (grand)canonical method with the corresponding microcanonical ensemble generated computationally as in the LRA.
As we show, in all the cases of interest a choice of constraints leads to a specific set of coupled nonlinear equations to be solved.In such equations, the observed values of the enforced topological properties (e.g. the degree sequence) determine the values of an equal number of 'hidden' parameters in such a way that the real network, or any other network with the same constraints as the real one, is generated with maximum likelihood.
Since only the enforced constraints enter the equations, our method only requires the knowledge of the sufficient statistics of the problem and not of the whole topology, restoring a desirable feature of randomization algorithms.Solving the maximum-likelihood equations only takes a computational time T E which is negligible if compared to the time required to measure any nontrivial topological property, and entirely replaces the artificial generation of many randomized variants of the original network.
Once the parameters solving the equations are found, they can be directly used to obtain the expectation value X and standard deviation σ[X] of any topological property X of interest analytically.When useful, this also allows to obtain a z-score representing the number of standard deviations by which the randomized value X differs form the observed value X(A * ).The possibility to obtain the standard deviations and/or z-scores is very important, because it allows to assess which topological properties X are consistent with their randomized value X within a statistical error, and which deviate significantly from the null expectation.In the former case, one can conclude that the enforced constraints completely explain the higher-order property X.In the latter case, the observed property cannot be traced back to the constraints, and therefore requires additional explanations or generating mechanisms besides those required in order to explain the constraints themselves (it should be noted, however, that z-scores can be unambiguously interpreted only if the property X is normally distributed, and this is generally not the case; nonetheless they still carry information about the discrepancy between observations and the null model).
Importantly, the time required to compute the expectation value X of a given property X analytically (formally corresponding to an average over a huge number of randomized configurations) is the same as the time T X required to compute the same property on the single original network.Therefore our method takes only a total time O(T E + T X ) to obtain X exactly, which is incredibly shorter than the aforementioned time required by the LRA to obtain X approximately.Importantly, T E is independent of the complexity of the topological property X to measure, which means that for complicated properties O(T E + T X ) = O(T X ).Therefore for any topological property X which can be measured in a large but still reasonable time O(T X ) on the real network, the computation of its expectation value X will require the same time O(T X ).If the time required in order to obtain X is too large, it is because the time required to measure X is too large as well.In other words, the property X is too complicated to be computed on the real network itself.In such a case, the problem is not due to the method, but to a demanding choice of X for that particular network.Note that we are assuming that the topological properties of the real network are computed using the full adjacency matrix.This is the worst-case scenario, since in many cases (especially for sparse net-works) it is enough to use reduced information such as the list of existing links.For instance, the time to measure the clustering coefficient can be significantly shorter, using optimized algorithms, on a sparse network than on a generic network of the same size (an in this case it will also be shorter than the time required to compute its randomized value across our ensemble).However, our interest is precisely to focus on the (worst) general case (e.g.dense and very dense networks), because it is in this case that other approaches fail (such as eq.1), or become extremely time consuming (such as the LRA, which takes longer for denser graphs).

IV. RESULTS
We now show the application of our method to real networks of various type, by considering several topological properties and their randomized counterparts.

A. Binary undirected networks
We start with the simplest case of binary undirected networks.One of the most important topological properties of a binary network is the correlation between the degrees of adjacent nodes, which has been shown to dramatically affect various structural and dynamical features [2].These correlations can be measured by the average nearest neighbour degree (ANND), which on the real network A * is defined as While the degree is a first-order property which only depends on the number of links (topological paths of length one) entering a vertex, the ANND is a second-order property contributed by paths of length 2 (i.e. the terms a * ij a * jk ).Similarly, a third-order (i.e.involving paths of length 3) property is the clustering coefficient c i , which represents the fraction of pairs of neighbours of vertex i which are mutually connected: As we mentioned, it is always important to assess whether in a particular real network higher-order properties arise merely as a consequence of low-level constraints or whether they signal additional structural patterns.In particular, comparing the real network A * with the CM (which provides an ensemble of random networks having, on average, the same degree sequence k(A * ) as A * ) allows to assess whether longer topological paths and the structural properties involving them are simply a random concatenation of the individual links enforced by the degree sequence, or whether they are irreducible to 2. Application of our method to binary undirected networks.The red points are the empirical data, the black solid curves are averages over the configuration model obtained using the local rewiring algorithm [4,5], and the blue dashed curves are the analytical expectations (± one standard deviation) obtained using our method.The green curves are the flat expectations under the Erdős-Rényi random graph model, and highlight the average level of correlation in the random case.The panels report k nn i versus ki (left) and ci versus ki (right) for: a) and b) the network of the largest US airports (N = 500) [31], c) and d) the synaptic network of Caenorhabditis elegans (N = 264) [32], e) and f) the proteinprotein interaction network of Helicobacter pylori (N = 732) [33], g) and h) the network of liquidity reserves exchanges between Italian banks in 1999 [34] (N = 215), i) the Internet at the AS level (N = 11.174)[35] and j) the protein-protein interaction network of Saccharomices cerevisiae (N = 4.142) [33].The last two networks are randomized using only our method, as the local rewiring algorithm would require much more time given the large number of edges.
first-order constraints.As we discuss in detail in the Appendix, our method can solve this problem by making use of an auxiliary N -dimensional vector x = {x 1 . . .x N } of parameters.In particular, one must look for the particular values x * that solve the following set of N coupled nonlinear equations: where k i (A * ) is the observed degree of vertex i in the real network A * .Once the parameter values are found, they allow to obtain analytically the expectation value X * of any topological property X across the desired ensemble.This simply amounts to replace the adjacency matrix entry a * ij appearing in the definition of X(A * ) with its expectation value which represents the correct expression that should be used in place of eq.( 1).Similarly, it is possible to obtain the standard deviation σ * [X] analytically in terms of x * (see Appendix).
In fig. 2 we show an application of our method on the network of the 500 largest US airports [31], a synaptic network [32], two protein interaction networks [33], an interbank network [34] and the Internet at the Autonomous Systems level [35].These are among the most studied networks of this type.We compare the correlation structure of the original networks, as measured by the dependence of k nn i (A * ) and c i (A * ) on k i (A * ), with the expected values k nn i * and c i * obtained analytically using our method.Note that we are averaging the values of k nn i (A * ) and c i (A * ) over all vertices with the same degree: this makes our comparison with the values k nn i * and c i * consistent, since both real and randomized quantities can be plotted using the same values k i * = k i (A * ) on the abscissa (we use the same strategy in what follows).We also highlight the region within one standard deviation around the average by plotting the curves For the sake of comparison, we also report the average values obtained sampling the microcanonical ensemble with the standard local rewiring algorithm [4,5], and the expected values over the ensemble of random graphs with the same number of links (random graph model, RG) As we mentioned, the microcanonical method requires the generation of many randomized variants, many rewirings per variant, and the measurement of k nn i and c i on each variant separately, plus a final averaging.By contrast, our method only requires the preliminary estimation of the {x * i }.Then the calculation of k nn i and c i takes exactly the same time as that of the empirical values.As can be seen, the two approaches yield very similar results (in the Appendix we provide a detailed comparison of the two methods).For the two largest networks (the protein interactions in S. cerevisiae and the Internet), we only report the expectations obtained using our method, as the microcanonical approach would require too much computing time.
The above results allow to interpret the effect of the degree sequence on higher-order properties.Firstly, the trends displayed by the CM are not flat as those expected in the random graph case.This confirms that residual structural correlations, simply due to the enforced constraint, are still present after the rewiring has taken place.The presence of these correlations does not require any additional explanation besides the existence of the constraints themselves.This is very different from the picture one would get by using the (wrong) expectation of eq.( 1) which would yield flat trends as well, naively suggesting that correlations can never be traced back to the degree sequence alone.Secondly, while the trends observed in all the networks considered are always decreasing, they unveil different correlation patterns when compared to the randomized trends.The real interbank data are almost indistinguishable from the randomized curves, meaning that structural constraints can fully explain the observed behaviour of higher-order network properties.Instead, in the airport network the randomized curves lie below the real data (except for an opposite trend of k nn i for low degrees).This means that the real network is more correlated than the baseline randomized expectation, and indicates that additional mechanisms producing positive correlations must be present on top of structural effects.By contrast, in the H. pylori 's protein network the expected curves lie above the real data, suggesting the presence of mechanisms producing negative correlations.The same is true for the correlation structure of the Internet, confirming previous results [5], while S. cerevisiae's protein network is completely different from its randomized variants.Therefore seemingly similar trends can actually reveal very different types of structural organization.This means that measuring the topological properties alone is uninformative, and makes the comparison between real data and randomized ensembles essential.Thus the possibility to analytically and quickly characterize the latter, which was previously unavailable, is a remarkable advantage of our approach.

B. Directed networks
We now consider binary directed networks, which are specified by an asymmetric adjacency matrix A. The local constraints are now represented by the joint sequence of out-degrees and in-degrees {k out i , k in i } = { j =i a ij , j =i a ij }.Given a particular real network A * and a measured topological property X(A * ), our method allows to analytically obtain the expectation value X * and standard deviation σ * [X] across the ensemble of binary directed graphs with, on average, the same directed degree sequences k out (A * ) and k in (A * ) as A * (directed configuration model, DCM).As shown in the Appendix, in this case our method makes use of two N -dimensional  [32], c) and d) the metabolic network of Escherichia coli (N = 1078) [36], e) and f) the Little Rock Lake food web (N = 183) [37].For the C. elegans network, we also show the microcanonical standard deviations obtained using the LRA (black dotted curves), which are indistinguishable from the grandcanonical ones.vectors x, y of auxiliary variables, and requires that these parameters are set to the particular values x * , y * that solve the following set of 2N coupled nonlinear equations: The quantities x * , y * allow to obtain X * and σ * [X] analytically and quickly, outperforming the directed version of the LRA [? ].Note that, as in the undirected case, the method only makes use of the sufficient statistics of the problem.We apply our method to various directed networks, by studying the second-order topological properties measured by the outward ANND and the inward ANND, which are defined as two natural generalizations of eq.( 3): In fig. 3 we plot the observed values k nn,in i ] obtained using our model (see Appendix), for three real directed networks: the neural network of C. elegans [32] (now in its directed version), the metabolic network of E. coli [36], and the Little Rock Lake food web [37].As before, we also show the microcanonical average obtained using the LRA and the expectation under the directed random graph model (DRG) with the same number of links.Again, we find a very good agreement between the two approaches, confirming that our method yields the correct prediction in incredibly shorter time (see Appendix for a discussion about the convergence time of the LRA to our exact results).For the C. elegans network (fig.3a-b), we also show the microcanonical standard deviations, which turn out to be indistinguishable from the grandcanonical ones.We also confirm that while some networks (C.elegans and E. coli ) are almost consistent with the null model, others (Little Rock ) deviate significantly.
However, the most interesting point for the present analysis is that, while for the undirected networks considered above all randomized trends were decreasing, in this case we find that the three randomized trends behave in totally different ways.In the neural network, both k nn,in i * and k nn,out i * are approximately constant.This means that the baseline behavior for both quantities is flat and uncorrelated (as in the directed random graph, but at a different level).By contrast, in the metabolic network the expected curves are decreasing, and thus the ensemble of randomized networks is disassortative as for the undirected graphs considered above.Finally, in the food web the constraints enforce unusual positive correlations, and the randomized ensemble is even assortative.Interestingly, while it is expected that random networks with specified degrees display a disassortative behavior [5,9], the assortative trend is totally surprising.This is because our method extracts the hidden variables directly from the specific real world network, rather than drawing them from ad hoc distributions.The resulting values can be distributed in a very complicated fashion, invalidating the results obtained under other hypotheses.To further highlight this important point, we selected three more food webs characterized by a particularly small size (see fig. 4).Small networks cannot be described by approximating the mass probability function of their topological properties (such as the degree) with a continuous probability density.Therefore in this case the difference between the expectations obtained by drawing the x and y values from analytically tractable continuous distributions and those obtained by solving eqs.(8) using the empirical degrees is particularly evident.As we show in fig. 4 (where for simplicity we omit the comparison with the LRA), we confirm that the (directed) CM can display not only flat or decreasing trends, but also increasing ones.Importantly, in this case all three webs do not deviate dramatically from the null model.This means that while one would be tempted to interpret the three observed trends as signatures of dif-ferent patterns (zero, negative and positive correlation), actually in all three cases the observed behavior can be roughly replicated by the same mechanism and almost entirely traced back to the degree sequence only.This unexpected result highlights once again that the measured values of any topological property are per se entirely uninformative, and can only be interpreted in relation to a null model.

C. Reciprocity and motifs
So far, in our analysis of directed networks we have considered second-order topological properties.In principle, third-order properties can be studied by introducing directed generalizations of the clustering coefficient [39,40].However, there is a proliferation of possible third-order patterns due to the directionality of links.For this reason, a more complete analysis consists in counting (across the entire network) all the possible directed motifs [6] involving three vertices, and comparing the empirical abundances with the expected ones under the null model.As we show in a moment, our method lends itself admirably in such a case.Before presenting our results, we note however that directionality makes the possible specifications of the null model proliferate as well.In particular, besides the DCM considered above, a more refined way to randomize directed networks includes the possibility to enforce additional constraints on the reciprocity structure [6,11].In other words, it is possible (and important in many applications [6,12]) to preserve not only the total numbers k in i and k out i of incoming and outgoing links of each vertex, but also the number k ↔ i ≡ j a ij a ji of reciprocated links (pairs of links in both directions) [41,42].This specification is equivalent to enforce, for each vertex i, the three quantities [11,41] )} as the reciprocal configuration model (RCM).This is an example of model with nonlocal (second-order) constraints which can still be treated analytically using our method.As we show in the Appendix, in this case one must solve the following 3N coupled equations: The expectation value of any topological property, as well as its standard deviation, can now be calculated analytically in terms of the three N -dimensional vectors x * , y * , z * .For instance, in fig.4g-h we repeat the analysis of the directed ANND of the St. Marks River food web, now comparing the observed trend against the RCM.In this case, we find no significant difference with respect to the DCM considered above (fig.4e-f).However, as we now show, the analysis of motifs reveals a dramatic difference between the predictions of the two null models.
If N m denotes the number of occurrences of a particular motif m, our method allows to calculate the expected number N m * and standard deviation σ * [N m ] exactly (see Appendix), and thus to obtain the z-score analytically.This can be done for both the DCM and the RCM.The value of z[N m ] indicates by how many standard deviations the observed and expected numbers of occurrences of motif m differ.Large values of z[N m ] indicate motifs that are either over-or under-represented under the particular null model considered, and that are therefore not explained by the lower-order constraints enforced.In fig. 5 we show the z-scores for all the possible 13 non-isomorphic connected motifs with three vertices in 8 real food webs, for both null models.We also show the two lines z = ±2 to highlight the region within 2 standard deviations from the model's expectations.This analysis is similar to that of ref. [12], but is made much simpler by our method which does not require to randomize the webs through a computational algorithm preserving the (reciprocal) degree sequences.The food webs considered here are from different ecosystems (lagoons, marshes, lakes, bays, estuaries, grasses), with a prevalence of aquatic habitats.The presence of (intrinsically directed) predator-prey relationships implies that reciprocity is a very important quantity in food webs [12].Thus the RCM should fluctuate less than the DCM.Indeed, this is confirmed by our analysis.The z-scores for the motifs m = 2, 3, 13 are significantly reduced from the DCM to the RCM.Also, while the motifs m = 1, 6, 10, 11 display large values of z with opposite signs across different webs under the DCM, the signs of all statistically surprising motifs (i.e. when |z| 2) become consistent with each other under the RCM (except for m = 13).As a consequence, under the RCM all networks display a very similar pattern, and the most striking features of real webs become the over-representation of motifs m = 2, 10 (plus m = 6, 11, 13 for the Little Rock Lake web) and the under-representation of motifs m = 5, 9, 13 (plus m = 3, 7, 8 for Little Rock Lake).In particular, the under-representation of motif m = 9 (the 3-loop) is the most common pattern across all webs, and becomes stronger as the reciprocity of the web increases.Also note that in a network with no reciprocated links, the number of motifs with at least a pair of reciprocated links is zero.Under the RCM, the expected number of these motifs remains zero.By contrast, their expected number under the DCM is always positive.Thus we confirm that the upgrade to the RCM is necessary, as its stricter constraints allow to analyze 3-vertices motifs once 2-vertices motifs (i.e.all possible dyadic patterns) are correctly accounted for.The possibility to treat the RCM analytically using our method is therefore an important step forward.

D. Weighted networks
Remarkably, our method works equally well for weighted graphs (where the binary adjacency matrix A is replaced by a non-negative weight matrix W), thanks to recent analytical results that allow to characterize maximally random weighted networks with spec-ified properties in a way that is completely analogous to their binary counterparts [22,23].In a particular weighted network W * , the local constraints are the strength sequence {s i (W * )} = { j w * ij } (undirected case) or the joint out-strength and in-strength sequence {s out i (W * ), s in i (W * )} = { j w * ij , j w * ji } (directed case).We will only consider undirected weighted networks.The extension to the directed case is straightforward.The family of randomized weighted graphs with the same strength sequence as a real weighted network is sometimes denoted as the weighted configuration model (WCM) [15].The available microcanonical algorithms regard each link weight as an integer multiple w of a fundamental unit of weight, transform each edge of weight w into w edges of unit weight, and rewire the latter as in the unweighted case, now ensuring that the strength (total number of incoming edges of unit weight) of each vertex is preserved.This means replacing a list of L * ≤ N (N − 1)/2 weighted links, summing up to a total weight W * = i<j w * ij , with W * N (N − 1)/2 unweighed links.As real networks have broadly distributed weights summing up to a large W * , this procedure becomes very time consuming as incredibly many rewiring steps per randomized variant must be performed.As for the binary case, an alternative procedure makes use of a naive theoretical expectation [15,43] for the expected weight of a link in the WCM, in analogy with eq.( 1): However, the above expression has been shown to have as many limitations as its binary counterpart, and to be incorrect [22].
By contrast, as we show in the Appendix, our method allows to treat the WCM analytically as in the unweighted case.Note that choosing the unit of weight in the WCM (before performing the randomization) is in principle arbitrary, but the resulting ensemble will be different for different choices.This issue of granularity is an open problem that deserves future investigations.Our grandcanonical alternative to the WCM is not aimed at fixing the problem, but at providing, for a given choice of the weight unit in the microcanonical ensemble, the corresponding grandcanonical expectation.
Given a real weighted undirected network W * , our method proceeds by finding the particular values {x * i } solving the N coupled equations Note the difference of sign with respect to eq.( 5).As in the binary case, the knowledge of x * allows to obtain the expectation value X * and standard deviation σ * [X] of any weighted topological property X analytically across the ensemble of weighted graphs with, on average, the same strength sequence s(W * ) as the real network W * .
Again, the time required to obtain X * is as short as that required to measure the empirical value X(W * ), as X * can be obtained by replacing w * ij with the expectation value into the definition of X(W * ).Equation ( 17) corrects the naive expectation (15).In order to apply our method, we need to choose the weighted topological properties to investigate.Generalizing binary properties to weighted graphs is arbitrary, as no unique choice exist [18,[43][44][45].To better highlight the generality of our approach, here we follow ref.[44] since it introduces a way to always systematically define a weighted counterpart X for every binary property X.The idea is to define X as an average of X over the ensemble of binary graphs generated by a convenient connection probability p ij = f (w ij ) ∈ [0, 1] which is a function of the observed weights {w ij }.The functional form of p ij can in principle be chosen depending on the empirical properties one wants to detect; however, our purpose here is using our method to compare the empirical properties with the expected ones, rather than comparing alternative definitions of the empirical properties themselves.Therefore we make the simplest choice and, given a real weighted network W * , we set is the total weight.This choice yields the following definition for the weighted degree [44]: which is simply proportional to the strength.Similarly, the weighted ANND and clustering are defined as the counterparts of eqs.( 3) and (4) [44]: In analogy with the binary case, knn i and ci can be plotted against ki (or equivalently s i ) in order to investigate the correlation structure of the weighted network.
In fig.6 we analyze the weighted and undirected (symmetrized) versions of four networks we have already considered in the previous binary study: the the Florida Bay food web, the Italian interbank network, the C. Elegans neural network and the US airport network.We compare the empirical results with the expected trends (± one standard deviation) under the WCM obtained using our method.For simplicity, we only show the results obtained using our method, and omit the timeconsuming microcanonical comparison.Note that, since  6.Application of our method to weighted undirected networks.Red points are the empirical data and the blue dashed curves are the exact expectations obtained using our method (± one standard deviation).Green dashed curves are the flat expectations under the weighted random graph model, WRG [23].The panels report knn i versus si (left) and ci versus si (right) for: a) and b) the Florida Bay food web (N=128) [38], c) and d) the Italian interbank network (N=215) [34], e) and f) the C. elegans neural network (N=265) [32], g) and h) a snapshot of the US airport network (N=332) [31].
the strengths are preserved in the WCM, i.e. s i * = s i (W * ) ∀i, the total weight is preserved as well: W * = W * .We find that the empirical trends are quite scattered and variable: some are weakly increasing (Florida Bay), some are approximately constant (interbank web), others first increase and then decrease (airport network).These diverse trends must be compared with a null model which, unlike naively expected from eq.( 15), is not flat and displays a not easily characterizable increasing behavior.A common feature is that, with respect to the null behavior, real weighted networks are more assortative and clustered for low values of the strength, while they are less assortative and clustered for high values for the strength.These considerations confirm that, even in the weighted case, the empirical trends are uninformative by themselves, and always require a comparison with a null model.Our method allows to treat the otherwise problematic WCM in a simple way, in straightforward analogy with the binary case.
Although we do not consider this possibility here explicitly, for weighted networks one could also enforce additional constraints on the degree sequence.This amounts to specifying not only the strength of each vertex, but also its purely topological degree [16,19,20].In this case, sampling the randomized ensemble by means of computational algorithms becomes even more difficult.By contrast, our method can still be used efficiently, as the analytical expressions characterizing the corresponding maximum-entropy ensemble have been derived recently [22].Those results easily allow to obtain the equations implied by the ML principle, as well as the expectation values of network properties over the ensemble, in a straightforward fashion.
For completeness, in fig.7 we show the ratios of the contraints standard deviations, σ C , to the constraints expected values, µ C (a quantity known in statistics as coefficient of variation), plotted versus the expected values.For small values of the constraints σ C /µ C ∼ (µ C ) −1/2 (an approximation valid both for binary and weighted networks); the higher the constraints expected values, the more important becomes a correction factor whose entity (and sign) depends on the particular type of network considered (see Appendix for the details of the calculations): in the food webs (panel b) the presence of in-degree hubs implies the correction to be important even for small outdegree vertices.

V. DISCUSSION
Our method make use of the correct expressions (6) and (17) for the connection probability and expected weight respectively, in place of the incorrect naive expressions (1) and (15).While the latter depend only on the properties (k i or s i ) of the end-point vertices i and j, the former depend on the entire degree or strength sequence through eqs.( 5) and (16).We have shown that this has a dramatic effect on the properties of the randomized ensemble.In particular, we have found that enforcing the same set of constraints in different networks can yield very different trends for the randomized properties, whose behavior is therefore highly unpredictable a priori.The general expectation that randomized higherorder properties (such as k nn i and c i in unweighted networks or ki and ci in weighted networks) are independent of the local ones (k i or s i ) turns out to be only a very infrequent possibility among the possible scenarios.Indeed, we have also found increasing and decreasing trends for the randomized quantities, and shown that the particular behavior displayed by the null model highly depends on the particular values of the constraints in the  original real-world network.This makes the comparison with the particular null model even more important than previously expected, and underlines the importance of a tractable description enabled by our analytical method.
The incorrectness of eqs.( 1) and (15), as well as of their directed counterparts, has another series of undesired effects, as those expressions have been explicitly used to define important structural quantities involved in network analysis.Indeed, even when not explicitly used to randomize a network, null models unavoidably enter into the analytical expressions defining many properties of interest.For instance, many popular community detection algorithms make use of the concept of modularity to evaluate the quality of a partition of the network against a null case [46].A partition into communities can be represented by the matrix {δ ij }, where δ ij = 1 if vertices i and j are assigned the same community and δ ij = 0 otherwise.For a binary undirected network A * , the modularity Q of the partition {δ ij } has been defined as where p ij is the probability that i and j are connected in a suitable null model, and the most frequent choice is the CM.Similarly, for a weighted undirected network where w ij is the expected weight of the link joining i and j in the WCM.Unfortunately, the expressions for p ij and w ij are always taken to be eqs.( 1) and ( 15) respectively.To the best of our knowledge, no rigorous assessment of the consequences of using these approximations has been provided.Therefore the problems described in the present paper affect any modularity-based community detection problem in an uncontrolled way.
Our methods provides the previously unavailable exact expressions ( 6) and ( 17), whose values can be inserted into eqs.(21) and ( 22) to have the correct modularity.
A straightforward analysis of how the correct expressions change the detected community structure of real networks is an important open point to address in the future.

VI. CONCLUSIONS
We have presented a fast and exact method to obtain analytical results about the grandcanonical ensemble of randomized variants of a particular real-world network that preserve its average local properties.The method works for both weighted and unweighted networks, and for both directed and undirected graphs.In any case, it requires as the input only the strength or degree sequence(s), which represent the sufficient statistics of the problem.Our approach can be extended to enforce different or additional constraints, such as the reciprocity structure in directed networks or the simultaneous specification of strengths and degrees in weighted networks.Notably, our results show that maximally random networks exhibit a diverse range of behaviors which is sensitive to the particular values of the constraints displayed by the real network, making a case-by-case comparison of the observed properties with the randomized ones necessary.This diversity of outcomes is in any case not captured by widely used but incorrect expressions for the expected properties.Unfortunately, important network properties such as the modularity completely rely on such expressions, a problem that may have therefore biased previous analyses of community structure in networks.We believe that our contribution represents a promising step towards the identification of relevant information in real networks.

Appendix A: GENERAL MAXIMUM-LIKELIHOOD METHOD
Here we describe our maximum-likelihood method in its general formulation.Our approach combines previous analytical results (obtained by one of us [11,22,23] and other authors [9,14,21]) about the properties of maximum-entropy graph ensembles with previous results (by one of us [30]) about the maximum-likelihood estimation of free parameters in such ensembles, and adds to them a new technique to obtain analytical expressions for the expectation value and standard deviation of any topological property of interest across the ensemble.After describing the method in general terms, we derive the explicit expressions that apply in the particular cases of local constraints (for undirected, directed and weighted networks).We then consider an extension to nonlocal constraints, and finally compare our analytical method with alternative computational techniques.

Maximum-entropy probability distribution
Our method aims at characterizing analytically the properties of families of randomized variants of a particular real network.In more rigorous terms, a family of randomized network variants is a statistical ensemble of graphs where a set of structural constraints has been specified, and the rest of the topology is completely random.Let us denote by G a generic network in the ensemble, and by G * the particular real-world network that we need to randomize.The ensemble will consist of all possible networks {G} of the same type of G * (binary/weighted, directed/undirected), and will include G * itself.For binary (either directed or undirected) networks, each graph G is completely specified by its adjacency matrix A, i.e.G ≡ A. Similarly, for weighed (either directed or undirected) networks, each graph G is completely specified by its weight matrix W, i.e.G ≡ W. We will keep our discussion completely general and use G to indicate a graph of any type (directed/undirected, binary/weighted).Thus G can always be thought of as a matrix with entries {g ij }, where g ij represents the (either binary or non-negative) weight of the edge (i, j).Any topological property X evaluates to X(G) when measured on the particular network G, i.e. it is an (arbitrarily complicated) function of the entries {g ij }.
Each graph G in the ensemble has an occurrence probability P (G) whose form is determined by the particular constrains enforced.This probability must always be such that where the sum runs over all graphs in the ensemble.The expectation value of a topological property X across the ensemble is the mean value (ensemble average) Let us denote the set of constraints {C a } by the vector C, where each C a is a topological property that, unlike any other generic property X, we need to tune to the particular value displayed by the real network G * .Enforcing the constraints exactly, i.e. allowing only the graphs G such that C(G) = C(G * ), results in a so-called microcanonical ensemble characterized by the uniform probability where N [ C(G * )] denotes the number of graphs in the ensemble for which the value of each constraint C a equals the value C a (G * ).Microcanonical graph ensembles are hard to deal with analytically, and they are most often sampled computationally by generating many randomized networks explicitly, using probabilistic rules that ensure that the constraints are matched exactly.Currently, such computational techniques are the only available method to randomize a real network.Unfortunately, the need to sample the ensemble explicitly and generating a large number of randomized graphs makes this approach computationally demanding, time consuming and beyond analytic control.
In order to develop a randomization method which is fast and analytically tractable, we exploit the results in ref. [14] and consider the alternative possibility to enforce the constraints on average, i.e. by only specifying their expectation values C .The resulting ensemble is a (grand)canonical one where each graph G is assigned a probability P (G) that maximizes the Shannon-Gibbs entropy subject to the constraints G P (G) = 1 and C = C. Maximizing the entropy subject to constraints is widely used in statistical mechanics, and in general for problems with incomplete information [47,48].The desired maximum-entropy graph probability can be found by introducing a set of Lagrange multipliers θ = {θ a } enforcing the constraints C = {C a }.The resulting conditional (on the value of θ) probability reads [14] P where H(G, θ) is the graph Hamiltonian defined as the linear combination and the normalizing quantity Z( θ) is the partition function, defined as The above results show that the graph probability P (G| θ) always depends on the value θ, which in turn depends on the constraints considered.As a consequence, we can rewrite eq.(A2) more explicitly as a function of θ: where • θ denotes that the ensemble average is evaluated at the particular parameter choice θ.The above expression clarifies that the expectation value of any topological property X depends on the specific enforced constraints through θ.Different choices of the constraints imply different values of θ, P (G| θ), and X θ .

Maximum-likelihood parameter estimation
As we mentioned, maximum-entropy graph ensembles generated by eq.(A5) have been used extensively to characterize mathematically networks with specified properties [9,11,14,21,22].However, previous studies did not focus on the randomization of a particular real network (which is our main interest here), but rather on the effects that the specification of certain structural properties has on other aspects of network topology.As a consequence, the Lagrange multipliers {θ a } have been considered as free parameters, generally drawn from carefully chosen probability densities [14,21,22] that allow analytical results, in terms of which the properties of the network model have been investigated.In most cases, the aim has been to explore the topological properties in the thermodynamic limit N → ∞, where N is the number of vertices of the network.This means that only generic statistical properties of real networks, such as a power-law degree distribution, were used to generate the ensemble.However, this implies that the specific properties of a particular real network (such as deviations of individual vertices from the fitted degree distribution, the intrinsic finiteness of the system, etc.) are ignored and, more importantly, that there is no correspondence between the vertices of the real network and those of the model.Thus this approach allows to inspect the properties of maximum-entropy graph ensembles, but does not allow the latter to be considered as null models of a particular real network.As a consequence, it cannot be used to detect empirical topological patterns consisting of statistically significant deviations from a null network model.
Here we make one step forward and construct, for a given choice of the constraints, the particular maximumentropy graph ensemble representing the family of correctly randomized counterparts of a given real network G * .Explicitly, we consider a grandcanonical ensemble of graphs with the same number N of vertices as the real network, and for a given choice of the constraints we fit the model defined by eq.(A5) to the empirical network G * .To this end, we exploit previous results [30] showing that maximum-entropy graph ensembles defined by eq.(A5) are a particular class of models for which the maximum-likelihood principle provides an excellent method of parameter estimation, since they are free from problems of bias afflicting other network models.In particular, it can be easily shown [30] that the log-likelihood to obtain the real network G * is maximized by the particular parameter choice θ * such that the ensemble average C a θ * of each constraint C a equals the empirical value C a (G * ) measured on the real network: where we have used • * as a shorthand notation to indicate the ensemble average • θ * evaluated at the particular value θ * .The above results means that the maximum likelihood principle indicates, for maximumentropy graph ensembles, precisely the parameter choice that ensures that the desired constraints are met.This is not true in general: in other network models, tuning the average values of the topological properties of interest to their empirical values requires a parameter choice which in general does not maximize the likelihood to obtain the real network [30], thus introducing a bias in the analysis.The idea to take the observed constraints C(A * ) as the input and find the 'hidden' values θ * that generate those constraints as the most probable ones was already proposed in ref. [30] with the purpose of checking whether θ * correlates with some external set of empirical non-topological quantities, thus unveiling possible mechanisms shaping the network topology.Here we make progress, noting that finding the values θ * represents a preliminary step in order to generate the randomized ensemble we are looking for, and have complete analytic control over it.This is completely independent of whether there are external empirical quantities correlating with θ * .
Note that in eqs.(A8) and (A10) the expectation values and the model parameters play inverted roles: while in eq.(A8) the expectation values are obtained as a function of the parameters θ which can be varied arbitrarily, in eq.(A10) the observed constraints, which are measured on the particular real network and are therefore given as an input, are used to fix the model parameters to the values θ * .Interestingly, this opposite line of research has been used quite extensively in traditional social network analysis (where maximum-entropy ensembles of networks are widely used under the names of p * , logit or exponential random graph models [49][50][51]) but has not yet been transferred to the randomization problem frequently occurring in complex networks theory.As we show below, the maximum-likelihood parameter choice is exactly what we need in order to obtain statistically correct expectations over ensembles of randomized variants of any particular real-world network.This allows to understand which properties of a real-world network can be simply traced back to the enforced constraints, and which require more complicated explanations.Another important difference with respect to the main approach followed in social network analysis is that our method allows to analyze weighted networks in exactly the same way as binary graphs, which are instead usually not studied within the p * framework.As a consequence, some of the analytical results we derive and use represent previously unavailable tools to study weighted networks (and maximum-entropy ensembles of them) through a straightforward analogy with binary networks.Finally, in all the applications we consider it is always possible to find the maximum-likelihood parameter values θ * exactly even for large networks, without resorting to the approximate techniques traditionally used in social network analysis [51].Therefore our approach extends in many directions the connection between exponential random graphs and maximum-entropy network ensembles, and strengthens considerably the existing relation between social science and network theory.

Expectation values of topological properties
Equation (A10) provides an implicit expression for the value θ * , and solving it is equivalent to maximizing eq.(A9).The numerical value of the solution θ * is the key ingredient we are looking for in order to detect topological patterns in the real network G * analytically, without performing any time-consuming computational randomization.Indeed, if we insert the value θ * into eq.(A8)we obtain which provides the exact expected value of any topological property X across the maximum-entropy graph ensemble where the expected values C of the topological properties C chosen as constraints are set equal to the empirical values C(G * ) measured on the real network G * , as ensured by eq.(A10).For simplicity, given a real network G * and a set of constraints C, we shall sometimes call X * the randomized value of the topological property X.The comparison of X * with the empirical value X(G * ) allows to assess whether, in the real network G * , the topological property X requires additional information besides that provided by the properties C. If X(G * ) is sufficiently close to X * (within a statistical error that we determine in A 4), one can conclude that the enforced constraints C fully explain the property X.By contrast, if X(G * ) is significantly different from X * , then the properties C do not explain the property X, which means that the structure of G * is determined by other factors besides those determining C.This allows to assess which topological properties can be traced back to (i.e.explained by) the chosen constraints in any real network, and which can not.Trivially, if X is one of the properties among the enforced constraints (i.e. if X = C a for some a), then eq.(A10) implies X(G * ) = X * by construction.
Note that any other parameter choice θ = θ * would not enforce the chosen constraints and would yield an expectation value X θ different from the desired one, i.e. not corresponding to the correct randomized value X * for that particular network and for that particular choice of the constraints.This clarifies why previous results [9,11,14,21,22] about the properties of maximum-entropy ensembles, that were obtained using θ as a free parameter unrelated to the empirical values C(G * ) and to the real network G * itself, cannot be used in order to solve the pattern detection problem considered here.Also note that C(G * ) is the sufficient statistics of our problem, which completely determines θ * through eq.(A10) and consequently any randomized property X * .The knowledge of the other topological properties of the real network G * is useless.This means that two real networks G * 1 and G * 2 with exactly the same values C(G * 1 ) = C(G * 2 ) of the constraints generate the same maximum-entropy ensemble, and give rise to the same value of θ * and X * , as should be.
Clearly, the possibility to solve eq.(A10) and to obtain the randomized properties through eq.(A11) both depend on whether one manages to rewrite the formal expression for X θ in eq.(A8) in a simplified form that avoids the unfeasible actual enumeration of all graphs {G} in the ensemble.In practical terms, this means that not all specifications of the constraints C allow to solve eq.(A10) and obtain θ * , and not all topological properties X allow to be averaged exactly through eq.(A11).However, as we describe in B, the first step can always be carried out successfully whenever one considers local constraints as the ones of interest for us.Similarly, as we now show in general and then restate more explicitly in each particular case, the expectation value X * of any higher-order topological property X of interest can be rewritten, either exactly or approximately, in a way that is only as complicated as measuring X(G * ) on a single network, rather than on all graphs {G} in the ensemble.This represents a major advantage of our method: the computation of an expectation value across the entire ensemble of graphs is only as time-consuming as the computation of the corresponding observed value on the empirical network G * .Thus, if the observed value can be computed in reasonable time, the same is true for the expectation value.To see this, we write down an approximated expression for X * as a Taylor expansion.Note that any property X(G) depends in general on all the entries {g ij } of the matrix G, which are the fundamental degrees of freedom of the problem.The ensemble average of g ij reads If we define the gradient matrix of any topological property X(G) as and if we denote by G the matrix whose entries G ij are the expectation values g ij , it is possible to expand X around G and write the multidimensional firstorder Taylor expansion In the above expression, (•) G= G means that we are evaluating the quantity in brackets by replacing each g ij with g ij , and denotes the scalar product of two matrices A and B, and the double sum runs over all N (N − 1) ordered pairs of vertices (with i = j).Note that for an undirected network, where g ij = g ji by construction, half of the terms in the sum in eq.(A14) will be equal to zero, since one has either ∂X/∂g ji = 0 or ∂X/∂g ij = 0, depending on whether g ij or g ji appears in the formal definition of X.With the above approximation, the expectation value of X reads since the first-order terms vanish.The above formula shows that, if one evaluates X by simply replacing G with G into X(G) (linear approximation), the difference with respect to the exact expectation value is only in the second-and higher-order terms.This is true for any value of θ, on which all expectation values depend.As already explained, our method consists in choosing the particular value θ * solving eq.(A10), which yields an expectation value Among all possible parameter values θ, the choice of θ * ensures that the deviation of the approximate value X( G ) from the exact one X(G) in eq.( A14) is minimal, since G * is as close as possible to G, if the chosen constraints C are chosen as a reference to measure the difference between G and G.In particular, when X coincides with one of the enforced constraints C a , eq.( A17) becomes an exact expression, as we mentioned.Moreover, as we show later in B, most topological properties of interest in our analysis are either multilinear functions of statistically independent matrix elements {g ij } or ratios of multilinear functions.In the former case, the expectation value X * is exactly X( G * ).In the latter case, the numerator and denominator will be separately evaluated exactly, and the approximation will only affect the ratio.In general, ratios of averages can be very different from averages of ratios.However we confirmed (see figs.2 and 3) that our estimates for the ratios are in very good accordance with what is obtained in the microcanonical case using the LRA, where averages of ratios are evaluated exactly.Moreover, recall that we are interested in determining an interval of statistically significant values around X * , rather than X * alone.Our results (figs.2 and 3) also show that the difference between the microcanonical and (approximate) grandcanonical value of X * is typically much smaller than the standard deviation of X (that we obtain below), so using eq.(A17) is in any case a very good way to proceed.The above discussion clarifies that a good approximation to the randomized value X * of any topological property of interest is given by simply replacing each g ij with g ij * in the definition of the property X(G), in the same way as the empirical value X(G * ) is obtained by replacing each g ij with the observed entry g * ij of G * in the definition of X(G).This means that the empirical value X(G * ) (if the full adjacency matrix is used, see main text) and the approximate randomized value X( G * ) require exactly the same computational time, which makes our method faster than any other available alternative approach (and in general as fast as possible).Clearly, in order to evaluate eq.(A17) the complete knowledge of the values is required.While for generic choices of C it may be impossible to obtain the formal expression for g ij θ and/or the particular parameter value θ * , in B we show that local constraints always allow to obtain g ij * exactly.This makes the problem analytically solvable, and implies that our method becomes very simple in all the applications of interest.

Variances of topological properties
As we mentioned, another important advantage of our method is the possibility to obtain, besides the expectation value, the analytical expression for the standard deviation of any topological property of interest.This provides a statistical error allowing to detect significant deviations of any empirically observed topological quantity X(G * ) from its randomized value X * .To this end, we employ the fundamental expression relating the variance of a function of many random variables to the variances of the latter, whose most popular consequence is the general formula for the propagation of errors in experimental measurements.In our notation, the variance of a topological property X across the ensemble is defined as (which depends on θ).Using the linear approximation in eq.(A14) we can write where is the covariance of g ij and g ts , and For the 'diagonal' terms given by i = t and j = s, the covariance σ[g ij , g ts ] equals the variance (again, both σ[g ij , g ts ] and σ 2 [g ij ] depend on θ).In a different context where X is a function of many experimental quantities {g ij }, eq.(A21) provides the general formula for the propagation of errors (from {g ij } to X), if the measured value of g ij is used as the best estimate for g ij , and if its experimental error is used in place of σ[g ij ].Here, we do not need approximate estimates for g ij and σ[g ij ], since both quantities can be completely specified: even if there is always a single observation, i.e. the real network G * , the latter generates the entire ensemble of graphs which is described by the probability P (G| θ * ), as we discussed in detail in A 2.
As for the expectation value X * , our approach proceeds by evaluating the standard deviation σ[X] * at the particular parameter value θ * solving eq.(A10): Note that, as for the expected values, the above standard deviation makes use of the linear approximation and is therefore not exact.However, when we measured also the microcanonical standard deviations, we found an excellent agreement with our grandcanonical ones (see fig. 3a-b), showing that the errors on the estimates of our standard deviations are small.Equations (A24) and (A25) show that the values are the fundamental quantities, besides the averages g ij * given by eq.(A18), required in order to obtain the standard deviation σ * [X] of any topological property X.For generic choices of the constraints C, obtaining the value of g ij g ts * can be very complicated or even impossible, as we already discussed for g ij * .However, as we will show, local constraints always allow to evaluate analytically all the covariances, and hence the standard deviation σ * [X] of any property X.
Equation (A24) is the key expression providing the statistical error associated with X * .For any topological quantity X, our method allows to assess whether the empirical value X(G * ) is consistent with the randomized value X * within z standard deviations (where z is a conveniently chosen threshold value), i.e. whether As long as the above inequality holds, it is legitimate to say that the particular property X evidences no significant deviation of the real network G * from a null model where the constraints C are specified.This means that the observed value X(G * ) requires no explanation besides those accounting for the observed values C(G * ) of the constraints.By contrast, if the above inequality is violated, then one has a signature that the observed network G * is not completely a result of the specification of the constraints C. Additional mechanisms, besides those determining the values of the constraints, are at work.In other words, higher-order patterns which cannot be traced to low-level constraints are present, and our method is able to detect them.In practice, in order to discriminate between the two possibilities, it is useful to compute the two values which delimit the region within which an observed value X(G * ) would imply the acceptance of the null model from the one where an observed value X(G * ) would imply the rejection of the null model.As an alternative, rather than fixing a threshold value for z, one can directly compute the number of standard deviations by which the expected and the empirical value of X differ, i.e. the zscore ) is substantially larger (smaller) than expected, while small values signal no significant deviation from the null model (note however, as mentioned in the main text, that z-scores are easily interpretable only for normally distributed properties).This concludes the description of our method in its general form.In what follows, we consider the particular case of interest for the present analysis, i.e. when the constraints C are (either binary or weighted) local topological properties, or when they are nonlocal but simple enough to preserve the analytical character of the method.

Appendix B: LOCAL CONSTRAINTS
The most important case is when the constraints C are local (or first-order) topological properties, i.e. properties determined by moving only one step away from a vertex, thus reaching only its first neighbours.In binary undirected networks the fundamental local property is the degree k i = j =i a ij , while in weighted undirected networks the corresponding quantity is the strength s i = j =i w ij .In directed networks, a pair of inward and outward variants of the same quantities (i.e. the in-degree k in i and out-degree k out i , or the instrength s in i and out-strength s out i ) characterizes the local properties of each vertex.Choosing local constraints is the natural option when one is interested in understanding the effects that the specification of low-order properties, involving only direct interactions, has on higherorder properties involving longer chains of interactions.In what follows, we therefore discuss our method in detail in the particular case of local constraints.We will consider both binary and weighted networks, and both undirected and directed links.Importantly, we will show that in all these cases the graph probability P (G| θ) factorizes as where the product runs over all unordered pairs of vertices (i, j) (with i < j) and D ij (g, g | θ) is the dyadic probability that the pair (g ij , g ji ) takes the particular value (g, g ), i.e. the joint probability that g ij = g and simultaneously g ji = g .Clearly, where g and g run over all the allowed values for g ij and g ji (g = 0, 1 for binary networks, while g = 0, 1 • • • + ∞ for weighted networks; the same for g ).The marginal probability that g ij takes the particular value g, independently of the value of g ji , is and, consistently with eq.(B3), is normalized such that Note that for undirected networks, where g ij = g ji by construction, we have where δ g,g = 1 if g = g and δ g,g = 0 if g = g .
The factorization of P (G| θ) according to eq.(B1) implies that eq.(A12) can be rewritten as which can always be obtained analytically.Using the latter, eq.(A10) can be simply rewritten exactly as which allows the maximum-likelihood parameter values θ * appearing in G * to be easily calculated numerically.
Alternatively (e.g.depending on the software used) one can calculate θ * by directly maximizing the log-likelihood defined in eq.(A9), which in this case takes the simpler form (we always adopted the maximization of the loglikelihood).In both cases, even for very large networks this preliminary parameter estimation takes a negligible time with respect to the calculation of any nontrivial topological property.This implies that eq.(B7) can always be evaluated exactly at the particular parameter choice θ * , providing the correct value g ij * in terms of which the ensemble average X * of any topological property X can be obtained analytically through eq.(A17).Thus, as we discussed, the time required to obtain X * (which formally is an average over all possible graphs in the ensemble) is just the same as that required in order to measure X(G * ) on the real network G * .This makes our method incredibly faster than other randomization procedures that require the actual computational generation of many randomized variants (necessarily sampling only a part of the ensemble) of the real network, on each of which X must be computed explicitly before performing a final average approximating X * .
The standard deviation σ * [X] of any property X can be evaluated very easily as well.Equation eq.(B1) implies that if (i, j) and (t, s) are two distinct pairs of vertices then By contrast, if i = t and j = s then Finally, if i = s and j = t we have Again, all the above quantities can be obtained analytically and evaluated exactly at the particular value θ * solving eq.(B8).As a consequence, if eqs.(B11), (B13) and (B15) are inserted into eq.(A21),we find that the expression for the variance σ 2 [X] of any topological property X reduces from eq.(A24) to the simpler formula involving only a single sum over pairs of vertices.In the above expression, we have kept our convention to let the sum run always over all possible ordered pairs of vertices, thus considering the pairs (i, j) and (j, i) as distinct terms in the summation.For ensembles of directed networks, g ij and g ji are different random variables which may or may not be dependent on each other (depending on the enforced constraints, as we show in detail below).Equation (B16) takes care of both possibilities by including the covariance σ * [g ij , g ji ].For ensembles of undirected networks, g ij and g ji are actually the same random variable and are thus perfectly correlated, which Again, eq.(B16) takes care of this by compensating the summation over a doubled number of terms with the presence of the covariances which exactly restore the correct expression.In such a way, one does not have to care whether the network is undirected when using eq.(B16), which therefore applies without modifications to all the cases we will consider below.Different cases only differ by the specific expression of σ * [g ij , g ji ].This is very convenient when implementing the formula computationally.Another desirable consequence of formally treating g ij and g ji as different variables even in undirected networks is that in eq.(B16) the derivative ∂X/∂g ts of any function X(G) of (a subset of) the entries {g ij } can always be computed by repeatedly applying the elementary rule (where δ ij = 1 if i = j and δ ij = 0 if i = j) for both directed and undirected graphs.
Summarizing the results discussed so far, we showed that for local constraints our method allows g ij * , g 2 ij * and g ij g ji * to be computed exactly, and to use them in order to obtain the expected randomized value X * and standard deviation σ * [X] of any topological property X through eqs.(A17) and (B16) respectively.Unlike alternative computational methods, our approach is completely analytical and allows to evaluate the randomized value X * in just the same time as that required to measure X on the original real network G * , plus a negligible preliminary time required to find the parameter values θ * numerically through eq.(B8).The simple steps through which our method proceeds in the case of local constraints can be summarized as follows: 1. choose the desired representation for the real network G * (directed/undirected, binary/weighted) and the corresponding grandcanonical ensemble of graphs {G}; 7. assess whether the empirical value X(G * ) is consistent with the randomized one X * using either the interval in eq.(A28) or the z-score in eq.(A29).
For completeness, in the above list we have included all the logical steps involving also the initial derivation of the required analytical expressions.However, since those expressions have already been derived in the literature for all the constraints we will consider in what follows, in practice our method reduces to a straightforward application of the last three steps.For clarity, in what follows we illustrate the method explicitly for a range of useful specific cases, i.e. for various choices of the constraints C and of the topological properties X.We will also highlight in more detail the advantages with respect to alternative methods.

Undirected configuration model
For unweighted undirected networks, each graph G in the ensemble is uniquely specified by its binary symmetric adjacency matrix A with entries a ij = a ji = 1 if vertices i and j are connected, and a ij = a ji = 0 otherwise.Generally, one considers loop-less graphs with a ii = 0 unless otherwise specified.This fixes the first step of our method according to the list shown above.Thus we can replace G → A and g ij → a ij in our general notation used so far.
Given a real binary undirected network A * with entries {a * ij } and degree sequence k(A * ), our method allows to compare the properties of A * with those displayed by a randomized ensemble of binary undirected graphs having, on average, the same degree sequence as A * .As we mentioned in sec.II, the available methods have severe limitations.In particular, as noted in refs.[9] and [30], the incorrectness of eq.( 1) is a consequence of the fact that it is not a proper maximum-entropy probability over the ensemble of binary graphs, i.e. it cannot be traced back to a Hamiltonian model as the ones described in A 1. By contrast, our method provides the correct solution.The appropriate choice is to include the constraint C = k into eq.(A6) and obtain the corresponding correct probability [14].This is precisely what the steps 2-4 of our method prescribe.For the sake of completeness, we briefly sketch the main results.If C(A) = k(A), the Hamiltonian reads The partition function can be calculated exactly [14] as Therefore the graph probability can be written in the factorized form (B1) as follows where is the mass probability function of a Bernoulli-distributed binary random variable a ij , with success probability representing the probability that a link between i and j is present.Introducing the new variable x i ≡ e −θi , not to be confused with the symbol X used so far, and changing notation from θ to x, the expectation value of a ij is simply given by Also, since a 2 ij = a ij , the second moment is Finally, if (i, j) and (t, s) are two distinct pairs of vertices, then a ij and a ts are independent random variables and This completes the fourth step in our method.The fifth step consists in finding the particular parameter values x * that maximize eq.(B9), that in this case reads Equivalently [30], the parameters x * can be found solving the following N coupled equations enforcing the desired constraints as in eq.(B8): Importantly, since x i ≡ e −θi and θ i is a real number, the solution we are looking for is the one where x * i > 0 ∀i.This solution is unique.Even for large networks, the above parameter estimation ranges from seconds to tens of seconds even on an ordinary laptop.
Once the parameters x * are found, we can proceed to the sixth step and exploit eq.(A17) to obtain the expectation values of the properties X of interest: In particular, the expectation value of the ANND defined in eq.( 3) is and the expectation value of the clustering coefficient defined in eq.( 4) is where Similarly, the standard deviation σ * [X] can be evaluated using eq.(B16), which here reads It is straightforward to obtain σ * [X] in terms of x * alone, by using the derivation rule (B17): This can also be implemented symbolically in adequate softwares.Let us calculate explicitly the standard deviations of the constraints: which in turn imply that (which also represents an upper-bound for the ratio).The more this condition is violated (the vertex i has an high degree, there are hubs in the network, etc.), the more important becomes the correction, lowering the ratio to eventually reach zero.

Directed configuration model
Binary directed networks have an asymmetric adjacency matrix A with entries a ij = 1 if a directed link from i to j is there, and a ij = 0 otherwise.Given a real binary directed network A * with out-degree sequence k out (A * ) and in-degree sequence k in (A * ), our method provides analytical expressions for the expectation values and standard deviations of topological properties across the maximum-entropy ensemble of binary directed graphs with out-degree sequence k out (A * ) and in-degree sequence k in (A * ).The Hamiltonian is now The partition function can be calculated exactly [14] as The graph probability is now and Setting x i ≡ e −αi and y i ≡ e −βi , the expectation value of a ij is The second moment is Finally, if (i, j) and (t, s) are two distinct pairs of vertices, now including the case (t, s) = (j, i), then a ij a ts x, y = a ij x, y a ts x, y (B42) The log-likelihood (B9) to maximize is and the values x * , y * that realize the maximum can alternatively be found by solving the 2N coupled equations corresponding to eq.(B8).Again, we are looking for the solution where x * i > 0 and y * i > 0 ∀i.Expectation values can still be obtained using eq.(B28).In particular, the directed ANNDs defined in eqs.( 9) and ( 10) have expectation values where a ij * = x * i y * j /(1 + x * i y * j ).Similarly, the standard deviation σ * [X] can still be evaluated through eqs.(B31) and (B32), now using σ * [a ij ] = x * i y * j /(1 + x * i y * j ).Let us calculate explicitly the standard deviations of the constraints: Given the vertex i, if p * ij 1, j = 1 . . .N and j = i, the trend decreases as (k out i ) −1/2 (which also represents an upper-bound for the ratio).The more this condition is violated (the vertex i has an high out-degree, there are in-degree hubs in the network, etc.), the more important becomes the correction, lowering the ratio to eventually reach zero.Similar observations hold for the in-degrees.

Weighted configuration model
When weighted undirected networks are considered, each graph G in the ensemble is specified by its nonnegative symmetric matrix W whose integer entry w ij represents the weight of the link between vertices i and j (including w ij = 0 if no link is there).Thus we can replace G → W and g ij → w ij in the general notation.As we mentioned in the main text, in the weighted configuration model a real weighted undirected network W * with entries {w * ij } is compared with a maximumentropy ensemble of weighted undirected graphs having the same strength sequence s(W * ).In our method, by setting C = s into eq.(A6)we obtain the Hamiltonian The partition function is [22] Z and is only defined if θ i > 0 ∀i.The graph probability is [22] P where is the mass probability function of a geometricallydistributed [23] integer random variable w ij , with success probability representing the probability that a link between i and j is present.Introducing x i ≡ e −θi ∈ [0, 1), the expectation value of w ij is Now in general w 2 ij = w ij , and the second moment is Finally, if (i, j) and (t, s) are two distinct pairs of vertices, then The log-likelihood (B9) reads and the parameters x * maximizing it solve the following N coupled equations enforcing the desired constraints as in eq.(B8).Now the solution must be looked for in the region 0 ≤ x i < 1 ∀i.
Through the parameters x * we obtain the expectation values of the properties X of interest: For instance, the expectation value of the weighted ANND defined in eq.( 19) is where we have used W * = W * (see main text).Similarly, the weighted clustering coefficient defined in eq.( 20) has expectation value where The rule (B17) here reads and allows to obtain σ * [X] in terms of x * alone.Let us calculate explicitly the standard deviations of the constraints: Given the vertex i, if w ij * 1, j = 1 . . .N and j = i, the trend decreases as s −1/2 i .The more this condition is violated (the vertex i has an high strength, there are 'strength-hubs' in the network, etc.), the more important becomes the correction.Note that for weighted networks the second term has a positive sign.This means that the correction 'increases' the s −1/2 i trend which now represents a lower-bound for the coefficient of variation.

Appendix C: NONLOCAL CONSTRAINTS
Our model can also be applied to more complicated cases where the constraints are no longer local.However, a necessary condition for our method to work with nonlocal constraints is that eq.(A8) can still be expressed exactly in a form which does not require the enumeration of all possible graphs (in other words, the partition function can be calculated analytically).In such a case, eq.(A10) can still be used to calculate the parameters θ * exactly as in the local case, and at the same time those parameters can be used to obtain the analytical expressions for the expected value and standard deviation of the topological properties of interest.Therefore, only a limited number of nonlocal constraints lend themselves to an analytical treatment.However, since the philosophy of randomization algorithms is always to enforce the simplest constraints in order to detect higher-order patterns, it turns out that the mathematically tractable constraints are also the ones of major interest.We now provide an explicit example of a choice of nonlocal constraints that is often used in empirical studies, and at the same time preserves the analytical character of our method and yields exact results.

Reciprocal configuration model
As discussed in the main text, a more constrained null model for a binary directed network A * is one where the three reciprocal degree sequences k → (A * ), k ← (A * ) and k ↔ (A * ) are specified, where The Hamiltonian for this model is The nonlocality is manifest in the fact that, unlike the previous examples, now the (second-order) constraints involve products of two adjacency matrix entries.Despite this complication, the partition function can still be calculated exactly [11] as The graph probability can still be expressed in the form (B1), i.e.
In the above expression, is the dyadic probability defined in terms of where we have set x i ≡ e −αi , y i ≡ e −βi and z i ≡ e −γi [22].The above expressions represent the dyadic expectation values.
A little algebra leads to the log-likelihood and the values x * , y * , z * that realize the maximum can alternatively [30] found by solving the 3N coupled equations corresponding to an example when eq.(A10) can be written explicitly even if the constraints are nonlocal.We are looking for the solution where x * i > 0, y * i > 0 and z * i > 0 ∀i.
The expectation values of topological properties involving products of dyadic terms can be obtained exactly without resorting to the linear approximation in eq.(A17).For instance, the number of occurrences of a particular motif m, where m labels one of the possible 13 non-isomorphic connected motifs with three vertices, is where now which are both known exactly in terms of eqs.(C10)-(C13).The calculations for the standard deviations of the constraints are similar to the directed configuration model case: which in turn imply that (where a =→, ←, ↔) and similar observations hold.

Appendix D: COMPARISON WITH COMPUTATIONAL MICROCANONICAL ALGORITHMS
The LRA-based microcanonical approach [4,5] and our likelihood-based grandcanonical approach are in general not equivalent for finite networks.Let D( C) be the set of all graphs G that realize the enforced constraints C = {C α } exactly.Both approaches assign equal probabilities to all graphs that realize the constraints, i.e.P (G 1 ) = P (G 2 ) if G 1 ∈ D( C) and G 2 ∈ D( C).Also, in both approaches these graphs are the most likely to occur, i.e.P (G 1 ) > P (G 2 ) for any G 1 ∈ D( C) and G 2 / ∈ D( C).However the two approaches are different, the microcanonical one being very severe in assigning FIG. 8. Difference between the LRA-based microcanonical approach and our likelihood-based grandcanonical approach.Top: the microcanonical approach assigns non-zero probability only to the subset D( C) of graphs that realize the enforced constraints C (in the example shown, a given value of the number of links L) exactly.Bottom: by contrast, our grandcanonical approach assigns non-zero probability to all graphs, but this probability reaches its maximum value for the graphs belonging to D( C).In so doing, it is more robust to potential errors in the original network data (such as missing links).
zero probability to any graph where the degrees are not matched exactly, i.e.P (G) = 0 if G / ∈ D( C).By contrast, in the grandcanonical approach all possible graphs can occur, even if with very different probabilities, in such a way that the ensemble average of the desired constraints over all graphs coincides with the observed values (see fig. 8 for an illustration of this difference).
The above key and elegant property places grandcanonical ensembles at the basis of information theory.Notably, they are more robust to errors in the original data such as missing or overrepresented links.In presence of even a small percentage of such errors, the 'true' graph (the unobserved one affected by errors) will never appear in the microcanonical ensemble, while it will appear with nonzero probability in the grandcanonical ensemble.As desirable, for small deviations from the observed graph the true graph will have a slightly decreased probability with respect to the one assigned by our method to the observed graph, while for larger errors the probabilities will differ by a larger amount.
Therefore, while for infinite systems the microcanonical and grandcanonical ensembles become equivalent since fluctuations about the average values become negligible, in finite systems the use of grandcanonical ensembles is preferable.What is of interest for us here is the impact of the two methods on the topological properties induced on the randomized networks.To this end, we now show explicitly the relation between the two ap-proaches when applied to particular networks.We shall only consider unweighted networks for simplicity.
In the unweighted (either directed or undirected) case, our method directly provides 'from the beginning' the explicit values of the probabilities p G ij that a link from i to j is there.The superscript G stands for 'grandcanonical', and the probability is evaluated at the parameter values that maximize the likelihood, as described above.By contrast, the microcanonical approach samples the configuration space iteratively, and the microcanonical probability p M ij that a link from i to j is there can only be evaluated as the frequency of occurrence of the link over many randomizations.As the number of randomized networks increases, this frequency will converge to p M ij .However this asymptotic value will also depend on the number R of elementary rewiring steps used to obtain a single randomized network.To see this, consider the trivial case R = 0.As no rewiring takes place, all the 'randomized' networks will in fact coincide with the original network.If the adjacency matrix of the latter has elements {a ij }, this means that p M ij = a ij .If R is nonzero but still very small, p M ij will not change substantially.Only if R is large enough then p M ij will approach p G ij .This is shown explicitly in fig.9, where we plot p G ij as a function of p M ij for all directed pairs of vertices (i, j) by taking the Little Rock Lake food web as the starting network.As R increases from R = 0 to R = 10000, the double-peaked shape (corresponding to p M ij = a ij independently of p G ij ) evolves towards the identity p M ij = p G ij .Similar evolution patterns are observed for all the networks we analyzed.This clearly shows that in our method we obtain 'from the beginning' the values p G ij to which the microcanonical p M ij will converge only after several iterations.Notably, the number R of rewiring steps required for p M ij to converge to p G ij acceptably is not known a priori and without the knowledge of p G ij itself.This problematic aspect of the microcanonical approach highlights another advantage of the grandcanonical one.
The two approaches are in general not equivalent for finite networks.We can now state this more rigorously, and indicate at least two ways in which they may differ.
First of all, p M ij represent marginal probabilities, where the information about the correlations between the presence of a link between different pairs of vertices has been lost.While in the grandcanonical approach these correlations are absent, and different pairs of vertices are always statistically independent, in the microcanonical approach some weak correlations will be preserved even after many rewiring steps.These correlations arise from the microcanonical constraint of matching the degree sequence (or other contraints) exactly.Thus, while our grandcanonical method enables to compute the expected topological properties exactly, in the microcanonical approach this is not possible.
Secondly, the final 'convergence' of p M ij to p G ij for R → ∞ will in general not hold exactly.This means that the asymptotic plot of p G ij versus p M ij will not be a strict identity, but a narrow scatter of points close to the identity.In other words, increasing R beyond a certain value will not make the quantities converge further.For some networks (such as the Little Rock Lake food web shown above) one may attain a better convergence than for others.
It is interesting to understand whether the degree of convergence between the two approaches depends on some property of the network.To this end, we first define two measures of discrepancy between {p G ij } and {p M ij }, and then study how they behave on well-controlled, artificially generated networks.As measures of discrepancy, we consider the l 2 distance and the Kullback-Leibler information distance (note that we have normalized the above distances in such a way that both lie in the range [0, 1]).It is instructive to use these distances to compare the two methods on a family of artificially generated networks.We considered N = 100 vertices, assigned each vertex a real value x i drawn randomly in the interval [0, 1], and established an edge between each pair of vertices i and j with probability p ij = zx i x j /(1 + zx i x j ).
This choice generates maximally random networks with degree distribution controlled by {x i } as in eq.( B23), but has an additional parameter z that tunes the overall link density d ≡ 2L/N (N − 1), representing the fraction of realized links.With {x i } kept constant, we considered various choices of z and, for each of them, adopted both the microcanonical randomization and our grandcanonical method.
In fig. 10 we show the resulting difference between the marginal probabilities {p G ij } and {p M ij }, as a function of link density.The two methods yield very similar results for both small and large link density, whereas for intermediate density values they display a greater difference.Even in this case, however, the distances between them are ∆ l 2 ≈ 0.05 and ∆ KL ≈ 0.12, both small considering their possible range of variation.

FIG. 1 .
FIG. 1.An illustration of the local rewiring algorithm whose iteration allows to computationally explore the microcanonical configuration model.
FIG.2.Application of our method to binary undirected networks.The red points are the empirical data, the black solid curves are averages over the configuration model obtained using the local rewiring algorithm[4,5], and the blue dashed curves are the analytical expectations (± one standard deviation) obtained using our method.The green curves are the flat expectations under the Erdős-Rényi random graph model, and highlight the average level of correlation in the random case.The panels report k nn

FIG. 3 .
FIG.3.Application of our method to directed networks.Red points are the empirical data, the black solid curves are expectations under the directed configuration model using the local rewiring algorithm, and the blue dashed curves are the exact expectations obtained using our method (± one standard deviation).The green curves are the flat expectations under the directed version of the Erdős-Rényi random graph model.The panels report k nn,in i

FIG. 4 .
FIG.4.Application of our method to small-size directed food webs.Red points are the empirical data and the blue dashed curves are the exact expectations (± one standard deviation) under the directed configuration model obtained using our method.The green curves are the flat expectations under the directed version of the Erdős-Rényi random graph model.The panels report k nn,in i

FIG. 5 .
FIG.5.Legend: • -Chesapeake Bay, -Little Rock Lake, -Maspalomas Lagoon, -Florida Bay, * -St Marks Seagrass, -Everglades Marshes, • -Grassland, -Ythan Estuary.Application of our method to the analysis of directed motifs involving three vertices in 8 real food webs.Top panel: zscores obtained enforcing only the in-degree and out-degree sequences (directed configuration model).Bottom panel: zscores obtained enforcing also the reciprocal degree sequence (reciprocal configuration model).
FIG.6.Application of our method to weighted undirected networks.Red points are the empirical data and the blue dashed curves are the exact expectations obtained using our method (± one standard deviation).Green dashed curves are the flat expectations under the weighted random graph model, WRG[23].The panels report knn

FIG. 7 .
FIG. 7. The panels report a) the ratios σ * [ki]/ki plotted versus the degrees ki for the binary undirected networks of fig.2, b) the ratios σ * [k out i ]/k out i and σ * [k in i ]/k in i plotted versus the degrees k out i and k in i , respectively, for the binary directed networks of fig.3 and fig.4 and c) the ratios σ * [si]/si plotted versus the strengths si for the weighted undirected networks of fig.6.The food webs are indicated by means of symbols.The black dashed line is the function f (x) = x −1/2 which is expected to well reproduce the coefficients of variation for small values of the constraints.

)
Given the vertex i, if p * ij 1, j = 1 . . .N and j = i, the trend decreases as k −1/2 i

a m, 1 ij
a m,1 ij a m,2 jk a m,3 ki (C14)where a m,l ij is one of the four possible dyadic relations a → ij , a ← ij , a ↔ ij , a ij , and {a m,1 ij , a m,2 jk , a m,3 ki } indicates the specific triplet of dyadic relations defining motif m.The exact expectation value of N m is * is given by evaluating eqs.(C10)-(C13) at the particular values x * , y * , z * .The standard deviation of N m , and in general of a topological property X, can still be obtained using eq.(B16), i.e.(σ * [X]) 2 = i,j σ * [a ij ] σ * [a ij , a ji ] ∂X ∂a ij ∂X ∂a ji A= A * + . . .