Geometric randomization of real networks with prescribed degree sequence

We introduce a model for the randomization of complex networks with geometric structure. The geometric randomization (GR) model assumes a homogeneous distribution of the nodes in an underlying similarity space and uses rewirings of the links to find configurations that maximize a connection probability akin to that of the $\mathbb{S}^1$ or $\mathbb{H}^2$ geometric network models. However, GR preserves the original degree sequence, as in the configuration model, thus eliminating the fluctuations of the degree cutoff. Moreover, the model does not require the explicit estimation of hidden degree variables, which restricts the number of free parameters to one, controlling the level of clustering in the rewired network. We illustrate the potential of GR as a null model by investigating the effects on modularity that derive from the flattening of geometric communities in both real and synthetic networks. As a result, we find that for real networks the geometric and topological communities are consistent, while for the randomized counterparts, the topological communities detected are attributable to structural constraints induced by the underlying geometric architecture.


INTRODUCTION
Null models play a central role in network science and statistics to discern regularities and patterns in the fabric of systems that are not attributable to specific constrains. Typically, null models of complex networks are fit with one or several particular structural properties, depending on the question at hand, to predict the organization of a network as the outcome of a random process where other features are allowed to vary. Hence, null models are said to produce maximally random ensembles given some specific features [1]. Many successful applications of null models in complex networks include the detection of rich-club ordering [2,3], the characterization of structural correlations in weighted networks [4], or the quantification of communities using modularity [5].
Intriguingly, the frontier separating models and null models is not so neat, specially when the models remain simple and the null models fix more than one property. In fact, some famed network models, originally born to explain some peculiarity of the structure of networks on the basis of first principles, are often used as null models, for instance, the growing Barabási-Albert model [6] that explains the generation of scale-free degree distributions implementing a preferential attachment mechanism. Recently, a class of network models in hidden metric spaces [7,8] has been shown to explain many pivotal features of real networks simultaneously -like the small world property, heterogeneous degree distributions, high levels of clustering, and self-similarity-based only on three parameters, controlling the average degree, the exponent of the power-law degree distribution and the clustering coefficient.
The key ingredient of the geometric network models is the fact that the probability to connect two nodes of the network is determined by their effective distance, as measured in a hidden metric space in which nodes are embedded. The underlying space is defined along two dimensions representing popularity and similarity features of the nodes, such that more popular and similar nodes have more chance to interact. In the S 1 model [7], the hidden degree of a node is a proxy for its popularity, and its angular position in the one-dimensional sphere (or circle) provides the similarity measure. The two coordinates contribute explicitly to the connection probability between two nodes, which increases with the product of their hidden degrees and decreases with their angular distance along the circle. The hidden degree can be estimated by the observed degree and reinterpreted as a radial coordinate in a hyperbolic plane [9], which leads to the formulation of an isomorphic version of the model which is purely geometric. In the H 2 model, popularity takes the form of a radial coordinate in the hyperbolic disk, such that higher degree nodes are placed closer to the center, while the angular coordinate remains as in the S 1 similarity space, and the probability of connection decreases with the hyperbolic distance.
In both S 1 and H 2 models, the angular coordinate of nodes, representing the similarity dimension, is extracted from a homogeneous distribution, at odds with hyperbolic maps of real networks [10]. In fact, geometric communities of nodes lying nearby in the similarity space (referred as soft communities or latent communities) are typically detected in real networks [10][11][12] and can be modeled [13,14]. This observation opens the door to the use of geometric models with homogeneous similarity distribution as null models for the investigation of the community organization and other structural properties of geometric networks.
In this paper, we introduce a variant of the popularitysimilarity geometric model, that we named geometric randomization (GR) model, and illustrate its use as a null model for the analysis of the topological properties of real networks, including community structure. The GR model assumes the same form of the connection probability as in the S 1 or H 2 models, and a homogeneous distribution for the similarity coordinate as well. In contrast, it is fit with a given degree-sequence, like the configuration model [15]. The use of prescribed degrees allows to skip the step of estimating the hidden degrees from real data. It could also help, for instance, in the analysis of features which are specially sensitive to fluctuations of the degree cutoff, like the behavior of dynamical processes such as epidemic spreading or synchronization, or for high-fidelity reproduction of real network topologies. Based on the premises mentioned above, we propose an algorithm that homogenizes the similarity distribution and rewires the links in a network preserving the given degrees to maximize the likelihood that the new topology is generated by the geometric model. We analyze the effects of the GR model on the topological properties of real and synthetic geometric networks, and use it as a null model to explore the effects on modularity of the flattening of geometric communities in the similarity space.

THE GEOMETRIC RANDOMIZATION MODEL
The GR model operates on networks where nodes have an observed degree and exist in a similarity space. The similarity space is taken to be a circle, as in the S 1 or H 2 models. In those models every node i is characterized by a popularity-similarity pair (κ i , θ i ), where κ i is the node's hidden degree (expected to be proportional to the observed degree k i ) and θ i its angular or similarity coordinate. In the GR model, instead, only angular coordinates are assigned to the nodes, chosen uniformly at random from [0, 2π]. The network is then rewired in order to maximize the likelihood that the new topology is generated by the S 1 model while preserving the observed degrees, and thus the total number of edges E. The rewiring procedure is conducted by executing the Metropolis-Hastings algorithm, aimed at finding the network connectivity (i.e. the adjacency matrix a ij ) that maximizes the likelihood function where ∆θ ij stands for the angular distance between nodes i and j, and the S 1 connection probability p(κ i , κ j , ∆θ ij ) reads p(κ i , κ j , ∆θ ij ) = 1 (2) Parameter µ depends on the observed average degree k of the network, and R is the radius of the circle (adjusted to have a density of nodes equal to 1, see Appendix A) .
The algorithm proceeds by repeating the following steps: • Compute the current likelihood L c • Two links, between nodes i and j, and between nodes l and m, are randomly chosen and swapped: the new links are connecting nodes i and m, and nodes l and j.
• Compute the new likelihood L n • If L n > L c accept the link swap • Otherwise, if L n < L c accept the link swap with probability L n /L c The rewiring algorithm is terminated after a number E 2 of edges are chosen to be swapped, ensuring that the likelihood has reached a plateau. Notice that at the end of the rewiring procedure the degrees of the nodes have not changed but the resulting network might not be connected. Since the hidden degrees are kept constant (independently of their values), the probability of swapping links between nodes i and j and between nodes l and m simply reads Therefore, the GR model does not actually require to estimate the hidden degrees of the nodes because they do not enter in any step of the algorithm. In contrast, the GR model simply needs to assign uniformly distributed angular coordinates and give a value for the clustering parameter β, see next Section for details on this part.
Geometric randomizations of networks can be also obtained using the S 1 model with parameters γ, β and µcontrolling the exponent of the power-law hidden degree distribution, the clustering coefficient, and the average degree, respectively-estimated from the empirical network. This alternative however, requires the explicit estimation of the hidden degree sequence P (κ) or of the exponent of the hidden degree distribution, and, thus, it may introduce undesired fluctuations in the degree cutoff which can induce relevant differences between the topological properties of real and S 1 generated networks.

TUNING CLUSTERING THROUGH PARAMETER β
In order to apply the GR model to a real or synthetic network one simply needs to fix parameter β, which controls the level of clustering in the network [7]. Clustering is a signature of the metricity of geometric networks [16] and gives the connection between the observed topology and the underlying metric space, as a reflection of the triangle inequality.
Note that the value of β affects the probability to accept a link swap (see Eq. (3)) so it determines the final network's structure. We address the role of β by applying the GR model to synthetic networks generated by the Geometric Preferential Attachment (GPA) model [13] and the soft communities in similarity space (SCSS) model [14]. Both models are intended to produce synthetic networks with tunable community structure.
The GPA model generates geometric networks with soft-communities using a growing mechanism in the hyperbolic plane. The probability of connection depends on parameter Λ controlling the initial attractiveness of the different angular regions, such that the heterogeneity of the angular coordinate is a decreasing function of Λ, with Λ → ∞ recovering the homogeneous distribution. Notice that the degree distribution and the clustering coefficient in networks generated by the GPA model are independent of Λ. However, β → ∞ by construction and, thus, the level of clustering is always the maximum possible. The SCSS model consists in an S 1 version for the generation of soft communities that allows to change the generated level of clustering as a function of β. Fig. 1a shows the average clustering coefficient c of a GPA network compared with the randomizations obtained by applying the GR model using different values of β. As expected, the average clustering of the rewired networks strongly depends on the value of β: the lower β, the lower c in the resulting network. A level of clustering similar to GPA values can be obtained in GR networks by using large values of β, such as β = 10.
In Fig. 1b-c, we report the average clustering coefficient obtained by applying the GR model to synthetic networks generated with the SCSS model. The SCSS networks are produced using two different generating values, referred as β 0 . Fig. 1b-c show that it is possible to fine tune the value of β used by the GR networks so that they reproduce the same average clustering c as the original networks. If the generation value β 0 is used for the rewiring, the level of clustering in the GR instances does not reach that in the original networks and remains smaller. This observation can be understood by noticing the following two points. First, for SCSS networks the c is independent of the level of angular clusterization, so any two SCSS networks with equal β 0 and the same distribution of hidden degrees, P (κ), will have equal c . Second, a GR instance of a SCSS network obtained using β 0 would be one with homogeneous P (θ) and the same observed degree distribution P (k) as in the SCSS network. That is, if P (k) = P (κ) exactly, then the average clustering c reached by the GR instance with β 0 would need to match that of the SCSS network. Since we do not observe this matching in Fig. 1b-c, we conclude it is due to differences between the distribution of observed and hidden degrees of the SCSS network.

EFFECTS OF GEOMETRIC RANDOMIZATION IN EMPIRICAL NETWORKS
In the following, we apply the GR model to real networks. We consider six empirical networks from different domains: the network of chords transitions in western popular music (Music) [17], the one-mode projection onto metabolites of the human metabolic network at the cell level (Metabolic) [11], the word adjacency network in Darwin's book On the Origin of Species (Words) [18], the email communication network within the Enron company (Enron) [19], and the Internet at the autonomous system level (Internet) [10,20], see Table I and Appendix B for details.
As described in the previous Section, β is the only free parameter of the model, and can be used to tune the clustering coefficient. In the following, we will show results by using a value of β ensuring that the average clustering of the rewired network is equal to that of the real one. Another possible choice for β is the value estimated when embedding the real network into the underlying metric space [10], which we indicate as β 0 in Table I. The embedding method estimates the coordinates of the nodes in the underlying geometry by maximizing the likelihood that the observed topology has been produced by the model. In the process, β 0 is estimated such that the expected clustering coefficient of the embedded network matches the observed clustering coefficient of the network topology. As explained in the previous section for synthetic networks, using β 0 as the input in GR does not produce in general rewired networks with the same average clustering c as in the original networks. For real networks, the two values of β are very similar but not always identical, see Table  I. The small difference is related with the fact that, for some real networks, the GR model cannot adjust simultaneously the empirical connection probability and the observed clustering using a single value of β, see Fig. 2.
Clustering and degree correlations Fig. 3 shows the average clustering c of the empirical networks under consideration as compared to the randomized versions obtained by the GR model. We consider both values β and β 0 (the corresponding networks are indicated by GR and GR 0 , respectively), and we include also a comparison with real network replicas generated by the S1 model [7], see Appendix A. As expected, GR networks show an average clustering practically identical to that of the original data, while GR 0 networks present mild deviations, and differences are usually more important for S1 networks due to deviations in the obtained degrees. One exception to the preservation of clustering in GR instances is the Words data set. This empirical network has a β 0 extremely close to the minimal threshold of β 0 = 1 defined in hidden metric space network models. The β value necessary to ensure that the GR network has the same level of clustering as the empirical one cannot be achieved since it would need to be lower than 1. In general, an embedding value of β 0 1 suggests that clustering is due to finite size effects, since β 0 = 1 corresponds to absence of clustering in the thermodynamic limit of the geometric network models. Graphs on the top row of Fig. 4 show the clustering spectrum c(k) for empirical networks and networks obtained by the GR and S1 models. In all cases, the functional form of c(k) is similar, a decreasing function of k with a broad tail. The clustering spectrum of the GR networks is always very close to the original data, while the S1 networks present important departures in some systems, as a result of the lack of preservation of the empirical degrees. This is especially evident for the S1 versions of the Music and Words networks, with the clustering spectrum much lower than that of the original data.
On the other hand, the real networks under consideration are generally disassortative, as revealed by the decreasing form of the average degree of nearest neighbors,k nn (k) function, Fig. 4 (bottom). Internet, Music and Words show a decay with power law form, while other data sets show milder degree correlations. In all cases, GR networks havek nn (k) distributions very similar to the original data, while S1 networks exhibit strong deviations, with the exception of the Internet.

Community structure
So far, GR randomized versions of real and synthetic geometric networks seem to be able to preserve topological features beyond the degree distribution, including clustering and the average nearest neighbors degree. However, the GR randomization homogenizes the distribution of nodes in similarity space, while nodes in real networks are typically heterogeneously distributed, as they are more concentrated in some specific regions [11,12]. This denotes the presence of communities of similar nodes, named soft communities [13]. Top row of Fig. 5 shows the representations of the empirical networks embedded in the hyperbolic plane, with coordinates (r, θ) (see Appendix A for the relationship between r and the degree, and Appendix B for references to the sources of the empirical maps). One can clearly see that the angular coordinates θ are heterogeneously distributed in [0, 2π]. A different perspective is shown in the bottom row in Fig. 5, displaying the probability density function P (θ) of the similarity coordinate of the nodes for the six empirical networks.
The heterogeneity of the angular coordinate can be quantified by performing a Kolmogorov-Smirnov (KS) test between the probability density functions P (θ) and P GR (θ). The KS statistic measures the difference between two probability distributions, and it is defined as the maximum difference between the values of the distributions P (θ) and P GR (θ). The larger the KS score, the more heterogeneous the angular distribution. Thus, it can be used to discard the null hypothesis that the empirical P (θ) and synthetic P GR (θ) samples (with uniform distribution by construction) present the same angular distribution. The KS distance D KS for empirical networks under consideration is reported in Table I. One can see that the null hypothesis is strongly rejected for all real networks.
Soft communities in the geometric domain can then be detected using geometric methods. We use the definition of soft communities given in [13], where they are defined as group of nodes in similarity space separated from the rest by two angular gaps that exceed a certain critical value, ∆θ c . The critical gap ∆θ c is calculated as the expected value of the largest gap between two nodes when the angular coordinates are distributed uniformly at random: ∆θ c 2π ln(N )/N . In the top row of Fig. 5, we highlight the soft community deterministic partition detected by the critical gap method in the real networks using different colors.
Next, we compare the community structure of the real networks with their randomized counterparts. To quan- tected by the critical gap method in the real networks, using di↵erent colors.
Next, we compare the community structure of the real networks with their randomized counterparts. To quantify their topological community structure, we apply the widely used Louvain method [20], aimed at maximizing the modularity Q 2 [ 1,1], that compares the fraction of links inside communities with the expected fraction for a random distribution of edges with the same node degree distribution as the given network. Interestingly, Fig. 6a shows that in real networks, albeit the Louvain method identifies topological communities with higher modularity, the soft communities discovered by the CG display large Q values, in some cases (e.g. Metabolic or Music data sets) comparable to the modularities given by the purely topological LM.
This picture is completely di↵erent for GR networks, reported in Fig. 6b. GR networks show strong community organization at the topological level, resulting in large values of Q as measured by the Louvain method, that is induced by structural constraints imposed by the geometric models. However, as expected the critical gap does not detect soft communities, as demonstrated by the non-significant values of the modularity, compatible with zero, over di↵erent realizations of the randomization process.
We study in more detail the relationship between soft communities and topological ones by comparing the partition obtained by the Louvain method with the partition generated by the critical gap. The overlap between the two partitions can be quantified by the normalized mutual information [21]. Fig. 6c shows that the overlap between soft and topological communities is quite large for real networks, specially for Metabolic and Internet data sets, meaning that communities identified by purely (deterministic) geometric methods can be meaningful, though subject to the degree of congruency of the real network with the hidden metric space. On  tremely close to the minimal threshold of 0 = 1 defined in hidden metric space network models. The value necessary to ensure that the GR network has the same level of clustering as the empirical one cannot be achieved since it would need be lower than 1. In general, an embedding value of 0 ' 1 suggests that clustering is due to finite size e↵ects, since 0 = 1 corresponds to absence of clustering in the thermodynamic limit of the geometric network models.
Graphs on the top row of Fig. 4 show the clustering spectrum c(k) for empirical networks and networks obtained by the GR and S1 models. In all cases, the functional form of c(k) is similar, a decreasing function of k with a broad tail. The clustering spectrum of the GR networks is always very close to the original data, while the S1 networks present important departures in some systems, as a result of the lack of preservation of the empirical degrees. This is especially evident for the S1 versions of the Music and Words networks, with the clustering spectrum much lower than that of the original data.
On the other hand, the real networks under consideration are generally disassortative, as revealed by the decreasing form of the average degree of nearest neighbors,k nn (k) function, Fig. 4 (bottom). Internet, Music and Words show a decay with power law form, while other data sets show milder degree correlations. In all cases, GR networks havek nn (k) distributions very similar to the original data, while S1 networks exhibit strong deviations, with the exception of the Internet.

Community structure
So far, GR randomized versions of real and synthetic geometric networks seem to be able to preserve topological features beyond the degree distribution, including clustering and the average nearest neighbors degree. However, the GR randomization homogenizes the dis-tribution of nodes in similarity space, while nodes in real networks are typically heterogeneously distributed, as they are more concentrated in some specific regions [11,12]. This denotes the presence of communities of similar nodes, denoted as soft communities [13]. Top row of Fig. 5 shows the representations of the empirical networks embedded in the hyperbolic plane, with coordinates (r, ✓) (see Appendix A for the relationship between r and the degree). One can clearly see that the angular coordinates ✓ are heterogeneously distributed in [0, 2⇡]. A di↵erent perspective is shown in the bottom row in Fig. 5, displaying the probability density function P (✓) of the similarity coordinate of the nodes for the six empirical networks.
The heterogeneity of the angular coordinate can be quantified by performing a Kolmogorov Smirnov (KS) test between the probability density functions P (✓) and P GR (✓). The KS statistic measures the di↵erence between two probability distributions, and it is defined as the maximum di↵erence between the values of the distributions P (✓) and P GR (✓). The larger the KS score, the more heterogeneous the angular distribution. Thus, it can be used to discard the null hypothesis that the empirical P (✓) and synthetic P GR (✓) samples (with uniform distribution by construction) present the same angular distribution. The KS distance D KS for empirical networks under consideration is reported in Table I. One can see that the null hypothesis is strongly rejected for all real networks.
Soft communities in the geometric domain can then be detected using geometric methods. We use the definition of soft communities given in [13], where they are defined as group of nodes in similarity space separated from the rest by two angular gaps that exceed a certain critical value, ✓ c . The critical gap ✓ c is calculated as the expected value of the largest gap between two nodes when the angular coordinates are distributed uniformly at random: ✓ c ' 2⇡ log(N )/N . In the top row of Fig. 5, we highlight the soft community deterministic partition de- tify their topological community structure, we apply the widely used Louvain method [21], aimed at maximizing the modularity Q ∈ [−1, 1], that compares the fraction of links inside communities with the expected fraction for a random distribution of edges with the same node degree distribution as the given network. Interestingly, Fig. 6a shows that in real networks, albeit the Louvain method identifies topological communities with higher modularity, the soft communities discovered by the CG display large Q values, in some cases (e.g. Metabolic or Music data sets) comparable to the modularities given by the purely topological LM.
This picture is completely different for GR networks, reported in Fig. 6b. GR networks show strong community organization at the topological level, resulting in large values of Q as measured by the Louvain method, which is induced by structural constraints imposed by the geometric models [22]. However, as expected, the critical gap does not detect soft communities, as demonstrated by the non-significant values of the modularity, compatible with zero, over different realizations of the randomization process.
We study in more detail the relationship between soft communities and topological ones by comparing the partition obtained by the Louvain method with the partition generated by the critical gap. The overlap between the two partitions can be quantified by the normalized mutual information [23]. Fig. 6c shows that the overlap between geometric and topological communities is quite large for real networks, specially for Metabolic and Internet data sets, meaning that communities identified by purely (deterministic) geometric methods are meaningful, though subject to the degree of congruency of the real network with the hidden metric space. On the contrary, Fig.6c shows that the overlap between soft and topological communities in GR networks is very low due to the complete randomization of the angular coordinate operated by GR.

CONCLUSIONS
The rewiring process preserving degrees in the geometric randomization of real networks gives an alternative to their replication using directly the popularity-similarity model as a topology generator. The GR offers the advantage of avoiding the delicate task of estimating the hidden degree distribution, and it can be especially useful in problems responsive to fluctuations of the degree cutoff, like the behavior of some dynamical processes including epidemic spreading processes.
As a model, GR depends on a single parameter controlling the level of clustering in the resulting networks, so that the clustering coefficient of real networks can be chosen to be replicated or not. Interestingly, the discrepancies between hidden and observed degrees in embedded networks, have an effect on the clustering level achieved by the GR. In particular, the parameter value suggested by the embedding of the original data is, in general, not far but not totally coincident with the needed value for replicating the clustering coefficient of the original network. Our results also indicate that, in some networks, degree-degree correlations can only be replicated by the geometric network models if the observed degrees are preserved.
As a null model, GR can be used to investigate the relevance of geometric communities in real networks. Taken together, our results indicate that geometric communities are meaningful in the real networks analyzed here. At the same time, topological communities, like those detected in GR networks, are not always reliable and can be a result of constraints induced by the underlying geometric architecture. The fact that an underlying geometric organization imposes structural constraints on complex networks, which are strong enough for recreating detectable topological communities even in the absence of geometric ones, is an interesting subject by itself and will be investigated in future work.  6: a-b) Modularity Q as detected by the Lovain method (purple) and the critical gap (yellow), for real (plot a)) and GR (plot b)) networks. Error bars in plot b) are obtained by 10 realizations of the GR model. c) Normalized mutual information between the partition detected by the Louvain and the critical gap methods, for empirical (blue) and GR (red) networks. Error bars are obtained by 10 realizations of the GR model.