Beyond the clustering coefficient: A topological analysis of node neighbourhoods in complex networks

In Network Science node neighbourhoods, also called ego-centered networks have attracted large attention. In particular the clustering coefficient has been extensively used to measure their local cohesiveness. In this paper, we show how, given two nodes with the same clustering coefficient, the topology of their neighbourhoods can be significantly different, which demonstrates the need to go beyond this simple characterization. We perform a large scale statistical analysis of the topology of node neighbourhoods of real networks by first constructing their clique complexes, and then computing their Betti numbers. We are able to show significant differences between the topology of node neighbourhoods of real networks and the stochastic topology of null models of random simplicial complexes revealing local organisation principles of the node neighbourhoods. Moreover we observe that a large scale statistical analysis of the topological properties of node neighbourhoods is able to clearly discriminate between power-law networks, and planar road networks.


Introduction
There has been significant recent interest in the topology and the geometry of networks [1,2,3,4].Network topology and geometry allow us to tackle fundamental theoretical questions concerning the principles determining the emergence of network geometry [5,6] or the proper definition of curvature in the discrete setting of networks and simplicial complexes [7,8,9].Moreover, network topology has been shown to be an important tool for network inference, and has been extensively used to explore network structure [10,11,12,13,14,15] and dynamics [16,17,18].In this paper, we propose a new network topology framework to investigate the local structure of real networks determined by the neighbourhoods of their nodes, which we often refer to as node neighbourhoods.
This approach will allow us to detect non-random statistical properties (with respect to a specific random model) in the local organization of these structures.
Node neighbourhoods, also called ego-centered networks, have been studied extensively in Network Science [19,20].In particular the clustering coefficient of a node is the most famous measure that quantifies the local density of triangles, or the so called transitivity, of connections [19].In particular the clustering coefficient has been used extensively to quantify to what extent a network satisfies the principle of triadic closure.This principle was originally formulated in the context of social networks, where it is observed that two friends of a common person are more likely to be friends of each other than in a set of random relations.However this tendency of real networks of displaying a large number of triangles has also been observed in other complex networks.Interestingly it is to mention that triadic closure is a basic mechanism for generating self-organised communities, as demonstrated by models enforcing this mechanism of network evolution [21].
Recently, higher-order clustering coefficients have been formulated in order to measure the density of cliques larger than triangles in a given node neighbourhood.The closure of higher dimensional cliques has also been used for link prediction [22,23].However, the clustering coefficient and its generalisations are insufficient to fully characterise the topology of a node neighbourhood, and therefore it is important to develop new Topological Data Analysis tools [24,25] that allow us to go beyond these simple metrics.
To perform a topological analysis of real networks, the first step is to construct a simplicial complex starting from the network.A simplicial complex is a higher-order network structure that is not just formed by nodes (0-dimensional simplices) and links (1-dimensional simplices) like a network but it is also formed by higher dimensional simplices such as triangles (2-dimensional simplices), tetrahedra (3-dimensional simplices), and so on.Starting from a real network, we can extract a simplicial complex (called clique complex) by associating to each c-clique of the network a (c − 1)-dimensional simplex.The topology of the clique complex can be investigated by calculating the Betti numbers, therein measuring the number of connected components (Betti number β 0 ), the number of cycles that form 1-dimensional holes (Betti number β 1 ), the number of 2-dimensional cavities (Betti number β 2 ), and their higher dimensional generalisations.
Here we propose a novel topological approach for analysing node neighbourhoods which goes beyond the traditional measure of the clustering coefficient, performing a large scale statistical analysis of the topology of the node neighbourhoods of real network datasets, with size ranging from 82,168 to 12,394,385 nodes.The results we obtain are compared to random simplicial complexes [26] or random Vietoris-Rips complexes [27], which take on the role of null models.
We show how the topology the node neighbourhoods of real datasets significantly differ from these null models, showing that they obey organisation principles.Moreover, we show how the proposed topological analysis of node neighbourhoods reveals significant statistical differences between scale-free networks, and planar road networks.
The paper is organised as follows.In Sec. 2 we define networks, simplicial complexes and clique complexes, and discuss how the clique complexes can be extracted from a network dataset.In Sec. 3 we define node neighbourhoods, and we describe their topology in terms of the total number of nodes, link density, and Betti numbers.In Sec. 4 we show evidence of the relevant diversity found in the topology of real network neighbourhoods with comparable node and link density.In Sec. 5 we study the topology of null model of node neighbourhoods, and we discuss general differences observed between real network neighbourhoods and the null models.In Sec. 6 we provide the results of a large scale statistical comparison between the topology of neighbourhoods of scale-free hierarchical networks and neighbourhoods of road networks, and we compare the statistical properties of these neighbourhoods with the statistical properties of the considered null models.Finally in Sec.6 we provide the conclusions.

Networks, simplicial complexes and clique complexes
A network is a graph G = (V, E) formed by a set of nodes V and a set of links E that represent the elements of a complex system and their interactions, respectively.Networks are ubiquitous and include systems as different as the WWW (web graphs), infrastructures (as airport networks or road networks) and biological networks (as the brain of the protein interaction network in the cell).
Simplicial complexes represent higher-order networks, which include interactions between two or more nodes, described by simplices.A simplex µ of dimension c − 1 is formed by a subset of c nodes.for instance a node is a 0 dimensional simplex, a link is a 1-dimensional simplex, a triangle is a 2-dimensional simplex and a tetrahedra is a 4-dimensional simplex and so on.A simplicial complex K is formed a by a set of simplices that satisfy the following two conditions: (a) if a simplex µ belongs to the simplicial complex, i.e. µ ∈ K then any simplex µ formed by a subset of its nodes is also included in the simplicial complex, i.e. if µ ⊂ µ then µ ∈ K; (b) given two simplices of the simplicial complex µ ∈ K and µ ∈ K then either their intersection belongs to the simplicial complex, i.e. µ ∩ µ ∈ K or their intersection is a null set, i.e. µ ∩ µ = ∅.
Given a simplicial complex it is always possible to extract a network known as the 1-skeleton of the simplicial complex by considering exclusively the nodes and links belonging to the simplicial complex.Conversely, given a network, it is possible to derive deterministically a simplicial complex defining its clique complex, obtained by taking a (c − 1) dimensional simplex for every c-clique in the network.The clique complex is a simplicial complex.In fact, if a simplex is included in a clique complex, then all its sub-simplicies are also included.
Moreover any two simplices of the clique complex have an intersection that is either the null set or it is a simplex of the clique complex.

Definition
A complex network can be described locally in terms of d-hop neighbourhoods, or ego-centered networks.Starting from a given node i, we consider the subgraph induced by the set of the nodes at hopping distance δ (with 0 < δ ≤ d).
The neighbourhood of node i is the clique complex of this induced subgraph.For example, if node i has degree 12, of which all nodes apart from one pair are disconnected, the corresponding clique complex contains 11 disconnected components, of which 10 are 0-simplices, and the other is a 1-simplex.Fig.

Number of nodes and link density of the neighbourhoods
The neighbourhood of node i is characterised by its number of nodes n i , and the density of links ρ i , given by When the neighbourhoods are formed by nodes at distance δ = 1 (i.e. when we consider d = 1 and 1-hop neighbourhoods), the number of nodes n i is given by the degree k i of node i in the original network, i.e.Additionally in this case the density of links ρ i in the neighbourhood is given by the local clustering coefficient C i of node i, [19] i.e.

Betti numbers
The neighbourhood topology can be studied by computing the Betti numbers of the resulting simplicial complex.These are topological invariants derived from the simplicial complex, and correspond, for each i ≥ 0, to the number of linearly independent i-dimensional holes in the space.Specifically the Betti number β 0 provides the number of connected components of the neighbourhood, the Betti number β 1 measures the number of 1-dimensional holes, i.e. cycles that are not boundaries of 2-dimensional subsets of the simplicial complex, and so on for spheres, hyperspheres, etc.

Topology of neighbourhoods in real complex networks
We have considered a number of large real complex networks, including WWW graphs, social networks and road networks with the total number of nodes N , total number of links L, average local coefficient C, and diameter D indicated in Table 1.All these networks are small-world [19], hierarchical [28] and scale-free [29], with the exception of the road networks that are both spatial [30] and planar.For the road networks alone, we have considered neighbourhoods formed by nodes at distance less or equal than d = 5 from the central nodes.In fact, due to their planarity, the 1-hop neighbourhoods (d = 1) are typically formed by isolated nodes.However, for all other small-world networks, considering 5-hop neighbourhoods would capture non-local properties, since these neighbourhoods would include a large fraction of the nodes of the network.Therefore, for the scale-free, small-world networks (i.e.all the networks considered, with the exception of road networks) we have considered only neighbourhoods formed by nodes at hopping distance d = 1.
For each studied dataset we have performed statistical analysis of the topology of its neighbourhoods by computing the Betti numbers, using the computational homology software CHomP [31].Specifically, in order to define an efficient computational framework, we have restricted our attention to neighbourhoods formed by clique complexes of dimension equal to 3, which leads to an accurate information about the Betti numbers β 0 and β 1 .In fact, this procedure guarantees that the value of the measured Betti numbers remain unchanged if clique complexes including simplicies of higher dimensions are also included.
The Betti number β 0 indicates the number of components of the local neighbourhood.Therefore, a high Betti number β 0 of a neighbourhood around node i indicates that the node i determining the neighbourhood acts as broker between different otherwise disconnected components of its neighbourhood.The Betti number β 1 indicates the number of cycles forming 1-dimensional holes.
Therefore a high ratio β 1 /β 0 indicates that the neighbourhood is not maximally dense.
In Figure 2

Why null models?
In order to characterize real datasets it is always important to refer to suitable null models.A null model allows us to compare the results obtained in real datasets with the results obtained when minimal assumptions are made on the underlying network topology.To this end, here we consider two popu-   lar models of simplicial complexes used in the Applied Topology literature, the random clique complex [26] and the random Vietoris-Rips complex [27].It is to be mentioned that this is not the only possible choice of null models.Indeed several more complex models of stochastic random simplicial complexes [32,33] and evolving simplicial complexes [34,35] have been proposed in the literature.However, here the choice of the random clique complex and the random Vietoris-Rips complex is driven by the need to make minimal assumptions on the topology of the node neighbourhoods.

Random Clique Complex
The random clique complex [26] also known as flag complex is the most fundamental null model for simplicial complexes.The random clique complex is the clique complex of the Erdös-Rényi random graph G(n, p) of n nodes and density of links p.

Random Vietoris-Rips Complex
The random Vietoris-Rips complex is the clique complex derived from the random geometric graph G(n, πr 2 0 ) [27].This ensemble is formed by n nodes distributed in the unit square [0, 1] 2 with periodic boundary conditions (i.e.R 2 /Z 2 ) according to a Poisson point process and connected when at distance less or equal to r 0 .Therefore the expected average degree of the random Vietoris-Rips 1-skeleton is given by nπr 2 0 .

Betti numbers of random clique complexes and random Vietoris-Rips complexes
For random clique complexes and random Vietoris-Rips complexes, there are analytical results indicating the expected value of the Betti numbers and sometimes also their distribution in the limit of large n, i.e. for n → ∞ [26,27].
In particular, since the Betti number β 0 of a simplicial complex is equal to the number of connected components, its value is predicted by the theory of percolation.For the random clique complex G(n, p) it is well known that in the limit n → ∞ the Betti number β 0 is a monotone graph property and is always decreasing with p.In particular for n → ∞ and np = a log n as long as the constant a is greater than one, the network has almost surely a Betti number β 0 = 1, i.e. it is formed by a single connected component.
The higher Betti numbers β j , with j > 0 however, display a non-monotone monotonic behaviour with the density of the network p.In a random clique complex this implies for instance that the Betti number β 1 which is zero for p 1 initially grows with increasing values of p.As the density of links p increases further, β 1 decreases.This result has an intuitive interpretation.Take for example a 4-cycle, which contributes one to the Betti number β 1 .As the density of links increases, the four nodes on the cycle connect into a four clique, which then contributes zero to the Betti number β 1 .
In Ref. [26], Kahle shows how this uni-modal transition of the Betti number β j , j > 0, which goes from vanishing, to non-vanishing, and back to vanishing again, occurs in the limit n → ∞ in a almost sequential way.For α, j > 0, if p = 1/n α and α > 1/j or α < 1/(2j + 1), then in the limit n → ∞ almost surely we have that the Betti number β j is vanishing, i.e.
For example, for j = 1 we have the threshold for percolation, and the theorem states that all components of G(n, p) with p = 1/n α and α > 1 are trees.This is a classic topological property of the component structure in the sub-critical phase of the random graph.The above result implies that for α < 1/3, all the cycles belong to the boundary of a higher dimensional clique.
Similarly it has been shown [27] that for the random Vietoris-Rips complex, the Betti number β 0 is monotonically decreasing with the increasing connection range, while the Betti number β j with j > 0 displays a uni-modal transition with r 0 .While these results are clearly fundamental to shed light on the topology of random simplicial complexes, since in this work we consider node neighbourhoods, we need to study how these asymptotic results are reflected in the small or middle sized networks which are available to us.In Figure 4, we show evi- dence that the discussed asymptotic behaviour of the two chosen null models is also reflected on random clique complexes and random Vietoris-Rips complexes of relatively small size (n = 20 and n = 50).From this figure, it is apparent that as a consequence of the uni-modality of the Betti number β 1 , typically in a random clique complex with given value of density of links p, only one of the two Betti numbers β 0 and β are significant.A similar behaviour is observed also for the random Vietoris-Rips complex with given connection range r 0 .

Comparison between Betti numbers of null models and Betti number of real complex network neighbourhoods
The neighbourhoods of real complex networks datasets can be compared with the two considered null models (random clique complexes and random Vietoris-Rips complexes), whose underlying network skeleton has the same average degree.When comparing the neighbourhood topology with an ensemble of random complexes (whose members act as an hypothesis of neighbourhood graphs), the neighbourhoods are constructed according to the null model, rather than obtained by searching large random graphs, as it is done for the real datasets rather random neighbourhoods are directly sampled from the corresponding ensemble of the null models.[36,37].This is a model that enforce formation of many triangles leading to a separation between many disconnected isolated nodes and a large, highly clustered component in its condensed phase.Additionally in these networks is not rare to observe neighbourhoods displaying at the same time large Betti number β 0 and large Betti number β 1 , leading to a topology of network neighbourhoods significantly different from the ones of the considered null models.In Figure 10 we show typical instance of neighbourhoods of the California road network which has instead rather different topology.In fact denser neighbourhoods are typically formed by a single component but display a large Betti number β 1 reflecting the planar nature of the underlying network.
6. Statistical topological analysis of complex networks neighbourhoods 6.1.Homology of hierarchical scale-free networks versus homology of road networks The datasets that we have considered include among them several hierarchical scale-free networks [28] (Notre Dame and Google Web Graphs, the Pokec, Slashdot and WikiTalk social networks) characterized by an average clustering coefficient of nodes of degree k scaling like and several road networks (Texas, Pensylvania, California road networks).
Clearly we expect to see significant differences in the topology of the neighbourhoods of hierarchical networks and road networks.The planarity of road networks clearly constraint all the Betti numbers β j with j > 1 to be zero.However as we have seen in the typical neighbourhood of road networks shown in Figure 2 and in Figure 10, the road networks neighbourhood tends also to have a lower β 0 and a higher β 1 with respect to neighbourhoods in the other datasets having the same number of nodes n and density of links ρ.To investigate further the statistical differences between road network neighbourhoods and scale-free hierarchical networks neighbourhoods we have evaluated the average value of   the Betti numbers β 0 and β 1 over neighbourhoods with fixed number of nodes n or density of links ρ.In the hierarchical scale-free networks a node with degree k will give rise to a node neighbourhood of n = k nodes as long as all the nodes of the neighbourhood are at distance δ = 1 from the original node.Correspondingly the density of links ρ of the node neighbourhood can be approximated with the average clustering coefficient C(k) of nodes of degree k = n, i.e. ρ C(k) that obeys Eq. ( 5) providing the following power-law scaling of ρ as a function of n (see Figure 11) From the statistical topological analysis of the average Betti number β 0 of the network neighbourhood we find that the average Betti number β 0 increases as a power law with n, i.e.
with ν > 0 (see Figure 12).This scaling indicates a monotonic power-law increases of the number of components of the local neighbourhoods as a function of the number of nodes n of the neighbourhoods.Therefore hubs tend to be broker between different otherwise unconnected communities.In Figure 13 we report the average Betti numbers β 0 and β 1 for node neighbourhoods with given density of links ρ.For hierarchical scale-free networks, using the scaling indicated in Eq.( 6), it is easy to predict that the average Betti number β 0 should decay as an inverse power of the density of links ρ, i.e.
with α = ν/θ.These scaling relations imply that more densely connected neighbourhoods are typically smaller, and characterised by a smaller Betti number β 0 .The road networks, that are not hierarchical are characterised by a significant different trend of the average Betti number β 0 as a function of ρ.In fact for the road networks the Betti number β 0 is a non-decreasing function of ρ.This result reveal that the number of connected components of the road neighbourhoods decreases for the neighbourhood of larger road junctions.
Therefore the Betti number β 0 of road networks neighbourhoods display very relevant statistical differences with respect to the Betti number β 0 of scale-free hierarchical networks.In order to emphasise the quantitative differences that we observe in the real datasets with respect to the null models in Figure 16    β 0 with respect to the density of links ρ that persists throughout all instances, independently of n.However, this behaviour is not captured by the null models.This is the clearest distinction between the stochastic topology of the null models, and the neighbourhoods in the real datasets.
This apparently follows from the scale-free, hierarchical structure of the datasets, which acts to ensure that large neighbourhoods display multiple disconnected components, unlike random cliques or random Vietoris Rips complexes.The Texas road network datasets, also, display a topology that is significantly different from the null models, where the Betti number β 0 is not decreasing with the density of links ρ while it is decreasing for the random clique complex and for the random Vietoris-Rips complex.

Conclusions
In this paper, we have analysed the topology of node neighbourhoods in large network datasets.The node neighbourhood of a generic node i is the clique complex of the network induced by the nodes up to distance d from i.
The topology of node neighbourhoods is then investigated by calculating their Betti numbers β 0 and β 1 .A node neighbourhood with a high β 0 indicates that the central node is connected to many nodes that are not directly connected to any other node in the neighbourhood.A high β 1 indicates instead that among connected nodes in the neighbourhood, there are several open cycles, implying that the corresponding cluster is not densely connected.
Our large scale statistical analysis reveals that the topology of these neighbourhoods is not only determined by their size and their link density.In fact, for a given size and link density of a neighbourhood, different topologies can be observed.Specifically we show that the topological study of node neighbourhoods is able to distinguish between the neighbourhoods of scale-free hierarchical networks and the neighbourhoods of spatial road networks.Moreover both types of real datasets obey significant organisation principles that impose a local topology of the node neighbourhoods that is significantly different from the random clique complex and the random Vietoris-Rips complex.In the future an interesting question that we would like to investigate is to what extent the recently proposed local curvatures of discrete networks and simplicial complexes [7,8,9] capture the local topology of node neighbourhoods.

3. 3
depicts an example of a 1-hop neighbourhood.On panel (a), the induced subgraph on the neighbours is shown, with the corresponding clique complex on the right in panel (b).The simplices, show up to dimension two, are randomly coloured.

Figure 1 :
Figure 1: The construction of the neighbourhoods of a node include two steps.First the subgraph induced by the neighbours of a node found at distance 0 < δ ≤ d is considered (panel (a)).Subsequently the clique complex of this subgraph is constructed, by adding (c − 1)-dimensional simplices for any c-cliques (panel (b)).
datasets, there are significant fluctuations.The large variability of the network topology of neighbourhoods with the same density of links ρ i indicates that the density of links in the neighbourhood (equivalent to the local clustering coefficient C i for neighbourhood with d = 1) cannot fully capture the variability observed in the topology of the node neighbourhoods.In fact for fixed value of the density of links ρ that can have very different Betti numbers β 0 and β 1 across different network datasets.In particular while the small-world and scale-free datasets are characterised often by neighbourhood with high value of the Betti number β 0 , the planar nature of the road network is revealed by the high value of the Betti number β 1 of its neighbourhoods with respect to the neighbourhoods of the other not planar networks.

Figure 2 :
Figure 2: We show several examples of neighbourhoods with number of nodes n 20 and different density of links ρ for different real network datasets and for the random clique complexes.The topology of network neighbourhoods evaluated by the Betti numbers β 0 and β 1 can have significant fluctuations even for neighbourhoods with comparable values of the local parameters n and ρ.

Figure 3 :
Figure 3: We show several examples of neighbourhoods with number of nodes n 100 and different density of links ρ for different real network datasets and for the random clique complexes.The topology of network neighbourhoods evaluated by the Betti numbers β 0 and β 1 can have significant fluctuations even for neighbourhoods with comparable values of the local parameters n and ρ.

Figure 4 :
Figure 4: In the left panel the average Betti numbers β 0 and β 1 of the random clique complex of n = 20, 50 nodes are plotted as a function of the density of links p.In the right panel the the average Betti numbers β 0 and β 1 of the random Vietoris-Rips complex of n = 20, 50 nodes are plotted as a function of the connection range r 0 .

Figure 8 :Figure 9 :
Figure 8: A set of typical neighbouroods of the Pokec social network are shown for ρ 0.05 and increasing values of the number of nodes n = 10, 20, 30, 50, 75, 125.

Figure 10 :Figure 13 :
Figure 10: A set of typical neighbourhoods of the California road networks are shown for ρ 0.05 and increasing values of the number of nodes n = 10, 20, 30, 50, 75, 125.

6. 2 .
Topological data analysis as a function of the number of nodes n and the density of links ρ The non-random behaviour in the datasets becomes apparent when we characterise the average topology of the neighbourhoods of real datasets with given number of nodes n and density of links ρ.To this end we plot the average Betti numbers β 0 and β 1 as a function of n and ρ and we compare the results with the average Betti numbers of the random clique complexes and the random Vietoris-Rips complexes with the same number of nodes and density of links.The Figures 14 and 15 display the numerical results for the neighbourhoods of the Slashdot social network and of the Pokec social network, respectively.
we plot the average Betti number β 0 of neighbourhoods of hubs nodes (n > 70), peripheral nodes (n < 20) and of bridge nodes n ∈ [20, 70] as a function of ρ.All the considered real scale-free networks display a power law dependence of the Betti number

Figure 14 :
Figure 14: The average Betti numbers β 0 and β 1 of the neighbourhoods to the Slashdot social network as plotted as a function of the number of nodes n and the density of links ρ of the neighbouroods.The real data are then compared with the results obtained on the random clique complex and the random Vietoris-Rips complex observing significant difference highlighting the non-random character of the real dataset.

Figure 15 :
Figure 15: The average Betti numbers β 0 and β 1 of the neighbourhoods to the Pokec social network as plotted as a function of the number of nodes n and the density of links ρ of the neighbourhoods.The real data are then compared with the results obtained on the random clique complex and the random Vietoris-Rips complex observing significant difference highlighting the non-random character of the real dataset.

Table 1 :
, we plot several examples of node neighbourhoods found in the analysed datasets.The first observation that we can draw from the statistical analysis is that if we compare the homology of neighbourhoods with comparable Table of used network datasets including number of nodes N , number of links L, number of nodes n and density of links ρ, but coming from different network average clustering coefficient C, and Diameter D.
Figure 7: A set of typical neighbouroods of the Slashdot social networks are shown for ρ 0.05 and increasing values of the number of nodes n = 10, 20, 30, 50, 75, 125.