Statistics of remote regions of networks

We delve into the statistical properties of regions within complex networks that are distant from vertices with high centralities, such as hubs or highly connected clusters. These remote regions play a pivotal role in shaping the asymptotic behaviours of various spreading processes and the features of associated spectra. We investigate the probability distribution $P_{\geq m}(s)$ of the number $s$ of vertices located at distance $m$ or beyond from a randomly chosen vertex in an undirected network. Earlier, this distribution and its large $m$ asymptotics $1/s^2$ were obtained theoretically for undirected uncorrelated networks [S. N. Dorogovtsev, J. F. F. Mendes, A. N. Samukhin, Nucl. Phys. B 653 (2003) 307]. Employing numerical simulations and analysing empirical data, we explore a wide range of real undirected networks and their models, including trees and loopy networks, and reveal that the inverse square law is valid even for networks with strong correlations. We observe this law in the networks demonstrating the small-world effect and containing vertices with degree $1$ (so-called leaves or dead ends). We find the specific classes of networks for which this law is not valid. Such networks include the finite-dimensional networks and the networks embedded in finite-dimensional spaces. We notice that long chains of nodes in networks reduce the range of $m$ for which the inverse square law can be spotted. Interestingly, we detect such long chains in the remote regions of the undirected projection of a large Web domain.


Introduction
The particular focus within the research field of the statistical physics of complex networks is on the exploration and comprehension of the central regions of a network that house vertices with high centralities, such as hubs or highly connected clusters.A large number of various centrality measures are used for discovery and indexing of this important central part [2,3,4,5,6,7,8,9,10,11].The statistical properties of the remote regions of networks, distant from vertices with high centralities, are far less studied despite their significant role in the asymptotic behaviour of various spreading processes, random walks, and the features of associated spectra [12,13,14].One of the simplest statistical characteristics of the remote regions of networks is the shape of the tail of the distribution of shortest-path lengths [15,16,17,1,18,19,20,21,22,23,24,25].Notably, in the networks demonstrating the small-world effect, this distribution approaches the delta function shape as the network size tends to infinity [26] in contrast to finite-dimensional networks ("large worlds").Hence, the exploration of remote network regions sug-gests a focus on networks that are large yet finite in size.In this paper we consider another statistical characteristic of the remote network regions in the giant connected component, namely, the probability distribution P ≥m (s) of the number s of vertices located at distance m or beyond from a randomly chosen vertex.This distribution was obtained theoretically in Ref. [1] for the configuration model of undirected uncorrelated networks with an arbitrary degree distribution.It was shown that the large s asymptotics of this distribution, for sufficiently large m, follows the inverse square law if an uncorrelated network contains leaves (dead end vertices), that is vertices of degree 1, while for the uncorrelated networks with the lowest non-zero degree of vertices equal 3, the asymptotics does not follow this law.In the intermediate case of the uncorrelated networks with the lowest non-zero degree of vertices equal 2, this asymptotics was obtained theoretically, but the range of its validity turned out to be narrow in reasonably sized networks, and so it is difficult to observe.
One should emphasize that uncorrelated networks are rather special in the sense that they account only complex degree distributions, devoiding of the various correlations and short cycles that are prevalent in the majority of real-world networks.Furthermore, these compact networks, despite their locally tree-like organization, contain cycles, and hence they cannot be proper trees.This is why the theoretical asymptotics, Eq. ( 1), was obtained only for a narrow class of networks.In this work we reveal that this inverse square asymptotics is actually observed in diverse real-world and synthetic undirected networks including strongly correlated networks, trees and loopy networks, demonstrating the small-world effect.These nets belong to the class of networks that is much wider than the uncorrelated networks.On the other hand, we indicate a set of networks for which this law is not valid.In particular, this set includes the finitedimensional networks and the networks embedded in finite-dimensional metric spaces.
Each distribution, plotted in each figure of this paper, was measured for one network realization through numerical computation of the number s of vertices located at distance m or beyond from each (and every) vertex in that specific realization of the network.
The paper is structured as follows.In Section 2 we generate a number of synthetic networks, including an Erdős-Rényi random graph, a random uniform tree, growing trees and loopy networks with various degree distributions and correlations, and measure in these networks the distribution P ≥m (s) and its asymptotics.In Section 3 we analyze the structure of the remote regions of a set of real-world networks, including social networks, the Internet and the WWW, power grids, and road networks.We classify the networks in which the inverse square law is observed and indicate the networks in which it is not valid.In Section 4 we discuss our results.

Inverse square law in synthetic networks
It is natural to start our study with an Erdős-Rényi random graph as the classical paradigm for random networks, being an uncorrelated network with a Poisson degree distribution.Figure 1(a) shows the distributions P ≥m (s) for different m observed in the Erdős-Rényi random graph of 10 6 vertices, each pair of which is interconnected with a probability p, where the average degree of a vertex ⟨q⟩ pN is 5.For the sake of comparison, for each m in the plot we indicate the corresponding theoretical asymptotics from Ref. [1]: In this asymptotics, and where X ∞ = 1 − S G (S G is the relative size of the giant connected component in the network) is the solution of the equation Finally, the k → ∞ limit of the recursion where the initial value is X 0 = 1 − δ, δ → 0, provides the constant B in Eq. ( 2), The function Γ(x) in Eq. ( 2) is the gamma function.Similar formulas describe the asymptotics of P ≥m (s) for the uncorrelated networks containing vertices of degree 1.Notice an excellent agreement between the measured distribution and the theoretical one.It will be more convenient to observe the cumulative distribution P (cum) ≥m (s) = u≥s P ≥m (u), for which this law, Eq. ( 1), corresponds to the 1/s asymptotics, see Fig. 1(b).Furthermore, Fig. 1(c) shows the distribution P m (s) of the number s of vertices located at distance m from a randomly chosen vertex for different m.One can see that for sufficiently large m, the distribution P m (s) is close to P ≥m (s).
Let us now consider synthetic correlated networks.First we explore three recursive trees: the growth of two of them is driven by the linear preferential attachment algorithm, and hence they are scale-free, with the degree distribution exponents γ = 2.2 and 3 (Barabási-Albert model-proportional preferential attachment), and the third is the random recursive tree, for which the degree distribution is exponential (γ = ∞).The first random tree has disassortative correlations between the degrees of the neighbouring vertices, the second has weak correlations, and the third has assortative correlations.All these growing random trees are small worlds.Figure 2 demonstrates that the cumulative distributions P (cum)  ≥m (s) at sufficiently large m decay as 1/s.
Figure 3 shows the cumulative distribution P (cum) ≥m (s) for a quite different tree, namely, for a connected uniform random tree, whose Hausdorff dimension equals 2, that is, this random tree is a "large world".The figure demonstrates that the cumulative distribution does not have a power-law asymptotics.
Figure 4 shows the cumulative distributions P (cum) ≥m (s) for loopy recursive networks whose growth is similar to the recursive trees in Fig. 2 with one difference.In contrast to the recursive trees, each new vertex in the networks in Fig. 4(a,c,e) attaches, with equal probability, to one or two existing vertices, and each new vertex in the networks in Fig. 4(b,d,f) attaches to two existing vertices.The existing vertices for attachment are chosen by the rules implemented for the recursive trees in Fig. 1(a,b,c).The degree distributions and correlations of the trees in Fig. 1(a,b,c) and the loopy networks in, respectively, Fig. 4(a,c,e) and Fig. 4(b,d,f) are similar.One can see the asymptotics 1/s of the cumulative distributions P (cum)  ≥m (s) for the loopy growing networks in Fig. 4(a,c,e), which have vertices of degree 1, while this asymptotics is not observed in the loopy growing networks in Fig. 4(b,d,f), which have no vertices of degree 1, despite their large sizes.
In Table 1 we list the basic structural characteristics of the synthetic networks considered in this paper.

The statistics of remote regions in real-world networks
Real-world networks typically have more complicated architectures than synthetic ones, and so one could expect that the observation of the inverse square law in real networks is more difficult.Surprisingly, this is not the case.In Table 2 we list the basic structural characteristics of the real-world networks considered in this paper.Figure 5 shows the cumulative distributions P (cum)  ≥m (s) for the maps of the large regions of four collaboration and social networks, namely, the FP5 net, Cite-Seer, the Youtube friends network, and Facebook.For all four sets of cumulative distributions we observe the 1/s asymptotics.
We also observe this asymptotics inspecting the cumulative distributions P (cum)  ≥m (s) for the Internet networks [36]: the maps of the routers and the autonomous systems, see Fig. 6.On the other hand, as is natural, the US power grid and the road network of Pennsylvania, which are two-dimensional networks, do not demonstrate the power-law asymptotics of P (cum)  ≥m (s), see Fig. 7.
Figure 8(a) shows the cumulative distributions P (cum)  ≥m (s) for a real-world network with a very large hub.This is the undirected projection of a network of 171,206 hyperlinks between 15,763 pages within Google's sites.The largest hub in this network has huge degree 11,401, which shapes the architecture of this specific network.The steps in the empirical cumulative distributions in Fig. 8(a) can be reproduced in a tree-like model network mimicking the structure of the Google net.Imagine a tree-like network with the hub having the same numbers of the first-, second-, third-, etc.-nearest neighbours as the hub in the Google net.For this model network one can easily estimate P (cum)  ≥m (s), see Fig. 8(b), and get a quantitative agreement with the empirical distribution for m = 2, 3, 4, and 5.The small size of this network does not allow us to check whether this special network architecture still provides the inverse square law or not.
Figure 9 shows an interesting set of the cumulative distributions P (cum)  ≥m (s) for the undirected projection of a large Stanford Web domain (notice a similar set of cumulative distributions in Fig. 6(b)).These empirical cumulative distributions have the 1/s asymptotics for m within the range between 10 and 15, but for larger m, the cumulative distributions become step-like.This steplike shape suggests a specific structure of the remote regions of this network.To understand the organization of the connections between the vertices within the remote regions of the network, we extract the vertices at a distance m = 25 or beyond from the largest hub in the undirected projection of the giant weakly connected component of the network and edges between them, and visualize the resulting clusters, indicating, for the sake of completeness, the directed edges of the original directed network.In total, there are 714 vertices in these clusters and 1681 directed edges.Fig. 10 demonstrates this visualization (see also Ancillary file).Notice that almost all these directed edges are reciprocal.Only 3 directed edges are not reciprocal.This gives the remarkably high fraction (1681 − 3)/1681 = 0.998 of reciprocal edges in these clusters.The same edge number computation performed including the main part of the network gives a total of 2, 234, 572 directed edges, of which 1, 649, 280 are not reciprocal, resulting in a much lower value, 0.262, for the fraction of reciprocal edges.We see that in this region the network is a set of long chains.Notably, only 3 of these chains have one of their ends free, and the remaining 7 chains are parts of long cycles.Loosely speaking, the Web in this remote region is one-dimensional.

Discussion and conclusions
We have explored one of the basic structural statistical characteristics of the remote regions of complex networks, which previously was known only for uncorrelated networks.We have observed the s −2 asymptotics of the distribution P ≥m (s) of the number s of vertices located at distance m or beyond from a randomly chosen vertex in a large set of real and synthetic undirected networks-small worlds-with a surprisingly diverse architectures.Such networks include trees and loopy networks, networks with strong and weak correlations, the one-partite projections of bipartite networks (FP5 net), the undirected projections of directed networks (Stanford Web), collaboration and social networks, the Internet and Web networks.This inverse square law is not observed in the networks having no dead ends (vertices of degree 1) and in finite dimensional networks (power grids, road networks).
For each of these networks we inspected the product of the cumulative distribution by N, NP (cum)  ≥m (s), which turned out approximately symmetric for all tested cases in the sense that the x-and y-axes of the plots can be interchanged.
Moreover, we have revealed that the organization of connections between vertices within the remote regions of networks differs dramatically from the main part of the network, see Fig. 10 and Ancillary file.In particular, we have observed a surprisingly high reciprocity 0.998 of directed edges in the remote region of the Stanford domain of the Web, while the reciprocity equals only 0.262 in the entire domain.
One should emphasize that the theoretical results of Ref. [1] for uncorrelated network still do not offer a compelling explanation for the consistent observation of the inverse square law across such a wide spectrum of networks.The explanation of this law is a challenge for the future work.Note that if we assume that the distribution P ≥m (s) has a power-law asymptotics, then, for the divergence of the first moment of this distribution (average number of vertices at distance m or beyond from a randomly chosen vertex), the exponent of this power law must be not greater that 2. Hence the observed exponent 2 of the asymptotics is the maximum possible value.
Other challenging directions for the future work are the exploration of remote regions of directed networks and examining the role of the chain structures observed in this work in network processes.
CRediT authorship contribution statement J.G. Oliveira: Planning and revision of the manuscript, Designed the study, Carried out numerical simulations, Writing original draft.S.N.Dorogovtsev: Planning and revision of the manuscript, Designed the study, Carried out analytical and numerical calculations, Writing original draft.J.F.F.Mendes: Planning and revision of the manuscript.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.≥m (s) of three random growing trees.Dashed lines have slope −1.(a) A scale-free recursive tree of 10 6 vertices whose growth is driven by the linear preferential attachment, Prob(q i ) ∝ q i + A, A = −0.8,where q i is the degree of vertex i.The degree distribution decays as q −γ , γ = 3 + A. (b) A scale-free recursive tree of 10 5 vertices generated by the Barabási-Albert model (proportional preference).The degree distribution decays as q −γ , γ = 3. (c) A random recursive tree of 10 5 vertices generated by progressive attachment of new vertices to randomly chosen vertices.Its degree distribution is exponential.≥m (s) of the connected uniform random tree of 10 4 vertices.For clarity, only the highest 30 values of m are plotted.This tree was generated by the Aldous-Broder algorithm [27,28] which we run on the complete graph of 10 4 vertices.(a,c,e) Each new vertex in a recursive network attaches, with equal probability, to one or two existing vertices selected by the same attachment rules as for the recursive trees in Fig. 2(a,b,c), respectively.(b,d,f) Each new vertex in a recursive network attaches to two existing vertices selected by the same attachment rules as for the recursive trees in Fig. 2(a,b,c), respectively.The networks in (a)-(e) contain 10 6 vertices, the network in (f) has 2 × 10 7 vertices.
Figure 8: (a) The cumulative distributions P (cum) ≥m (s) of the undirected projection of a network (15,763 vertices and 171,206 edges) of hyperlinks between pages within Google's sites [39].(b) The theoretical cumulative distributions P (cum)  ≥m (s) of the model tree-like network mimicking the Google net: it has the hub with the same numbers of the first-, second-, third-, and fourth-nearest neighbours, z 1 = 11, 401, z 2 = 4228, z 3 = 132, and z 4 = 1, as the Google net.

Figure 1 :Figure 2 :
Figure 1: The statistics of the remote region of the Erdős-Rényi random graph of 10 6 vertices with the average vertex degree ⟨q⟩ = 5.(a) Distribution P ≥m (s) for different m.The dotted lines show the theoretical asymptotics provided by Eq. (2).Dashed line has slope −2.(b) Cumulative distribution P (cum) ≥m (s) for different m.Dashed line has slope −1.(c) Distribution P m (s) for different m.Dashed line has slope −2.

Figure 3 :
Figure 3: Cumulative distribution P (cum)≥m (s) of the connected uniform random tree of 10 4 vertices.For clarity, only the highest 30 values of m are plotted.This tree was generated by the Aldous-Broder algorithm[27,28] which we run on the complete graph of 10 4 vertices.

Figure 4 :
Figure 4: Cumulative distribution P (cum) ≥m (s) of six random growing networks.Dashed lines have slope −1.(a,c,e)Each new vertex in a recursive network attaches, with equal probability, to one or two existing vertices selected by the same attachment rules as for the recursive trees in Fig.2(a,b,c), respectively.(b,d,f) Each new vertex in a recursive network attaches to two existing vertices selected by the same attachment rules as for the recursive trees in Fig.2(a,b,c), respectively.The networks in (a)-(e) contain 10 6 vertices, the network in (f) has 2 × 10 7 vertices.

Figure 7 :
Figure 7: The cumulative distributions P (cum)≥m (s) for (a) the US power grid of 4, 941 vertices[37] and (b) the US road network Pennsylvania with 1, 087, 562 vertices[38].For clarity, in (b), only the highest 50 values of m are plotted.

Figure 9 :
Figure 9: The cumulative distributions P (cum) ≥m (s) for the undirected projection of a large Stanford Web domain containing 255, 265 vertices [38].Dashed line has slope −1.

Figure 10 :
Figure 10: Visualization of the remote clusters in the Stanford Web network.The Pajek program package is used [40].The vertices are labeled according to their distances from the largest hub in the network.Three of these chains have one of their ends free, and the remaining 7 chains are parts of long cycles.This Figure is provided as Ancillary file.

Table 1 :
Basic structural characteristics of the synthetic networks considered: each line has information specifying a network realization, the figure where numerical results for the distributions are plotted, the size N of the largest component, the maximum degree k max , the average path length ℓ, and the maximum path length ℓ max .

Table 2 :
Basic structural characteristics of the real-world networks considered: each line has information specifying a network, the figure where numerical results for the distributions are plotted, the size N of the largest component, the maximum degree k max , the average path length ℓ, and the maximum path length ℓ max .