Distances in a geographical attachment network model

: Distances between nodes are one of the most essential subjects in the study of complex networks. In this paper, we investigate the asymptotic behaviors of two types of distances in a model of geographic attachment networks: the typical distance and the flooding time. By generating an auxiliary tree and using a continuous-time branching process, we demonstrate that in this model the typical distance is asymptotically normal, and the flooding time converges to a given constant in probability as well.


Introduction
A significant amount of research has focused on networks as a result of the recent increase in interest in social networks [1] , communication networks [2] , scientific collaboration networks [3] , biological networks [4] , and many other types of networks.To analyze these networks, researchers have developed a number of models, many of which are widely used, such as the Watts and Strogatz model and the preference attachment model.Due to various constraints, different models have different topologies.In this paper, we concentrate on the model affected by geographical restrictions.
It is a given in social networks that people who relocate will most likely develop acquaintances with people in the area, such as their neighbors.Motivated by this idea, a geographical attachment network (GAN) model was first proposed in Ref. [5].In this network model, its size (the number of nodes in it) increases over time, and the newly added nodes are only connected to the nodes that are closest to them.
The following guidelines can be used to generate the GAN model presented in this work.We start with an initial state (at time ) of three nodes distributed on a ring, all of which are connected to one another.That is, the initial graph GAN is a triangle; see Fig. 1a.At time , the network GAN is obtained from in the following manner: A new node is placed in an internode interval chosen uniformly at random from the existing nodes along the ring and connected to its two nearest neighbors (one on either side).This GAN model is the simplest case in Ref. [5]; see also Ref. [6].For convenience in below, here we may define potential nodes and active intervals.If two endpoints of an interval are adjacent on the ring, the interval is said to be active.Each active interval corresponds to a potential node, which is a node that may be chosen as a new node in the future.Instead, we refer to nodes that are added to the network as actual nodes.There are two potential edges that could connect n = 0, 1 2 each potential node to its neighbors.Fig. 1 shows an illustration of this network at times and .For the variants of GAN models, we refer to Refs.[7][8][9].G A path in network is a subnetwork with a sequence of successive edges that joins a sequence of distinct nodes (except, possibly, the first and last node).The length of a path is the number of edges in it.Despite the fact that the geographical distance along the ring is used in the generating process of the GAN model, in this paper, we consider the graphic distance, i.e., the distance between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes.The diameter of a network is the largest distance among all pairs of nodes.
Several properties of the GAN model are obtained in Ref. [5] with heuristic arguments and computational simulations, and Ref. [6] using the rigorous probabilistic method.Let be the proportion of nodes with degree in the network GAN( ).It is shown in Ref. [6] that as Then the GAN model is not a scale-free network [10] , in which the limiting degree distribution is a power law.This indicates that the network model considered here is essentially different from the random Apollonian network model (see, for example, Refs.[11][12][13][14]), which looks quite similar to the GAN model.For the diameter of the GAN model, the following result is also derived in Ref. [6].We say that a sequence of events occurs with high probability (w.h.p.) when as .
As , the diameter of the network is with high probability asymptotic to with the constant is the unique solution in the interval (0,1) to the equation In this paper, we further investigate another two types of distances in the GAN model: the typical distance and flooding time.The typical distance of a network is the distance between a pair of nodes picked in it uniformly at random (u.a.r.).For the typical distance of other random graph models, we refer to Refs.[15][16][17][18].The flooding time of a network is the greatest distance from a randomly chosen node to other nodes.For more backgrounds and results of flooding time, see Refs.[19][20][21][22].
The rest of the paper is organized as follows.In Section 2, we analyze the structure of the subnetworks and obtain some results by building an auxiliary tree and using a continuoustime branching process.Based on these, we derive the results of the typical distance and flooding time of GAN models in Section 3.

Subnetworks and auxiliary tree
When the GAN model is in its initial state, i.e., GAN(0), we can see that the ring is divided into three initial intervals, and (see Fig. 2).We refer to the subnetwork made up of the nodes and edges in interval , including the endpoints of the interval, as , where .It goes without saying that any pair of nodes' paths involves a maximum of two subnetworks.There are only two scenarios to take into account for the typical distance of GAN( ): (A) Two nodes originate from the same initial internal; (B) Two nodes originate from distinct initial internals.Indeed, we only need to consider the typical distance of one of the subnetworks and the distance between any initial node and the node selected u.a.r. in the subnetwork.Therefore, we need to concentrate on the subnetworks first.
Assuming that there are and nodes in each of the three open intervals at time , then .We obtain that as , where Beta( , ) denotes the beta distribution with parameters and .With high probability, the random variables and have the same order of order , as proven in Ref. [6].Then, we can use the fact that The identical distribution of the three random variables is clear to notice.Without loss of generality, we restrict our attention to the subnetwork GAN ( ).We suppose that , where is an integer, and that a new node is added at each step in this subnetwork.As a result, the size of the subnetwork GAN can be set to , where varies with .GAN has two initial nodes that are linked together, hence the distances between any given node in the network and the two initial nodes can differ by up to one.
If we consider the first node added to the subnetwork to be the root of a binary tree and each time a new node is added, the two potential nodes produced are the two children of that node, we can create a binary tree according to this relationship.Afterwards, we may utilize a binary tree to obtain some of this subnetwork's characteristics.Moreover, in each step, only one potential node is transformed into an actual node, and two new potential nodes are generated.This network's creation resembles a unique continuous-time branching process in which each individual splits just once to produce two offspring.

Auxiliary tree structure
In this section, we construct an auxiliary binary tree and embed a branching process in the subnetwork .There is only one initial interval and one potential node in the initial state of subnetwork .An interval becomes two new active intervals when a new node is added to it, and these two new intervals also correspond to two new potential nodes.The potential node that lies to the left (or right) of node is called the left (or right) child of .Instead, is the parent of these two nodes.The ancestral line of node is a path that leads from the first nodes added in to , where the previous node is the parent of the next node.The network in

GAN 1 (n)
Fig. 3a can be redrawn using the shape of the binary tree to obtain Fig. 3b.The auxiliary tree in this paper is constructed in a manner similar to that in Ref. [6], but the nodes in the auxiliary tree correspond to the edges of the network in the latter.As shown in Fig. 3b, we can embed a continuous-time branching process (CTBP) (see, for example Refs.[23] and [24]) into .
Consider a CTBP as follows: at the beginning, there is a single individual who serves as the process's root.This initial individual then splits into two individuals (producing two offspring) before becoming inactive.Each individual has an i.i.d.exponential lifespan with a mean of 1.As a result, after birth, each individual is active for the entirety of its lifespan before splitting, going inactive, and giving birth to two offspring who then become active for their own i.i.d.Exp(1) lifespans.There are active individuals if individuals split during the process.Let represent the time when the -th individual splits.
There is a mapping relation between the subnetwork and the CTBP with splitters as follows.The first node in (i.e., the first node added to the subnetwork ) corresponds to the root of the CTBP.When theth individual in the CTBP splits, the already split individuals are the actual nodes in , while the individuals who are still active in the CTBP are the potential nodes in .This is true because, in each step, two new potential nodes are created in place of the actual node when a new node is added to .Furthermore, a potential node is randomly chosen in each step, which is equivalent to the CT-BP's next individual splitting being a randomly chosen active individual due to the memoryless nature of exponential variables.In fact, there is a maximum difference of one between any node's distance from the first node and its distance from either of the initial nodes.
The generation of an individual in the CTBP is equivalent to the length of the ancestral line leading to the corresponding node in .For active individuals, we can directly apply the conclusion of Corollary 1.1 in Ref. [24].Let be the generation of the individual in the CTBP corresponding to potential node selected u.a.r. in .Then as ,

Shortcuts
Obviously, for each node, there is a shortcut that connects it to other ancestors in addition to the connection to its parent.
To calculate the distance between each pair of potential nodes, it is necessary to identify the shortcuts between the first node and each node.
The ancestral line shows one of the paths from the node to the first node in GAN , and we can use the ancestral line to identify the shortcuts.We could also define a sequence for each node's ancestral line to discover the rule governing the existence of shortcuts.First, the left and right child nodes are denoted by symbols and , respectively.The ancestral line is the subnetwork after adding several nodes without marking potential nodes where the nodes labeled 0 and 1 are the initial nodes.By redrawing the network, we can obtain (b).The nodes marked as black circles and the nodes marked as blue squares are actual nodes and potential nodes, respectively.Except for the blue dotted line, the solid lines and the dotted lines represent the ancestral line of each node and shortcuts, respectively.Furthermore, the black lines and red lines represent existing edges and potential edges, respectively.is one of the potential nodes of this subnetwork.
of each node provides a sequence of and , where the -th symbol of the sequence denotes whether the -th ancestor is the left or right child of the -th ancestor.For instance, the sequence of the node labeled in Fig. 3b is .The sequence of the first node is represented by .Each node's sequence is therefore unique, with the exception of the initial nodes.The sequence is sometimes used to represent the node.Furthermore, instead of trying to distinguish between the two Λ initial nodes, we use the symbol to represent all initial nodes.
Assuming there are two nodes and , we can represent their sequences using the formulas and , respectively, where , , and and are the lengths of the sequence, i.e., and .We use to represent the last position of in a sequence and define a truncation operator , if symbol i does not appear in the sequence of the node.
The parent of is obviously either or .In particular, the first node has two parents, both of which are initial nodes.Along the ancestral line of , the next node of is , so is the left offspring of , i.e., is to the right of .Similarly, is to the left of .Based on the definition of the sequence of nodes and the above assumptions, we obtain the following facts: The prefixes of the sequence of node refer to the ancestors of , i.e., the nodes with the sequence are ancestors of .
The sequence represents the latest common ancestor of and , which can be written as .
u v (c) The shortest path between and must through some of their common ancestors.We must ascend to their common ancestors to determine the shortest path between them.
and are two ancestors of connected to , i.e., the neighbors of when node is added to the network as a new node.
Proof.We prove this claim by induction.The sequence of the first node is , and its children are and .Then the result is valid for the first node.Assume that the claim is valid for the most recently added node , that is, links to and .The endpoints of the interval to the left of are and .When the left child of , , is added to the network, and .These two results are exactly the endpoints of the interval of the left child of .Therefore, the claim holds for .Similarly, the claim also holds for .As a result, the shortcut connects any given node to either or .For any node , observe the difference between the sequence of nodes connected by the shortcut and the sequence of .From back to front, this difference collects exactly two symbols.For example, the shortcut of node with the sequence in Fig. 2 connects to the node with the sequence , and their difference is .Consequently, we can partition the sequence.From the back to the front, when exactly two symbols have been collected completely, i.e. the occurrence of the sequence or , we divide them into a block and obtain a shortcut.The division is then repeated in the same way to obtain the number of blocks.The last block may not have a complete collection of two symbols, but this block represents a node that is 1 away from one of the initial nodes.

E lr rl Y E
Reverse sequence and define an event : the occurrence of the sequence or .Let be the number of symbols needed for event to occur for the first time in the reversed sequence.It is easy to calculate When event occurs for the first time, we remove the symbols in the reverse sequence that precede it and those that represent it, and then we need to find the location of event the remaining sequence.This operation is repeated until no event occurs in the remaining sequence.The number of times an event is repeated in the inverse sequence of a node is equal to the distance between this node and the initial node minus .Suppose there is a sequence of length and let be the number of occurrences of event in that sequence.By (6.8) in Chapter XIII of Ref. [25], where indicates that the radio of the two sides tends to 1 and has an asymptotically normal distribution, i.e., as , ) . (3) Furthermore, we use to represent the length of the ancestral line of node .If node is the ancestor of node , then the shortest distance between and only needs to consider the difference in the two nodes' sequences, and the length of the difference is .GAN 1 (n) Suppose that is a potential node picked u.a.r. in with size and let be the shortest distance from to the first node.If , satisfies Proof.It is easy to obtain from (2), that According to formula (2), , as and the second term converges to in distribution.Condition on with , converges to in distribution by (3).Since is independent of , by Slutsky's lemma, we can obtain conclusion (4).

Common ancestor
According to fact (c) in Section 2.2, the shortest path between any two potential nodes must pass through some of their common ancestors.To find the distance between any two potential nodes, we need to find the relationship between the distance of this pair of nodes and their latest common ancestor.First, we want to divide the shortest path into two parts.u v u v Claim 2. For any pair of nodes and with sequences and , respectively, Proof.For convenience, we use the sequence of the node to represent the node in the following.
If is the ancestor of or vice versa, then this conclusion is easy to reach.If not, based on the previous facts in Section 2.2, we can assume that the shortest path between them is where is either or one of its ancestors.For path , each node is the parent of the previous node; for path , each node is the parent of the next node.If is , the conclusion is obvious.If is one of the ancestors of , assume that the two nodes of the child of closest to are and , where and represent the sequences where all the symbols are the same.Because of symmetry, we can assume that , , and .
Since it is impossible for the left offspring of and the right offspring of to have a shortcut to the same common ancestor at the same time, except for , there must be another common ancestor (defined as ) in this path.Without loss of generality, we can write the path as Obviously, there is a shortcut between and and between and .and means that there is a shortcut between and and there is at most one shortcut between and .There is a path through , and are the same as in path Since path (7) is the shortest path and it is possible that node is node , the distance between and is at least 1.Therefore, if we look for the shortest path through , the length of this path differs from the length of the path ( 7) by at most one.
Next, we focus on the length of the sequence difference between two nodes and their latest common ancestor.For any pair of nodes and picked u.a.r. in GAN with sequences and , respectively, assume that the individual in CTBP corresponding to was born at the -th split.According to Section 2 in [24], we can write where are conditionally independent and equal 0 or 1 depending on whether the individual in the ancestral line is newborn at time .Since we know that the number of active individuals (potential nodes) generated at each split is and the total number of active individuals at the -th split is , we can see that the indicators are independent and .In the same way, we define .Note that the event means that the ancestral lines of two nodes merge at the -th split.The joint conditional distribution of and can be written as P((I i , P((I i , . By Eqs. ( 8)- (10), It follows from has a limiting distribution, and converges to 1 in probability.This also means that has a limiting distribution, independent of .Considering the length of the sequence difference between a potential node and its ancestor, we can obtain the following result.
Using the Lindeberg central limit theorem and Eqs. ( 8)- (10) for linear combinations of where and are two arbitrary constants, the two variables in (11) converge jointly to a two-dimensional standard normal variable in distribution.

Typical distance and flooding time
As prepared in the previous section, we now consider the typical distance and flooding time in the GAN model.We first state our main results in the following.
Theorem 2. (Typical distance) Let be the typical distance of .Then as , To prove these two theorems, we need to use several lemmas as follows.In fact, we can define the radius of the network GAN as the length of the longest path from one of the initial nodes since the longest path of the network GAN must pass through one of the initial nodes.

GAN i (n) c log n c
Lemma 2 [6] .The radius of the subnetwork is w.h.p. asymptotic to with the constant is defined in Theorem 1.
Let be the distance between the node picked u.a.r. in GAN and either of the initial nodes.If , then 1 (n) Proof.For any existing node picked u.a.r. in GAN , there is a potential node that makes the distance between these two nodes equal to 1.By Lemma 1, the conclusion ( 13) is easy to obtain.
Proof.Pick a pair of potential nodes and u.a.r.from GAN whose sequences are and , respectively, and is their latest common ancestor with the sequence .In the following descriptions, we will directly represent nodes as the sequence of nodes for convenience.Define the distinct postfixes after by By Lemma 1 and Claim 2 (constant 1 can be ignored), the length of the shortest path between and satisfies

L u∧v
Similar to the proof of Lemma 1 and using the conclusion that has a limiting distribution, we calculate Observe that Since is independent of and the lengths of the sequences and are independent, we can obtain the following conclusion in the same way as the proof of Lemma 1.By conditioning first on and using the fact that the symbols in the sequence and are i.i.d., it can be shown that converges jointly to two independent copies of in distribution.

(n)
By (14), the distance between two potential nodes picked u.a.r. in GAN has the following property.
As shown in Fig. 2, each node in is connected to two potential nodes (except the initial nodes).For randomly selected nodes and in GAN with sequences and , respectively, suppose that the nearest potential nodes are and , respectively.Obviously, node is one of the ancestors of potential node .Then, Therefore, the distance between any pair of nodes selected u.a.r.from GAN satisfies the asymptotic normality (15).The proof of this lemma is now complete.This means that the distance between a randomly chosen pair of nodes in a randomly chosen initial interval has the above property due to the symmetry of GAN( ).For case (B), the path between any two nodes that originate from two different initial intervals must pass through one of the initial nodes.Consider these two nodes to have originated from and , respectively.Because of the independence of each node in GAN and GAN , we can see that the distance between the two nodes has the same distribution as , which is mentioned in Lemma 13.By ( 13) and (1), as , satisfies Therefore, the distances between any two randomly selected nodes from two of any randomly selected initial internals satisfy the above property.n After combining the conclusions of these two cases, it is not difficult to conclude that the distance between pairs of nodes in GAN( ) satisfies (12).
(n) Proof of Theorem 2. The flooding time in GAN can be expressed as where is a node picked u.a.r. in GAN .With high probability, the node most distant from is in another initial interval.Therefore, The radius of the subnetwork is given in Lemma 2. Therefore, by (18), (19) and Lemma 2, as ,

Fig. 1 .
Fig. 1.Illustration of the growing GAN model with potential nodes for time , 1, and 2, where points represent nodes in the network, points are potential nodes, and red dashed lines are potential edges.

2 Proposition 1 .
Suppose and are two potential nodes picked u.a.r. in with size , then,

Theorem 3 .
(Flooding time) Let be the flooding time of .Then as , constant defined in Theorem 1.
where and represent the difference between the two sequences.Define the distinct postfixes after by

Proof of Theorem 1 .
For case (A), as , since is the typical distance of the subnetwork GAN , we can use the fact(1) and Lemma 4 to obtain Without loss of generality, we consider the properties of the distance in GAN .ByLemma 3,  and    .If , by Chebyshev's inequality, is defined in Theorem 1.