Random walk attachment graphs

We consider the random walk attachment graph introduced by Saram\"{a}ki and Kaski and proposed as a mechanism to explain how behaviour similar to preferential attachment may appear requiring only local knowledge. We show that if the length of the random walk is fixed then the resulting graphs can have properties significantly different from those of preferential attachment graphs, and in particular that in the case where the random walks are of length 1 and each new vertex attaches to a single existing vertex the proportion of vertices which have degree 1 tends to 1, in contrast to preferential attachment models. AMS 2010 Subject Classification: Primary 05C82. Key words and phrases:random graphs; preferential attachment; random walk.


Introduction
There is currently great interest in the preferential attachment model of network growth, usually called the Barabási-Albert [2,1] model, though it dates back at least to Yule [11], and was discussed also by Simon [10]. In the simplest version of this an existing graph is incremented at each stage by adding a single new vertex which then attaches to a single pre-existing vertex; this latter is chosen from amongst those of the pre-existing graph with probability proportional to the degree of that vertex. In the Barabási-Albert model the new vertex will connect to m vertices, where m is fixed and is a parameter of the model, but here we only consider the case m = 1. One of the best known properties of the model is that it produces a power law degree distribution, as shown rigorously by Bollobás et al [3].
One weakness of this model and its generalisations is that this implicitly requires a calculation across all the existing vertices, or at least a knowledge of the total degree (sum of the vertex degrees) of the graph. This requirement then destroys the potential for this model to have emergent properties from local behaviour.
A possible solution to this was proposed by Saramäki and Kaski [9]. In their model the new vertex simply chooses a single vertex from the graph and then executes a random walk of length ℓ step initiated from that vertex. Saramäki and Kaski [9] and Evans and Saramäki [6] claim that this reproduces the Barabási-Albert degree distribution, even when ℓ = 1. It is clear that this is the case if the random walk is run for long enough to have converged to its stationary distribution. However we will prove that in the particular case ℓ = 1 the degree sequence does not converge to a power law distribution, but rather to a degenerate limiting distribution in which almost every vertex has degree 1.

The Model
Let G 0 be an arbitrary (perhaps connected) graph, with v 0 vertices and e 0 edges. Form G n+1 from G n by adding a single vertex. This vertex chooses a single vertex (i.e. this corresponds to m = 1 in the Barabási-Albert model) to connect to by picking a vertex uniformly at random in G n and then, conditional on the vertex chosen, performing a simple random walk of length ℓ on G n , starting from the randomly chosen vertex, and then choosing to connect to the destination vertex. Most of the time we will assume that ℓ is deterministic, but we will also consider a particular case where ℓ is replaced by a random variable.

Number of leaves
We first consider the number of leaves in the graph. Let p (n) d be the proportion of vertices in G n with degree d, and let L n = p (n) 1 , i.e. the proportion of leaves. The number of edges in G n will be n + e 0 , the total degree will thus be 2(n + e 0 ), and the number of vertices will be n + v 0 . Let V n be the vertex initially chosen at random at step n, and let W n be the vertex selected by the random walk, so the new vertex connects to W n . We now prove the main result, which applies to the case where ℓ = 1.
Proof. We assume that G 0 is not a star. If G 0 is a star, then it is clear that, with probability 1, G n will eventually not be a star, so we can just wait until this happens and re-label the first non-star graph as G 0 . If G n is not a star each vertex has at least one neighbour which is not a leaf, and in particular no leaves have a leaf as their neighbour. If V n is a leaf, which has probability L n , then W n will be one of its neighbours, which will not be a leaf, so in this case the number of leaves increases by 1. Hence, considering the conditional expectation of the number of leaves in G n+1 , and so E(L n+1 |G n ) ≥ L n and so (L n ) n∈N is a submartingale taking values in [0, 1], and thus converges almost surely and in L 2 to a limit, which we call L ∞ .
To show that L ∞ = 1 almost surely, note that conditional on V n having degree d the probability of W n not being a leaf is at least 1/d, so we can make (1) sharper, getting The total degree of non-leaves in G n is 2(n+e 0 )−L n (n+v 0 ) = (2−L n )(n+v 0 )+2(e 0 −v 0 ), and the number of non-leaves is (1−L n )(n+v 0 ), so the average degree of non-leaves is 2−Ln 1−Ln + 2(e 0 −v 0 ) (n+v 0 )(1−Ln) . Hence at least half the non-leaves have degree at most 2 2−Ln and so If E(L ∞ ) = lim n→∞ E(L n ) < 1, then for some fixed c < 1 we must have L n ≤ c with positive probability. The expectation on the right of (4) is then bounded away from zero for large n, giving a contradiction and showing that E(L ∞ ) = 1 and thus that L ∞ = 1 almost surely.
It should be noted that the argument for Theorem 1 is dependent on the walk length being fixed at 1. For example, define a sequence of random variables (X n ) n∈N which are independent and identically distributed with P (X n = 0) = p and P (X n = 1) = 1 − p, and let the walk length from V n to W n be X n , rather than a fixed ℓ as previously.
Then, by the same argument as before As there can be at most one more leaf in G n+1 than in G n , we also have Also, if there are no random walk steps from the initially chosen vertex the probability that the new vertex connects to a leaf is simply L n , so So, if we have X n = 0 with probability p and 1 with probability 1 − p for all n independently of each other Similarly, The right hand side of (5) is negative if and n is sufficiently large and the right hand side of (6) is negative if L n > 1 1+p and n is sufficiently large.

G 0 Bipartite
We now consider a special case which demonstrates that, for all odd ℓ, the random walk model of [9] differs fundamentally from that of the Barabási-Albert model.
Assume that G 0 is a bipartite graph, with the two parts coloured as red and blue. Then, in both models, for all n the graph G n will be bipartite, and the parts can be coloured red and blue consistently for each n. Let the proportion of red vertices in G n be R n . We begin with the random walk model.

Theorem 2.
We have R ∞ such that R n converges almost surely to R ∞ . If ℓ is even, then R ∞ = 1 2 , almost surely, while if ℓ is odd R ∞ is a random variable with a Beta distribution.
Proof. Conditional on G n , V n will be red with probability R n . If ℓ is odd W n will be of opposite colour to V n , which implies that the new vertex (which connects to W n ) will be of the same colour as V n , and thus, conditional on G n , will be red with probability R n and blue with probability 1 − R n . Hence in this case the colours of vertices are equivalent to the colours of the balls in a standard Pólya urn (where when a ball is drawn two of the same colour are returned), and so by classical results on the Pólya urn (see, for example, Theorem 2.1 in [8]) R n converges almost surely to R ∞ where R ∞ has a Beta distribution whose parameters depend on G 0 .
If ℓ is even then W n is of the same colour as V n and so the new vertex is of opposite colour to V n . Hence this case corresponds to a two-colour generalised Pólya urn where a ball is selected and a ball of the opposite colour is added, namely a Friedman urn with α = 0 and β = 1. In this case R n → 1 2 almost surely; see for example Freedman [7], and Theorem 2.2 in [8].
Proof. In this model it is possible to associate the selection of a vertex with an urn model by considering half-edges, and giving each half-edge the colour of its associated vertex, i.e. each edge is split into a red half and a blue half. The selection of a vertex with probability proportional to its degree is then equivalent to selecting a half-edge uniformly at random and then selecting the associated vertex. As the new edge added in G n+1 will always consist of a blue half and a red half, the proportion of red half-edges must converge to 1 2 , and as a red vertex is added if and only if a blue vertex is selected, the proportion of red vertices will converge to 1 2 , almost surely.
So in this respect the behaviour of the random walk model is different from the Barabási-Albert model when ℓ is odd, regardless of the size of ℓ.

Discussion
We have demonstrated that the model of Saramäki and Kaski is fundamentally different from that of Barabási and Albert, unless we allow an indefinite length for the random walk component. It does have the advantage of not requiring a global calculation, retaining the local behaviour characteristic which is desirable in models of emergent behaviour. An alternate approach might be to imagine that the addition of edges is affected by the vertices in G n , rather than by the new vertex. Thus each vertex in G n could link to a new vertex as it arises with probability proportional to its degree, independently of all other vertices, as in the variant of preferential attachment studied by Dereich and Mörters [4,5]. This, of course, destroys one of the usual assumptions of the preferential attachment model that the number of new links is some fixed value m, though we could substitute the condition that the average number added was fixed.
The urn model approach is interesting particularly since there is much known about these (see for example the survey paper by Pemantle [8]). We might generalise the model to consider directed graphs where there are k colours c i ; i = 0, k − 1, with directed edges only between a vertex of colour c i and one of colour c (i+1)(mod k) . When a new vertex is added it links at random to a vertex and then takes ℓ random steps along directed edges, its colour then being determined. The case ℓ = 0(mod k) will have the proportions of each colour converging to 1/k, whereas for ℓ = 0(mod k) there will be a Dirichlet distribution with parameters depending on G 0 .