Vertices of high degree in the preferential attachment tree

We study the basic preferential attachment process, which generates a sequence of random trees, each obtained from the previous one by introducing a new vertex and joining it to one existing vertex, chosen with probability proportional to its degree. We investigate the number $D_t(\ell)$ of vertices of each degree $\ell$ at each time $t$, focussing particularly on the case where $\ell$ is a growing function of $t$. We show that $D_t(\ell)$ is concentrated around its mean, which is approximately $4t/\ell^3$, for all $\ell \le (t/\log t)^{-1/3}$; this is best possible up to a logarithmic factor.


Introduction
In this paper, we study the basic preferential attachment process, which is defined as follows. We start with a (small) tree on τ 0 ≥ 1 vertices. At each integer time t > τ 0 , a new vertex arrives, and is joined to one existing vertex; a vertex is chosen as the other endpoint of the new edge with probability proportional to its current degree. Thus, at each time t, we have a tree with t vertices. The random tree obtained at any time t is called the preferential attachment tree.
The first appearance of this process can be traced back at least to Yule [25] in 1925, and in probability theory the model is sometimes referred to as a Yule process. Subsequently, Szymański [22] studied the preferential attachment process in the guise of plane-oriented recursive trees. He gave a formula for the expected number d t (ℓ) of vertices of degree ℓ at time t, namely d t (ℓ) = 4t ℓ(ℓ + 1)(ℓ + 2) + O(1).
The structure of such trees was further analysed by Mahmood, Smythe and Szymański [17], and by Mahmood and Smythe [16]. Lu and Feng [12] proved a concentration result for the random number D t (ℓ) of vertices of degree ℓ, for fixed ℓ. Interest in the model surged after a paper of Barabási and Albert [2] in 1999, who proposed preferential attachment as a model of the growth of "web graphs", i.e., graphs possessing many of the same properties as "real-world networks" such as the worldwide web. Barabási and Albert studied not just the preferential attachment process as defined above, but also the variant where each new vertex chooses some fixed number m ≥ 1 of neighbours. For m > 1, the preferential attachment graphs produced are of course not trees, but for properties such as the degree sequence, the overall pattern of behaviour is the same for any fixed m.
Preferential attachment graphs were studied formally by Bollobás, Riordan, Spencer and Tusnády [4], who proved that the degree sequence follows a power law with exponent 3, i.e., the expected number d m t (ℓ) of vertices of degree ℓ at time t is of order t/ℓ 3 , more precisely 2m(m + 1) (ℓ + m)(ℓ + m + 1)(ℓ + m + 2) t for all ℓ ≤ t 1/15 . They also showed that the random number D m t (ℓ) of vertices of degree ℓ at time t is concentrated within O( √ t log t) of its expectation d m t (ℓ). They further indicated how the results could be extended to somewhat larger values of ℓ.
A much more general model was introduced and studied by Cooper and Frieze [6], and Cooper [5]: in the latter paper, Cooper proved a general result that implies (weak) concentration for D t (ℓ) whenever ℓ ≤ t 1/6 / log 2 t.
The maximum degree ∆ t of the preferential attachment tree is known to behave as t 1/2 as t → ∞. Móri [19] proved a law of large numbers and a central limit theorem for the ∆ t : in particular, he showed that ∆ t t −1/2 converges almost surely to some positive (non-constant) random variable, as t → ∞. Further, he showed that the fluctuations of ∆ t t −1/2 around the limit, scaled by t −1/4 , converge in distribution to a normal law.
Many variants of the preferential attachment process have been studied. The limiting proportion of vertices of each fixed degree ℓ has been investigated in many different models extending and generalising that of preferential attachment trees. See for instance, Rudas, Toth and Valko [21], Athreya, Ghosh and Sethuraman [1], Deijfen, van den Esker, van der Hofstad and Hooghiemstra [8], and Dereich and Mörters [9]. See also the survey of Bollobás and Riordan [3] for a number of other results on related models.
Our principal aim in this paper is to prove concentration of measure results for D t (ℓ) for all values of ℓ up to the expected maximum degree. For values of ℓ above εt 1/3 , the expectation of D t (ℓ) is of order at most 1, and all we show is that D t (ℓ) is, with high probability, at most about log t. For values of ℓ at most (t/ log t) 1/3 , we shall prove that D t (ℓ) is concentrated within about t log t/ℓ 3 of its mean.
We can write D t (ℓ) = t s=1 I(s, t, ℓ), where I(s, t, ℓ) is the indicator of the event that the vertex arriving at time s (or, for s ≤ τ 0 , the initial vertex labelled s) has degree exactly ℓ at time t. One would expect that, for large t and ℓ in an appropriate range, most of the variables I(s, t, ℓ) are approximately independent of each other, each with mean bounded away from 1. This would suggest that the variance of D t (ℓ) is of the same order as its mean, and that D t (ℓ) should be concentrated within about E D t (ℓ) ≃ t/ℓ 3 of its mean. This is indeed the case for constant ℓ: see Janson [11] for asymptotic formulae for the covariances Cov(D t (ℓ)/t, D t (j)/t). So our concentration result is likely to be best possible up to logarithmic factors.
Our methods can also be used to prove similar results for the random variable U t (ℓ), the number of vertices of degree at least ℓ at time t. The expectation of U t (ℓ) is approximately 2t/ℓ 2 for large t: at the end of this paper, we indicate briefly how to adapt our proof to show that U t (ℓ) is concentrated within about t log t/ℓ 2 of its mean as long as ℓ ≤ t 1/2 / log 13/2 t.
Before stating our results, we specify our model precisely. We start at some time τ 0 ≥ 1, with an initial graph G(τ 0 ) = (V (τ 0 ), E(τ 0 )), with |V (τ 0 )| = τ 0 , |E(τ 0 )| = τ 0 − 1; we think of G(τ 0 ) as a tree, although it need not be. At each step t > τ 0 , a new vertex v t is created, and is joined to existing vertices by one new edge, whose other endpoint is chosen by preferential attachment, that is, a vertex v is chosen as an endpoint with probability proportional to its degree at time t − 1. Note that, if G(τ 0 ) is a tree, then the graph at all later stages is also a tree.
The parameter ψ is a constant which may be chosen arbitrarily large in order to make the probability of failure arbitrarily small; the results are only of interest when t is larger than some t 0 (ψ). All our results are stated in terms of such a parameter (denoted ψ or ω).
We do not give a detailed proof of Theorem 1.2 in this paper, but we do give an indication of how to adapt our proof of Theorem 1.1 to give this result. The term log 7 (ψt) appearing above could certainly be improved with more work. Theorem 1.2 seems to be the first explicit result concerning concentration of measure for U t (ℓ), although some weak concentration can be deduced from concentration results for D t (ℓ). Moreover, Talagrand's inequality [24] can be applied readily: to demonstrate that U t (ℓ) ≥ x, a certificate of length at most O(xℓ) suffices (see for instance [18] for details of the method). This method gives concentration for U t (ℓ) up to about ℓ = t 1/3 , and indeed concentration for D t (ℓ) up to about ℓ = t 1/5 .
For constant values of ℓ, Bollobás, Riordan, Spencer and Tusnády [4] showed that D t (ℓ) is concentrated within about t 1/2 of its mean, which is best possible; a similar result for U t (ℓ) follows. For larger values of ℓ, in particular where ℓ is growing as a small power of t, earlier methods (including the method based on Talagrand's inequality that we mentioned above) do not give the "optimal" concentration of D t (ℓ) or U t (ℓ) about their respective means. Our results above do give what should be optimal concentration, up to logarithmic factors, for D t (ℓ) and U t (ℓ), whenever the expectations of these random variables tend to infinity, again up to logarithmic factors.
In Section 2, we give an exposition of a method based on exponential supermartingales, that is widely used in the analysis of continuous time Markov processes. We transfer the method to the discrete time setting, and state two theorems that we shall use, and that can be applied in other similar contexts.
In Section 3, we apply our method to describe the evolution of the degree of a fixed vertex in the preferential attachment model. We do this partly to illustrate the method, but mostly so that we can use the results in later sections. We prove a result on the maximum degree ∆ t that is weaker than Móri's [19], but simple to prove, in the interests of keeping the paper selfcontained.
Sections 4 to 6 are devoted to the proof of Theorem 1.1. Section 4 contains the main thread of the proof, and we defer some calculations to Sections 5 and 6. One difficulty we face is that we cannot get sharp results by working directly with the natural martingale associated to the Markov chain D = (D t (ℓ) : t ≥ τ 0 , ℓ ∈ N), so we work instead with a suitable transform of that martingale. Proving concentration of measure for the transform is not straightforward, so we introduce another Markov process derived from D, and apply our methods from Section 2 to that process. Section 7 contains a brief sketch of the proof of Theorem 1.2.
In this paper, we deal only with the preferential attachment tree. However, our methods will extend to more general settings, and indeed we believe we can prove results similar to those above for the general Cooper-Frieze model. We intend to address this elsewhere; in a very brief final section, we make a few remarks on the difficulties involved in extending our proof to other preferential attachment models.

Our method: exponential supermartingales
The following technique is adapted from a fairly standard method used in the analysis of continuous-time random processes; see for instance [7], [13] and [15]. We have not been able to find a suitable account in the literature of a discrete-time, and time-dependent, version of the method for us to quote, so we develop the theory here. We provide results that we hope may prove useful in other settings.
Let X = (X t : t ∈ Z + ) be a discrete-time Markov chain, possibly time non-homogeneous, with countable state space E and transition matrix P t = (P t (x, x ′ ) : x, x ′ ∈ E) at time t. (Here and in what follows, our matrices -which will normally be infinite -have rows and columns indexed by the countable set E.) Let (F t ) be a filtration, and suppose that (X t ) is adapted.
Let I denote the identity matrix. Further, let us write, for a matrix A, Then we see that Further, note that is the expected change in f at the t-th step given that X t = x.
is an (F t )-martingale.
Proof. The proof for the time homogeneous case can be found in Norris [20].
Checking that M f t is a martingale in the time non-homogeneous case is just as easy. Consider where we used (2.1). Also, for each t ≥ 0, Then is an F t -supermartingale, as long as E Z f t < ∞ for all t.

Proof.
Consider where we have used the fact that E[f (X t+1 ) | X t ] = P t f , and the fact that Note that, for a continuous-time Markov chain, the analogue of Z f t in Lemma 2.2 is in fact a martingale; see for example Lemma 3.2 in Chapter 4 of [10]. In the time-continuous case, the matrix (P t − I) is replaced by the generator matrix A t of the Markov chain, which is the derivative at time t of its transition semigroup P t .
We shall show how, under certain conditions, Lemma 2.1 and Lemma 2.2 can be used to prove a law of large numbers for a Markov chain. Lemma 2.3. Let g : E → R be a function, and suppose that X 0 = x 0 a.s., for some x 0 ∈ E. For θ ∈ R, let Then is an F t -supermartingale, as long as E Z g t (θ) < ∞ for each t.
Proof. The result is a consequence of Lemma 2.2, with f (x) = e θ(g(x)−g(x 0 )) . That lemma tells us that Z f t is a supermartingale, and we need only verify that Z f t = Z g t (θ) for this choice of f .
The calculation goes as follows: Note that, while X t remains in a 'good' set S t of states x where e θ(g(x ′ )−g(x)) is bounded by some constant (possibly depending on t) over all x ′ such that P t (x, x ′ ) > 0 and all x ∈ S t , (i.e., the size of changes in g stays uniformly bounded), then the finiteness assumption of Lemma 2.3 holds. Furthermore, we can approximate e θ(g(x ′ )−g(Xt)) using a Taylor expansion.
In many applications, in particular those in this paper, |g(x ′ ) − g(x)| will be uniformly bounded over the entire state space E and over all transition matrices P t : if we work up to some fixed time τ , then it suffices to have the bound valid for t < τ . We assume from now on that, for every τ ≥ 0, there is some real number J = J(τ ) such that g satisfies: Now we fix some real number α > 0, and restrict attention to values of θ such that |θ| ≤ α. We use the identity Suppose that X 0 = x 0 a.s., for some x 0 ∈ E, and that we study the chain up to some time τ > 0. Our aim is to show that M g t remains small over the period 0 ≤ t ≤ τ . For a precise statement, we need a few more definitions.
By optional stopping, Hence, using the Markov inequality, Optimising in θ, we find that θ = δ/e αJ R is the best choice, and note that |θ| ≤ α. This yields An almost identical calculation gives and the first part of the result follows. The two special cases are obtained by choosing the given values of α and δ, and verifying that δ ≤ e αJ αR and We summarise what we have proved in a theorem. Theorem 2.5. Let X = (X t ) t∈Z + be a discrete-time Markov chain, with countable state space E and transition matrix P t at time t, and suppose that (X t ) is adapted to a filtration (F t ). Let g : E → R be any function, τ any natural number, and J any real number, satisfying Let R > 0 be a real number, and set Then is an (F t )-martingale and, for any ω > 0, We have demanded that the state space be countable, so that we can express our results in terms of sums over the state space. It suffices to assume instead that, for any state x ∈ E, and any time s, there is a countable set We also remark that, in the statement above, we begin our consideration of the chain at time 0. When applying Theorem 2.5 in the analysis of the preferential attachment tree, we shall instead start at some fixed time τ 0 : of course this makes no substantive difference.
In some instances, for example in Section 3, we will want to bound the probability that |M g t | ≤ δ(t) for all t ≤ τ , where δ(t) is a suitable function growing with t. One easy way to do this is to apply the above theorem for each value t ≤ τ , choosing an appropriate value R(t) of R at each time. This approach has the drawback that it is necessary to sum the probabilities of failure over t ≤ τ . Better bounds may be obtained by applying the lemma only for a sparse sequence of values t, as we illustrate in the proof of the following result.
The notation here is essentially as for Theorem 2.5. We again have a real-valued function g defined on the state space E of a Markov chain X, and the change in g is uniformly bounded by J over all possible transitions of the chain. The function Φ g t (X) is as in Theorem 2.5. Now we have a non-decreasing function R : Z + → R + , and we set Also, for any non-decreasing function δ : Z + → R + , we define an associated stopping time T g (δ) = inf{t ≥ 0 : |M g t | > δ(t)}. With the notation as above, we have the following result.
(a) Fix ω > 4, and let δ(t) = max(ωJ, 2 ωR(t − 1)) for t ≥ 1. Then, for any τ > 0 such that R(τ − 1) ≥ ωJ 2 , Proof. For (a), we define a finite sequence of times τ 1 , τ 2 , . . . as follows. Let τ 1 be the first t for which R(t) > ωJ 2 : by assumption The final term τ N in the sequence is the first τ j with τ j = τ , and the number N of terms in the sequence is then no greater than 2 + log 4 We first apply Lemma 2.4(ii) with τ = τ 1 and R = R(τ 1 − 1), noting that ω ≥ R(τ 1 − 1)/J 2 by definition of τ 1 . We obtain that As δ(t) ≥ ωJ for all t ≤ τ 1 , this means that, with probability at least We conclude that, for each j ≥ 2, with probability at least It now follows that, with probability at least 1−2N e −ω/4 , we have |M g t | ≤ δ(t) for all times t ≤ T R ∧ τ , and part (a) follows.
The proof of part (b) is very similar in style. This time we let τ 1 be the first t for which R(t) > 2ψJ 2 log(ψJ 2 ). Given τ j , we let τ j+1 be the minimum t such that R(t) > 4R(τ j ). The assumption that R(t) tends to infinity ensures that we obtain an infinite sequence (τ j ) j≥1 of times.
We apply Lemma 2.4(ii) with ω = 2ψ log(ψJ 2 ), τ = τ 1 , and R = R(τ 1 −1), noting that ω ≥ R(τ 1 − 1)/J 2 by choice of τ 1 . We obtain that Since δ(t) ≥ 2ψJ log(ψJ 2 ) for all t, this implies that, with probability at as required; here we used the facts that , by the definition of the τ j , and R(·) is increasing, (ii) ψJ 2 ≥ 4 and x/ log x is an increasing function, with minimum value e, for x ≥ e (so in particular 2 log(ψJ 2 ) < ψJ 2 ). We obtain: and so, with probability at Therefore, summing over j, and noting that

Evolution of the degree of a vertex
For the remainder of the paper, we will use the results of the previous section to analyse various aspects of the preferential attachment process.
Our first, relatively simple, application is to the evolving degree of a single vertex; loosely, we prove that, if a vertex has degree k at time s, its degree at a later time t is unlikely to be far from k t/s. Results of a similar flavour are in the literature already (see for instance Cooper [5], Athreya, Ghosh and Sethuraman [1] and Dereich and Mörters [9]); we give them here partly to illustrate our methods and partly because we shall have need of the results from this section later on.
We assume as always that our process starts at some time τ In what follows, we shall assume that v ≤ τ 0 , so that vertex v is present in the graph at time τ 0 , and we let m 0 = X τ 0 (v) be its degree at the initial time.
We now want to calculate the corresponding martingale from Lemma 2.1. First we note that .
This is because the sum of all vertex degrees at time t is 2(t − 1), and the probability that a vertex w is chosen as the endpoint of the new edge from vertex t at time t, conditional on X t = x, is proportional to its degree x(w), and therefore the conditional probability that vertex v is chosen is is a martingale. We re-write the above as .
Let x t solve the recurrence relation , and so Now fix any ω ≥ 4. For a vertex v, we define the time (3.1), and using the recurrence , .
is either 0 or 1, and the probability that it is equal to 1, conditional on X s , is .
We now apply Theorem 2.6(b) to the function f v , with R(s) = 60ω 3 m 0 s τ 0 −1 , J = 1, and with ψ = ω. Thus we have We thus obtain that, with probability at least Now log(xy) − y 1/3 log x is decreasing in y for y ≥ 1 when x ≥ 20, and is zero at y = 1: we apply this with x = 60ω 3 m 0 and y = (t − 1)/(τ 0 − 1), to obtain that for any m 0 ≥ 1, and so We analyse the recurrence above using the following simple lemma.
Of course, the conclusion that we shall use is that e t < 6A t−1 τ 0 −1 , but the bound above is easier to establish by induction.
Proof. The proof is by induction on t, the result being true with something to spare for t = τ 0 .
Suppose the result is true for all s with τ 0 ≤ s < t. Then, by the induction hypothesis and the recursive bound, we have: is decreasing for all s > τ 0 , so g(s) ≤ 1/3τ 0 for all s ≥ τ 0 , and we have

This gives
. This is the desired inequality for e t .
We can now deduce that, with probability at least and this bound is at most 48ω T v = ∞, since otherwise this would contradict the definition of T v . This means that, with probability at least 1 − 5e −ω/4 , the bound (3.3) is valid for all times t ≥ τ 0 . We thus have the following theorem.
and therefore We note two consequences of the result above that we shall use later.
The result will then follow from Theorem 3.2 as long as We see that which implies the desired inequality.
We shall also use the following result, stating that the maximum degree at time t is unlikely to be larger than ψ √ t − 1, where ψ is a large constant.
Theorem 3.4. Let τ 0 ≥ 4 and ψ ≥ 10 5 √ τ 0 − 1 log 3 τ 0 be constants. For the preferential attachment model, with any initial graph on τ 0 vertices and τ 0 − 1 edges, Proof. Let P 1 be the probability that X t (v) ≥ ψ √ t − 1 for some t ≥ τ 0 and some vertex v already present at time τ 0 , and P 2 be the probability that X t (v) ≥ ψ √ t − 1 for some t ≥ τ 0 and some vertex v arriving at a time later than τ 0 .

Concentration for D t (ℓ)
In this section, we again consider the basic preferential attachment model, but now we are concerned with the number of vertices of degree exactly ℓ at time t.
Recall that, for t ≥ τ 0 and ℓ ∈ N, D t (ℓ) denotes the number of vertices of degree exactly ℓ at time t. It is easy to see that D = (D t (ℓ) : t ≥ τ 0 , ℓ ∈ N) is a Markov chain.
We recall our main theorem.
As is well-known (and as we shall show shortly), the expectation of D t (ℓ) is very close to 4t/ℓ(ℓ + 1)(ℓ + 2), for all ℓ ≥ 1 and all t ≥ τ 0 , so the theorem shows concentration of measure of these random variables about their means.
For ℓ ≤ (t/ log(ψt)) 1/3 /ψ 2 , the bound on the deviation of D t (ℓ) is at most 125 t log(ψt) ℓ 3 , which is, up to the log factor, on the order of E D t (ℓ).
We get concentration within a factor (1 + o(1)) of the mean as long as For all values of ℓ larger than (t/ log t) 1/3 , the bound on the deviation that we obtain is of order log t. This result might conceivably be of interest for values of ℓ between about (t/ log t) 1/3 and t 1/2 , but for larger values of ℓ we already have a stronger result: Theorem 3.4 tells us that D t (ℓ) = 0 when ℓ is larger than ψ √ t, with probability at least 1 − 1/ψ. The proof of Theorem 1.1 takes up the rest of this section, although we defer the bulk of the calculations until later sections.
Proof. It will shortly turn out to be convenient to truncate the range of ℓ, so that we consider only values of ℓ with 1 ≤ ℓ ≤ ℓ 0 , for some fixed ℓ 0 . We remark now that we may freely do this, as we are proving an explicit bound on the probability of failure that is independent of ℓ 0 .
For the moment though, we consider all values ℓ ∈ N simultaneously, and consider the evolution of the entire process D = (D t (ℓ)) for t ≥ τ 0 .
We want to show that, for ℓ ≥ 1, D t (ℓ) is close to d t (ℓ), where the d t (ℓ) satisfy d τ 0 (ℓ) = D τ 0 (ℓ) for all ℓ, and: Given the initial values d τ 0 (ℓ) = D τ 0 (ℓ), for ℓ ≥ 1, the equations above are equivalent to: These equations are known to admit the explicit solution if the initial conditions correspond (which of course cannot happen for a concrete graph at time τ 0 , since then all the D τ 0 (ℓ) are natural numbers). More generally, we have the following result, which is very similar to results of Szymański [22,23] and Bollobás, Riordan, Spencer and Tusnády [4].
For each t > τ 0 , it is straightforward to verify that , and, for ℓ > 1, .

By induction, it now follows that
for all t ≥ τ 0 and every ℓ ≥ 1, as claimed.
Define E t (ℓ) = D t (ℓ) − d t (ℓ), where, as above, we set d τ 0 (ℓ) = D τ 0 (ℓ) for all ℓ ≥ 1. Note that D τ 0 (ℓ) is an integer-valued random variable, determined by the graph at the initial time τ 0 . Note also that E τ 0 (ℓ) = 0 for all ℓ ≥ 1. For the moment, we shall keep the term E τ 0 in our expressions, to show how the calculation would be affected in a setting where E τ 0 is not necessarily zero.
Once we have fixed ℓ 0 , we may restate the previous system of equations as a matrix equation, giving a recurrence for We have, for t ≥ τ 0 + 1, Hence it follows that, for t ≥ τ 0 + 1, Here and subsequently, the notation t−1 s=τ 0 A s indicates the matrix product A t−1 · · · A τ 0 , with the indices taken in decreasing order. At this point, we recall that E τ 0 = 0, so that We shall control the deviations of E t , although this process is not itself a martingale, and so we cannot directly apply our martingale deviation inequalities. The process (E t ) is a transform of the martingale (M t ), in that it is a sum of its differences, multiplied by the appropriate t−1 u=s A u , which depend on t. In order to get around this difficulty, we now introduce, for each τ > τ 0 , a martingale M τ stopped at τ , whose value at τ is the quantity E τ of interest.
We fix τ > τ 0 and define, for t ≤ τ , and M τ t = M τ τ for t > τ , then it is easily checked that M τ = ( M τ t ) is a martingale, and that M τ τ = E τ . Thus we can obtain bounds on E τ by studying the martingale M τ .
From Lemma 4.1, we have that, for every τ ≥ τ 0 and every ℓ = 1, . . . , ℓ 0 , We now consider the transitions of the truncated process D ℓ 0 , with state space (Z + ) ℓ 0 . Recall that each transition involves the creation of one new vertex of degree 1, and the increase of a degree of an existing vertex by 1. This means that a transition of the truncated process involves an increase of 1 in D ℓ 0 (1), and either: (i) a decrease of 1 in D ℓ 0 (k) and an increase of 1 in D ℓ 0 (k + 1), for some k ∈ {1, . . . , ℓ 0 − 1}, (ii) a decrease of 1 in D ℓ 0 (ℓ 0 ), or (iii) no further change. In other words, the vector D ℓ 0 s+1 is obtained from D ℓ 0 s by adding one of the following vectors: (i) y k = e 1 − e k + e k+1 , for some k ∈ {1, . . . , ℓ 0 − 1}, (ii) y ℓ 0 = e 1 − e ℓ 0 , (iii) y 0 = e 1 . Here e j denotes the standard basis vector in Z ℓ 0 with a 1 in the jth coordinate and 0s elsewhere: here and in what follows, we abuse notation by suppressing the dependence on ℓ 0 . The transition probabilities are then given by Here too we have removed the superscripts ℓ 0 for clarity. We can write, for s ≥ τ 0 , We consider running the process up to some fixed τ > τ 0 : all our notation should specify the dependence on τ , but again where possible we shall suppress this.
We then have, for τ 0 ≤ s < τ , so that, for t ≤ τ , The process D is not in general a Markov process. However, we may define a process Y = Y τ by setting Y t = (D t , D t ) for t ≤ τ , and Y t = Y τ for t ≥ τ . This extended process Y is Markovian, with state space Our plan is to apply Theorem 2.5 to the Markov process Y , and, for ℓ = 1, . . . , ℓ 0 , to the projection function g = g ℓ taking (x, x) ∈ E to x(ℓ). For each k = 0, . . . , ℓ 0 , if D t+1 − D t = y k , then g(Y t+1 ) − g(Y t ) = [B t y k ](ℓ), the ℓ-th entry of the vector B t y k . Since B t is a product of non-negative substochastic matrices, it too is non-negative and sub-stochastic. The vector y k has all its entries in {0, +1, −1}, with at most two positive and one negative entries, so each co-ordinate of the vector B t y k is a sum of at most two entries of B t , minus at most one other entry. Therefore |[B t y k ](ℓ)| ≤ 1 for all t, k and ℓ. So, in applying Theorem 2.5, we may take J = 1.

5.
Bounds for Φ ℓ τ −1 (Y ) Our aim in this section is to prove Lemma 4.2, which states that is at most Recall that, for s < τ ≤ T ∧ T ∆ , and 1 ≤ k ≤ ℓ 0 , we have For this section, we may and shall assume that we do indeed have these bounds on the values of D s (k).
Recall also that, for s = τ 0 , . . . , τ −1, B s is the matrix product A τ −1 · · · A s+1 , and that We may now write, for 1 ≤ ℓ ≤ ℓ 0 , and τ 0 ≤ s < τ , where [B s ](i, j) denotes the (i, j)-entry of the matrix B s . We then have Provided we interpret B s (ℓ 0 , ℓ 0 + 1) as equal to zero, we can now bound the sum over k, for any s and any ℓ ≤ ℓ 0 , as For 1 ≤ ℓ < ℓ 0 , all terms in the sum with k > ℓ are zero, since the matrix B s is lower-triangular, and therefore we have The key task is thus to estimate the entries B s (ℓ, k) of the matrix product B s = A τ −1 · · · A s+1 , and in particular the differences |B s (ℓ, k)−B s (ℓ, k +1)|. The recurrence satisfied by these matrix entries is that, for 0 ≤ j < ℓ: since the only non-zero entries of A s in column (ℓ − j) are those in rows (ℓ − j) and (ℓ − j + 1). Substituting for the values of these entries yields For notational convenience, we fix ℓ ≥ 1 and write a j (s) = a Rewriting in terms of the a j (s) gives: (5. 2) The transition probabilities P s (D s , D s + y k ) can be expressed explicitly as kD s (k) 2(s − 1) for each s and k.
The recurrence satisfied by the a j (s) is then: for 0 ≤ j ≤ ℓ − 1, and τ 0 ≤ s ≤ τ − 1. We also have B τ −1 = I, the identity matrix, so that a 0 (τ − 1) = 1, and a j (τ − 1) = 0 for j > 0. Note also that a −1 (s) = 0 for all s, since the matrix B s is lower-triangular. These boundary conditions, together with the recurrence relation, suffice to determine all the values a j (s). There is a natural interpretation of the term a j (s): it is the probability that a fixed vertex v with degree ℓ − j at time s will have degree ℓ at time τ − 1. This can most easily be seen by checking that this system of probabilities satisfies the boundary conditions and the recurrence relation. In the notation of Section 3, One immediate consequence is that 0 ≤ a j (s) ≤ 1 for all j and s.
It may be of interest to note that there is a formula for the a j (s) as an alternating sum: One may verify that this formula satisfies the recurrence. It can also be obtained by observing that the matrices A s can be simultaneously diagonalised, leading to a formula for the matrix B s . We also obtain Although these formulae are quite appealing, we have been unable to extract useful bounds from them. At this point, we break into three cases. The main case of interest is when 8 ≤ ℓ ≤ 2ψ √ τ − 1, but we also need to deal with values of ℓ outside this range, and we do this first.
Although our exact expression for the a j (s) proved difficult to work with, we now give a function f j (s) which has a simple form, and which satisfies the boundary conditions and an approximate version of the recurrence; our plan is to show that a j (s) is close to f j (s) for all values of j and s.
For 0 ≤ j ≤ ℓ − 1 and 0 ≤ s ≤ τ − 1, set Throughout what follows, we shall set v = v s = s/(τ − 1), so We note that v τ −1 = 1, and so f j (τ − 1) = 0 for j = 0, while f 0 (τ − 1) = 1. We could formally define the function f −1 to be identically 0: the key identity we use for the binomial coefficients is which indeed entails ℓ−1 −1 = 0. However, we find ourselves having to deal with the case j = 0 as a boundary case separately anyway, and so we need make no (further) explicit mention of the case j = −1.
We claim that for all j ≥ 1 and 1 ≤ s ≤ τ −1. Our aim will then be to show that the term in square brackets is usually small, and that this is thus a good approximation to the recurrence satisfied by the a j (s). Rearranging the claimed identity, we see that it is equivalent to To verify this identity, we write Equation (5.3) demonstrates that the f j (s) are the analogues to the a j (s) for a continuous time version of the preferential attachment process. In this continuous time version, at time s, each vertex of degree k attracts a new edge (whose other endpoint is a new vertex of degree 1) at rate k/2s, independent of the degrees of other vertices. The degree of a given vertex is then a pure birth process with this transition rate. The probability that a vertex with degree ℓ − j at time s has degree ℓ at time τ − 1 satisfies the differential equation (5.3), as well as the boundary condition f j (τ − 1) = δ j0 .
It seems intuitively plausible that the difference e j (s) = f j (s) − a j (s) between the continuous and the discrete "solutions" will always be small. Indeed we shall prove the following lemma, which is very crude in most ranges.
Lemma 5.1. For all ℓ ≥ 8 and 0 ≤ j ≤ ℓ − 1, we have: We shall defer the proof of Lemma 5.1 to the next section.
We set We now show that the bound in Lemma 5.1 suffices to show that Φ ℓ τ −1 (Y ) is not much larger than Ψ ℓ τ −1 (Y ).
Lemma 5.2. For any ℓ and τ , with Proof. Equation (5.2) tells us that Φ ℓ τ −1 (Y ) is at most Using the inequalities a j (s) 2 ≤ 2f j (s) 2 + 2e j (s) 2 and Now we apply the bounds from Lemma 5.1: and similarly as claimed.
For k = 1, . . . , ℓ, we have that P s (D s , D s + y k ) = kD s (k)/2(s − 1), since each of the D s (k) vertices of degree k has probability k/2(s − 1) of receiving an extra edge at time s + 1. Therefore The double sum is the main term here, and we mainly concentrate on this; we will obtain adequate bounds on s f ℓ−1 (s) 2 as a byproduct of our estimates.

From (5.3) and (5.4), we have
where v = v s = s/(τ − 1), as before. We estimate the sum over s by approximating it by the integral The integrand here is bounded above by 1, since each f j (s) is at most 1.
has derivative which is a positive multiple of a quadratic function of v, so the function has just two stationary points, one either side of the zero v = (ℓ − j)/ℓ. Therefore the integrand, which is a positive multiple of the square of this function, has two local maxima. The sum is then at most the value of the integral plus the values of the integrand at the two local maxima, and so In the last line, we changed variable: recall that s = v 2 (τ − 1). We write for positive integers ℓ and j, and integer α, where ℓ > j and α ≥ −1.
The integral above can be evaluated as a sum of Beta functions. We will be confronted by a very similar integral when estimating Q 2 , and it is convenient to prove a lemma covering both cases (here we need α = 1 and later we shall take α = −1). Lemma 5.3. For integers ℓ and j with ℓ > j ≥ 0, and integer α ≥ −1, and Proof. For non-negative integers a and b, we have the identity where B(·, ·) denotes the Beta function.
For j ≥ 1, the required integral can be written as a sum of three integrals of the form above, and we obtain I(ℓ, j, α) as claimed.
For j = 0, we have for all α ≥ −1, also as claimed.
This is the first of several occasions in the paper where we use the in- the first is valid for all integers x ≥ 1, and the second for all integers x ≥ 0.
Sometimes, as below, we use simply that 2x We obtain To estimate the sum appearing above, we use the numerical value ∞ j=1 j −3/2 ≤ 2.61238, and the crude bound The next step is to estimate As before, we shall start by fixing j, and estimating the sum over s by the integral We used the expression for f j (s) − f j−1 (s) derived earlier, and made the substitution s = v 2 (τ − 1).
To bound the difference between the sum τ −1 s=τ 0 1 s (f j (s) − f j−1 (s)) 2 and the corresponding integral is not completely straightforward. The integrand 1 s (f j (s) − f j−1 (s)) 2 can be written as The function h j (v) has stationary points at Now we use the bounds for I(ℓ, j, −1) from Lemma 5.3. We also use that ℓ ≤ 2ψ √ τ − 1, and ψ ≥ 3, to obtain: The next task is to bound the sum where, as before, v = s/τ − 1. This sum is bounded above by the integral plus the maximum value of the integrand. The integral is equal to which is more than small enough for our purposes, and the integrand is certainly at most 1, so Proof. Using the expression for f ′ j (s) in (5.4), as well as the identities s = v 2 (τ − 1) and (ℓ − j) ( and then we have: Let us first verify the result for j = 1, when we can write The right-hand side is increasing from v = 0 to v = (ℓ − 1)(ℓ − 3) 2 /ℓ(ℓ − 2) 2 , and decreasing thereafter. It is thus always at least its value at v = 1, which is −(ℓ − 1)(2ℓ − 3), and at most its value at the stationary point, which is at most and thus which is as required. We now embark on the calculation for 2 ≤ j ≤ ℓ − 2. We define a The point is that the "main term" (1−v) j−2 v ℓ−j−2 in our expression for the second derivative of f is maximised at ϕ = 0, whereas the other term for all |x| < 1, and obtain: .
There is a final case where ϕ = −(j − 2), i.e., v = 1, and we may dispose of this immediately since the second derivative is zero unless j = 2. In and k 2 ≤ (2/e) |ϕ| otherwise.
This completes the proof.

VERTICES OF HIGH DEGREE IN THE PREFERENTIAL ATTACHMENT TREE 47
Thus we have f 0 (s) a 0 (s) = τ −1
Each term in the product is clearly at least 1, so f 0 (s) ≥ a 0 (s) for all s. If s ≤ ℓ, then we certainly have |e 0 (s)| ≤ f 0 (s) ≤ s/(τ − 1) ≤ ℓ/(τ − 1), so we may assume that s ≥ ℓ. Now we have, for all w ≥ s ≥ 2ℓ, This means that

Concentration for U t (ℓ)
In this section, we give a very brief sketch of the proof of Theorem 1.2. The proof proceeds on very similar lines to that of Theorem 1.1.
Recall that U t (ℓ) is the number of vertices of degree at least ℓ at time t. It is easy to show that the expected value of U t (ℓ) is close to u t (ℓ) = 2t/ℓ(ℓ+1), uniformly over t and ℓ.
We have P sup In particular, together with Lemma 7.1 and the fact that M τ τ = F τ for each τ ≥ τ 0 , this implies that We next use this inequality to show that, with probability at least 3/ψ, for all τ ≥ τ 0 and ℓ = 2, . . . , ℓ 0 , either |F τ (ℓ)| ≤ δ τ (ℓ) or τ ≥ T . Similarly to the proof of Theorem 1.1, this leads to the conclusion that P( T < ∞) ≤ 4/ψ, which is the desired result.

More complex preferential attachment models
In this section, we discuss some of the issues we confront when extending this proof to other models of preferential attachment.
A first extension would cover the model which again generates a random tree, where now an arriving vertex chooses an existing vertex v as a neighbour with probability proportional to X(v) + β, where X(v) is the degree of vertex v, and β is a fixed constant. For such a model, the expected degree of a vertex at time t grows as Ct 1/(2+β) , and the expected number of vertices of degree ℓ at time t behaves as Ct/ℓ 3+β . When attempting to follow the proof in this paper to establish concentration results, the main difficulty is in finding a suitable analogue of Lemma 5.1, giving bounds on the error function playing the role of e j (s).
Another well-studied variant is to have each arriving vertex select some fixed number m of neighbours (with replacement), instead of just one. The main difficulty introduced by this variation is that we have to account for the possibility that some existing vertex has its degree increased by more than one at each step, and that the recurrence relations do not have such clean forms.
In the full Cooper-Frieze model (see [6], [5]), the number of new edges added at each step is a random variable. Indeed, with some probability, no new vertex is added, and some edges are added between existing vertices, chosen either uniformly or via preferential attachment. This means that the numbers of vertices and edges present at time t are no longer determined, causing further complications in the application of our method.
We do believe that all of these problems can be overcome, and that our method can be used to analyse general Cooper-Frieze models. We also hope that the method will find further applications in the analysis of other random processes.