Extensive Condensation in a model of Preferential Attachment with Fitnesses

We introduce a new model of preferential attachment with fitness, and establish a time reversed duality between the model and a system of branching-coalescing particles. Using this duality, we give a clear and concise explanation for the condensation phenomenon, in which unusually fit vertices may obtain abnormally high degree: it arises from a growth-extinction dichotomy within the branching part of the dual. We show further that the condensation is extensive. As the graph grows, unusually fit vertices become, each only for a limited time, neighbouring to a non-vanishing proportion of the current graph.


Introduction
The classical model of preferential attachment is an increasing sequence of random graphs (G n ), beginning from a finite graph G 0 . To construct G n+1 from G n , a vertex p n is randomly sampled from G n , with the probability of picking each vertex v weighted according to its degree deg n (v). Then, a single new node is attached to p n via a single new edge. More generally, the new node may be joined via m new edges to m existing nodes, each sampled independently from G n , weighted by degree and with replacement.
This model is perhaps the simplest example of a stochastic model in which earlier gains (in the form of higher degree) confer an advantage towards future growth. It has been studied extensively and the structure of G n as n → ∞ is well understood; see for example Chapter 8 of van der Hofstad (2016) and the references therein.
The classical model was generalized by Bianconi and Barabási (2001), with the addition of fitness values for the vertices. A higher fitness value confers a better chance of attaching to the new incoming vertices. More precisely, nodes are assigned i.i.d. fitness values F v ∈ [0, 1], and a node v with fitness F v now carries weight F v deg n (v) (instead of deg n (v)). Cases in which F v has support [0, 1] but P[F v = 1] = 0 are of particular interest.
In such cases, as the graph grows large, it is possible that the vertices with fitnesses approaching 1 will capture a macroscopic fraction of the edges -a phenomenon known as condensation.
Using evidence from numerical simulations Bianconi and Barabási predicted that once their graph became large 'a single node captures a positive proportion of the links' -this is known as 'extensive' condensation. Dereich et al. (2017) showed recently that extensive condensation did not, in fact, occur.
A second extension of the classical model, known as preferential attachment with choice, was studied by Malyshkin and Paquette (2014). Their model does not include fitnesses; rather, to obtain G n+1 from G n a set {p 1 , . . . , p R } of vertices are sampled from G n , each using the same degree-weighted mechanism as in classic preferential attachment (independently, and with replacement). A single new vertex then attaches via a single new edge to whichever p i has the highest degree.
Malyshkin and Paquette showed that in their model a so-called persistent hub emerges -a single vertex v which, at some random time N , has maximal degree (within G N ) and which then remains as the vertex of maximal degree for all time. When R > 2, they establish extensive condensation through showing that the degree of the persistent hub grows linearly.
In the present article we introduce a new model, which modifies the model of Malyshkin and Paquette (2014) to include fitnesses. Like Bianconi and Barabási (2001), we take the vertex fitnesses to be i.i.d. values in [0,1]. In our model, to obtain G n+1 from G n , we sample vertices {p 1 , . . . , p R } from G n , each using the same degree-weighted mechanism as in classic preferential attachment (independently, and with replacement).
Then, attach a single new vertex v n to whichever p i has the highest fitness.
We will show that, in contrast to the Bianconi-Barabási model, in our model extensive condensation does occur. However, it occurs without the emergence of a persistent hub. This results in a delicate situation in which a succession of ever fitter vertices grow to eventually topple the previously dominant positions of older (and less fit) vertices. Our model provides the first rigorous example of a preferential attachment graph with extensive condensation via such behaviour. From hereon, let us refer to the model as PAC -'Preferential Attachment with Choice by fitness'.
We analyse PAC using techniques which, to our knowledge, are novel to preferential attachment; we exhibit a time-reversed duality between PAC and a system of branching-coalescing particles. This type of duality is perhaps best known in the context of population genetics where genealogical trees, described by branching-coalescing particles, are used to represent historical transfers of genetic information.
Note that sampling the vertex v weighted according to deg n (v) is equivalent to sampling a half-edge (in G n ) uniformly at random, and then picking the associated vertex v. For this reason it is advantageous to consider half-edges. For convenience, we assign to each half-edge the same fitness as its associated vertex. We will use genealogies to track new half-edges inheriting fitness values from pre-existing half-edges. These genealogies will be closely connected to the duality used, in a spatial model of population genetics, by Etheridge et al. (2017).
In the genealogical dual process of PAC, if we suppress coalescence and consider the behaviour when the graph is large, we obtain that the branching part of the dual approximates a Galton-Watson process, at least when restricted to only finitely many generations. Using this fact we will be able to give a clear and concise explanation of why (and, under what condition) condensation occurs: precisely, when this Galton-Watson process has positive probability of non-extinction. Non-extinction corresponds to the genealogy of a new half-edge extending far backwards in time, far enough that it has a 1 corresponds to the zero energy state.
Phase transitions, such as that characterising the emergence of a Bose-Einstein condensate, only become sharp when the number of particles tends to infinity. However, in this limit, there are two natural ways in which one might define what is meant by the emergence of a Bose-Einstein condensate. Firstly, we might ask that a macroscopic fraction of particles remain in the lowest energy state; alternatively, we might ask that a macroscopic fraction of particles become arbitrary close to the lowest energy state. The former definition corresponds to extensive condensation, the latter to non-extensive condensation.
More generally, condensation refers to the formation of an atom in the limit of sequence of measures. We refer the reader to van den Berg et al. (1986) for further discussion of Bose-Einstein condensation, and let us now, in the same spirit, offer a precise definition of condensation in the context of random graphs.
Consider an increasing sequence of finite graphs (G n ), with vertex and edge sets G n = (V n , E n ), in which each vertex v has a fitness value F v ∈ [0, 1]. We define the quantities µ n (A) = 1 2|E n | v∈Vn deg n (v)1 {Fv∈A} , (1.2) Thus, µ n is a random probability measure on [0, 1] which measures the fitnesses present in G n , weighted according to degree. The quantity ℓ n (A) is not a measure; it is the proportion of half-edges in G n that are attached to the highest degree vertex with fitness in A. 3. We say that condensation at a occurs around the persistent hub v, if v is a fixed vertex with fitness a such that lim inf n→∞ E 1 |En| deg n (v) > 0.
For many models with fitness, including PAC, the weak limit µ n → µ exists almost surely and the limit µ is deterministic. In such cases condensation at a is equivalent to µ possessing an atom at a. Extensive condensation occurs only when the degrees of individual vertices make non-negligible contributions to the formation of this atom.
These three definitions provide qualitative measures of how strongly the structure of G n becomes dominated by a small fraction of high degree nodes, as n → ∞. Clearly, 3 ⇒ 2 ⇒ 1.
As we have mentioned, we are interested in models for which condensation occurs either at a = 1 or not at all. In such cases, condensation occurs only through a positive fraction of the half-edges appearing on ever fitter vertices. Extensive condensation captures the more specific event that, in the limit, individual vertices become (each perhaps only for a limited time) neighbouring to a positive fraction of the graph.
Remark 1.2. From now on, we use the term condensation to mean condensation at 1.
Let us now summarise the various techniques which have been used to rigorously analyse condensation in models of preferential attachment, with particular attention given to models incorporating fitness and/or choice. Readers familiar with this literature may wish to move directly on to Section 1.2, and will not miss out on any notation by doing so.
We first recall a natural coupling between the classic preferential attachment model and an urn process. Fix, v 0 ∈ G 0 . Colour v 0 white and all other vertices black; pass these colours on to the associated half-edges. Now, regard each half-edge of G n as a coloured ball within an urn U n . From the dynamics of classic preferential attachment, the one-step dynamics of (U n ) are as follows. To obtain U n+1 from U n , we: 1. Draw a ball uniformly at random from U n and note its colour. Return this ball to the urn.
2. Add a new black ball to the urn, and also add a new ball of the same colour as was drawn in step 1.
Then, at all times, the number of white balls in U n is equal to deg n (v 0 ). The new black ball corresponds to the half-edge associated to a new vertex v; the drawn ball corresponds to sampling the (colour of the new half-edge attached to the) vertex to which v connects.
It is straightforward to extend the coupling to track the joint degree of multiple balls, using multiple colours. The first rigorous analysis of the Bianconi-Barabási model was provided by Borgs et al. (2007), who extended the idea described above to couple the model to a generalized Pólya urn process. In a generalized Pólya urn each colour is assigned a different activity value (in this case, given by a function of the fitness). Crucially, these activity values weight how balls are drawn from the urn, in a way that exactly matches the fitnessdependent sampling used in the Bianconi-Barabási model. With this coupling in hand, Borgs et al. invoked the limit theory of urns provided by Janson (2004), and showed rigorously that condensation occurred. However, this limit theory applies only when the urn has finitely many colours, meaning that discretization of the fitness values was a necessary step within the proof.
As we have mentioned, Bianconi and Barabási (2001) predicted extensive condensation within their model. This prediction was shown to be false by Dereich et al. (2017), who embedded the Bianconi-Barabási model in continuous time (a technique advocated by Janson) and, having done so, viewed it as a multi-type branching process with reinforcement. In this formulation, half-edges correspond to individuals within the branching process, and having greater fitness corresponds to being a type of individual that branches at faster rate. Individuals with the same fitness are referred to as a family. (In fact Dereich et al. considered a more general case than Bianconi and Barabási, by  including an extra parameter controlling the rate at which new edges appear between  existing vertices.) The argument given by Dereich et al. for non-extensive condensation proceeds via computations based on the growth rates and birth times of families, utilising the independence inherent within branching processes. Their result requires regular variation of the fitness distribution near 1, which covers the range of parameters of interest to Bianconi and Barabási. For non-regularly varying fitness distributions the behaviour is not known, but see Section 8 of Dereich et al. (2017) for a discussion.
The analysis of Malyshkin and Paquette (2014) relies heavily on the appearance of a persistent hub within their model. It proceeds by first showing that the number of possible persistent hubs is almost surely finite, followed by showing that for any two vertices, which one has higher degree may switch only finite many times. These arguments rely on comparisons to classical preferential attachment (which is also known to have a persistent hub). With this information in hand, Malyshkin and Paquette used stochastic approximation to analyse the growth of the persistent hub, which they show to have degree of asymptotic order n when R > 2 and order n log n when R = 2.
More generally, stochastic approximation is well established as a method of studying urn processes and preferential attachment models. We refer the reader to the survey article of Pemantle (2007) for details. A rather general application of stochastic approximation to an extension of the Bianconi-Barabási model can be found in Dereich and Ortgiese (2014). We will discuss the applicability of stochastic approximation to PAC in Remark 2.9.
Some authors have considered variants of preferential attachment with choice in which the chosen vertex is not (or is not always) the fittest or the most valent of the R samples. Examples of such models, which have typically been studied through stochastic approximation, appear in Malyshkin and Paquette (2015) and Haslegrave et al. (2020).
The latter includes a particular example with R = 3 and attachment to the vertex with middle fitness, in which condensation occurs at a random location within (0, 1).
For models with choice, the coupling described earlier results in an urn process for which multiple balls must be drawn and reacted to on each time step. Janson (2004) comments that such urns are often intractable, however we will be able to analyse the urn process arising from PAC using the aforementioned duality.

Multiple waves of natural selection
In populations genetics, models that feature multiple waves of natural selection towards ever fitter individuals, are rare. To our knowledge, at the present time all known tractable examples are close relatives of the model introduced by Desai and Fisher (2007), who described an extension of the Moran model in which mutations produce ever fitter individuals and selection brings the descendants of some of these individuals to dominance. A detailed rigorous analysis, in the limit of large population size, was given recently by Schweinsberg (2017); see also the references therein for variants and special cases that were treated in earlier articles.
In loose terms, we may compare a wave of natural selection in which a fit subpopulation emerges and grows to dominance (this is known as a selective sweep) followed by their later demise in a subsequent even fitter wave, to the growth and eventual decline of 1 n deg n (v), where v is a fit vertex within PAC. Schweinsberg (2017) showed that within the Desai-Fisher model, and under suitable assumptions, the initial growth of each new wave could be approximated by a branching process. However, this approximation breaks down once the new wave becomes a positive fraction of the total population, after which point a fluid limit is used. The same paradigm can be found within the infectious disease literature, for example in Ball and Sirl (2017) (for a single wave of infection), and also within the heuristics described for our own proofs in Section 2.2. However, in our case time will be reversed and we will be tracking the growth of the genealogies of half-edges.
There are substantial differences between the Desai-Fisher model and PAC. In the Desai-Fisher model individuals die and are replaced, whereas in PAC once a vertex has appeared it remains present forever. Moreover, in the corresponding regime of the Desai-Fisher model, the individuals that cause the (j + 1) th wave first appear during the j th wave, whereas within PAC we will see that a new fittest vertex born at time ≈ n β , where β ∈ (0, 1), will survive through several waves of dominance by less fit (but older) vertices, before it has its own chance at time ≈ n.

Results on preferential attachment with choice by fitness
Let us now define the notation which, from now on, we use (only) for PAC. From hereon we refer to PAC as 'the' model. The model is parametrized by (the distributions of) a pair of random variables, F taking values in [0, 1] and R taking values in N. For clarity, we use the convention that N = {1, 2, . . .}, so R does not take the value 0. Let (F n ) be a sequence of i.i.d. samples of F , and let (R n ) be a sequence of i.i.d. samples of R.
We describe an increasing sequence of random graphs (G n ) n≥0 with vertex and edge sets G n = (V n , E n ). We begin from a graph G 0 , which we will take to be a single vertex v 0 with a self-loop. In fact, our results hold for an arbitrary finite initial graph G 0 , but we follow a common convention and make this choice for simplicity.
Definition 2.1. The dynamics of (G n ) are as follows. At each time step we will add a single new vertex v n to the graph, so that V n = {v 0 , v 1 , . . . , v n }. The new vertex v n is assigned the fitness value F n . Given G n−1 and the fitnesses of its vertices, we attach v n according to the following rule.
1. First, we sample an ordered set of R n existing vertices, which we label as (p n,j ) Rn j=1 .
Each of the p n,l is sampled independently and with replacement, from V n , according to preferential attachment. That is, for each index l = 1, . . . , R n , the probability of picking the vertex v ∈ V n is proportional to deg n−1 (v).
2. We define the (unordered) set P n = {p n,1 , p n,2 , . . . , p n,Rn }. (2.1) A single new vertex v n joins the graph by attaching via a single new edge to the fittest vertex in P n .
We assume that the distribution of F is absolutely continuous, with essential supremum 1. Consequently, distinct vertices have unique fitness values and step 2 is well defined.
Remark 2.2. Within P n , which vertex is fittest depends on the order of the fitness values, but not on their specific values. Thus, whilst µ n defined by (1.1) does depend on the distribution of F , in fact in PAC the distribution of the graph G n does not.
The key parameter in PAC is the distribution of R, which affects both µ n and G n . Heuristically, when R tends to take larger values, we should expect that fit vertices will become more successful at capturing edges, thus making condensation more prone to occur. We will assume, throughout, that We now state our results rigorously. Our first result sets the scene, and shows that as n → ∞ each vertex grows towards infinite degree but, whilst doing so, does not become a persistent hub. Theorem 2.3. Let v be a (deterministic) vertex. Then, deg Gn (v) → ∞ almost surely, and 1 n deg Gn (v) → 0 in probability.
The next result describes the precise limiting distribution of the degree weighted fitnesses distribution µ n , as n → ∞. Of course, this results in a characterization of when condensation occurs. The statement involves a particular Galton-Watson process which, as we have already mentioned, will play a key role in the proof. (2.2) Let (C n ) n∈N be a sequence of i.i.d. copies of F , independent of L. Then, almost surely, as n → ∞, µ n converges weakly to the probability measure µ on [0, 1] given by where a ∈ [0, 1]. Combining Theorems 2.3 and 2.6, we have that extensive condensation occurs without the formation of a persistent hub. Our proofs of the above theorems rely on a time-reversed duality, between (G n ) and the genealogy of an urn process (U n ), which is naturally coupled to (G n ) in the same style as described (for classical preferential attachment) in Section 1.1. The genealogy of (U n ) can in turn be coupled, but only for a limited time, to a Galton-Watson tree T n with offspring distribution (2.2). We introduce these couplings in Sections 2.1.1 and 2.1.3, to be followed by a heuristic outline of the proofs in Section 2.2. The proofs themselves, of Theorems 2.3, 2.4 and 2.6 are given in Sections 3, 4 and 5 respectively.
In Section 6 we discuss a natural extension to our results; we consider the effects of incorporating a mechanism commonly used to control the strength of preference that incoming vertices have for making connections to high degree vertices. In PAC this mechanism is closely related to attaching new vertices onto the existing graph via multiple new edges.

Coupling to an urn process
We define an urn process (U n ) which will be coupled to (G n ). In the urn, each ball will have a colour, represented as a number in [0, 1], and this colour corresponds to a fitness value (of a vertex) in the graph model. The balls themselves correspond to the half-edges of the graph. We write balls in bold case e.g. u, and we write the colour of u as col(u). From now on, we will use the terms fitness and colour interchangeably. Formally, let U n be the set of half-edges in the graph G n , where n ∈ N 0 . For each u ∈ U n , we set col(u) to be the fitness of the vertex to which u is attached.
Definition 2.7. The dynamics of the process (U n ) are as follows. Label the two initial half-edges in G 0 as c 0 and s 0 . To construct U n , given U n−1 , do the following: 1. Draw R n balls, independently and uniformly at random, from U n−1 . Label these balls P n = {p n,1 , . . . , p n,Rn } . 3. Define U n = U n−1 ∪ {c n , s n }.
Using the notation above, we divide the balls within U n into two distinct types: the cue balls c n and source balls s n . Recall that we take the initial graph G 0 to be a single vertex with a self-loop. We extend the terminology of 'cue' and 'source' to U 0 , by writing U 0 = {s 0 , c 0 } and specifying that s 0 is a source ball and c 0 is a cue ball. We write S n = {s 0 , s 1 , . . . , s n } and set S = ∪ n S n . We define C n and C analogously for cue balls. Thus, U n = S n ∪ C n and we set U = S ∪ C.
The process U n is a projection of G n , in the sense that U n forgets the graph structure and remembers only how many half-edges of each colour were present in G n . Nonetheless, U n is a Markov process with respect to the filtration generated by (R n , F n , P n ). Note that the random measure µ n satisfies µ n (A) = 1 |U n | b∈Un 1{col(b) ∈ A}.
(2.5) Thus, µ n (A) is the proportion of balls with colour ∈ A at time n. We can therefore understand (2.3) as expressing the limiting distribution of the colour of a ball drawn (uniformly) from the urn at large time.

Representation as a genealogy
We equip the balls in the urn U with a genealogy that records the way in which each new cue ball c n inherits its colour from a single pre-existing ball. We will use terminology from population genetics to describe this genealogy. The fitness values (i.e. colours) play precisely the role of fitnesses in population models.
We say that P n from (2.4) are the potential parents of c n . We refer to the unique ball in P n with colour col(c n ) as the parent of c n . We say that c n is a child of its parent ball. To handle time n = 0, we say that s 0 is the parent of c 0 , and we give c 0 precisely R 0 potential parents all of which are equal to s 0 , where R 0 is an independent copy of R.
Lastly, source balls do not have any parents or any potential parents.
A finite sequence (b (k) ) K k=1 of balls in which, for all k, the ball b (k+1) is the parent of b (k) (resp. potential parent of b (k) ), and in which b (K) is a source ball, is said to be the ancestral line (resp. a potential ancestral line) of b (1) . We stress that each ball has a unique ancestral line, but multiple potential ancestral lines. Each potential ancestral line ends in a source ball, which necessarily has no potential parents. Given any ball b ∈ U , we write b ↓ for the set of balls that appear in one or more of the potential ancestral lines of b, including b itself. The set b ↓ is known as the set of potential ancestors of b. If we Figure 1: A graphical representation of the genealogy of the urn (U n ), in a case with only two different colours of balls. Columns correspond to balls, with source balls on the left and cue balls on the right. Rows correspond to time-steps, with one new source ball and one new cue ball introduced on each row. The relative fitness of the two colours is shown by an inequality. Smaller black dots correspond to potential parents chosen by cue balls, and the lines connecting them to balls represent the genealogy. Looking backwards in time, branching is visible at each step of time when the new cue ball samples its potential parents, as black dots on the same row. Coalescences occur when the same potential parent is sampled more than once (at possibly different times); one such event is visible at n = 1, 2 within the column of the initial cue ball.
couldn't see the fitness values of the balls, but could see which balls made up the sets P n , then b ↓ represents the full set of balls which might have been lucky enough to have their own fitness value passed on b. Thus col(b) = max col(u) : u ∈ b ↓ = max col(s) : s ∈ b ↓ ∩ S . (2.6) In words: the colour of b is the colour of the fittest source ball within its potential ancestors. This fact is a natural consequence of point 2 of Definition 2.7. See Figure 1 for a graphical explanation. Equation (2.6) is in the same spirit as the duality used (for a version of the spatial Λ-Fleming-Viot process) by Etheridge et al. (2017). More generally, dualities of this kind are instances of ancestral selection graphs, introduced by Krone and Neuhauser (1997). The selective mechanism of always choosing the potential parent with maximal fitness simplifies their structure considerably, whereas in general they can lead to quite intractable dual processes.
We will be particularly interested in the structure of c ↓ n , when n is large. It is natural to view c ↓ n as a branching-coalescing structure: coalescence of (potential) ancestral lines occurs when a given ball is a (potential) parent to more than one cue ball. Similarly, when a cue ball has more than one potential parent we say that it as a branching of potential ancestral lines.
We write b ↑ for the set of balls which contain b within their ancestral line. The set b ↑ is known as the family or descendants of b. When b is a source ball we refer to b as the founder of the family b ↑ , and the members of this family at time n are U n ∩ b ↑ . Note that all elements of b ↑ have the same colour as b, and that if v is the vertex to which the source ball s k is attached, then (2.7) We stress that b ↑ is based on ancestral lines, whereas b ↓ is based on potential ancestral lines. Hence, b ↑ depends on the sequence of fitnesses (F n ), but b ↓ does not. Theorem 2.3 states that P[|b ↑ | = ∞] = 1 and 1 n |b ↑ ∩ U n | → 0 in probability, for any fixed ball b ∈ U. Theorem 2.6 is more complex to translate, but note that it is implied if, as n → ∞, we see a non-vanishing probability that sup k≤n 1 n |s ↑ k ∩ U n | has asymptotic order n. That is, the size of the largest family at time n should be of order n.

Coupling to a Galton-Watson process
There is a natural coupling between the urn process (U n ) and a Galton-Watson process, which we will now describe. This coupling is only valid for a limited time; a Galton-Watson tree can only accurately represent the genealogy of c n as far backwards in time as that genealogy remains tree-like. Let W n 0 = {c n }. Then, iteratively, define W n k = {p : p was a potential parent of some b ∈ W n k−1 }.
Note that W n k is an (unordered) set, so that even if p is a potential parent to more than one b ∈ W n k−1 , only one instance of p appears in W n k . Note also that W n k ⊆ c ↓ n contains precisely the k th generation of potential ancestors of c n Since source balls have no potential ancestors it is possible for W n k to become empty. Let In words, K n is the first generation of W n k in which we encounter a coalescence. We use the usual convention that the empty set has inf ∅ = ∞, covering the case where no coalescences are encountered (which may occur when if W n k becomes empty).
Lemma 2.8. Let W n k = |W n k |, and let (Z k ) be a Galton-Watson process with offspring distribution given by (2.2). Then there exists a coupling such that (W n k ) Kn−1 Proof. Let us first note that, by Definition 2.7, the potential ancestry of any given ball is independent of the fitness values of (all) balls. Thus fitnesses play no role in this proof; all quantities considered are independent of them.
Given the process (W n k ), we define Z k = W n k for k < K n , and for k ≥ K n we allow Z k to evolve independently of (W n k ) as a Galton-Watson process with offspring distribution (2.2). It remains to show that Z k has the desired distribution for j < K n . To this end, let us consider when k < K n and W n k = m. Thus Z k = W n k , and we look to calculate the distribution of Z k+1 . We must consider two cases.
• If k + 1 = K n then by definition of Z k , then the transition Z k → Z k+1 will be independent of (W n k ) and will be that of a Galton-Watson process with offspring distribution (2.2).
• If k + 1 < K n then Z k+1 = W n k+1 . We have W n k = m, so W n k contains m (distinct) balls, but we do not know the identity of these balls. Moreover, because k < K n , each such ball is not an element of W n 0 ∪ . . . ∪ W n k−1 . Thus, each such ball is, independently of each other, and independently of W n 0 ∪ . . . ∪ W n k , a cue ball with probability 1 2 and a source ball with probability 1 2 . By point 1 of Definition 2.7, these cue balls each, independently, have i.i.d. numbers of potential parents with common distribution matching that of R. Because k + 1 < K n , the identities of these potential parents are distinct. Thus, W n k+1 is the sum of m i.i.d. copies of R, and the transition W n k → W n k+1 is that of a Galton-Watson process with offspring distribution (2.2).
In both cases, the transition Z k → Z k+1 has the desired distribution.
We will show in Lemma 3.2 that P[K n ≤ k] → 0 as n → ∞, for all k ∈ N. In words, as n → ∞ the coupling of c ↓ n to a Galton-Watson tree remains valid for an arbitrarily large O(1) number of generations of this tree.

Outline of proofs
All of our proofs rely on the couplings detailed above. The proof of Theorem 2.6 relies on analysing the genealogy of (U n ) directly, whereas Theorem 2.4 uses only the Galton-Watson coupling, and Theorem 2.3 uses both. We outline all three proofs in this section.
Let us discuss Theorem 2.3 first. In terms of the urn process, the first part of Theorem 2.3 asserts that P[|u ↑ | = ∞] = 1. The proof rests on the observation that, when P n is sampled, then for any fixed ball u, the probability of u ∈ P n is of order 1 n as n → ∞. If we could apply the Borel-Cantelli lemma then, with a little extra work we could deduce that (almost surely) u was a parent infinitely often, thus |u ↑ | = ∞. Unfortunately, the lack of independence means the Borel-Cantelli lemma does not apply; instead we will use the Kochen-Stone lemma.
The second part of Theorem 2.3 asserts that |u ↑ ∩ U n |/|U n | → 0 in probability. Because this quantity has expectation (1 To prove the latter, we use that the genealogy of c ↓ n is that of a Galton-Watson tree, at least for a large O(1) number of generations. If this Galton-Watson tree dies out (i.e. in O(1) generations) then it has bounded size and is unlikely to include any fixed ball, in particular u. If it does not die out, then c ↓ n will include many source vertices, at least one of which is likely to be fitter than u. In both cases, c n / ∈ u ↑ . Theorem 2.4 establishes the limiting distribution of colours present in U n . Our proof first establishes the result in the case where only a two element set {0, 1} of colours are permitted. It is straightforward to upgrade this case into Theorem 2.4. The argument for the two colour case relies on establishing the distribution of col(c n ) as n → ∞. Heuristically, as n → ∞, we again compare c ↓ n to a Galton-Watson tree, and again the extinction/non-extinction dichotomy is key. If the Galton-Watson tree dies out, then the colour of c n is the maximal colour of the source balls at its leaves. If it does not die out, then c ↓ n contains many generations, which will mean that high probability there will be a source ball of maximum colour (i.e. colour 1, in the two colour case) within c ↓ n , in which case col(c n ) = 1. Recalling that half of all balls are cue balls, and the other half sources, along with (2.2) these considerations lead directly to the formula (2.3) given in Theorem 2.4. The first term on the right of (2.3) represents the i.i.d. colours of source balls, the latter term represents the cue balls.
The proof of Theorem 2.6, given in Section 5, takes up the majority of the present article. The outline is as follows. Finding the largest family at time n is essentially the same as identifying which source s k , for k ≤ n, was most likely to have founded the family to which c n belongs. This, in turn, relies on understanding the behaviour of the genealogy of c ↓ n during the stage at which it stops being tree-like, and coalescences start to have a significant effect. Thus, we are trying to examine a property of the model that the Galton-Watson coupling will not capture. For this reason, the Galton-Watson coupling is not used in the proof of Theorem 2.6, and we do not think in terms of the associated parent-child generations.
Consider the urn process (U k ) n k=0 . Looking backwards in time, as k decreases from n to 0, we will see that at around time k ≈ n β , where there starts to be a positive probability that a potential parent sampled for c k will already have been sampled as a potential parent of some c j ∈ c ↓ n for k < j ≤ n. More precisely, we will follow the process backwards in time (i.e for k = n, n − 1, . . . , 0). Note that H n · may both grow and shrink in size. During the transition k + 1 → k, the set H n · will lose s k and c k if they were present, but if c k was present then any potential parents of c k that were not already present will be added in.
We denote the number of elements in the set H n k by |H n k |. We will see that when k ≈ n β , |H n k | is of order k. Consequently, at this point a potential parent of c k+1 has a non-negligible chance of being in H n k ; thus coalescence becomes non-negligible. In fact, the force of coalescence very quickly becomes strong, with the consequence that for k ≪ n β essentially the entire urn U k will be included in c ↓ n , and in particular essentially all sources s j with j ≪ n β will be included. However, the fittest source ball s j ∈ c ↓ n will, with high probability, be born during a 'critical window' of time, j ∈ [cn β , Cn β ] where c > 0 is small and C < ∞ is large.
We now summarise the techniques within the proof. When k ≫ n β we will use iterative arguments, backwards in time, to construct bounds on the N ∪ {0} valued process |H n k |. The resulting bounds on |H n k | will eventually break down, because in order to stay tractable they will partially ignore coalescences. However, they will stretch just far enough to see that, when k ≈ Cn β for suitably large C, the set H n k comprises a small but non-negligible fraction of U k , with positive probability. We then switch techniques, and for k ∈ [cn β , Cn β ] we aim to establish a fluid limit for the [0, 1] valued process k → |H n k |/|U k |, in reverse time as k decreases. After a suitable rescaling of time, this limit turns out to be an ordinary differential equation, with a stable fixed point at 1 and an unstable fixed point at 0; so starting just above zero results in attraction towards 1. Having established the ODE limit, the key question becomes whether the critical window is actually long enough for the ODE to escape from 0. Using an artificially longer critical window, by e.g. taking a larger value of C, does not help because this results in an initial condition closer to zero. However, on escaping 0, we obtain non-vanishing behaviour for |H n k |/|U k | during k ∈ [cn β , Cn β ], which results in a positive probability that s k ∈ c ↓ n . The final step of the proof involves combining the above results with the records process of col(s k ). We show that an unusually fit source ball born at around time k ≈ n β may start a family that grows to include a non-vanishing proportion of U n , as n → ∞.
Remark 2.9. Let us briefly survey what we might achieve via alternative methods. For PAC, the techniques used by Malyshkin and Paquette (2014) are unavailable because a persistent hub does not emerge. The techniques used by Dereich et al. (2017) to analyse the Bianconi-Barabási model are not available either, because we do not have independently growing families.
It is possible to use stochastic approximation to recover Theorem 2.4, but doing so results in a description of µ through the fixed points of a family of differential equations. This is much less appealing than the intuitive formula (2.3) provided by the Galton-Watson coupling. By contrast, it does not seem feasible to prove Theorem 2.6 via stochastic approximation. The vertex with greatest degree switches identity infinitely often and this greatly increases the amount of information which must be tracked. Our attempts to find an alternative proof along such lines resulted in requiring more detailed information about the sensitivity of rather general families of ODEs to small perturbations than we were able to extract. We discuss this issue a little further, after the key proof, in Remark 5.4.

Proof of Theorem 2.3
In this section we prove Theorem 2.3 which, re-phrased in terms of the urn process (U n ), is split across two lemmas: we prove that P[|u ↑ | = ∞] = 1 in Lemma 3.1, and that |u ↑ ∩ U n |/|U n | → 0 in probability in Lemma 3.4.
Proof. We consider the case of u = s 0 and suppose that col(s 0 ) = α > 0. It is easily seen that the argument for this case can be adapted to a general ball u. Let A n = {p n,1 , . . . , p n,Rn are all source balls} ∩ {p n,1 is the fittest of the p n,j } ∩ {p n,1 = s 0 }.
Note that, for any n, the probability that a (given) potential parent is both a source ball and less fit than s 0 is α 2 . Note also that P[p n,1 = s 0 ] = 1 2(n+1) , from which it is easily seen that P[A n ] has order 1 n .
We will prove the present lemma by showing that A n occurs infinitely often. Since the A n are correlated we will use a version of the Kochen-Stone lemma: This result can be found as Theorem 1 of Yan (2006). We will take E n = A in , where i n is defined as follows. Let r = inf{r ∈ N : P[R = r] > 0} and set q = P[R = r]. Define i 0 = 0 and i n+1 = inf{l ∈ N : l > i n , R l = r, and the (p l j ) r j=1 are distinct source balls}.
The events {R n = r and (p n,j ) r j=1 are distinct} are mutually independent for different values of n. Moreover, for any ǫ > 0, for large enough n the chance of the (p n,j ) r j=1 being distinct is at least 1 − ǫ, and the chance of them being distinct source balls is at least ( 1 2 ) r − ǫ. Therefore, it follows from the strong law of large numbers that (( 1 2 ) r − ǫ)q ≤ lim inf n in n ≤ lim sup n in n ≤ ( 1 2 ) r q a.s. and thus, since ǫ > 0 was arbitrary, P in n → q2 −r = 1. (3.2) Until further notice, we condition on the sequence (i n ) and work with the conditional measure P ′ [·] = P[· | σ(i 0 , i 1 , . . . , )]. Note that, under P ′ , the (p in,j ) r j=1 are conditioned to be distinct source balls, and thus are distributed as a uniformly random subset of Here, the term 1 in is the probability of p in,1 = s 0 (given that p in,1 is a source ball) and α r−1 is the probability that the other potential parents are all with fitness less than α (given that they are distinct source balls). We now consider Here, as usual, f n = O(g n ) means that lim sup n |f n /g n | < ∞. Putting these together and cancelling factors of α, in view of (3.1) we are interested to calculate the limit as N → ∞ of It is easily seen that J N → 1 as N → ∞, and since ǫ > 0 was arbitrary we conclude that also I N → 1. We thus have (3.1) (with E n = A in ), and hence P ′ [A in infinitely often] = 1. Hence also P[A n infinitely often] = 1.
We write T n k = ∪ n k=0 W n k and T n = ∪ ∞ k=0 T n k . Note that T n = c ↓ n , which we accept as a small piece of redundancy in our notation. We write L n k = T n k ∩ S for the set of source balls in T n k . Note that this is similar too, but not quite the same as, the set of leaves of T n k ; because T n k is curtailed at generation k, it may also have a number of cue-balls amongst its k th generation leaves. However, all leaves of T n are source balls.
Proof. We remark that if no coalescences occurred in c ↓ n , then K n = ∞ and the statement of the lemma holds trivially. Let us refer to the single element of W n 0 as the 'root'. Fix k ∈ N. Since P[R < ∞] = 1, it is easily seen that by choosing suitably large A ∈ N we obtain sup n P[|T n k | ≥ A] ≤ ǫ. For each b ∈ T n j there is a potential ancestral line, containing at most k + 1 balls, between b and the root. Following this ancestral line backwards in time, the potential parents were chosen uniformly at random from the current urn. By choosing δ > 0 small, we may control the chance that any of the (at most k + 1) such potential parents along this line were sampled from within U ⌊δn⌋ . Thus, we may choose δ ∈ (0, 1) and N ∈ N such that sup n≥N P[T n k ∩ U ⌊δn⌋ = ∅] ≤ ǫ. Conditional on the event {T n k ∩ U ⌊δn⌋ = ∅ and |T n k | ≤ A}, each potential parent of each element of T n k was sampled uniformly from a set of balls with at least δn elements. The expected number of such potential parents is O(AE[R]) = O(1), and the chance of choosing any particular ball as a potential parent is O( 1 δn ). Hence, the probability of seeing the same parent twice tends to zero as n → ∞, and consequently P[K n ≤ k] → 0 as n → ∞. The result follows. Lemma 3.3. For all k, n ∈ N, it holds that P [|L n k | < k/2 and W n k = ∅] ≤ ( 1 2 ) k/2 Proof. Let A n k denote the event that there is a potential ancestral line of c n containing at least k cue balls, and that at least k/2 of these cue balls had no source balls amongst their potential parents. If W n k is non-empty then, by definition of W n k , there must be a potential ancestral line of c n that intersects W n k . Note that this potential ancestral line contains k cue balls, corresponding to k generations of c ↓ n . If, additionally, |L n k | < k/2 then the event A n k must occur. In summary, {|L n k | < k/2 and W n k = ∅} ⊆ A n k . Each potential parent has probability 1 2 of being a source ball. Hence, for all j, P[P j ∩ {s 0 , s 1 , . . .} = ∅] ≤ 1/2. Since a potential ancestral line cannot include the same cue ball twice, P[A n k ] ≤ (1/2) k/2 . The stated result follows.
Proof. We will show L 1 convergence to zero, which is equivalent to convergence in probability because |u ↑ ∩ U n |/|U n | ≤ 1. Note that Since |U n | = 2n + 2, it suffices to show that P[c n ∈ u ↑ ] → 0 as n → ∞. Note that P[F 1 < F 2 ] is the probability that one source ball has fitness strictly less than that of another. Since the fitnesses are independent, we have P because, on the event that c n ∈ u ↑ and |L n k | ≥ k/2, at least k/2 source balls in T n k must either have fitness strictly less than that of u. Let ǫ > 0. Let δ > 0, k ∈ N, N ∈ N, to be chosen shortly (dependent upon ǫ). For all n ≥ N we have P c n ∈ u ↑ ≤ ǫ + P c n ∈ u ↑ , K n ≥ k and T n k ∩ U ⌊δn⌋ = ∅ ≤ ǫ + P c n ∈ u ↑ , K n ≥ k and W n k = ∅ . ≤ ǫ + 1 2 k/2 + P c n ∈ u ↑ , K n ≥ k and W n k = ∅ and |L n k | ≥ k/2 ≤ ǫ + 1 2 k/2 + 1 2 k/2 The first line of the above follows from Lemma 3.2, from which we obtain N and δ. By increasing N , if necessary, we may assume that u ∈ U ⌊δn⌋ . The second line then follows because, given that c n ∈ u ↑ and T n k ∩ U ⌊δn⌋ = ∅, the ancestral line linking c n to u must extend beyond T n k , and in particular W n k must be non-empty. The third line then follows by Lemma 3.3. The final line follows from (3.3). Choosing k large enough that 2( 1 2 ) k/2 < ǫ obtains that for all n ≥ N , P c n ∈ u ↑ ≤ 3ǫ. This completes the proof.

Proof of Theorem 2.4
In this section we prove Theorem 2.4. Throughout Section 4 we will adopt the conditions and notation used in the statement of Theorem 2.4 In particular, let L be the number of leaves on a Galton-Watson tree with offspring distribution (2.2) and let µ be the measure on [0, 1] defined by (2.3). Let (C i ) be a sequence of i.i.d. copies of F . Our proof proceeds by first establishing Theorem 2.4 for a fitness distribution F with only two possible values, 0 and 1. Note that, as defined in Section 2, the model does not currently allow for such a case because we had specified that the fitness distribution F must be continuous on [0, 1]. For general F , the extra difficulty is that we must handle the possibility that there may not be a unique fittest vertex (resp. ball) within P n (resp. P n ), defined by (2.1) (resp. (2.4)). This extra difficulty is not more than an irritation, which is why we excluded it in Section 2. It is convenient to first describe the case of non-absolutely continuous F at the level of the urn process U n -which, we recall, is a projection of the graph G n that records degrees via (2.7) but forgets the rest of the graph structure. We then show how to reconstruct (G n ).
To define U n , Definition 2.7 still applies exactly as written, but to define the associated genealogy we must specify how parent-child relationships are defined in the (additional) case that the potential parents P n do not contain a unique fittest ball. If there is not a unique fittest element of P n , then the parent of c n is chosen uniformly at random from the fittest balls within P n . Subject to this extra rule, the genealogical structure in Section 2.1.2 remains well defined. Let us denote the parent of c n as simply q n ∈ P n .
To define G n , as before we will take the balls of U n to be set of half-edges of G n . We will specify how to form these half-edges into a graph, conditionally given the various processes (U k , (p k,l ), q n , R k , F k ) ∞ k=0 . We will proceed inductively. As before, we take G 0 to be a single vertex with a self-loop, and U 0 = {c 0 , s 0 } contains two balls of the same colour corresponding to two half edges of the same vertex. Given G n−1 , we already know the vertex set V n−1 and we know which half-edges within U n−1 are attached to which vertices. We attach a single new vertex v n via single new edge, as follows.
1. For l = 1, . . . , R n we define p n,l ∈ V n−1 to be the vertex attached to the half-edge p n,l ∈ U n−1 . We define P n = {p n,1 , . . . , p n,Rn }. Thus the (p n,l ) Rn l=1 are i.i.d. degreeweighted samples from V n−1 . 2. We attach the half-edge c n to same vertex as its parent half-edge q n ∈ U n−1 is already attached to (within G n−1 ). We specify that (c n , s n ) will together comprise a new edge, and we attach s n to a new vertex v n . This new vertex is assigned fitness F n .
It is immediate that, when F is continuous, the above mechanism precisely matches Definition 2.1. Moreover, it preserves the connections (2.5) and (2.7) between G n and U n . In Section 4.1 we will apply Lemmas 2.8, 3.2 and 3.3 in this extended context.
Their proofs go through exactly as before -in fact this is immediate because they were concerned only with potential ancestors, the identities of which are unaffected by fitness values. Restricting to only two colours, the equivalent statement to Theorem 2.4 is as follows. With Proposition 4.1 in hand, it is straightforward to deduce Theorem 2.4. We give this argument first, to be followed by the proof of Proposition 4.1.
Proof of Theorem 2.4, subject to Proposition 4.1. Recall that Theorem 2.4 assumes a uniform fitness distribution on [0, 1]. Fix a ∈ [0, 1). Define f (x) = 1{x > a}, and define a new, two colour, urn process U n , with the same set of balls as U n and the same distribution for R n , by considering balls with fitness x to have the new fitness x = f (x). Thus, our new urn process has fitness space {0, 1} and fitness distribution F satisfying , 1]]. Let us write µ n for the empirical measure of colours within U n , analogous to (2.5).
Proposition 4.1 applies to our new urn process U n . Hence,  Berti et al. (2006) hold, with the conclusion that, almost surely, µ n converges weakly to µ.

Proof of Proposition 4.1
Recall that the conditions of Proposition 4.1 specify that the fitness space is a two point set {0, 1}, and that each fitness occurs with positive probability. We assume these conditions for the duration of Section 4.1. The first step of the proof is to show that, as n → ∞, To see this, recall that Lemma 2.8 states that W n k = |W n k | has the same distribution as a Galton-Watson process, with offspring distribution (2.2), for generations k ≤ K n . Let (Ŵ n k ) k≥0 be a Galton-Watson process with this same offspring distribution, and couplê W n k and W n k such thatŴ n k = W n k for all n and k ≤ K n . LetL be the number of leaves of (Ŵ n k ), and let (C i ) be a sequence of i.i.d. random variables, each with distribution F . We note that the offspring distribution M , given by (2.2), ofŴ n k does not depend on n. Since P[M = 0] ∈ (0, 1), it is easily seen that does not depend on n and, moreover, tends to zero as k → ∞. We note also that for all k, n ∈ N, P [col(c n ) = 0 and W n k = ∅] ≤ ( 1 2 ) k/2 + P [col(c n ) = 0 and |L n k | ≥ k/2] ≤ ( 1 2 ) k/2 + P[F = 0] k/2 . (4.4) In the above, the first line follows by Lemma 3.3, and the second line follows because col(c n ) = 0 when, and only when, every source ball in c ↓ n has colour 0. Let ǫ > 0, let k ∈ N be such that (4.4) and (4.3) are both ≤ ǫ, and let N ∈ N be chosen as in Lemma 3.2. Then, for n ≥ N we have Extensive condensation in a model of preferential attachment with fitness We now upgrade (4.2) into the full statement of Proposition 4.1. Note that the case a = 1 of (4.1) claims that 1 → 1, which is true, so it remains only to prove the case a = 0.
We have Noting that |S n |/|U n | and |C n |/|U n | are both equal to 1 2 , we obtain from the strong law of large numbers that the first term of the above tends (almost surely) to 1 2 P[F = 0], and it remains to consider the term labelled ν n . Thus, to prove (4.1) we must show that ν n a.s. → P L < ∞ and max i=1,...,L C i = 0 .
(4.5) From (4.2) we already know that E[ν n ] converges to the right hand side of the above equation so, by dominated convergence, equation (4.5) follows if we can show that the random sequence ν n converges almost surely to a deterministic limit ν. To establish this fact we will use the 'usual' machinery of stochastic approximation (c.f. Remark 2.9).
Let (F n ) be the filtration generated by (ν n ). Let A n be the event that the potential parents (p n l ) Rn l=1 of c n are all distinct, and let A c n denote its complement. (4.6) Note that in the final line of the above we use that R n is an independent copy of R, with a distribution that does not depend on n. Let M R denote the moment generating function of R n , which does not depend on n. Let us write λ = P[F = 0] for the probability that a given source balls has colour 0. Then, In the above, to deduce the second line from the first, we condition on the number R n+1 = r of potential parents of c n+1 , and also on the number s of potential parents of c n which are source balls; then, if all these potential parents are distinct, (ν n ) r−s (λ) s is the probability that all potential parents of c n+1 have colour 0. We also use (4.6). The third line follow from elementary calculations, and the fourth line follows by using (4.6) again.

Proof of Theorem 2.6
In this section we prove Theorem 2.6, which asserts that extensive condensation occurs in the model. We assume the conditions of this theorem for the duration of Section 5; in particular that E[R] > 2 with E[R 2 ] < ∞. From now on, we will write Note that ζ ∈ (1, ∞) and β ∈ (0, 1). We will introduce a third variable ξ ∈ (ζ, ∞) that also depends only on the distribution of R, in Lemma 5.9. We use the following extensions of Landau notation. If a k,n and b k,n are a pair of doubly indexed strictly positive (real-valued) sequences, defined for all k, n ∈ N such that k ≤ n, then a k,n b k,n means that lim sup k,n→∞ a k,n b k,n ≤ 1, a k,n b k,n means that lim inf k,n→∞ b k,n a k,n ≥ 1, a k,n ∼ b k,n means that lim k,n→∞ a k,n b k,n = 1.
Note that , and ∼ do not explicitly specify which pair of variables (k, n above) are to be used in the limit, but this should be clear from the context in all cases. Our requirement for this notation comes from Lemma 5.6, which provides a key two-variable asymptotic that will be used within Section 5.2. We use the same notation for sequences a n , b n of a single variable, with the same meaning, including when we take k = k n dependent on n.

Proof of Theorem 2.6
The proof of Theorem 2.6 relies on behaviour within the critical window [cn β , Cn β ], where c is suitably small and C is suitably large. We will show that the fittest source ball born during this window has a non-negligible expected family size at time n, as n → ∞. This, in turn, will be proved by showing that c n has non-vanishing probability to be descended from the fittest source ball within {s cn β , . . . , s n β }. Note that such a ball has a non-vanishing probability to be the fittest ball within {s 0 , . . . , s Cn β }.
Remark 5.1. We assume without loss of generality that Cn β and cn β are integer. This can be achieved by adding a small quantity, at most n −β , to c, C. The difference is sufficiently small that it does not change our arguments, so we continue to regard c, C as fixed constants, independent of n.
Let s k(n) denote the fittest source ball in {s 0 , . . . , s Cn β }. The proof of Theorem 2.6 has two key ingredients. The first, Proposition 5.2, will be used to show that P[s k(n) ∈ c ↓ n ] is bounded away from 0 as n → ∞. The second, Proposition 5.3, shows that during time [Cn β , n] not many source balls are included in the genealogy of c ↓ n ; few enough that none of them are likely to be fitter than s k(n) . Let us now state these two results rigorously, for which we require some notation.
Consider balls that are potential ancestors of c ↓ n . Each such ball has a natural multiplicity associated to it: the number of times it was chosen as a potential parent of some (other) potential ancestor of c ↓ n . Thus, counting with multiplicity means we are ignoring coalescences, within the genealogy of c ↓ n . For k ≤ n define to be the number of times that s k is a potential ancestor of c n , counted with multiplicity.
In words, N n i,i ′ is the number of source balls {s i , . . . , s i ′ }, counted with multiplicity, that are potential ancestors of c n . Thus, |c ↓ n ∩ {s i , . . . , s i ′ }| ≤ N n i,i ′ . Define also H n k = {u ∈ U k : u ∈ P j for some c j ∈ c ↓ n with j ≥ k + 1} H n k = |H n k | (5.4) Note that H n k is the set of balls that were born (non-strictly) before time k, and were a potential parent of some c j ∈ c ↓ n where j > k. The quantity H n k counts such balls without multiplicity. We note that the urn contains 2l + 2 balls at time l, so 1 2l+2 H n l represents the fraction of the urn included in H n l at time l. n β βC ζ−1 .
In both propositions, the asymptotic inequality is understood to apply as n → ∞. We will give the proof of Theorem 2.6 now, subject to these two propositions. We will then prove Proposition 5.3 in Section 5.2, and Proposition 5.2 in Section 5.3.
Proof of Theorem 2.6, subject to Propositions 5.2 and 5.3. Let c, C satisfy 0 < c < C < ∞ with c < 2 and C > ξ 1/ζ , with precise values to be chosen later. Recall that s k(n) denotes the (almost surely unique) fittest source ball within {s 0 , . . . , s Cn β }, and let S n = |s ↑ k(n) ∩ U n | denote the size of the family of s k(n) at time n. Let Q j,n be the event that all sources in {s l : l = Cn β + 1, . . . , j − 1} ∩ c ↓ j are less fit than s k(n) . Note that (5.5) In the above, on the final line, the summation includes j ∈ N such that 2 −1/β n ≤ j ≤ n. Consider n large enough that Cn β < 2 −1/β n, and take such a j. Then We now handle the first term on the right of (5.6). Let P n be the event that cn β ≤ k(n) ≤ n β . We have Note that k(n) is uniform on {0, 1, . . . , Cn β } and measurable with respect to the fitness values (F i ). The event P n is also measurable with respect to (F i ). These fitness values are independent of the sampling of potential parents, hence H j l and H j l are independent of k(n) and P n . Moreover, for all j ≥ i > l, the potential parents of c i ∈ c ↓ j were sampled uniformly and i.i.d. from U i ; conditional on any such potential parent p being within U l , the distribution of p is uniform on U l . Thus, the conditional distribution of H j l given H j l is also uniform (on the subsets of U l that have size H j l ). Consequently, Note that the third line above follows because cj β ≤ cn β and n β ≤ 2j β . We now apply Proposition 5.2, which gives that there exists N ∈ N such that for all n ≥ N , Increasing N if necessary, we may also assume that P[P n ] ≥ 1 2 1−c C for n ≥ N . Thus, continuing from (5.7), (5.8) We now look to control the second term on the right of (5.6). The statement of Theorem 2.6 relates to a property of the graph (G n ), and consequently in view of Remark 2.2 we may assume that the fitness values are sampled according to the uniform distribution on [0, 1]. By definition of k(n), col(s k(n) ) has the same distribution as max{U 0 , . . . , U Cn β } where the (U i ) are i.i.d. uniform random variables on [0, 1]. It follows that for any a > 0, The final inequality on the right hand side of the above follows from recalling that Cj β ,j ] + e −C/a . (5.10) In the above, to deduce the second line, for the second term we use (5.9); for the first term we recall N j Cn β ,j is the number of source balls (counted with multiplicity) within c ↓ j , that were born between Cn β and j, and that each such source ball samples its fitness independently and uniform on [0, 1]. The third line follows trivially from the second.
We are now in a position to apply Proposition 5.3 by which, increasing N again if necessary, for all j ≥ 2 −1/β N we may assume E[N j Cj β ,j ] ≤ 2 j β βC β−1 . Recalling that 2 −1/β n ≤ j ≤ n, we thus obtain that when n ≥ N P Ω \ Q j,n ≤ 2 1 aβC ζ−1 j n β + e −C/a ≤ 2ζ log C βC ζ + 1 C ζ . (5.11) Putting (5.8) and (5.11) into (5.6), and then putting (5.6) into (5.5) we obtain that (5.12) Noting that ζ > 1 and β ∈ (0, 1), we may choose c = 1 2 and C sufficiently large that the term in square brackets, in the above equation, is strictly positive. We thus obtain that 1 n E[S n ] γ, where γ > 0 is equal to the right hand side of (5.12) divided by n. Let ǫ > 0 and recall ℓ n from (1.2). Since s k(n) is the initial half-edge of the fittest of the first Cn β + 1 vertices, it is clear that F k(n) = col(s k(n) ) → 1 almost surely as n → ∞. Hence we may choose N ∈ N such that for all n ≥ N , P[F k(n) ≥ 1 − ǫ] ≥ 1 − ǫ, and when this event occurs we have ℓ n ([1, n → ∞, which implies that extensive condensation occurs. Remark 5.4. In the above proof, it is crucial that, within (5.12), the first term inside the square brackets (which comes from Proposition 5.2) has order C −1 and the latter terms (which come from Proposition 5.3) have the lower order C −ζ . Let us briefly attempt to explain why this occurs. Looking backwards in time, as k decreases, through the genealogy of c ↓ n , there are two key transitions that take place: 1. The point at which H n k grows large enough to include source vertices that were unusually fit for their time of birth.
2. The point at which H n k grows large enough to include individual source vertices with non-vanishing probability.
A priori, these two transitions might not occur simultaneously, although it is clear that 1 must come non-strictly before 2. The fact that these two transitions do occur simultaneously is what leads to extensive condensation in PAC -Propositions 5.2 and 5.3 (the latter via (5.10)) tell us that both happen during the critical window. If these two transitions did not occur simultaneously, one should expect non-extensive condensation.
We believe that, as a rule of thumb, the standard stochastic approximation theorems (see e.g. Section 2.4 of Pemantle (2007)) are not suitable to prove extensive condensation, in the absence of a persistent hub or other simplifying factor. The reason is that stochastic approximation determines only if convergence (of some well chosen quantity to a suitable limit) occurs -it does not identify the rate of convergence. Thus stochastic approximation alone will not identify the asymptotic times at which the two key transitions above take place, and thus does not distinguish between extensive and non-extensive condensation.
Remark 5.5. The reader may ask why we focus on lim inf n E[ 1 n S n ] in preference to E[lim inf n 1 n S n ]. Firstly, as discussed in Section 1.1, it is lim inf n E[·] which relates to the existing meaning of extensive condensation within the literature. Secondly, for PAC, we expect that E[lim inf n 1 n S n ] = 0. We expect this because the critical window [cn β , Cn β ] will, infinitely often as n → ∞, contain a large number M of vertices that each became a newly fittest vertex at their time of birth. Let us call this event E n . When E n happens and M is large we believe that S n is close to zero, essentially because the many unusually fit vertices born during the critical window will then compete with each other as they grow, resulting in a situation where the largest vertex at time n has degree ǫ M n, where ǫ M is close to zero when M is large. Note that E[S n ] can remain bounded away from zero only because P[E n ] → 0. We do not attempt a rigorous statement or proof of these claims within the present article.

The branching phase of the genealogy
We now turn our attention to proving Propositions 5.2 and 5.3. The latter will appear at the end of Section 5.2, and the former in Section 5.3. In both cases we investigate the growth of the genealogy c ↓ n , backwards in time. In this section we fix a cue ball c ↓ n , and look backwards in time at the period during which its genealogy is dominated by branching. We analyse this phase of the genealogy using iterative methods, with each iteration moving one step further backwards in time. The following lemma will play a key role in the calculations.
Lemma 5.6. Let α ≥ 0 and γ j ∈ R such that j |γ j | < ∞. Then as k, n → ∞ with k ≤ n, A proof of this lemma is given in Appendix A. We will also make regular use of the following elementary inequality: for j ∈ N and x ∈ [0, 1], (5.13) We now define the notation that we will use to explore c ↓ n backwards in time. Recall that the potential parents P j = {p j,1 , . . . , p j,Rj } of c j are i.i.d. samples from U j−1 . For k = 0, 1, . . . , n we define G n k = n j=k 1 (cj∈c ↓ n ) Rj l=1 1 (p j,l ∈U k−1 ) (5.14) In words, G n k counts, with multiplicity, potential parents p of {c k , . . . , c n } ∩ c ↓ n that were born strictly before time k. Note that, as usual, · n denotes a superscript n and not an exponent.
Our first goal in this section is to find upper and lower bounds for E[G n k ]. In order to establish the lower bound we will also require an upper bound on E[(G n k ) 2 ]. We end with two applications of these bounds: in Lemma 5.10 we show that when k ≈ n β we have E[G n k ] ≈ k and, with this choice of k, we give the proof of Proposition 5.3. Let In words, A n k is the number of times (counted with multiplicity) that either c k or s k is chosen as a potential parent of some c ∈ {c k+1 , . . . , c n } ∩ c ↓ n . Similarly, let In words, B n k is the number of potential parents (counted with multiplicity) of c k when c k is itself in c ↓ n , and is zero otherwise. Note that all such potential parents are automatically elements of U k−1 , justifying (5.17). It is immediate from (5.14), (5.15) and (5.17) that (5.18) For k = 0, 1, 2, . . ., we define the sequence of decreasing σ-fields In words, G k contains the information of: the number R j of potential parents of each of the balls {c k , c k+1 , . . .} ∪ {s k , s k+1 , . . .}, plus the identities of these potential parents in cases where they are also elements of {c k , c k+1 , . . .} ∪ {s k , s k+1 , . . .}. We will take conditional expectation of (5.18) with respect to G k+1 in Lemma 5.7, and the same for (G n k ) 2 in Lemma 5.9. To this end, we note that: (⋆) If k + 1 ≤ j ≤ n, and 1 ≤ l ≤ R j , then 1 (cj∈c ↓ n ) and 1 (p j,l ∈U k ) are both G k+1 measurable.
The first claim holds because if there is a potential ancestral line connecting c n to c j then G k+1 can see the identities of these ancestors. The second holds because p j,l ∈ U k if and only if p j,l was not born after time k + 1.
We record one further observation for future use: ( †) Consider k + 1 ≤ j ≤ n, and 1 ≤ l ≤ R j . On the event that p j,l ∈ U k , we have that p j,l is uniformly distributed on U k , with distribution independent of G k+1 .
This observation is an immediate consequence of the fact that each potential parent p j,l of c j is sampled, independently of all else, uniformly from the set U j−1 .
Lemma 5.7. For k < n, we have 2 . We address the two claims in turn. In short, the first claim holds because, by ( †), each ball within the subset of U k counted by G n k+1 has chance 2 |U k | = 1 k+1 of being an element of {c k , s k }. Formally, from (5.15), since c k , s k ∈ U k we have that Taking conditional expectation with respect to G k+1 , Here, to deduce the first line we use (⋆), and to deduce the second line we use ( †). The first claim now follows from (5.14). We now address the second claim. We will take conditional expectation of (5.18) with respect to G k+1 . By (⋆), G n k+1 is G k+1 measurable. We have already calculated E[A n k | G k+1 ] above and it remains to calculate E[B n k | G k+1 ]. From (5.16) we have Therefore, Here, in the first line we use (⋆) and the fact that R k is independent of G k+1 , with mean E[R] = 2ζ. We use ( †) to deduce the second line, and the final line then follows from (5.14) and |U k | = 2k + 2. The stated result follows. Lemma 5.8. As k, n → ∞ with k ≤ n we have E[G n k ] 2ζ n k ζ−1 .
Proof. From Lemma 5.7 and the left hand side of (5.13), By iterating the above inequality we obtain that E[G n k ] ≤ E[G n n ] n j=k+1 1 + ζ−1 j . The result follows by applying Lemma 5.6 and noting that G n n = R n , with expectation 2ζ.
Extensive condensation in a model of preferential attachment with fitness Lemma 5.9. As k, n → ∞ with k ≤ n we have E[(G n k ) 2 ] ξ n k 2(ζ−1) , where ξ ∈ (ζ 2 , ∞) is a constant that depends only on the distribution of R.
Proof. We will first show that for all k ≤ n, To this end, let C k j = Rj l=1 1 (p j,l ∈{c k ,s k }) . In similar style to (5.19) note that when k < j, Here, the first equality follows by (⋆) and the second by ( †).
We now aim to deduce (5.22). From (5.15) we have For k + 1 ≤ j ′ < j ≤ n we have in particular that j ′ = j, hence p j,l and p j ′ ,l ′ are independent of each other. Hence also C n j and C n j ′ are also independent, and in particular Here, the second line follows by Lemma 5.7 and (5.23). The final line then follows from (5.14). Thus we have established (5.22).
We now approach (G n k ) 2 . To keep our notation manageable, during the remainder of this proof we will write 1 c = 1 (c k ∈c ↓ n ) and 1 s = 1 (s k ∈c ↓ n ) . We define also 1 !c = 1 − 1 c and 1 !s = 1 − 1 s , and also 1 c∪s = 1 (c k ∈c ↓ n or s k ∈s ↓ n ) . From (5.18), for k < n we have because the final bracket sums to 1. Note that B n k = 1 c R k . Note also that if 1 !c 1 !s = 1 then A n k = 0. Thus, Squaring both sides, (5.24) To deduce the third line of the above from the second, we recall that A n k ≥ 0, and to deduce the final line we use also that 1 c∪s = 1 if and only if A n k = 0.
We now prepare to take conditional expectation of both sides of (5.24), with respect to G k+1 . With this goal in mind we note that 2k + 2 The first equality follows from the same calculation as in (5.20) and (5.21) (but without the R k term present), and the inequality then follows from (5.13). Recall that G n k+1 is G k+1 measurable, but that R k is independent of G k+1 and of 1 c . Lastly, recall that we have Lemma 5.7 and (5.22) to control E[A n k | G k+1 ] and E (A n k ) 2 | G k+1 respectively.
As we can see from (5.28), the second term on the right hand side of (5.25) turns out, after iteration, to be of the same order as the first. Roughly speaking, the first term of (5.28) corresponds to branching, and the second to (an over-estimate of) coalescing. We will see the same pattern in the calculations following (5.29), below.
We will now set k to depend on n, such that k n ∼ Cn β with a suitable C ∈ (0, ∞). This corresponds to the furthest backwards in time (from n) that the iterative methods used in this section are capable of seeing. In the first part of next lemma we can see that even reducing the value of C towards 0 results in the lower bound for 1 k E[G n k ] drifting away from the upper bound. This root cause is as follows: in order to remain tractable the proof of Lemma 5.9 has used a slight underestimate of the coalescence part, in (5.24).
Once coalescences become non-negligible, which occurs around k ≈ n β when 1 k E[G n k ] becomes non-trivial, that estimate breaks down.
Lemma 5.10. Let C ∈ (0, ∞). Suppose that k = k n is such that k ≤ n and k ∼ Cn β . Then as n → ∞ we have Proof. Let us first prove the two asymptotic upper bounds. Recall that β ∈ (0, 1) was defined in (5.1) and note that (1 − β)ζ = 1. Thus, since k ∼ Cn β , we have n k ζ−1 ∼ n Cn β ζ−1 = 1 C ζ−1 n β ∼ 1 C ζ k. The first upper bound now follows from Lemma 5.8 and the second upper bound follows from Lemma 5.9. It remains to prove the lower bound from the first claim of the lemma. We have Here, the first line follows from Lemma 5.7 and the right hand side of (5.13), and the second and third lines are elementary computations. Iterating (5.29), we obtain We now seek to apply Lemma 5.6, but again we must take care to handle the summation over j. Let ǫ > 0. By Lemma 5.6 there exists K = K ǫ ∈ N such that for all j > k ≥ K we have j k Similarly, by Lemma 5.9, increasing K if necessary, we may assume that for all n ≥ j ≥ K we have E[(G n j ) 2 ] ≤ ( n j ) 2(ζ−1) (1 + ǫ). We thus obtain from (5.30) that for n ≥ j > k ≥ K, Recall that k ∼ Cn β and (1 − β)ζ = 1. Therefore, increasing k again if necessary, we may assume that for all n ≥ k ≥ K we have n k ζ−1 1 The asymptotic lower bound claimed in the first part of the lemma now follows because ǫ > 0 was arbitrary. This completes the proof.
Proof of Proposition 5.3. Recall the definition on N n i,i ′ in (5.3). Let C > 0. We have (5.32) In the above, the first line follows from (5.2), (5.3) and from noting that s k ∈ {s k , c k }.
To deduce the second line we use ( †) which implies that, for each j, P[p j,l = s k | p j,l ∈ {s k , c k }, c j ∈ c ↓ n ] = 1 any ǫ > 0 there exists K such that for all n ≥ k ≥ K, E[A n k ] ≤ 2ζ 1 k n k ζ−1 (1+ǫ). Combing this with (5.32) obtains E N n Cn β ,n ≤ (1 + ǫ)ζn ζ−1 n k=Cn β To deduce the final line of the above we use βζ = ζ − 1. The proposition follows because ǫ > 0 was arbitrary.

The branching-coalescing phase of the genealogy
We now turn our attention to look further backwards into the genealogy of c ↓ n , in particular at the full range of times of order n β . It is during this window of time that coalescences become frequent. With this in mind, we will now need to count potential parents of ancestors of c ↓ n without multiplicity. In fact, it will also become useful to record their identities. To this end, for convenience we reproduce (5.4) here: Note that H n k is the set of balls that were born (non-strictly) before time k, and were a potential parent of some c j ∈ c ↓ n where j > k. Thus, comparing to (5.14), H n k is 'G n k+1 counted without multiplicity'. The apparently incongruity between k and k + 1 will not bother us, because we will shortly shift our emphasis entirely from G n k to H n k . Moreover, in this section is it advantageous that H n k relates to U k and not to U k−1 . Let us now upgrade ( †) and Lemma 5.10 to handle H n k .
( † †) The conditional distribution of H n k given H n k is uniform on the set of subsets of U k that have size H n k .
Proof of ( † †). Recall the definition of G n k+1 from (5.14): it counts the number of times a parent of some c ↓ j (with k + 1 ≤ j ≤ n) was an element of U k . By ( †), each such parent is a uniformly sampled element of U k , independently of all else.
For the remainder of Section 5.3, we fix a pair of constants c, C such that 0 < c < C < ∞. It is understood that they will be chosen dependent upon the common distribution of the R k . We assume, without loss of generality, that both cn β and Cn β are integer c.f. Remark 5.1.
Lemma 5.11. Suppose that k = k n ∼ Cn β . Then, as n → ∞ we have Proof. It is immediate that H n k ≤ G n k+1 . Thus both asymptotic upper bounds follow from their counterparts in Lemma 5.10. For the lower bound, by symmetry we have It will be helpful to work with proportions of cue balls rather than with their absolute number. We set Z n j = 1 2(Cn β − j) + 2 , Y n j = Z n j H n Cn β −j (5.33) defined for j = 0, 1, . . . , (C − c)n β . In words, Z n j is one over the the number of balls born (non-strictly) before time Cn β − j, and Y n j is the proportion of such balls that constitute H n Cn β −j . The indexing in (5.33) is in preparation for finding a fluid limit, as n → ∞, of the process k → 1 k H n k considered backwards in time -that is as j increases and k decreases, during the critical window k ∈ [cn β , Cn β ]. This is somewhat tricky because the process H n k is non-Markov, with respect to its generated filtration, and also time-inhomogeneous. We will see that these technical difficulties may be overcome by taking the limit of (Y n j , Z n j ) under a suitable rescaling of time.
We are now ready to state the major result of this section, Proposition 5.12, which will come as a consequence of the aforementioned fluid limit. We write Y n u = Y n j for any u ∈ [j, j + 1).
Proposition 5.12. If 0 < c < C < ∞ and C > ξ 1/ζ then . (5.34) Let us comment a little further on the strategy we adopt to prove Proposition 5.12, and outline where the formula (5.34) comes from. We look to obtain a fluid limit for the [0, 1] 2 valued process (Y n j , Z n j ), during time j = 0, 1, . . . , (C − c)n β . We will use the framework of weak convergence. To this end, we parametrize time using s ∈ [0, 1], resulting in times s(C −c)n β , but we will see that it is also helpful to make the substitution t = log( C C−s(C−c) ) after which, loosely speaking, the limit of Y n j will turn out to be the ordinary differential equation run for time t ∈ [0, log(C/c)], starting from the initial condition y(0) ≈ Y n 0 . Equation (5.35) is well known -it is logistic growth at rate 2ζ. The precise formulation of (5.34) comes from the explicit solution to (5.35), which is y(t) = A A−(A−1)e −2ζt , where y(0) = A. The limit of Z n j will be zero; its presence is solely because (Y n j , Z n j ) is a time-homogeneous Markov process, whereas (Y n j ) alone is not. The ODE (5.35) has a stable fixed point at 1 and an unstable fixed point at 0. Our initial condition y(0) is positive, resulting in attraction towards 1 as t increases. However, the value of y(0) ≈ 2ζ C ζ tends to zero as C → ∞, as a consequence of Lemma 5.11. We need to keep enough freedom to choose a large value for C (as we did in Section 5.1). Heuristically, as C → ∞ we observe (5.35) started with a vanishing initial displacement, of order C −ζ , away from its unstable fixed point at y = 0. It is not a priori clear if the time interval t ∈ [0, log(C/c)] gives long enough to actually escape from 0; fortunately, we will see that it does. Proposition 5.12 comes from knowing that Y n j behaves similarly to y(t), for large n.
With Proposition 5.12 in hand, the proof of Proposition 5.2 is straightforward, so we will give it now; then we will turn our attention to proving Proposition 5.12.
The remainder of this section will focus on proof of Propositon 5.12. As we have mentioned, j → Y n j is not time-homogeneous, which is due to its dependence on Z n by ǫ itself. Thus, by conditioning on the event {T n ≤ θ} we obtain that for all s ∈ [0, θ] and n ≥ N , P |W n τn+s − W n τn | ≥ ǫ ≤ ǫ. This establishes (5.39) and thus completes the proof.
Lemma 5.16. Suppose X n 0 converges in law to X 0 . Then X n converges weakly to X in D([0, 1] 2 ).
Proof. First, we establish an elementary inequality relating to B r,a,b . Recall the definition of B r,a,b (just above Lemma 5.13) in terms of placing balls into marked and unmarked boxes. Let 0 < r ≤ a − b. We claim that (5.44) To see (5.44), note that we can bound B r,a,b from above by counting the total number of balls placed into unmarked boxes; this is binomial with r trials and success probability a−b a = 1 − b a . We can bound B r,a,b below by noting that at most r unmarked boxes will be chosen in total, so if we place the r balls in turn, then each time we place a ball the chance of it being placed into an (as yet) unoccupied unmarked box is at least a−b−r a = 1 − b a − r a ; hence B r,a,b is stochastically bounded below by a binomial with r trials and success probability 1 − b a − r a . Thus (5.44) holds.
Now, we will show weak convergence of X n to X . The argument rests on applying Theorem 4.8.10 of Ethier and Kurtz (1986), which requires us to establish that the Markov generators Q n , of X n , and Q, of X , are close, in a suitable sense. We will denote partial derivatives of f with respect to its first and second coordinate as ∂f ∂1 and ∂f ∂2 respectively. We take the domain of Q n to be the set of real valued continuously differentiable functions on [0, 1] 2 , and note that the generator of X t , with this same domain, is Qf (y, z) = 2ζy(1 − y) ∂f ∂1 (y, z).
(5.45) Using (5.36) from Lemma 5.13 and recalling the definition of X n t , the generator of X n t is Q n f (y, z) = 1 z ≥ Here, the O(z 2 ) term has subsumed an O(z) term coming from P[I 0 = I 1 = 1] = y 2 −yz 1−z = y 2 + O(z); noting that by Taylor's theorem f (y z ′ z + O(z ′ )) − f (y, z) = O(z), so that after multiplication in the above only the y 2 part is non-negligible.
We will show that sup y,z∈[0,1] |Q n f (y, z) − Qf (y, z)| → 0 (5.50) as n → ∞. With this equation and Lemma 5.15 in hand it is straightforward to see that Theorem 4.8.10 of Ethier and Kurtz (1986) applies, with the desired conclusion. We now give the proof of (5.50). We begin by examining the two terms in (5.48). Take y ∈ [0, 1] Finally, using much the same calculations as in (5.55) we obtain This comes from the presence of the indicators in (5.46) and (5.47), which ensure that whenever Q n (y, z) and Q(y, z) differ we must have z ∈ [0, 1 2cn β +2 ), and the fact that our calculations above only contain non-negative powers of y ∈ [0, 1]. Thus,equation (5.50) follows immediately, which completes the proof. Let t → y(t; A) denote the (unique) solution to (5.35) subject to the condition y(0) = A ∈ [0, 1]. That is, (5.58) Note that y(t; ·) has fixed points at A = 0 and A = 1; the former is unstable and the latter is stable. Given A ∈ (0, 1), the map t → y(t; A) is a strictly increasing function of t, with y(t; A) → 1 as t → ∞ and y(t; A) → 0 as t → −∞. Moreover, noting that 0 ≤ y ′ (t) ≤ 4ζ, it is easily seen that for A, B ∈ (0, 1), lim B→A sup t∈R |y(t; B) − y(t; A)| = 0.
Similarly, from Lemma 5.11 we have E[Y n 0 ] , from which (5.61) follows. Equation (5.61) is all that we know about the Y n 0 , so we now look to gain some 'artificial' control over the initial conditions used in the limit. To this end, independently of all else, let (I n ) be a sequence of independent {0, 1} valued random variables such that P[I n = 1] = ζ 2 /4ξ Define a set M ⊆ U Cn β as follows: • If 1 Cn β H n Cn β ≥ ζ C ζ and I n = 1, then let M be a uniformly random subset of H n Cn β of size ζ C ζ−1 n β (which we will assume to be an integer c.f. Remark 5.1). • Otherwise, let M be the empty set.
For j = 0, . . . , (C − c)n β we definê H n Cn β −j = H n Cn β −j ∩ b∈M b ↓ . (5.62) In words, to defineĤ n Since we have X n 0 =X n 0 it follows from Lemma 5.14 that we can couple X n · andX n · in such a way that for all j = 0, 1, . . . , (C − c)n β ,X n j = X n T n j . Hence, in particular Y n j = Y n T n j .

Affine preferential attachment and addition of multiple edges
Several authors, dating at least as far back as Dorogovtsev et al. (2000), allow an extra parameter α, which controls the extent to which new vertices prefer to attach to existing high degree vertices. In the classical model, the effect of α is that when a new edge samples which vertex to attach to, the existing vertices are weighted according to α + deg n (v), instead of just deg n (v). This mechanism is sometimes known as 'affine' preferential attachment. In PAC, we may apply the same mechanism to the sampling of potential parents.
The corresponding modification of the urn process in Section 2.1.1 is that each source ball is assigned activity 1 + α, whilst cue balls have activity 1. Here, activity is meant in sense of (drawing balls from) generalized Pólya urns; a ball with activity a > 0 is drawn with probability proportional to a. If α is an integer, then at the level of the urn process this mechanism is equivalent to adding α new source balls, all of colour F n , on the n th step of the process. For the Galton-Watson process of Section 2.1.3, the effect is that the probability of 1 2 for p to be a source ball is replaced by 1+α 1+(1+α) , corresponding to the idea that source balls have activity 1 + α and cue balls have activity 1. An alternative, and equally natural, extension is to allow new vertices to connect to more than one existing vertex. Models of this type are considered by, for example, Bianconi and Barabási (2001) and Dereich and Ortgiese (2014). In PAC, we may permit each new vertex v n to connect to a random number V n of existing vertices, with each such vertex sampled independently according to PAC mechanism. We may also allow the sequence (V n ) to be random; for simplicity we will assume it is an i.i.d. sequence. For the urn process associated to PAC, this means that on the n th step of time we would add V n new source balls, all of the same colour, plus V n new cue balls whose colours would be inherited independently of each other using the usual mechanism. In this case, the balance of source balls versus cue balls remains exactly even, with the result that Corollary 2.5 requires no modification. Note that, in order to obtain this result (in particular, to carry over Lemma 3.2) we must assume that the i.i.d. random variables V n have finite expectation.

A Appendix
We give a proof of Lemma 5.6. The case γ j = 0 can be found within Exercise 8.3 of van der Hofstad (2016), and may be established using the Gamma function and Stirling's inequality. We give an elementary argument which also covers γ j = 0.
Let us first consider the case in which γ j = 0 for all j. In this case, Lemma 5.6 is implied by the following inequality: for all α > 0 and all k, n ∈ N such that 2α + 1 < k ≤ n < ∞ it holds that n k The proof of (A.1) proceeds as follows. We first note that EJP 25 (2020), paper 68.