A phase transition for preferential attachment models with additive fitness

Preferential attachment models form a popular class of growing networks, where incoming vertices are preferably connected to vertices with high degree. We consider a variant of this process, where vertices are equipped with a random initial fitness representing initial inhomogeneities among vertices and the fitness influences the attractiveness of a vertex in an additive way. We consider a heavy-tailed fitness distribution and show that the model exhibits a phase transition depending on the tail exponent of the fitness distribution. In the weak disorder regime, one of the old vertices has maximal degree irrespective of fitness, while for strong disorder the vertex with maximal degree has to satisfy the right balance between fitness and age. Our methods use martingale methods to show concentration of degree evolutions as well as extreme value theory to control the fitness landscape.


Introduction
A distinctive feature of real-world networks is their inhomogeneity, characterized in particular through the presence of hubs. These are nodes with a number of connections that greatly exceeds the average and thus have a great impact on the overall network topology. The existence of hubs in a network is closely linked to the scale-free property, that is, the proportion of nodes in the network with degree (number of connections) k scales as a power law k −τ for some τ > 1.
Preferential attachment models, as popularized by Barabási and Albert [2], form a class of random graphs that shows this behaviour 'naturally', that is, as a result of the dynamics and not because it is imposed otherwise, see also [6] for a first mathematical derivation of this fact. In these evolving random graph models new vertices are introduced to the network over time and they connect to earlier introduced vertices with a probability proportional to their degree. This leads to the so-called rich-get-richer effect, which means that vertices with a high degree are more likely to increase their degree. It is exactly this effect that yields the power-law degree distributions and the existence of hubs in the graph.
The study of the emergence of hubs in random graph models such as the preferential model is often focused on the behaviour of the maximum degree in the graph. Móri first showed that for the Barabási-Albert model the maximum degree is of the same order as the degree of the first vertex [20], which was later generalised by Athreya et al. to affine preferential attachment models (with random out-degree) and to a larger class of preferential attachment models by Bhamidi in [1] and [3], respectively. A consequence of the way in which preferential attachment graphs evolve, is that the rich-get-richer effect should really be interpreted as an old-get-richer effect: it is the old vertices, who are introduced at the beginning of the evolution of the graph, that are able to attract the most connections [15].
However, when compared to real-life networks, it is clearly desirable to have a model where younger vertices can compete with the old ones. One way to achieve this is by assigning to each vertex a random fitness representing its intrinsic attractiveness and then to let the connection probability of newly incoming vertex be proportional to either the product of the fitness and degree or the sum. These two models were introduced by Barabási and Bianconi in [4] and Ergün and Rodgers in [13], respectively.
Most previous results on preferential attachment models with fitness deal with the multiplicative case for bounded fitness. One of the reasons is that under certain conditions on the fitness distribution, these models exhibit the phenomenon of condensation, where a positive proportion of incoming vertices connects to vertices with fitness closer and closer to the maximal fitness in the system. This phenomenon was first shown in the mathematical literature in [7], later extended in [12] for a wide range of models, by looking at the empirical fitness and degree distribution. A full dynamic description of the condensation is a challenging problem, however see [9] for a very detailed analysis in a slightly modified model. [10] considers a continuous-time embedding of the process into a reinforced branching process, which allows them to control the maximal degree (in the continuous-time setting), which in the non-condensation case can be translated back to the random graph model. Also, under certain assumption on the fitness distribution, they show that condensation is non-extensive in the sense that there is not a single vertex that acquires a positive fraction of the incoming edges. These results are extended by [19] to a larger class of (bounded) fitness distributions (as a special case of a more general set-up).
Here, we consider the model with additive fitness, where a vertex is chosen with probability proportional to the sum of its degree and its intrinsic fitness. To best of our knowledge the only mathematical result have been [3] and [23], who confirmed the non-rigorous results in [13]. [3] showed that when the fitness is bounded, the degree distribution follows a power law with the same exponent as for the model with an additive constant equal to the expected value of the fitness. Moreover, [3] gives the asymptotics for the maximum degree and shows that it agrees again with the asymptotics for the model with additive constant. [23] considers the case of a deterministic additive sequence and shows that there is an equivalence between the preferential attachment (tree) model and a weighted recursive tree. From this, the author deduces ℓ p -convergence of the renormalized degree sequence under a growth condition on the additive sequence. Furthermore, he considers geometric properties of the weighted recursive trees. Somewhat related is a model of preferential attachment with random (possibly heavy-tailed) initial degree, for which [8] show convergence of empirical fitness distributions, but the structure of these networks is very different from the additive fitness case due to large out-degrees.
In our work we consider the case of unbounded fitness and show that when the fitness distribution follows a power law, a more complex phase diagram arises. Our first result shows convergence for the empirical degree and fitness distribution. From this we can in particular deduce that if the fitness distribution is sufficiently light-tailed, then we are in a weak disorder regime, where the same result as in [3] still holds for both tail exponent of the degree distribution and the asymptotics of the maximum. However, if the tail exponent of the fitness distribution is sufficiently small (but so that the fitness still has a finite first moment), then we are in a strong disorder regime, where the tail exponent of the degree distribution is the same as for the fitness distribution. Moreover, the maximal degree grows of the same order as the largest fitness in the system. However, the vertex that maximizes the degree has to satisfy a delicate balance between arriving early and having a large fitness. In the limit this competition is expressed as an optimization of a functional of a Poisson point process.
Finally, we can also consider the extreme disorder regime when the fitness does not have a finite first moment. In that case, we show that a uniformly selected vertex does not connect to any incoming vertices with high probability. Moreover, the maximal degree now scales as order n and the maximising vertex again satisfies the right balance between arriving early and large fitness. We note that our results for the degree distribution improve on those by Ergün and Rodgers [13], where these different regimes are overlooked and only the weak disorder regime is covered.
Our proof for the empirical degree/fitness distributions uses a stochastic approximation argument, which was also used in [12] for the multiplicative case. The analysis of the maximal degree is split into two steps: First we show concentration of the degrees when compared to the expected degree (conditional on the fitness values) adapting the martingale arguments of Móri [20] (see also [15] for an exposition with more general attachment rules). For the weak disorder case, similar arguments as in [15] are sufficient to control the maximal degree. However, in the strong and extreme disorder case, we have to control the conditional expectation of the degrees, which are a function of the fitness only. We then show that these functionals simplify and converge to a functional of a Poisson point process, so that with the concentration we can deduce convergence of the maximal degree. Finally, our analysis is robust and covers essentially three variants of preferential attachment models: a model with possibly random out-degree as in [11] (and at most one edge between vertices) and two variations where the out-degree of each new vertex is fixed and then the connection probabilities are either updated after each edge is drawn or are kept fixed.
Notation. Throughout we will use the following notation. We let N = {1, 2, 3, . . .} be the natural numbers, we write N 0 = {0, 1, 2, . . .} if we want to include 0 and let [n] := {1, . . . , n}. Moreover, for any sequence a n and b n of positive real numbers, we say a n = Θ(b n ) if there exists a constant C > 0 such that a n ≤ Cb n and b n ≤ Ca n . Moreover, we say a n ∼ b n if lim n→∞ an bn = 1. Also, we use the conditional probability measure P F (·) := P(· | (F i ) i∈N ) and expectation E F [·] := E[· | (F i ) i∈N ].

Definitions and main results
The preferential attachment model is an evolving random graph model, where vertices are added to the graph consecutively and then connected to older vertices. We denote by G n the resulting directed graph at the stage when the vertex set is [n]. Moreover, we take edges to be directed from the vertex with high index to the one with lower index. Throughout, we will use the following notation, Z n (i) := in-degree of vertex i in G n .
We now introduce three different preferential attachment with fitness models (PAF), the first one which allows for a random out-degree in the spirit of Dereich and Mörters [11], the second one where the out-degree of a new vertex is fixed and we connect edges while keeping the degrees fixed and the last one with a fixed out-degree, but where we update degrees in between connections (where the later is the fitness modification of the a model closer to [6]). Let n 0 , m 0 ∈ N. We say that a sequence of random graphs (G n ) n≥n0 is a preferential attachment model with (additive) fitness if G n is a directed and weighted graph on the vertex set [n] with edges directed from larger to smaller indices. Moreover, we assume that G n0 has m 0 edges and we assign fitness values F 1 , F 2 , . . . , F n0 to the vertices 1, 2, . . . , n 0 respectively.
To obtain G n+1 from G n for some n ≥ n 0 , add vertex n + 1 to the vertex set and attach fitness F n+1 to n + 1. Furthermore, we assume that the updating rules satisfies one of the following three assumptions for some fixed m ∈ N: (PAFRO) Preferential attachment with fitness and random out-degree. Here m = 1 and conditionally on G n , vertex n + 1 is connected to each vertex in [n] by at most one edge and the probability to connect to a given i ∈ [n] is Z n (i) + F i m 0 + (n − n 0 ) + S n . (2.1) Furthermore, conditionally on G n the degree increments (∆Z n (i) := Z n+1 (i) − Z n (i), i ∈ [n]) are pairwise non-positively correlated.
(PAFFD) Preferential attachment with fitness and fixed degree. To vertex n + 1 we assign m half-edges. Conditionally on G n , connect each half-edge independently to some i ∈ [n] with probability (PAFUD) Preferential attachment with fitness and updating degree. To vertex n + 1 we assign m half-edges. Let Z n,j (i) denote the in-degree of vertex i when n + 1 has attached j of its half-edges, j = 1, . . . , m. For j = 1, . . . , m, conditionally on the graph of size n including the first j − 1 half-edges from n + 1, connect the j th half-edge to i ∈ [n] with probability Remark 2.2. The quantity in (2.1) is always less than 1, since n0 i=1 Z n0 (i) = m 0 and at each step Z n (i) increases by at most one. Note also that for the PAFRO assumption, the exact distribution of (∆Z n (i), i ∈ [n]) is not specified. For example, for m = 1, the PAFFD and the PAFUD model are identical and both satisfy PAFRO. Another possibility is to consider a model with a random out-degree, where (∆Z n (i), i ∈ [n]) is a vector of independent Bernoulli variables with success probability as given in (2.1).
We have defined our random graph model for an arbitrary fitness distribution. However, for the analysis the most interesting case occurs when we are dealing with heavy-tailed distributions. In this case the fitness can have a significant effect on the behaviour of the system as a whole, whereas the 'fitness effect' is smoothed out when its tail behaviour is too light. In the latter case, one sees no differences in the mean-field behaviour when changing from a deterministic, fixed fitness to random i.i.d. fitness values. Therefore, in the following, we will frequently consider the following assumption: Assumption 2.3. The fitness distribution is a power law with exponent α > 1, i.e.
We continue by stating our first main result. We define the following measures, which correspond to the the empirical fitness distribution of a vertex sampled with weight given by its in-degree, then the joint empirical fitness-in-degree distribution and finally the empirical degree distribution.
where the first two statements hold with respect to the weak * topology and the limits are given as

4)
and Remark 2.5. Throughout this article we work with Definition 2.1. However, Theorem 2.4 also holds under the following slightly weaker conditions. Set and define the degree increment at step n + 1 of vertex i by ∆Z n (i) := Z n+1 (i) − Z n (i). We assume the graph G n0 is given deterministically such that m 0 := i∈[n0] Z n0 (i) ≥ 1. Furthermore, we assume for n ≥ n 0 , is negatively quadrant dependent in the sense that for any i = j and k, l ∈ Z + , As can be seen from the proof, Theorem 2.4 holds for any evolving random graph model that satisfies these assumptions. See also Lemma 4.3 below, where we show that the PAFFD and the PAFUD model satisfy the negative quadrant dependency as in (A4).
By comparing with the case where the fitness is constant, we can interpret Theorem 2.4 such that the degree of a typical vertex can be found via a two-step process, where first the fitness is chosen according to µ and then the degree evolves as in the case with an additive constant equal to the fitness.
However, while at first our result looks similar to the constant fitness case, by looking at the tail exponent of the degree distribution we can see that this is only the case when the fitness is not too heavy-tailed. Indeed, suppose that the fitness distribution follows a power law, then we can distinguish three different regimes. As the next theorem shows, if the fitness distribution has finite moments of order θ m = 1 + E[F ] /m, then the degree distribution has power law exponent 1 + θ m , which is the same as in the model with constant fitness equal to E[F ]. Using the terminology used in the field of random media, we refer to this situation as the weak disorder regime. However, if the fitness distribution is more heavy-tailed, but still with finite first moment, then the degree distribution follows the same power law as the fitness distribution, a situation which we will refer to as the strong order regime. Finally, we can also consider the extreme disorder case when the fitness distribution does not have a finite first moment. In this case we show that with high probability, a uniformly chosen vertex has not received any incoming edges (since most connections are made to vertices with very high fitness).
and where Γ is the Gamma function. (ii) Strong disorder. Suppose F has a power law distribution as in Assumption 2.3. Then, if (iii) Extreme disorder. Suppose F has a power law distribution as in Assumption 2.3 with α ∈ (1, 2) and consider the three PAF model as in Definition 2.1. Let U n be a uniformly chosen vertex in G n , let ε > 0 and let E n := {Z n (U n ) = Z n0 (U n )}, be the event that U n has not increased its degree with respect to the initialisation G n0 . Then, for for n sufficiently large, P(E n ) ≥ 1 − Cn −((2−α)∧(α−1))/α+ε , for some constant C > 0.
Our next main result provides a more detailed analysis of the dynamic behaviour of the system by describing the asymptotics of the maximal degree. As might be expected from the different phases observed for the tail of the degree distribution, there are also three distinct phases for the maximal degree. Again under the assumption that the fitness has a power law, we observe that in the weak disorder regime, where the fitness has relatively light tails that the vertex with maximal degree is one of the old vertices, similar to the system with constant fitness. This first result (parts (i) and (iii) in the theorem below) in the special case of the PAFUD/PAFFD model with m = 1 is also contained in [23].
However, if the fitness is more heavy-tailed (but still with finite first moment), i.e. in the strong order regime, the maximal degree grows at the same rate as the maximal fitness in the system (i.e. approximately like n 1/(α−1) ). In this case, the maximal degree satisfies a delicate balance between arriving early enough and having large fitness. Finally, in the extreme disorder regime, where the fitness does not have a first moment, the maximal degree grows of order n, again satisfying a nontrivial optimisation between large fitness value and arriving early. The main difference compared to the strong disorder regime is that now the sum of the fitness values in the normalization, e.g. in (2.1), is random to first order and depends on the extreme values of the fitness landscape. As is common in extreme value theorem, the limiting variables in the strong and extreme disorder regime are described in terms of a functional of a Poisson point process capturing the extremes of the fitness (in competition with the advantage of arriving early).
Theorem 2.7 ((Maximum) degree behaviour in PAFs). Consider the three PAF models as in Definition 2.1. First, the following results hold for fixed degrees: where ξ i is an almost surely finite random variable with no atom at 0 and θ m := 1+E[F ] /m. (ii) When the fitness distribution satisfies Assumption 2.3 with α ∈ (1, 2), for all fixed i ∈ N, for some almost surely finite random variable Z ∞ (i).
In the following let I n := arg max i∈[n] Z n (i) (resolving any ties by taking the smaller index).
for some almost surely finite random variable I.

Overview of the proofs
In this section, we give a short overview of the proofs of the main theorems and the structure of the remaining paper.
In Section 4 we prove Theorems 2.4 and 2.6. In order to prove Theorem 2.4, we use the theory of stochastic approximation in a similar setup as in [12], where it was used for models with multiplicative fitness.
The main idea is to consider, for 0 ≤ f < f ′ < ∞, the quantities where 0 ≤ f < f ′ < ∞. Then, by considering the conditional increment and using the preferential attachment dynamics, we show that and also a similar lower bound with slightly different sequences A n , B n . This should be interpreted as a time-discretisation of a differential inequality. Then, a basic stochastic approximation argument (see also Lemma 4.1 below) shows that if A n , B n and R n converge almost surely, then we obtain an upper bound on the lim sup of Γ n ((f, f ′ ]) (and similarly a lower bound). By an approximation argument this yields convergence of Γ n . We obtain similar bounds for Γ ) so that with an induction argument we also can deduce convergence of Γ (k) n . In the last part of Section 4 we prove Theorem 2.6 using standard arguments.
The remainder of the paper deals with the asymptotics of the degree of a fixed vertex, as well as the maximal degree, as stated in Theorem 2.7. In the following we only discuss the proof for the PAFUD model, but the proofs for the PAFRO model and PAFFD model follow with minor modifications.
A central tool in the analysis of the degree evolutions is the following martingale introduced by [20] in the context of classical preferential attachment (see also [15]). For k ≥ − min{F i , 1}, define a sequence where c k n is a carefully chosen normalisation sequence and a b = Γ(a + 1) is the generalized binomial coefficient defined in terms of the Gamma function Γ. Next, we write P F and E F for the (regular) conditional probability measure (and its expectation respectively) when conditioning on the fitness values F 1 , F 2 , . . .. Then, as for the standard preferential model, one can show that (M k n (i), n ≥ i) is a martingale under the conditional measure P F .
Note also that with k = 1, Z n (i) = (c 1 n ) −1 M 1 n (i) − F i , and M 1 n (i) converges being a non-negative martingale. So for fixed i, the asymptotics are determined by c 1 n . Indeed, we will see that where S j = j ℓ=1 F ℓ . In Lemma 6.4, we will prove that if E[F ] < ∞, then by the law of large numbers the sequence c k n rescaled by n k/θm converges almost surely. Moreover, if α ∈ (1, 2) (for a power law fitness distribution), then c k n converges almost surely without rescaling. This proves the first two statements (2.7) and (2.8) of Theorem 2.7.
To prove the statements about the maximal degree, we first consider the conditional expectation of Z n (i) which using the martingale M 1 n (i) can be written as at least for i > n 0 , otherwise a small correction is necessary. From this point, the proofs in the the three different regimes deviate from each other.
First, if we assume that E[F ] < ∞, then by (3.2) and the the asymptotics of c 1 n from above we can deduce that Now, suppose that E F θm+ε < ∞ for some ε > 0. Then, in Lemma 6.6, we show that Intuitively, this follows from (3.3), since under the assumption the maximum of the fitness values satisfies max i∈[n] F i = o(n 1/θm ) (with high probability), so that the term ( n i ) 1/θm dominates for i small. Thus, together with a concentration argument we obtain the weak disorder result (2.9).
Next, we consider the strong disorder regime, where the fitness distribution is a power law with parameter α with α ∈ (2, 1 + θ m ). Extreme value theory tell us that in this case max i∈[n] F i ≈ n −1/(α−1) so that (3.3) suggests that in this regime vertices with high fitness have a chance to compete with the old vertices. To capture the asymptotics of the peaks of the fitness landscape more precisely, we consider the point process where u n := inf{t ≥ 0 : P(F ≥ t) ≥ 1/n}. Then, classical extreme value theory (see e.g. the exposition in [21]) tells us that where Π is a Poisson point process on (0, 1) × (0, ∞) with intensity measure ν(dt, dx) := dt × (α − 1)x −α dx (see also Section 5 below for more details). From this convergence, we can then deduce The concentration argument relies on the martingale M k n (i) for carefully chosen k (which correspond approximately to k th moments of Z n (i)), see the first part of Proposition 6.2.
Finally, we consider the extreme disorder regime, where α ∈ (1, 2) so that the fitness does not have finite first moments. In particular, the law of large numbers no longer applies to the sum S n = n i=1 F i appearing in the normalizing constant in the attachment probabilities. In this case, we obtain from (3.1) that for i of order n Then, it follows from (3.2) with the same Π n as in (3.4) that where E := (0, 1) × (0, ∞). From this we can eventually deduce that ds.
Unfortunately, the corresponding functionals are not directly continuous in Π n , so that the arguments involve careful cut-off arguments (see Section 5).
Then, the final step is to show concentration max i∈[n] which again uses the martingale M 1 n (i), but in this case is slightly easier than for α > 2. Overall, the proof of Theorem 2.7 is structured in the following way. In Section 5, we will first show convergence of the functional T i/n (Π n ) introduced in (3.5). Here, we take the opportunity to recap some of the basics of convergence of point process convergence and we will also carry out the technical cut-off arguments. Then, in Section 6 we introduce the martingales M k n (i) more formally and prove some of their properties. In particular, we then use those to show concentration in all three cases and also we show the point process convergence in the strong disorder case, where we can then refer back to the technical details dealt with in Section 5 for the extreme disorder case. Finally, in Section 7 we prove Theorem 2.7 by gathering together all the necessary results from the previous two sections.

Degree and fitness distributions
This section is devoted to first proving Theorem 2.4 using the ideas of stochastic approximation and then at the end we prove Theorem 2.6. However, before we prove Theorem 2.4, we introduce several lemmas that are required for the proof. The first lemma comes from [12, Lemma 3.1], which is the main ingredient in the proof of Theorem 2.4: Lemma 4.1. Let (X n ) n≥0 be a non-negative stochastic process. We suppose that the following estimate holds: (ii) (R n ) n≥0 is an almost surely convergent stochastic process.
Then, almost surely, Similarly, if instead, under the same conditions (i) and (ii), then almost surely, In the next lemma, we discuss two specific examples of the stochastic process R n as introduced in Lemma 4.1, which are used in the proof of Theorem 2.4: Recall Γ n and Γ (k) n from (2.2) and let 0 ≤ f < f ′ < ∞, k ∈ N 0 and assume the fitness distribution has a finite mean. We then have the two following results: and R n := n j=n0 ∆R j . Then R n converges almost surely.
Before proving Lemma 4.2, we recall the concept of negative quadrant dependence (NQD) as introduced in (2.6). We note that the PAFRO model has been defined with an additional assumption of non-positively correlated degree increments. Note that, since the degree increments in this model are Bernoulli random variables, NQD is equivalent to non-positive correlation. For the PAFFD and PAFUD models, NQD follows directly from the definition of the model, as we show in the following lemma: Proof. The NQD of the PAFFD model directly follows from [17], as (∆Z n (i)) i∈[n] forms a multinomial distribution, for which NQD is known. For the PAFUD model, (∆Z n (i)) i∈[n] is a convolution of unlike multinomial distributions (the probabilities of the multinomial distribution change at each step/sampling), for which NQD is proved in [17] as well. However, since the changes in the probabilities are dependent on the previous samplings (where previous edges are attached), we require a slightly more careful argument. Let us write ∆Z n (i) := X 1 + . . . + X m , ∆Z n (j) := Z 1 + . . . + Z m , where the X k , Z k are Bernoulli random variables which take value 1 if the k th edge of vertex n + 1 connects to i, j, respectively, k ∈ [m]. Since X 1 , Z 1 are part of a multinomial vector with one trial, (2.6) holds for these random variables. Then, we investigate X 1 + X 2 , Z 1 + Z 2 , where we prove (2.6) for X 1 + X 2 , Z 1 + Z 2 , but with ≥ rather than ≤ in the event, which is an equivalent definition of NQD. We write, for k, ℓ ≥ 0, Since conditional on G n and (X 1 , Z 1 ), the random variables (X 2 , Z 2 ) are part of a multinomial vector with a single trial, by the same argument we used for X 1 , Z 1 , we obtain the upper bound (4.1) It follows from the definition of the PAFUD model that X 2 , conditional on X 1 , is independent of Z 1 and Z 2 , conditional on Z 1 , is independent of X 1 . Then, as the probabilities in (4.1) are increasing functions of X 1 , Z 1 , respectively, it follows from the definition of negative association in [17], which is equivalent to NQD, that We can continue the same argument to obtain the same inequality for the m terms in ∆Z n (i) = We then recall that this result is equivalent to (2.6), as required.
We now prove Lemma 4.2.
Proof of Lemma 4.2. We first note that, in both cases, R n is a zero-mean martingale with respect to G n . The convergence of R n can be proved by showing its martingale increments ∆R n = R n+1 − R n have summable conditional second moments, or have summable second moments. We first deal with case (i). We write ∆R n as the difference of two martingales. For k ≥ 1, Here, we use that We note that, as the indicators in M n only differ by one index k, it is sufficient to prove the summability of the conditional second moment of ∆M (2) n for all fixed k ≥ 1. So, we write Using the non-positive correlation of the degree increments for the PAFRO model and Lemma 4.3 for the PAFFD and PAFUD models, we can bound this from above by, where we use Markov's inequality in the final step and use that the increments of all in-degrees is exactly m by the definition of the PAFFD and PAFUD models. Hence, the final statement is summable almost surely, which proves the almost sure convergence of R n . For the PAFRO model, we use the same steps as in (4.3) and (4.4), but take the expected value on the left-and right-hand-side. Then, using the definition of the PAFRO model, we arrive at By using the tower rule and conditioning on G n−1 , we find .

Continuing this recursion yields
Using this upper bound in (4.5), we obtain for some constants C 1 , C 2 > 0, which is indeed summable.
For k = 0, we can write ∆R n as where ∆M (1) n = 0 and ∆M (2) n is as in (4.2) with k = 0. We already proved the summability of the second conditional moment of M (2) n which follows for k = 0 as well, and the last term has a second conditional moment bounded by µ((f, f ′ ])/(n + 1) 2 , which is summable too. This proves the almost sure convergence of R n .
For (ii), we have as Z n+1 (i) = Z n (i) + ∆Z n (i). We now bound the conditional second moments of ∆R n by The second line follows from Lemma 4.3 for the PAFFD and PAFUD models and from the conditional non-positive correlation of the Z n (i) for the PAFRO model. Then, for the PAFUD and PAFFD models, we use that ∆Z n (i) is a sum of m indicator random variables and hence that its variance can be bounded by a m times its mean. Also noting that the sum of all the increments of the in-degrees equals m, we obtain the upper bound (m/(n + 1)) 2 , which is summable almost surely. For the PAFRO model, we again take the expected value on both sides of (4.7) to get rid of the conditional statement. Then, as the variance of ∆Z n (i) is bounded by its mean for the PAFRO model, and the same approach as used in (4.5) through (4.6) works here as well to arrive at a summable upper bound.
With these lemmas at hand, we can prove Theorem 2.4: Proof of Theorem 2.4. We provide a proof for the PAFFD and PAFUD models, the proof for the PAFRO model follows by setting m = 1; the additional required adjustments are all included in the proof of Lemma 4.2.
First, we show that Γ n converges in the weak * topology to Γ, defined in (2.4). To this end, we let 0 ≤ f < f ′ < ∞, and set We develop a recursion for X n+1 − X n . By writing Z n+1 (i) = Z n (i) + ∆Z n (i) andF n := (m 0 + m(n − n 0 ) + S n )/n, we find where we note that this holds for both the PAFFD as well as the PAFUD model. Then, It is now possible to write the following two bounds: We note that, by the strong law of large numbers, |I n |/n converges almost surely to µ((f, f ′ ]) and F n converges almost surely to mθ m , where we recall that θ m = 1 + E[F ] /m. From Lemma 4.2 it follows that R n := n k=n0 ∆R n converges almost surely, so it follows from Lemma 4.1 that almost surely We now take a countable subset F ⊂ [0, ∞) that is dense, such that for each f ∈ F, µ({f }) = 0. As F is countable, there exists an almost sure event Ω 0 on which both statements in (4.9) hold for any pair f, f ′ ∈ F such that f < f ′ . Take an arbitrary open set U , and approximate U from below by a sequence of sets (U m ) m∈N , where each U m is a finite union of small disjoint intervals (f, f ′ ], with f, f ′ ∈ F. Then, for any m ∈ N, applying a Riemann approximation to (4.9), Hence, by the monotone convergence theorem, it follows that lim inf n→∞ Γ n (U ) ≥ Γ(U ). Likewise, for any closed set C, a similar argument shows that lim sup n→∞ Γ n (C) ≤ Γ(C). It hence follows from the Portmanteau lemma [18,Theorem 13.16] that Γ n converges to Γ a.s. in the weak * topology.
The approach to prove the other two parts in (2.3) is to apply induction on k to the convergence of the measures Γ (k) n (and thus p n (k)). We prove the statements in (2.3) hold for k = 0, the initialisation of the induction, below, and show the induction step first. Let us assume that the last two statements in (2.3) hold for all 0 ≤ i < k, for some k ≥ 1. We now advance the induction hypothesis.
Let us take 0 ≤ f < f ′ < ∞, and define X n := Γ (k) n ((f, f ′ ]). Then, we can write the following recurrence relation, using I n as in (4.8): where in the second step we note that Z n+1 (n + 1) = 0 < k by definition and where we isolated the Z n (i) = k case in the last step. We do this, as this will prove to be the only part that does not converge to zero almost surely. We can then write (4.12) We can therefore write, using that f ′ − F i ≥ 0 holds almost surely for all i ∈ I n , where (4.14) We now prove the convergence of all three terms. First, we prove the convergence of A n to We note that, by the induction hypothesis, almost surely, We now deal with the two terms in A n separately. We start with the second term. By the definition of the PAFFD and PAFUD models in Definition 2.1, it follows that for both models, Using this in the second term of A n in (4.14), we obtain where C m > 0 is a constant. We note that this expression tends to zero almost surely as n tends to infinity, and that a similar lower bound that tends to zero almost surely can be constructed as well. For the first term, we write, (4.18) The first line converges to zero almost surely by the strong law of large numbers. By the induction hypothesis as used in (4.16), the second line converges to zero almost surely and by a similar argument as in (4.17) the last line converges to zero almost surely. For the third line, we use the definition of Γ and so, again using similar steps as in (4.17), the third line in (4.18) converges to zero almost surely, which finishes the proof of the almost sure convergence of A n to A, as in (4.15). Now, for B n we immediately conclude that almost surely. Finally, the almost sure convergence of R n again follows from Lemma 4.2. We thus obtain from Lemma 4.1, Likewise, the upper bound lim sup can be established from (4.12), too, when we replace the f ′ by f in (4.12) and note that f − F i ≤ 0 holds almost surely for all i ∈ I n .
We now again take a countable subset F ⊂ [0, ∞) that is dense, such that for each f ∈ F, µ({f }) = 0. As F is countable, there exists an almost sure event Ω 0 on which both (4.19) and (4.20) hold for any pair f, f ′ ∈ F such that f < f ′ . A similar argument as in (4.9) and (4.10) can be made, using Riemann approximations and the Portmanteau lemma, which yields for any open set U ⊆ [0, ∞) and any closed set and thus Γ (k) n converges in the weak * topology to Γ (k) , given by What remains is to perform the initialisation of the induction, regarding Γ (0) n . Analogous to the steps in (4.11), we now set X n := Γ Similar to (4.12), (4.13) and (4.14), we find As before, the almost sure convergence of R n follows from Lemma 4.2. Analogously to (4.22), and thus, with a similar reasoning as in (4.21), almost surely Γ (0) n converges weakly in the weak * topology to which yields Then, which proves (2.3) and concludes the proof.
We now prove Theorem 2.6: Proof of Theorem 2.6. We start by proving (i). The integrand of the integral in (2.5) can be written as as t → ∞ and a fixed, we find that the dominated convergence theorem yields We now prove (ii), so the fitness distribution satisfies Assumption 2.3. First, let α ∈ (2, 1 + θ m ). We write the integral in (2.5) as two separate integrals by splitting the domain into (0, k) and (k, ∞). We first concentrate on an upper bound. We note that, by symmetry, it also follows that We note that there exists a constant c > 1 such that Hence, using Assumption 2.3, we can bound (4.23) from above by where the first term follows from the fact that α < 1 + θ m and that the integral from 0 to 1 is finite. Hence, by [5, Proposition 1.5.8], as k tends to infinity, this is asymptotically For a lower bound, we bound the second integral in (4.23) from below by zero, and bound the first integral, using similar steps as before, from below by which is asymptotically, as k tends to infinity, (θ 2 m /(θ m − (α − 1))ℓ(k)k −α . Finally, for α = 1 + θ m , we note that the first term of (4.24) is no longer o(k −α ), but of the same order as the other terms. Furthermore, since the argument of the integral in the last line of (4.24) (as well as in (4.25)) now equals ℓ(x)/x, the integral equals ℓ ⋆ (k) and it follows from [5, Proposition 1.5.9a] that either ℓ ⋆ converges, in which case this falls under the first case (i) as the θ th m moment exists, or that ℓ ⋆ is slowly varying itself. Thus, in the latter case, we obtain an upper and lower bound with asymptotics, respectively, We also have from [5, Proposition 1.5.9a] that, in the case that ℓ ⋆ diverges as k tends to infinity, which finishes the proof of (ii).
Finally, we tend to (iii). We provide a proof for the PAFFD and PAFUD models with m ≥ 1 first, and then show how the results follows for the PAFRO model as well.
Recall that U n is a uniformly chosen vertex from [n]. We first condition on the size of the fitness of U n . Let 0 < β < ((2 − α)/(α − 1) ∧ 1). Note that when U n > n 0 , E n denotes the event that Z n (U n ) = 0. Then, Clearly, for ε > 0 fixed and n large, where we use Potter's theorem [5, Theorem 1.5.6], which states that for any fixed ε > 0 and any function ℓ, slowly-varying at infinity, For the second probability on the right-hand-side of (4.26), we write Now, using Markov's inequality, applying the tower rule and switching the summations yields the upper bound, writingF n = (m 0 + m(n − n 0 ) + S n )/n, where M j := max k≤j F k , we bound jF j from below by m 0 + M j and we bound the indicator variables from above by 1. We now bound the first moment from above. Note that, for the PAFFD and PAFUD models, since every vertex i > n 0 has out-degree m. Hence, combining (4.29) and (4.30), we obtain the upper bound, by using the tower rule and conditioning on the fitness, when n is sufficiently large, for some constant C > 0. We now bound E[1/(m 0 + M j )] from above.
where we bound M j from below by zero and j 1/(α−1)−ε in the first and second expectation, respectively. Then, using 1 − x ≤ e −x , for j large, where we use Potter's theorem, as in (4.28), in the last step. By combining (4.32) and (4.33), it follows that for j sufficiently large (say j > j 0 for some j 0 ∈ N), 34) which, by the definition of β and the fact that ε is arbitrarily small, tends to zero as n tends to infinity. Finally, we combine (4.34) and (4.27) in (4.26) to find We now finish the proof of Theorem 2.4 by choosing the optimal value of β ∈ (0, ( For the PAFRO model, set m to equal 1. Then, there is one adjustment required. Namely, the equality in (4.30) does not hold. Rather, using (4.6) yields the upper bound for some large constant C > 0. This adds at most an extra ε in the exponent of the final expression in (4.35) and since ε is arbitrarily small, the result still holds, which concludes the proof.

Convergence of point process functionals
As mentioned in the proof overview in Section 3, in this section, we complete an important step in the proof of Theorem 2.7 and show convergence of a functional of a point process as defined in (3.5) in the extreme disorder case (α ∈ (1, 2)). At the same time, we take the chance to discuss some of the required theory of point process convergence, which will also be useful in the next section when we consider the strong disorder case. A good reference for this theory is the book [21].
Recall u n from Theorem 2.7 and let M p (E) be the space of point measures (point processes) on E := (0, 1) × (0, ∞). Let us define the point process with δ a Dirac measure. It follows from [21,Corollary 4.19] that, when the fitness distribution satisfies Assumption 2.3 for any α > 1, Π n has a weak limit Π, which is a Poisson point process (PPP) on E with intensity measure ν(dt, dx) := dt × (α − 1)x −α dx. [21,Proposition 4.20] shows that an almost surely continuous functional T 1 applied to Π n converges in distribution to T 1 applied to Π by the continuous mapping theorem. In this section, we prove a similar result, though a slightly different approach is required.
Let ε, δ > 0, E δ := (0, 1) × (δ, ∞). For a point measure Π ∈ M p (E), define whenever these are well-defined. That is, when Π((0, s) × (0, ∞)) > 0 for all s ∈ (ε, 1) and when Π((0, s) × (δ, ∞)) > 0 for all s ∈ (ε, 1), respectively. The main goal in this section is to prove the following proposition: copies of a random variable F , which follows a power-law distribution as in Assumption 2.3 with α ∈ (1, 2). Consider the point measure Π n in (5.1), its weak limit Π and the functional T ε in (5.2). Then, In order to prove Proposition 5.1, one would normally prove the continuity of the functional T ε and combine the weak convergence of Π n with the continuous mapping theorem to yield the required result, as Resnick does in his proof of Proposition 4.20. This does, however, not work in this case. Due to the specific form of the functional, proving its continuity is not directly possible. Therefore, we investigate T ε δ as defined in (5.2) and show that this functional is indeed continuous and is 'sufficiently close' to T ε . This is worked out in the following two lemmas: Lemma 5.2. Consider, for ε ∈ (0, 1), δ > 0 fixed, the operator T ε δ as in (5.2). Then, the mapping Π → (t,f )∈Π:t>ε,f >δ δ (f T t δ (Π)) is continuous in the vague topology for measures Π ∈ M p (E) satisfying the following conditions: Remark 5.3. We note that for a PPP Π with intensity measure ν as introduced above, all the conditions in (5.3) are satisfied almost surely, except for Π((0, ε) × (δ, ∞)) > 0, which happens with positive probability only.
Proof of Lemma 5.2. We first prove that, for fixed ε ∈ (0, 1), δ > 0, the mapping Π → (t,f )∈Π:t>ε,f >δ δ (T ε δ (Π)) is continuous in the vague topology for measures Π ∈ M p (E). We obtain this by taking Π n , Π ∈ M p (E) such that Π n v → Π, and showing that the image of the mapping pf Π n introduced above also converges vaguely to the mapping of Π. Since the image is a point measure with only finitely many points, due to the last condition in (5.3), we can label the points (t, f ) in Π such that t > ε, f > δ, by (t i , f i ), 1 ≤ i ≤ p for some p ∈ N, where we order the points such that t i is increasing in i. We can do the same for the points of Π n , where we add a superscript n. Vague convergence is then equivalent to the convergence of (t n i , f n i ) ∈ Π n to (t i , f i ) ∈ Π for all 1 ≤ i ≤ p, since there are only finitely many points.
By [21,Proposition 3.13], we can fix η > 0 and take n large enough such that the balls B i := B((t i , f i ), η), centred around (t i , f i ) with radii η, contain the points (t n i , f n i ) and B i ∩ B j = ∅ for i = j. Thus, let us set q := Π((0, ε) × (δ, ∞)) > 0 and take n large enough such that Π n ((0, ε) × (δ, ∞)) = q as well. That is, points (t i , f i ), (t n i , f n i ), i = 1, . . . , q, satisfy t n i < ε and points (t i , f i ), (t n i , f n i ), i = q + 1, . . . , p, satisfy t n i > ε (due to the first condition in (5.3) there are no points (t, f ) such that t = ε a.s.). We can now express T ε δ (Π) in terms of a sum. Namely, where we set t p+1 := 1. A similar expression follows for Π n , with t n p+1 := 1. Since the sum contains a finite number of terms, the convergence of T ε δ (Π n ) → T ε δ (Π) immediately follows from the convergence of the individual points. As Π n v −→ Π, f n i → f i as n tends to infinity for all i = 1, . . . , p as well. What remains to prove, is that (T as n → ∞. Using the triangle inequality, we obtain Let us first consider 2 ≤ i ≤ p. The second term on the right-hand-side tends to zero by the above, as for i ≥ 2, Π n ((0, t i ) × (δ, ∞)) > 0 and thus the conditions in (5.3) are satisfied with ε = t i . The first term can be rewritten using the definition of T ε δ in (5.2) as where we bound the integrand of the outer integral from above by replacing the integration variable s by t n i ∧ t i in the integral's argument. In the integral that remains, we can bound f from below by δ and therefore, for n sufficiently large, we can bound the integral from below by δ, as there is always at least one particle (t, f ) such that t ≤ t n i ∨ t i since i ≥ 2 and the balls B i introduced above are disjoint. We thus obtain the upper bound |t n i − t i |/δ, which tends to zero with n. For i = 1, we adapt our approach to find When t 1 < t n 1 , the first term is infinite and we use the second term, while the second term is infinite when t 1 > t n 1 and we then use the first term. When the first term in finite (t 1 > t n 1 ), its first term is bounded from above by (t 1 − t n 1 )δ −1 < η/δ and its second term can be bounded by a constant times η, as follows when using (5.4). Similarly, when the second term of the minimum is finite (t 1 ≤ t n 1 ), its second term is bounded from above by (t n 1 − t 1 )δ −1 < η/δ and its first term can be bounded by a constant times η. As η is arbitrary, the required result holds.
We are also interested in how 'close' T ε (Π) and T ε δ (Π) (resp. T ε (Π n ) and T ε δ (Π n )) are when δ is small (resp. δ is small and n is large). We formalise this in the following lemma: Lemma 5.4. Consider the operator T ε δ as in (5.2) and the point process Π n as in (5.1), let Π be its weak limit and let Assumption 2.3 hold with α ∈ (1, 2). For ε ∈ (0, 1), η > 0 fixed, Proof. We start by proving the first statement. We fix η > 0 and define E ξ δ : We condition on {Π(E ξ δ ) = 0} to ensure that T ε δ (Π) is finite and show that on {Π(E ξ δ ) = 0} the difference in T ε δ (Π) and T ε (Π) will tend to zero in probability as δ ↓ 0. We first compute the second probability on the right-hand-side.
Note that, by the choice of ξ, this probability tends to zero with δ. Now, we bound the conditional probability in (5.6).
where, in the last line, we replaced the integration variable s with 1 in the integral in the numerator and with ε in the integral the denominator. We now bound the integral over E δ on the righthand-side from below using Π(E ξ δ ) ≥ 1 and use Markov's inequality to find the upper bound which tends to zero as δ ↓ 0. Note that we can omit the conditional statement in the second line, as the integral is independent of Π(E ξ δ ). Combining (5.7) and the upper bound of (5.9) in (5.6), implies that T ε δ (Π) P −→ T ε (Π) as δ ↓ 0. We now prove the second statement in (5.5), which uses a similar approach. Namely, using analogous steps as in (5.6), (5.8) and (5.9), we obtain The second probability on the right-hand-side converges to P(Π(E ξ δ ) = 0) as n tends to infinity, and then to zero as δ tends to zero by (5.7). Using Markov's inequality, we obtain an upper bound for the first probability on the right-hand-side of the form Thus, as n → ∞, since ℓ is slowly-varying, Using [21, Corollary 4.19 and Proposition 3.21], we conclude that nℓ(u n )u −(α−1) n converges to 1 and so the right-hand-side tends to zero with δ. Thus, which finishes the proof.
We now prove Proposition 5.1.
Proof of Proposition 5.1. For a closed set C ⊆ R + and η > 0, let C η := {x ∈ R : inf y∈C |x−y| ≤ η} be the η-enlargement of C and let us define the events We can then write Then, on E n,ε,δ (η) and using C η , we can bound the first probability on the right-hand-side from above by P max εn≤i≤n:Fi≥δun We note that every term in the maximum is bounded from above by 1. Then, since for n large Π n ((ε, 1) × (δ, ∞)) = Π((ε, 1) × (δ, ∞)) < ∞ and on F n,ε,δ , it follows from the continuous mapping theorem, Lemma 5.2 and Remark 5.3 that lim n→∞ P max εn≤i≤n:Fi≥δun where F ε,δ := {Π((0, ε) × (δ, ∞)) ≥ 1}. We now claim that it is possible to remove the δ in T ε δ (Π) and the δ and ε constraints in the supremum in (5.14), as well as that the two terms in the last line of (5.13) tend to zero when letting n tend to infinity, and then δ and ε to zero. These two tasks require a very similar approach, as they are essentially the same, one with Π n and the other with its weak limit Π. We start with the latter claim. We want to show that We first prove D 1 tends to zero in probability as δ ↓ 0. Namely, using the triangle inequality and the definitions of T ε δ and T ε in (5.2), where the final inequality follows from the definitions of T ε and T ε δ . Since α > 1, sup (t,f )∈Π f < ∞ almost surely. Furthermore, for any ε > 0 fixed, T ε (Π) < ∞ almost surely as well. Finally, by We now bound the inner supremum, by controlling the size of the maximum fitness value in these sub-intervals. That is, we define, for ξ > 0, k ∈ Z + , which are both summable. Therefore, by the Borel-Cantelli lemma, it follows that almost surely there exist a random index L, such that for all k ≥ L, Now, on the event {t ≤ 2 −L }, By applying (5.20) to the both integrals, we find an upper bound Using the definition of ℓ j in (5.18), for j large and some ζ ∈ (0, α − 1), we obtain for some constant C > 0. Again using (5.20) and on {k > L} (similar to t ≤ 2 −L ), we find for some γ > (1 + ξ)/(α − 1). We finish the argument by noting that L < ∞ almost surely and hence −→ 0 as ε ↓ 0. Together with the convergence of D 1 to zero in probability, we obtain (5.15). Recall F n,ε,δ from (5.12) and F ε,δ = lim n→∞ F n,ε,δ under (5.14). Evidently, by a similar argument as in (5.7), lim δ↓0 P(F ε,δ ) = 1 for all ε ∈ (0, 1), which also shows the third probability in (5.13) tends to zero as n → ∞ and then δ ↓ 0. Combining this with (5.15) and (5.14) yields Recall E n,ε,δ (η) from (5.12). What remains to prove, is that for all η > 0 fixed, which is very similar to (5.15), though we now deal with Π n rather than Π. Again, we use the triangle inequality to find P(E n,ε,δ (η) c ) ≤ P max εn≤i≤n:Fi≥δun F i u n T i/n (Π n ) ≥ η/2 =: P 1 + P 2 .

(5.25)
We first deal with P 1 . As in (5.16), we split this into two terms, namely

(5.26)
To show the first probability tends to zero, we write max εn≤i≤n:Fi≥δun Then, on F n,ε,δ , T ε δ (Π n ) converges in distribution to δT ε δ (Π) by the continuous mapping theorem and the fact that T ε δ is continuous in Π n , as follows from the proof of Lemma 5.2 and remark 5.3. So, as δ ↓ 0, T ε δ (Π) P −→ T ε (Π), as follows from the proof of Lemma 5.4, which implies that δT ε δ (Π) P −→ 0 as δ ↓ 0. As before, P(F n,ε,δ ) → 1 as n → ∞ and then δ ↓ 0, so by intersecting the first probability on the right-hand-side of (5.26) with F n,ε,δ , F c n,ε,δ , as in (5.13), yields that it tends to zero as n → ∞ and then δ ↓ 0. What remains is to show that the second probability on the right-hand-side of (5.26) tends to zero as n tends to infinity, then δ ↓ 0 and finally ε ↓ 0. We again use a similar argument as in (5.16) We show that the product of the maximum and (T ε δ (Π n )−T ε (Π n )) converges to zero in probability as first n → ∞ and then δ ↓ 0. We can use the fact that (T ε δ (Π n ) − T ε (Π n )) tends to zero in probability as n → ∞ and then δ ↓ 0, as is shown in the proof of Lemma 5.4. In order to extend this result to the product of these two random processes, we introduce the events A n,δ := {max i∈[n] F i /u n ≤ δ −ξ }, for some ξ ∈ (0, (2−α)/2). Then, splitting the second probability on the right-hand-side of (5.26) into two parts by using (5.27) and intersecting with the events A n,δ and A c n,δ , we obtain the upper bound P(A c n,δ ) converges to P(A c δ ), where A δ := {Y ≤ δ −ξ } and Y is the distributional limit of max i∈[n] F i /u n . Then, as δ ↓ 0, P(A c δ ) → 0, as Y is almost surely finite. Following the steps of the argument in (5.10) through (5.11) with ηδ ξ /4 instead of η, we find which tends to zero as δ ↓ 0. It thus follows that P 1 → 0 as n → ∞ and then δ ↓ 0.
What remains, is to show that P 2 tends to zero as n → ∞, ε ↓ 0. This follows from a similar approach as in (5.17) through (5.23). Recall ℓ k , h k from (5.18). We then divide the set of indices i ∈ [n] into subsets A k,n := {i ∈ [n] : i ∈ (2 −(k+1) n, 2 −k n]}, 0 ≤ k ≤ ⌊log n/ log 2⌋, and define the events A F k,n := max i∈A k,n F i /u n ∈ (ℓ k , h k ) . Using (5.19), it readily follows that Hence, when letting k n := ⌊log n/ log 2⌋, (5.29) By (5.28), it follows that the double limit of the second probability equals zero, so we focus on the first probability. Following the approach in (5.21) and (5.22) and using a Markov bound, we bound the first probability on the right-hand-side of (5.29) from above by where we recall that M j := max m≤j F m . We then bound the maximum in the second sum from below by considering only the indices m ≤ 2 − √ K n and using the events in the indicator to further bound the maximum from below by ℓ √ K . The terms of the second sum then are independent of j, which yields the upper bound n(ℓ √ K ) −1 . We rewrite the first sum, where we note that for i ∈ A k,n , i ≥ 2 −(k+1) n, and as before bound the maximum from below to find Since, for large j, we can bound (ℓ j ) −1 from above by 2 j(1/(α−1)+ζ) for some small ζ, we obtain the upper bound Cn2 (k+1)((2−α)/(α−1)+ζ) , for some constant C > 0. Note that this upper bound, as well as the upper bound stated above for the second sum in (5.30) are deterministic. Hence, using both upper bounds and bounding the indicator in the expectation in (5.30) from above by 1 yields the upper bound for some γ > 0 and where C η = (4/η) max{C, 1}. This bound no longer depends on n, and as we let K tend to infinity the bound tends to zero. This proves P 2 tends to zero with n → ∞ and then ǫ ↓ 0. Combining this result with the convergence of P 1 to zero with n → ∞ and then δ ↓ 0, it follows that the upper bound in (5.25) tends to zero, and therefore the two probabilities on the second line of the right-hand-side of (5.13) tend to zero with n → ∞, then δ ↓ 0 and finally ε ↓ 0. Together with (5.24), this yields lim sup Including the limit η ↓ 0 finally yields, by the continuity of the probability measure, and applying the Portmanteau lemma [18,Theorem 13.16] finishes the proof.

Martingales and concentration
In this section we state and prove several important results, required for the proof of Theorem 2.7. As discussed in the overview of the proof of Theorem 2.7 in Section 3, for α ∈ (1, 2) ∪ (2, 1 + θ m ) the approach to proving Theorem 2.7 is by showing the maximum degree is concentrated around the maximum of the conditional moments of the degrees and by showing that the latter converges to the right-hand-side of (2.10) and (2.12) when α ∈ (1, 2), (2, 1 + θ m ), respectively. To this end, we formulate the following propositions: Similarly, when α ∈ (1, 2), for any η > 0, Before we prove these two propositions, we introduce a family of useful martingales and derive some of their properties. We define, for k ∈ R, n, n 0 , m, m 0 ∈ N and a, b > −1 such that a−b > −1, .
For ease of writing, we omit the (m) in c k n (m), c k n (m) whenever there is no ambiguity. We can then formulate the following lemma: 1). For the PAFRO model (m = 1) and the PAFUD model with out-degree m ∈ N, the random variable is a supermartingale (resp. submartingale) with respect to G n−1 for n ≥ i∨n 0 , under the conditional probability measure P F (·) when k ≥ 0 (resp. k ∈ (− min(F i , 1), 0). Finally, for the PAFFD model, M 1 n (i) is a martingale with respect to G n−1 for n ≥ i ∨ n 0 under the conditional probability measure P F (·).
Proof. For ease of writing, let us define X n (i) := Z n (i) + F i and ∆X n (i) := X n+1 (i) − X n (i) = ∆Z n (i). For the PAFRO model, using (6.5), as ∆X n (i) is either 0 or 1. Then, taking the expected value of ∆X n (i) yields as c k n+1 (1 + k/(m 0 + (n − n 0 ) + S n )) = c k n . Note that the conditional mean of M k n (i) is finite almost surely as well. For the PAFFD model with out-degree m ∈ N, we can follow the same steps to find where we use Gamma function's properties in the second line and note that x → (x + k)/x is decreasing in x for k ≥ 0 in the last step. For k ∈ (− min(F i , 1), 0) the upper bound becomes a lower bound, as x → (x + k)/x is decreasing in x in that case. Conditional on G n , the number of edges vertex n+1 connects to i is a binomial random variable with m trials and success probability where we use that a random variable X ∼ Bin(m, p) has probability generating function E z X = (pz+(1−p)) m , z ∈ R. Then, recalling that for the PAFFD model n i=1 X n (i) = m 0 +m(n−n 0 )+S n yields the result. For the PAFUD model, we require a few more steps. As the connection of the i th edge of vertex n + 1 is dependent on the connection of edges 1, . . . , i − 1, we iteratively condition on G n,j , j = m − 1, m − 2, . . . , 0, the graph with n vertices where the n + 1 st vertex has connected j of its half-edges to the vertices 1, . . . , n. More precisely, letting X n,j := Z n,j (i) + F i , we write where ½ n+1,m,i is the indicator of the event that the m th half-edge of vertex n + 1 connects with vertex i. Now, as in (6.6), we write this as By the definition of the PAFUD model, the mean of the indicator equals X n,m−1 (i)/ n j=1 X n,m−1 (j) = X n,m−1 (i)/(m 0 + m(n − n 0 ) + (m − 1) + S n ). Hence, we obtain which, when iteratively following the same steps by conditioning on G n,j for j = m − 2, . . . , 0, yields the required result. Finally, we prove that M 1 n (i) is a martingale in the PAFFD model. We repeat the steps in (6.7), but note that as k = 1, we can omit the inequality and obtain As before, we note that ∆X n (i) is a binomial random variable with mean mX n (i)/ n j=1 X n (j). Thus, which finishes the proof.
From Lemma 6.3, we immediately conclude that the (super)martingales M k n (i), M k n (i) converge almost surely, as they are non-negative, to some random variables ξ k i , ξ k i , respectively. To use this in a meaningful way, we look more closely at this almost sure limit, showing that it does not have an atom at zero, and we study the growth rate of the sequences c k n , c k n . We first dedicate a lemma to the latter: for some almost surely finite random variables c k (m), c k (m) (again omitting the (m) whenever there is no ambiguity). Furthermore, the following upper and lower bounds hold almost surely for c k n (m) when α > 2 (they hold for c k n (m) as well). For .
Proof. We only prove the results for c k n (1), as the proofs for m > 1 and c k n (m) follow similarly. For ease of writing, let θ := θ 1 . We start by proving (6.8). We can write where we apply a Taylor expansion on the logarithmic terms in the sum for j ≥ n 0 + ⌈2|k|⌉ + 1. The second sum and the last term balance, their sum converges to some finite value depending on k and γ, where γ is the Euler-Mascheroni constant. We now show the almost sure absolute convergence of the third sum in the second line of (6.11). This is implied by the almost sure convergence of We prove this by showing that the mean of this sum converges. Let ε > 0 such that the (1 + ε) th moment of the F i exists. Using Hölder's inequality, we obtain which converges, as ε > 0. Finally, taking the absolute value of the double sum in (6.11) yields the upper bound In the first step, we first bound m 0 + j − n 0 + S j from below by j − n 0 and then take all terms where ik < j ≤ (i + 1)k, i ≥ 2, and bound them from below by ik, which yields the same upper bound k times in the third step. The right-hand-side equals |k| and thus proves the almost sure convergence of the double sum. This proves (6.8). For proving (6.9) we use a different approach. Namely, we prove that − log c k n converges almost surely, which yields the desired result as well. To that end, let M j := max i≤j F i . Then, we write where we use (4.33) in the last step to conclude that, by the Borel-Cantelli lemma, there exists an almost surely finite random index J such that for all j ≥ J, M j ≥ j 1/(α−1)−ε , for some small ε ∈ (0, (2 − α)/(α − 1)). It therefore follows that the upper bound on the right-hand-side of (6.13) converges as n tends to infinity almost surely, and therefore so does c k n , since − log c k n is non-negative and increasing. We now turn to the bounds in (6.10). Rather than using a Taylor expansion as in (6.11), we simply use that log(1 + x) ≤ x, to obtain , (6.14) where We rewrite E(n) to find where we note that the first and second term are decreasing and the final term is increasing in n. Hence, we obtain the upper bound for all n 0 + 1 ≤ i ≤ n, Using this inequality and taking the absolute value of the terms in the sum in (6.14), yields the upper bound as required. Similarly, we find a lower bound of the same form. As log (6.16) Using (6.15) and the fact that n−1 j=1 1 j − log n is non-decreasing, we obtain the lower bound which finishes the proof.
We now show that the almost sure limits of certain (super)martingales in Lemma 6.3 do not have an atom at zero: Lemma 6.5. For k ≥ 1, consider the martingales M k n (i) for the PAFRO and PAFUD models and M k n (i) for the PAFFD model as in Lemma 6.3 and their almost sure limits ξ k i , ξ k i , respectively. Then, the ξ k i , ξ k i do not have an atom at zero.
Proof. We first focus on the martingales M k n for the PAFRO and PAFUD models. Let ε > 0. We can write, [16,Theorem 1]. Now, take p ∈ (− min(F i , 1)/k, 0). The goal is to raise both sides to the power p and use a Markov bound. We first, however, need some other inequalities to obtain useful expressions. Using the concavity of log x and noting that x + pk is a weighted average of x and x + k when p ∈ (0, 1) and x + k is a weighted average of x and x + pk when p ≥ 1, we obtain, for all x, k ≥ 0, when p ∈ (0, 1), and 1 − k x + k p ≤ 1 − pk x + pk when p ≥ 1. (6.18) From the first inequality, we also immediately obtain, for p ∈ (−1, 0), k ≥ 0, x ≥ k|p|, It thus follows that, when p ∈ (− min(F i , 1)/k, 0), (c k n ) p ≤ c kp n , as F i > k|p|. Also, from [25] it follows that for all x ≥ 0, s ∈ (0, 1), Hence, since Γ(x)/Γ(x + s) is decreasing in x for s ≥ 0, when p ∈ (−1, 0), x ≥ |p|, so that, combining both (6.19) and (6.20) in (6.17) with p ∈ (− min(F i /k, 1/k), 0), yields which is finite almost surely and tends to zero with ε a.s. Hence, almost surely, and thus P(ξ 1 i = 0) = 0, by the dominated convergence theorem. For the PAFFD model, an altered argument is required, since M k n (i) is a submartingale for negative k, as follows from Lemma 6.3 so that the final steps in (6.21) no longer work. Rather, we only follow the same steps for ξ k i in (6.17). Then, let us define, for a large constant C > 0, η ∈ (0, E[F ] /(E[F ] + m)) and a large integer N ≥ i ∨ n 0 , the stopping time T N := inf{n ≥ N : Z n (i) ≥ Cn 1−η }. We aim to show that we can construct a sequenceĉ k n , to be defined later, such that is a supermartingale for k ∈ (− min(F i , 1), 0) for the PAFFD model. First, recall the computations in (6.7). We notice that the product in the second line contains terms which are positive but less than 1 when k ∈ (− min(F i , 1), 0). Therefore, the product decreases as the number of terms increases, so that we can bound the expected value from above by 1+kP(∆Z n (i) ≥ 1 | G n ) /(Z n (i)+ F i ). If we definê c k n := n−1 j=n0 1− kma j m 0 + m(j − n 0 ) + S j + kma j , a n := 1− m − 1 2 we obtain We now bound P(∆Z n (i) ≥ 1 | G n ) from below, using that for all x ∈ (0, 1), m ∈ N. Then, on {T N ≥ n + 1}, we can bound Z n (i) from above by Cn 1−η , which yields the upper bound Finally, as the event {T N ≤ n} is G n measurable, Together with the computations above, this yields which shows indeed thatM k TN ∧n (i) is a supermartingale for k ∈ (− min(F i , 1), 0). It also follows relatively easily, following similar steps as in the proof of Lemma 6.4, thatĉ k n n −k/θm a.s. −→ĉ k for some random variableĉ k as n tends to infinity. So, we can then write, for k ≥ 1, p ∈ (− min(F i /k, 1/k), 0), continuing the steps in (6.17) and using (6.19) and (6.20) as in (6.21), We now intersect with the event {T N ≥ n + 1} and its complement to obtain the upper bound Using the Markov inequality for the first probability and becauseM kp TN ∧n (i) is a supermartingale since kp ∈ (− min(F i , 1), 0), we find the upper bound We note that the first term tends to zero with ε. For the second probability we write, for some Using the upper bound for c s n0 /c s n = 1/c s n in (6.10), we find the upper bound where A equals the upper bound in (6.10) with i = n 0 . This upper bound is independent of n, so we find, combining this with (6.22), where the right-hand-side tends to zero almost surely as N tends to infinity, by the choice of s. Thus, it follows that lim ε↓0 P F ( ξ k i < ε) = 0 for all k ≥ 1. Again, using the dominated convergence theorem finally yields the required result.
In order to show that the maximum degree converges almost surely when α > 1 + θ m , we require a little bit more control over the (super)martingales M k n (i), M k n (i) than just their convergence, as we need to be able to bound their suprema, for which we introduce the following lemma: Proof. We note that the first result is implied if, for any ε > 0, P sup n≥i∨n0 M k n (i) ≥ ε for infinitely many i = 0, and similarly for M k n (i). We now use the 'good' event E ℓ (δ) := {|S j /j − E[F ] | ≤ δ ∀j ≥ ℓ}, where we take δ > 0 sufficiently small such that k ∈ (θ m (1 + δ), I). That is, we intersect with E ℓ (δ) and E ℓ (δ) c . By writing i.o. for 'infinitely often', we find where A i := {sup n≥i∨n0 M k n (i) ≥ ε}. We now show that the first probability on the right-hand-side equals 0 for every ℓ ∈ N, by showing the sum of indicators has a finite mean. We write (6.25) and first deal with the conditional expectation. We apply Doob's martingale inequality [22, Theorem II 1.7] to the events A i to find (6.26) where the first step holds by the monotonicity of the events {sup i∨n0≤n≤N M k n (i) ≥ ε}. Doob's martingale inequality holds for submartingales only, though. However, we can still prove the same upper bound for M k n (i), but a different technique is required. We define the stopping time τ ε := inf{n ≥ i ∨ n 0 | M k n (i) ≥ ε}. Then, for any N ∈ N, see also [22,Exercise 1.25,Chapter II]. We now use the optional sampling theorem [26,Theorem 10.10], which yields the required upper bound. Again, by monotonicity and taking N to infinity we obtain the same result. Now, using (6.26) in (6.25) and using Markov's inequality, on E ℓ (δ), which is finite by the choice of k and δ. For the second sum we cannot use the event E ℓ (δ) and therefore bound c k i∨n0 from above by 1. We note that we can indeed bound the mean of F +(k−1)+m0 k by a constant times 1 plus the k th moment of F . Namely, using the asymptotics of the Gamma function, with C 2 := max{C 1 , x * 0 x+(k−1)+m0 k µ(dx)} and x * such that for x ≥ x * , x+(k−1)+m0 k ≤ C 1 x k . It follows that the mean in (6.25) is finite and thus that the first probability on the right-hand-side of (6.24) equals 0. Hence, which tends to 0 as ℓ → ∞ by the strong law of large numbers, and so we obtain (6.23).
The final result we need comes from [1] and provides conditions such that the maximum of a double array converges to a certain limit: Proposition 6.7. [1, Proposition 3.1] Let {a n,i : i ∈ [n]} n≥1 be a double array of non-negative numbers such that (a) For all i ≥ 1, lim n→∞ a n,i = a i < ∞, Then, • max i∈[n] a n,i → max i≥1 a i , as n → ∞.
• In addition, there exist I 0 and N 0 such that max i∈[n] a n,i = a n,I0 for all n ≥ N 0 .
We now prove Proposition 6.1: Proof of Proposition 6.1. The focus of the proof is on the PAFUD model. The proof for the PAFRO model follows by setting m = 1, the proof for the PAFFD model follows in the same way, as we only look at the mean of M 1 n (i), which by Lemma 6.3 is a martingale for both the PAFUD and PAFFD model.
We start by proving (6.1). Take α ∈ (2, 1 + θ m ). Using Lemma 6.3, it directly follows that Note that for i ≥ n 0 the first term on the right-hand-side equals zero. We can then construct the inequalities max i∈[n] By Lemma 6.4, the last term on the right-hand-side tends to zero almost surely, as α − 1 < θ m . By the reverse triangle inequality, it follows that for x, y ∈ R n + , | max i∈ [n] x i − max i∈[n] which again tends to zero almost surely by Lemma 6.4, as it is a maximum over a finite number of terms. Therefore, assuming the limits exist, it follows that   Then, let η ∈ (1, (α − 2) ∧ 2) and let (ε n ) n∈N be a sequence such that ε n := n −β , with β ∈ (0, θ m η/(1 + (1 + θ m )η)). We split the maximum into two parts: indices i which are at most ε n n and at least ε n n and deal with these separately. (Note that β < 1 and thus ε n n → ∞.) We first define, for A ⊆ [n] and δ > 0,

This yields
We first investigate the latter probability. We write, where we bound the (n/i) 1/θm term from above by ε −1/θm n and take the maximum over the fitness variables and the absolute value separately. It is clear that the first maximum on the right-handside converges in distribution, as the number of terms in the maximum is of order n, and so the scaling is of the correct order. When i ≥ ε n n, the indices i tend to infinity with n, which indicates that the terms in the absolute value should be small by Lemma 6.4. We show that the second maximum tends to zero almost surely even when multiplied with ε −1/θm n . In order to prove this, we use the bounds in (6.10). The upper bound, when i ≥ ε n n, is largest for i = ε n n. Thus, we have a uniform upper bound for all ε n n ≤ i ≤ n, For n large, the denominator in the sum can be bounded from below by mj/2 and the term in the logarithm can be bounded from above by 1 + 2(n 0 + 1)/(ε n n). Hence, we obtain the upper bound Similarly, the lower bound in (6.16) is largest when i = n − 1 (note that the second maximum in (6.32) is never attained at i = n, so we can ignore this case), from which we obtain for some constant C > 0. It then follows that, as ε −1/θm n = n β/θm ≥ 1, a(e x − 1) ≤ e ax − 1 for all x ∈ R when a ≥ 1, Clearly, the first argument tends to zero, as β < θ m . What remains to prove is that the second argument of the maximum on the right-hand-side of (6.33) converges to zero in probability. The first term in the exponent tends to zero, as 1 − β > β/θ m by the choice of β. For the second term, using Markov's inequality, for any δ > 0, where we note that η ∈ (0, (α − 2) ∧ 2), such that we can apply the Marcinkiewicz-Zygmund inequality as in (6.12). This yields, for some constant C > 0, the upper bound which tends to zero by the choice of β. It now follows that the right-hand-side of (6.33) tends to zero in probability. This implies, using Slutsky's theorem [24], that for any δ > 0, For the first probability on the right-hand-side of (6.31), we show that max i≤εnn (F i /u n )(n/i) 1/θm tends to zero in probability when n tends to infinity and that max i≤εnn |(c 1 i /c 1 n )(i/n) 1/θm − 1| converges almost surely. We focus on the former first. The claim is proved by using the Poisson Point Process (PPP) weak limit. Recall Π n in (5.1) and its weak limit Π. We write where δ is a Dirac measure, and Π is a PPP on (0, 1) × (0, ∞) with intensity measure ν(dt, dx) := dt × (α − 1)x −α dx [21,Corollary 4.19]. We now define Π ′ to be the PPP on R + obtained from mapping points (t, f ) ∈ Π to f t −1/θm and let Π ′ ε be the restriction of Π ′ to points (t, f ) such that t ≤ ε. More formally, Now, we fix an arbitrary δ, η > 0. Then, we can find an ε > 0 sufficiently small, such that P max (t,f )∈Π:t≤ε is satisfied. Due to (6.35) and the continuous mapping theorem, any continuous functional T of Π n converges in distribution to T (Π). We use this to compare the law of max i≤εn (F i /u n )(i/n) −1/θm and max (t,f )∈Π:t≤ε f t −1/θm by defining, for ε ∈ (0, 1], the functional T ε , such that T ε (Π) := max (t,f )∈Π:t≤ε f t −1/θm . Let M k := {Π ∈ M p (E) | T ε (Π) < k}, k ∈ N. Then, on M k , T ε is continuous, and thus T ε is continuous on ∪ k∈N M k . Since the point processes Π with intensity measure ν as described above are such that T ε (Π) is finite almost surely, as follows from (6.36), Π ∈ M k for some k ∈ N and thus T ε is continuous with respect to Π almost surely for any ε ∈ (0, 1]. It follows that, for δ, η fixed, ε chosen such that (6.36) holds and n sufficiently large, As ε n decreases monotonically, ε n < ε for n sufficiently large. Hence, it follows that for n large, We therefore can conclude that max i∈[εnn] (F i /u n )(i/n) 1/θm P −→ 0 as n → ∞, as η is arbitrary. We now show that max i≤εnn |(c 1 i /c 1 n )(i/n) 1/θm − 1| converges almost surely. Because of Lemma 6.4, for each fixed i ∈ N, |(c 1 i /c 1 n )(i/n) 1/θm − 1| converges almost surely to some limit random variable A i , and A i = A j almost surely for all i = j. Using the lower and upper bound in (6.10), we obtain for every i ≥ n 0 + 1 fixed and n ≥ i, As the sums in the maximum are almost surely finite for all i ∈ N, as follows from the proof of Lemma 6.4 and the strong law of large numbers, lim i→∞ B i = 0 almost surely. Thus, combining the above steps with Lemma 6.7, we conclude that as n → ∞, and there exist almost surely finite random variables I, N , such that the maximum is almost surely attained at index i = I for all n ≥ N . It thus follows that the maximum converges almost surely to an almost surely finite limit A I . We can now conclude that, as ε n n → ∞, which, together with (6.37), yields Combining this with (6.31) and (6.34), we obtain (6.30). By a similar argument as before, we find, Thus, combining (6.29), (6.30) and (6.38) and applying Slutsky's theorem [24], we arrive at the desired result.
We now prove (6.2) and so we let α ∈ (1, 2). An important result is stated in Proposition 5.1. By the construction of Π n in (5.1) and the definition of T ε in (5.2), it follows that as for s ∈ [j/n, (j + 1)/n) the integrand is constant. Hence, by  Recall the result in (6.29) regarding the limit of the maximum conditional mean. The above is therefore implied by the following two statements:     We start by proving the first line of (6.40). Let us write Z j := m 0 + m(j − n 0 ) + S j . By (6.28), it follows that as the terms within the brackets on the right-hand-side are a.s. positive. Then, we further bound the expression on the right-hand-side from above by splitting the maximum into two parts, as where i n is strictly increasing and tends to infinity with n. We first investigate the second maximum, by bounding the terms within the brackets. Namely, Now, fix ε > 0. By (4.33) there exists an almost surely finite random variable J such that for all j ≥ J, for some constant C > 0, as we can bound an exponentially decaying sum by a constant times its first term. It follows, on i n ≥ J, which holds with high probability, and by (6.42), that which tends to zero in probability when i −2((2−α)/(α−1)−ε) n u n /n = o(1), that is, when i n = n ρ , with ρ ∈ (1/2, 1). On the other hand, when considering the first maximum in (6.41), we find where we bound the terms inside the brackets on the left-hand-side by omitting all negative terms and by noting that c 1 i ≤ 1 for all i. The right-hand-side of (6.44) converges to zero in probability when u in /n = o(1), that is, when i n = n ρ with ρ < α − 1, since c 1 n converges almost surely for α ∈ (1, 2) by Lemma 6.4. We conclude that for α ∈ (3/2, 2) we can find a ρ ∈ (1/2, α − 1) such that both maxima tend to zero in probability. When α ∈ (1, 3/2], however, such a ρ cannot be found and more work is required to prove the desired result. In this case, we split the maximum into K = K(α) < ∞ maxima, as follows: Let A i,n := F i /n, B i,n := (c 1 i /c 1 n − 1) − n j=i m/Z j . Then, we define i k n := n ρ k , k = 0, 1, . . . , K, with ρ 0 = 0, ρ K = 1, and where c := 2(2 − α) − 2ε(α − 1) = 1. Note that ρ k is strictly increasing in k, independent of c < 1 or c > 1. We now write max i∈[n] We first deal with the k = 0 term. As in (6.44), since ρ 1 < α − 1, max i 0 n ≤i≤i 1 n A i,n max i 0 n ≤i≤i 1 n B i,n tends to zero in probability. For k = 1, . . . , K − 2, following the same steps that lead to the bound in (6.43), we obtain for some constant C k > 0. This upper bound tends to zero in probability when is satisfied. By the definition of ρ k , this holds when which is indeed the case. Finally, for k = K − 1, again using the similar bound as in (6.43), we find that the final term of the sum in (6.45) converges to zero in probability when ρ K−1 ∈ (1/2, 1). What remains to show, is that for all α ∈ (1, 3/2] there does exist a finite K such that ρ K−1 ∈ (1/2, 1). We distinguish two cases: α = 3/2 and α ∈ (1, 3/2). For the first case, c < 1 for any choice of ε. This implies that ρ k → 1/(2ε) as k tends to infinity, so taking ε < 1 suffices. For α ∈ (1, 3/2), we can choose ε sufficiently small, such that c > 1, so that ρ k diverges. In both cases there therefore exists a K such that ρ k > 1/2 for all k ≥ K − 1. Thus, in both cases, we can define K := inf{k ∈ N | ρ k > 1/2} + 1. The only issue left to address regarding K, is that it is possible that ρ K−1 > 1. However, in that case we can simply choose ρ K−1 = a, for any a ∈ (1/2, 1), since ρ K−2 ≤ 1/2 < a by the definition of K, and decreasing ρ K−1 does not violate the constraint in (6.46) for k = K − 2. We hence obtain the first line in (6.40).
The proof for the second line in (6.40) follows similarly. First, by letting i = i(n) tend to infinity with n, we bound, conditional on {i ≥ J}, for some constant C ≥ m + m 0 . We note that this bound is similar to the upper bound for (c 1 i /c 1 n − 1)− n j=i 1/(j + S j /m) in (6.42). Also, both sums on the left-hand-side of (6.47) converge almost surely, as α ∈ (1, 2). Thus, a similar approach, with the same indices i 0 n , . . . , i K n can be used to obtain the desired result. Combining both statements in (6.40) and using the triangle inequality and the continuous mapping theorem proves (6.39), which together with Proposition 5.1 finishes the proof.
We now prove Proposition 6.2: Proof of Proposition 6.2. The focus of the proof is on the PAFUD model, for which we use the martingales M k n (i). The proof for the PAFRO model follows by setting m = 1, and for the PAFFD model it follows in a similar fashion, where all upper bounds still hold when the supermartingale M k n (i) is to be used. We prove (6.3) first. Applying (6.28), a p th moment bound for some p > 1 to be determined later, using Markov's inequality and Hölder's inequality yields where k > p/2 is an integer. As Z n (i) − E F [Z n (i)] = (Z n (i) + F i ) − E F [Z n (i) + F i ] and 2k is even, we find, using Hölder's and Jensen's inequality and setting X n (i) := Z n (i) + F i , Using that it follows that both sums in the last line of (6.48) equal 2 2k−1 . We can thus bound (6.48) from above by We now aim to bound the 2k th moment of Z n (i) + F i . Since, for x ≥ 0, k ∈ N, x 2k ≤ 2k j=1 (x − (j − 1)) = x+(2k−1) 2k (2k)!, it follows from Lemma 6.3 that We note that this inequality would still hold for the PAFFD model, when using the supermartingales M k n (i) and the sequences c k n (i). We thus obtain the upper bound where P 2k−1 (x) = (2k)! x+2k−1 2k − x 2k is a polynomial of degree 2k − 1. Using (6.27), we find Therefore, using this in (6.50) we obtain an upper bound that contains powers of F i of order at most 2k−1. This is the essential step to proving concentration holds. Namely, in (6.49), this upper bound yields an expression with powers of F i of order at most p(1 − 1/2k), which is just slightly less than p. The aim is, for every value of α > 2, to find values p, k such that the p(1 − 1/2k) th moment of F exists and such that the entire expression in (6.49) still tends to zero.
Hence, taking the mean, we obtain n p/θm u p n n i=1 i −p/θm E F p(1−1/(2k)) i ≤ C n p/θm u p n n (1−p/θm)∨0 , with C > 0 a constant. This tends to zero with n, as u n = n 1/(α−1) ℓ(n) for some slowly-varying function ℓ(n), and both p > α − 1 and θ m > α − 1 hold. So, the last expression in (6.53) consists of an almost surely finite random variable (the exponential term) and a term that converges to zero mean, which implies that the entire expression converges to zero in probability. The same argument holds also for all other values of ℓ in (6.52). Thus, as n tends to infinity, E F [Z n (i)]| > ηu n P −→ 0. (6.54) As this conditional probability measure is bounded from above by one, it follows from the dominated convergence theorem and (6.54) that (6.3) holds.
We now prove (6.4), so let α ∈ (1, 2). A different approached is required, so we write, using (6.28), a union bound and Chebyshev's inequality, Var F (M 1 n (i)). (6.55) We now use the martingale property to split the variance in the variance of martingale increments.