A central limit theorem for the gossip process

The Aldous gossip process represents the dissemination of information in geographical space as a process of locally deterministic spread, augmented by random long range transmissions. Starting from a single initially informed individual, the proportion of individuals informed follows an almost deterministic path, but for a random time shift, caused by the stochastic behaviour in the very early stages of development. In this paper, it is shown that, even with the extra information available after a substantial development time, this broad description remains accurate to first order. However, the precision of the prediction is now much greater, and the random time shift is shown to have an approximately normal distribution, with mean and variance that can be computed from the current state of the process.


Introduction
A model for the spread of information in space, in which random long-range contacts facilitate spread, was introduced in Aldous (2012). Individuals are represented as a continuum, evenly spread over a two-dimensional torus of large area L. Information spreads locally at constant rate from an individual to his neighbours, so that a disc of informed individuals, centred on an initial informant, grows steadily in the torus. However, information is also spread by long range transmissions to other, randomly chosen points of the torus, according to a Poisson process, whose rate is proportional to the area of currently informed individuals. Any such transmission initiates a new disc of informed individuals. Chatterjee & Durrett (2011) showed that, after some randomness in the initial stages of the process, the proportion of informed individuals settles into an almost deterministic development. This result was generalized to gossip processes on rather general homogeneous Riemannian manifolds by Barbour & Reinert (2013), and to related 'small world' processes, and a uniform bound on the approximation error was also derived. In addition, the equation describing the deterministic development was interpreted in terms of the Laplace transform of the limiting random variable corresponding to an associated Crump-Mode-Jagers (CMJ) branching process.
In this paper, we consider the gossip process (L t , t ≥ 0) evolving on a smooth closed homogeneous Riemannian manifold C of dimension d, such as a sphere or a torus, having large finite volume |C| =: L with respect to its intrinsic metric. An individual at P ∈ C informed at time 0 gives rise to deterministic local spread that informs the set K(P, s) by time s > 0; in addition, random 'long range transmissions' to independent and uniformly distributed points of C occur at rate ρ times the intrinsic volume of the set currently informed. Thus the process can be constructed from knowledge of the points 0 = τ 0 < τ 1 < · · · of a point process Π on R + , together with an independent sequence of independent points P 1 , P 2 , . . ., uniformly distributed in C, and an initial point P = P 0 . The informed set is denoted by and has intrinsic volume denoted by L t . The point process Π is simple, and has conditional intensity ρL t at time t with respect to the filtration (F t , t ≥ 0), where F t := σ((τ j , P j ), j ≥ 0, τ j ≤ t).
The sets K(P, s) are assumed to be closed balls, centred at P and of radius s, with respect to a metric that makes C a geodesic space: P ′ ∈ K(P, 2t) exactly when K(P, t) ∩ K(P ′ , t) = ∅. Since C is assumed to be homogeneous, the volume of K(P, s) is independent of P , and we will therefore denote it by ν s (K). The sets K(P, s) are also assumed to be locally almost Euclidean in the sense that ν s (K) ≈ s d ν(K) for some constant ν(K) > 0. More precisely, we will assume that, for constants c g , γ g > 0, The quantity ν(K) > 0 has physical dimensions (length/time) d , so that ν(K) 1/d can be interpreted as a local velocity of spread of information in any particular direction. Assumption (1.2) is satisfied, for instance, for balls with respect to geodesic distance on a sphere, when γ g = 2 in all dimensions d ≥ 2.
Define λ := {ρd!ν(K)} 1/(d+1) , set a dimensionless quantity, and suppose that Λ is large. Then, to start with, the points of Π closely match the birth events of a CMJ process X, whose birth intensity as a function of age s is given by ρν s (K). In fact, the approximation L t of L t , constructed by using the CMJ process X to approximate Π and with the same sequence of points (P j , j ≥ 1), is excellent for times t ≤ αλ −1 log Λ if α < 1/2, and still gives an approximation to the volume L t of L t at time t that is accurate to the first order if α < 1. This approximation takes the form for a constant K, where W is a limiting random variable associated with the CMJ process X. Taking t = t Λ (u) := λ −1 (log Λ + u) (so that u ≤ (α − 1) log Λ is large and negative in the range in which this approximation holds), this implies that L t Λ (u) /L closely follows the curve u → ℓ(u), where ℓ(u) = Ke u , but with a random time shift of log W . Theorem 3.2 of Barbour & Reinert (2013) shows that such an approximation holds, with uniformly small error, for all values of u, provided that the function ℓ is appropriately defined; clearly, for u large and negative, ℓ(u) ∼ Ke u . For any fixed u, the distribution of L t Λ (u) /L is close to that of ℓ(u + log W ), which is a bounded non-degenerate random variable, and is hence not normally distributed. Thus, at first sight, a central limit theorem does not seem natural. However, it may be of interest to predict the size of the informed set at t Λ (u), based on information at a time v = αλ −1 log Λ with α < 1, when the size of the informed set is still relatively small, but there is much more information available than there was at time 0. Here, there is an approximation W (v) to the limit random variable W that is already reasonably accurate (in fact, E{(W (v) − W ) 2 } = O(Λ −α )). It is then reasonable to ask whether the difference ∆(v) := W (v) − W , suitably normalized, is approximately normally distributed, and whether the conditional distribution of L t Λ (u) /L, given what is known at time v, is close enough to that of where Dℓ denotes the derivative of ℓ. Note that since ∆(v) has to be multiplied by Λ α/2 to get a non-trivial normal approximation, the error in approximating L t Λ (u) /L by ℓ(u + log W ) has to be shown to be of smaller order than Λ −α/2 , if (1.5) is to be useful. This is a stronger statement than that of Theorem 3.2 of Barbour & Reinert (2013), in which the error is shown only to be of order O(Λ −γ ), for some possibly very small γ > 0. We carry through the above programme in the next two sections. In the first, a normal approximation is established for ∆(v). For this, it is easier to work with a 'flattened' CMJ process X, rather than with the original CMJ process X. The process X has birth rate at age s given by ρs d ν(K), and is thus the same process for all L, whereas X depends implicitly on L through the function ν s (K). The quantity λ then turns out to be the Malthusian parameter of X. In a CMJ process with Malthusian parameter µ, at large times, a randomly sampled individual has average age approximately 1/µ. For X, µ = λ, and replacing s by 1/λ in (1.2) confirms that the two CMJ processes X and X have birth rates that are close to each other if Λ is large. The proof of the normal approximation to ∆(v) is now accomplished by defining a collection of martingales (W j (·), 0 ≤ j ≤ d) associated with X, with W (t) := W 0 (t), defined in (2.12) below, being non-negative and square integrable, having limit W (∞) =: W . It is then shown that W (v) − W , suitably normalized, is close enough to the integral of a function f (W (v), u) with respect to an independent standard Brownian motion B(u), giving the normal approximation.
In Section 3, it is shown that (1.5) can be justified with sufficient accuracy, using ∆(v) := W (v) − W as derived from X above. This involves comparing X and X, introducing further CMJ processes X + and X − as upper and lower bounds to do so, and then using a forwards-backwards argument analogous to that in Barbour & Reinert (2013), in which more details are to be found. The key observation is that, if the process L t starts with a single informed point P at time 0, then a point P ′ has been informed by time 2t exactly when L t ∩ L t has non-empty intersection, where L t is the process run backwards from P ′ . The intersection probabilities are computed using flattened CMJ approximations to both forward and backward processes.
To state our theorem, we take as an approximation to W , where the set J v denotes all non-intersecting neighbourhoods of L v . For each of these, the radii (v − τ j ) can be determined, and so W (v) can be derived from L v . Then letĉ d := d!/(d + 1), and and define ℓ(u) : Let d BW denote the bounded Wasserstein distance between probability measures on R: where F BW consists of all Lipschitz functions f : R → [−1, 1] whose Lipschitz constant is at most 1. The theorem is as follows.
In fact, the proof shows a little more: that we could realize the normal random variables N (0, σ 2 (u, W (v))), u 1 ≤ u ≤ u 0 as σ(u, W (v))N for the same standard normal random variable N, giving a functional version of the theorem. The interpretation of the result is that the fluctuations after v are dominated by the randomness in the period immediately following v, even when v is as large as αλ −1 log Λ, and its effect is the same for all t Λ (u). This at first sight surprising result reflects the phenomenon common to branching processes, that the randomness determining the growth of a super-critical branching process occurs at the very beginning of its development.

The branching process
In this section, we define the random variable W as the limit of a martingale W (t) as t → ∞, and then show that (W (t) − W ) is approximately normally distributed. We define W by way of a 'flattened' version X of the CMJ branching process X. The process X is the counting process associated with a point process (τ j , j ≥ 0) on R + , withτ 0 = 0 a.s., whose compensator is given by A(t) := t 0â (u) du, whereâ(u) := ρν(K) j:τ j ≤u (u −τ j ) d , and where ρ, as before, denotes the intensity per unit volume. At time t, X(t) can be thought of as consisting of M 0 (t) := 1 + max{r ≥ 0 :τ r ≤ t} neighbourhoods, whose volumes at time t are given by (t −τ r ) d ν(K), asymptotically close to, but not the same as the volume ν t−τr (K). The intensityâ is then precisely that of a CMJ process, in which neighbourhoods play the part of individuals, and an individual of age s has offspring at rate ρν(K)s d . The mean number of offspring of an individual is thus infinite, but the Malthusian parameter λ, chosen so that the equation is satisfied, is finite and given by λ := (d!ρν(K)) 1/(d+1) .
We can immediately deduce some useful general properties of the process X. To start with, it follows from Ganuza & Durham (1974, Theorem 1) that there exist finite constants c 1 and c 2 such that, for all u > 0, Then the intensityâ(u) can be expressed as ρν(K)M d (u), where This in turn implies from (2.1) that using Cauchy-Schwarz for the second inequality. However, X also has special structure that will prove useful in what follows, relating to the sums of the l-th powers of the ages of the neighbourhoods. Note that M d (t) is as defined previously, and that d dt (2.5) Since M 0 has intensityâ = ρν(K)M d , letting Z denote a unit rate Poisson process, we can write Defining H i (t) := M i (t)λ i /i!, for any λ > 0, the equations (2.5) reduce to (2.7) with the particular choice λ := (d!ρν(K)) 1/(d+1) , equation (2.6) becomes where H 1 denotes the process with λ = 1. Note that, since ρ may depend on L, so also may λ.
In order to describe the properties of the process X in more detail, we introduce the (complex valued) processes where x j := exp{2πıj/(d + 1)} ∈ C, j ∈ {0, 1, . . . , d}, which are martingales with respect to the natural filtration ( F t , t ≥ 0) of X. In particular, for j = 0, we have x j = 1, and is a real valued, càdlàg martingale, and plays a key part our arguments. It is shown in the next lemma that it is also non-negative, and the rest of the section is then devoted to proving a normal approximation to e λt/2 (W (t) − W (∞)), which is the basis for the central limit theorem for the gossip process itself. Note that the distribution of W (·) can be derived from the corresponding martingale W 1 (·) for the process with λ = 1, since, from (2.10), from this, it also follows that the distribution of W (∞) is the same for all λ. The remaining martingales W j are useful, because they enable the quantities H j (·) to be expressed in a tractable form, as in the next lemma.
Lemma 2.1 W (t) > 0 for all t ≥ 0, and, for 0 ≤ j ≤ d, we have Proof: It follows from (2.7) that, for any x ∈ C, and, by partial integration, that (2.14) Taking x = x j for any j ∈ {0, 1, . . . , d}, we have x d+1 = 1, making the right hand side equal to W j (t), because λH d (u) du = H d+1 (du) = A(du), by (2.7) and (2.8); hence The first statement of the lemma follows by taking j = 0, and the second by using the orthogonality relation d l=0 x l j x r l = (d + 1)δ jr . Now, writing r j := ℜx j and noting thatâ(u) = λH d (u) ≤ λe λu W (u), it follows that, for 0 ≤ j ≤ d, We shall exploit more detailed versions of these asymptotics in Section 3. The distribution of W , through its Laplace transform φ ∞ as in (1.8), already appears in the statement of Theorem 1.1, and is the same for all λ, as remarked following (2.13). Using (2.16), we can now establish some of the key properties of φ 1 s (θ) := E{e −θW 1 (s) }.

Lemma 2.2
With φ 1 s defined as above, and for any s, h, θ > 0, we have Proof: We note that W 1 (s) ≥ 0 and that EW 1 (s) = 1 for all s. Then, writing X s (h) := W 1 (s + h) − W 1 (s) and using (2.16), we have This implies that For the second, the argument is similar, based on writing After first taking expectations conditional on F s , this yields In order to use Lemma 2.1 to describe further the behaviour of the H j (t), we need good control of the fluctuations of the processes (W l , 0 ≤ l ≤ d). As indicated by (2.16), their asymptotic behaviour depends substantially on whether or not r l > 1/2. Note, for future reference, that min{(1 − r 1 ), 1/2} = ζ(d), where ζ(d) is as in (1.7).

Lemma 2.3
For any 1 ≤ l ≤ d and 0 < η < min{(1 − r l ), 1/2}, and for any K > 0, define the events Then there exist constants C(l, η), 0 ≤ l ≤ d, such that, for all K > 0, Proof: Combining (2.15) with (2.9), it follows that L (W 0 (s), . . . , W d (s)), s ≥ v | F v depends on F v only through the value of H(v). Then, noting that, for r + η ≤ 1, 1 ≤ l ≤ d and for any t ≥ v, and using Kolmogorov's inequality on the real and imaginary parts of W l , it follows that where the final bounds follow from (2.16). Adding over t ∈ {v + jλ −1 , j ∈ Z + }, and taking gives the result for 1 ≤ l ≤ d. For l = 0, the result is proved in analogous fashion, starting from sup As a result of this lemma, we can sharpen (2.17) by giving an explicit bound on the error made when approximating e −λt H j (t) by W (v)/(d + 1) for any t ≥ v. To state the bound, we define The aim of this section is to prove an approximation theorem, when v is large, for the process X We recall (2.6) and (2.8), and use the representation (2.11), writing Once again, the process X Since the expression (2.22) is too complicated to use directly, we simplify it in a series of stages.
We start by approximating ; the precise result is as follows. Note that, for our purposes, γ η (v) can be thought of as small.
Writing w = H d+1 (u + v) and inverting, it then follows immediately that establishing the lemma.
This now allows (2.22) to be rewritten in the form where Z (2) is a unit rate Poisson process, with respect to which both upper limit and integrand are predictable, the latter being decreasing in w and bounded between for all w ≥ 0, on the event E η 1 (v). In order to show that we can replace both the integrand and the upper limit of integration in (2.26) with simpler expressions, without making too great an error, we use the following standard lemma.
where Z is a Poisson process and the process F is predictable and a.s. bounded in modulus by the deterministic function G.
Proof: For any θ, the process is a supermartingale (van de Geer (1995, p. 1795)), and stopping at a easily yields P sup giving the first conclusion of the lemma. The second follows by choosing We first apply Lemma 2.5 to replace the integrand in (2.26), showing that X Lemma 2.6 With the above definitions, for any η < ζ(d) and any v ≥ v − (η), we have is an integral of the form considered in Lemma 2.5, albeit with a random upper limit, and its corresponding function F satisfies , in view of (2.27). We can thus apply Lemma 2.5 to the process X with F (t) := F (t)1{|F (u)| ≤ G(u), 0 ≤ u < t} and with G(u) := G(u) as in (2.29), noting that then, recalling (2.21), , and the result follows.
The next step is to simplify the upper limit in (2.28), using Lemma 2.5 to show that, with t v (s) as defined in (2.24), (X given by For this, we need to control sup s≥0, |z|<hv(s) |X In the second range of s, we define We need 4ε η (v) here as the bound on the supremum difference, rather than the usual 3ε η (v), because it is possible to have s(1 −g(v)) < s j−1 for some s j < s < s j+1 ; however, it then has to be the case that, for such s, In view of Lemma 2.7 and (2.25), we immediately have the following corollary.
We now show that X (2) v is close in distribution to the process X where, for the integrator, the compensated Poisson process Z (2) (w) − w from X (2) v has been replaced by a standard Brownian motion B(w). Note that e λv/2 X (3) v is itself just a time-changed Brownian motion: Proof: For any r ≥ 1, there are constants C r , K r with the property that, for any n ≥ 1, a standard Poisson process Z and a standard Brownian motion B can be constructed on the same probability space in such a way that P[A c r (n)] ≤ K r n −(r+1) , where v , which we express, by partial integration, in the form Taking the difference, it is immediate that, for 0 ≤ s ≤ e 3λv and on A r (e 3λv ), This shows that, on A r (e 3λv ), The same bound is satisfied also for sup e 3λv ≤s<∞ |X v (e 3λv )|, as can be deduced from the representation (2.35). Now choose v 3 ≥ λ −1 so that 8 exp{−(e/2)(C r λv 3 ) 2 e λv 3 } ≤ e −3rλv , and set v 0 := max{v 1 , v 2 , v 3 }.
Summarizing the conclusions Lemmas 2.6 and 2.9 and of Corollary 2.8, we have the following theorem. v on the same probability space, in such a way that, for all v ≥ λ −1 c 1 , ) is as defined in (2.31), and the constants c 1 and c 2 can be deduced from Lemma 2.9 with r = 1.
The statement of the theorem involves the σ( H(v))-measurable random variables W (v), Q(v), K(v) and θ i (v), 1 ≤ i ≤ 4, and it is useful to have some idea of their magnitude. To derive appropriate statements, we begin with the random elements W (v) and W l (v), 1 ≤ l ≤ d.
for a suitably chosen w 0 > 0.
Proof: The first part follows from (2.16) and Chebyshev's inequality, and, for W (v), the bound on the upper tail holds because Var W (v) ≤ Var W (∞) ≤ 1 and EW (v) = 1. For the lower tail, note that W (∞) > 0 a.s., so that, because W (·) is càdlàg and positive on R + , we have W * := inf t>0 W (t) > 0 a.s. also. Suppose that w 0 > 0 is such that P[W * ≥ w 0 ] ≥ 1/2. Then W (t) > x if any of the offspring of the initial individual that are born before time t x generate families with W * > w 0 , where e −λtx = x/w 0 . The probability that there are no such offspring is just exp{−ρν( In view of (2.19), if 0 < η < ζ(d), then Q(v) ≤ 3(d + 1) on the event if v is such that e 2ληv/3 ≥ (d + 1) 3 , and hence, for such v, for a suitable constant C 21 ; in addition, and θ 3 (v) are super-exponentially small in λv, and, by the last inequality in Lemma 2.11, an upper bound for the times to be considered in proving the central limit theorem.

The central limit theorem
In order to prove a central limit theorem for the size L t of the informed set L t , we make comparisons between a number of processes by realizing them on the same probability spaces. The process L itself can be realized by starting with the times (τ j , j ≥ 0) of the branching process X, paired with a sequence of independent uniform points (P j , j ≥ 0) of C. This yields a process in terms of which we define We can then define the set valued process obtained by taking the unions of the neighbourhoods generated by Y (t). The process Y can be augmented to a process Y of quadruples, by including a set of pairs ((K(j), Q j ), j ≥ 0), where 0 ≤ K(j) < j and Q j ∈ C, denoting the subsets from which the long range contacts were made and the positions of the individuals within them: given Y (τ j −), and Q j is then chosen uniformly from the set K(P K(j) ,τ j −τ K(j) ). The process L is derived from Y sequentially, by thinning. The pair (τ j , P j ) is not included in L unless K(j) = min{l ≥ 0 : Q j ∈ K(P l ,τ j −τ l )}. This thinning process ensures that, when neighbourhoods overlap in C, only contacts from the neighbourhood that was informed earliest are allowed, ensuring that the rate of long range transmissions from L t remains equal to ρL t . Note that, if P j ∈ Lτ j − , the pair (τ j , P j ) is included in defining L; however, it is redundant in (1.1), the newly informed individual having previously been informed, and it never contributes to further transmission, because of the definition of the thinning step. The resulting set of times and positions we denote by ((τ j , P j ), j ≥ 0), with and L is as given by (1.1); it satisfies L t ⊂ L t , with strict inclusion for all large enough times. The process L acts as a tractable upper bound for L, and it is useful also to have tractable lower bounds. In particular, when calculating the probability that a neighbourhood K(P, s) intersects L t , where s is fixed and P is a uniform random point of C, the way in which the neighbourhoods of L t intersect one another enters in a complicated way. However, if L t happened to consist of a union of non-intersecting neighbourhoods, which were also separated from one another by distance at least 2s, then the probability could be deduced by simply adding the intersection probabilities for the individual neighbourhoods. Then, because the neighbourhoods K are balls in a geodesic metric space, the probability of two neighbourhoods K(P, s) and K(Q, t) intersecting, if one or both of P and Q are chosen uniformly and independently in C, is given by q L (s, t) = L −1 ν s+t (K), (3.5) where ν s+t (K) can be estimated in terms of ν(K)(s + t) d , in view of (1.2). Of course, as t grows, intersections occur in L t , but, at least for a while, their effect may not be too large. So the next step is to construct subsets of L t with the necessary separation properties, and which are amenable to analysis. Fix any s, t > 0, and thin the process Y to obtain a set valued process L s,t as follows. Start with τ s,t 0 = 0 and P s,t 0 = P 0 , defining L s,t u := K(P 0 , u) for 0 ≤ u <τ 1 ; let R s,t 0 := ∅ denote the initial set of indices of censored points of Y . Then proceed sequentially. Suppose that the quadruples ((τ l , P l , K(l), Q l ), 0 ≤ l ≤ j − 1) ⊂ Y have already been considered. If K(j) ∈ R s,t j−1 , set R s,t j := R s,t j−1 ∪ {j} and proceed to the next quadruple; descendants of censored points are also censored. If not, thin much as in the construction of L, except that a point P j is also thinned if it belongs to N 2s+t−τ j (L s,t τ j − ), where, for V ⊂ C and u > 0, The extra thinning in (3.6) ensures that the neighbourhoods in L s,t t are at distance at least 2s from one another. If J s,t u denotes the set of indices of the points of Y that enter L s,t up to time u, then L s,t consists of disjoint neighbourhoods (K(P j , u −τ j ), j ∈ J s,t u ), and new points are generated at rate ρ j∈J s,t u v u−τ j (K)(1 − π s,t u ), where the censoring probability π s,t u is given by In our applications, we can find suitably small bounds for π s,t u , so that the growth of the numbers of neighbourhoods in L s,t is still reasonably close to that of the CMJ process X. In view of the 'hard core' censoring, the points (P j , j ∈ J s,t u ) are no longer independent of one another, but their marginal distribution is still uniform on C if P 0 is chosen at random. Note also that L s,t u ⊂ L u for each s, t ≥ 0 and 0 < u ≤ t. We shall also use comparisons between the CMJ process X and 'flattened' versions X − , X 0 and X + that are of the form discussed in the previous section. We start by noting that, from the inequality (1.2), (3.8) where t max (Λ) is defined in (2.43), and (3.9) Hence, up to time t max (Λ), the process X is stochastically dominated by the flattened process X + , defined as in the previous section, having intensity ρ + := ρ(1 + η Λ ) per unit volume, and hence growth rate λ + := λ{1 + η Λ } 1/d ; similarly, it stochastically dominates the flattened process X − with ρ − := ρ(1 − η Λ ) and λ − := λ{1 − η Λ } 1/d . We also define the flattened process X 0 with intensity ρ per unit volume, and with growth rate λ. The quantities M + j , M 0 j and M − j , and their standardized versions H + j , H 0 j and H − j , correspond to these processes. We make the relationships between the processes precise with the following construction.
Lemma 3.1 Let the successive birth times in the branching processes X, X − , X 0 and X + be denoted by (τ j ,τ − j ,τ 0 j ,τ + j , j ≥ 0), respectively, and let (T t , T − t , T 0 t , T + t ) denote the sets of birth times up to time t in each of the processes. If, for some 0 ≤ s < t max (Λ), T − s ⊂ T s ⊂ T + s and T − s ⊂ T 0 s ⊂ T + s , then the processes X, X − , X 0 and X + can be defined on the same probability space, in such a way that, for all s ≤ t ≤ t max (Λ), Proof: The birth rate of X at time t is given by and of X 0 by , by realizing X + on [s, t max (Λ)] together with an independent sequence of independent random variables (U j , j ≥ 1) uniformly distributed on [0, 1], and then thinning in the following way. At each successive pointτ + j > s, include it as a point of X if U j r( X + , t) ≤ r(X, t); similarly, if U j r( X + , t) ≤ r( X − , t), includeτ + j as a point of X − , and if U j r( X + , t) ≤ r( X 0 , t), includeτ + j as a point of X 0 . This construction preserves the inclusions (3.10) for all times up to t max (Λ), and, because independently thinned Poisson processes are again Poisson processes, also yields the right distributions for the processes X, X 0 and X − .
In what follows, we shall use F ++ t to denote the filtration for the combined construction in Lemma 3.1. We shall henceforth only conside times in [0, t max (Λ)], and will take Λ large enough that exp{2η Λ t max (Λ)} ≤ 2.
The first step in our detailed calculations is to replace L t /L with E{L t /L | F s }, where F s := σ( Y u , 0 ≤ u ≤ s), for suitable s < t; this conditional expectation is easier to handle. We start by bounding the conditional variance Var {L t /L | F s }, for suitable values of s < t.
The basis for our argument is given by the observations that (3.11) where K and K ′ are chosen independently and uniformly in C, implying that On the other hand, where L K t,s denotes the set of all points at time s that, if informed, would inform K by time t. Now, for the gossip process, L K t,s is independent of F s , and has the same distribution as L t−s . In view of (3.13), we thus have where L s is F s -measurable and L K t,s is independent of F s , and with L K t,s and L K ′ t,s independent of F s , but not of each other. Indeed, in view of (3.12), it is the extent of their dependence that measures Var {L t /L | F s }.
Writing t s := t − s, our argument now involves bounding the differences between the probabilities (3.14) and (3.15) and the smaller ones obtained by replacing L K t,s and L K ′ t,s by their related (independent) branching and growth processes L K and L K ′ . These, as observed in the joint construction at the beginning of the section, give rise to stochastically larger sets than L K t,s and L K ′ t,s . If both of the differences (3.16) and (3.17) are smaller than some ε, then the independence of L K and L K ′ immediately implies that Var {L t /L | F s } ≤ 4ε. Using this strategy, we prove the following lemma.
Lemma 3.2 Under the above assumptions, there is a constant C such that Proof: To control the differences (3.16) and (3.17), we begin by running a process Y K , defined following (3.1), until time t s , and thin to obtain L K t,s . As in (3.2), let J We then thin Y K further to construct the process (L 0,ts,K (u), 0 ≤ u ≤ t s ), by the method used to construct L s,t in (3.7).
We now consider the difference which is an upper bound for the real quantity (3.16) of interest to us. The quantity ∆ s,t is no larger than the conditional expectation given F s of the number Z K t,s of intersections between censored islands of L K ts and the islands of L s . If an island born in X K at u is censored, the expected number of censored islands that result at t s is at most c 1 e λ + (ts−u) , by (2.1) and because X K is stochastically dominated by X + . These islands each have radius at most (t s − u). Hence, given F s , and using (C j , j ≥ 1) to denote suitably chosen constants, the expected number of intersections resulting is at most in view of (3.5) and (1.2); N and M are as in (3.2). Similarly, the conditional probability π 0,ts,K u of an island born in X K at u being censored for L 0,ts,K , given the history up to u, is bounded above by Hence, again using N K as an upper bound for the number of uncensored islands, and noting that the birth intensity in X K at time u is at most ρ j∈Ju ν u−τ K j (K), we have Now, by (2.3) and Cauchy-Schwarz, and because X K is stochastically dominated by X + , Using this in (3.18), and noting that λ + ≤ λ(1 + η Λ ), gives the following bound for (3.16): We now need to bound (3.17). This can be done by introducing a process L 0,ts,K,K ′ , constructed in the same way as L 0,ts,K , but starting from two initial points K, K ′ and using a CMJ process X K,K ′ , which is the same as using two independent CMJ processes X K and X K ′ , by the branching property. Now L 0,ts,K,K ′ (t s ) ⊂ ( L K t,s ∪ L K ′ t,s ), and the conditional expection given F s of the number Z K,K ′ t,s of intersections between censored islands of X K,K ′ ts and the islands of L s satisfies by an argument exactly as before, but for a larger constant C 12 than C 10 appearing in (3.19). Since E{Z K,K ′ t,s | F s } is a bound for the difference in (3.17), we have enough to prove the lemma.
Remark. With s = α 1 λ −1 log Λ and t = α 2 λ −1 log Λ, where α 1 < α 2 ≤ 1, and since Our main interest is in approximating the distribution of L t /L when for u fixed. This is because the times (t Λ (u), u ∈ R) asymptotically represent the period in which L t /L increases from 0 to 1. As observed in the remark, for s := αλ −1 log Λ, . So pick v := α 1 λ −1 log Λ and s := α 2 λ −1 log Λ, with α 1 < α 2 < 1. Then in which the latter term, again by the remark, is typically of Supposing that Var {L t /L | F v } is actually of magnitude Λ −α 1 , this indicates that the conditional distribution of L t /L given F v is essentially that of the conditional distribution of E(L t /L | F s ) given F v . So the next step is to examine E{(1 − L t /L) | F s } in detail, for t = t Λ (u), and to express it in more amenable form. The next lemma once again uses the backward branching process L K from a randomly contains the information about when the islands of L K were formed, up to time v, but not where they are centred. We then write Z s for the number of islands of L K ts that intersect L s . Lemma 3.3 With the definitions above, there is a constant C such that Proof: We start by using (3.11), (3.13) and (3.19) to show that, for t > s, We now use Poisson approximation to approximate the probability P[L K ts ∩ L s = ∅ | F s ], using the conditional independence between the locations of the islands of L K ts , given F K s,t , as the basis of the approximation.
We first observe that the conditional probability that an island of L K ts with radius v intersects L s , given F K s,t , is at most in view of (3.5), and by (1.2) and (3.9). This, using Z s to denote the number of islands of L K ts that intersect L s , implies that and combining this with (3.22) gives the lemma.
We now define (3.25) as an approximation to M K s,t . The following lemma bounds the accuracy of the approximation for t = t Λ (u).
where, for any fixed u 0 , c = c(u 0 ) can be chosen to be uniform in u ≤ u 0 .
This theorem is not quite the same as Theorem 1.1, because both mean and variance are expressed in terms of W * (v), which is not necessarily determined by knowledge of L v alone, since all the birth times of X come into its definition. Instead, one can observe W (v) as in (1.6). We now show that this is enough.
We construct a lower bound W − (v) for W (v) by considering only the set of birth times of X defined by J v := j ≥ 0 : P j / ∈ l∈Jv l =j K(P j , 2v) , which give rise to neighbourhoods at time v that intersect no other neighbourhood, but not necessarily to all such. Then it is immediate from (1.2) that, for all Λ sufficiently large, the final inequality following from (2.1). Hence, for v ≤ t max (Λ), and, for v = α 1 λ −1 log Λ, this is of order O(Λ −1+α 1 (log Λ) 2d ). The most sensitive place where this enters is into K 1 (u, v), when the difference has to be small relative to Λ −α 1 /2 , because of the factor e λv/2 ; but this is the case if α 1 < 2/3, as in the statement of the theorem. The conversion of E * (v) into an event that can be determined from L v can be accomplished in similar fashion, by modifying the definitions of its constituent events in terms of W j (v), 0 ≤ j ≤ v.