An ergodic theorem for the frontier of branching Brownian motion

We prove a conjecture of Lalley and Sellke [Ann. Probab. 15 (1987)] asserting that the empirical (time-averaged) distribution function of the maximum of branching Brownian motion converges almost surely to a double exponential, or Gumbel, distribution with a random shift. The method of proof is based on the decorrelation of the maximal displacements for appropriate time scales. A crucial input is the localization of the paths of particles close to the maximum that was previously established by the authors [Comm. Pure Appl. Math. 64 (2011)].


Introduction
Branching Brownian Motion (BBM) is a continuous-time Markov branching process which plays an important role in the theory of partial differential equations [5,6,27], in particle physics [28], in the theory of disordered systems [9,17], and in mathematical biology [20,23]. It is constructed as follows. Consider a standard Brownian motion x(t), starting at 0 at time 0. We consider x(t) to be the position of a particle at time t. After an exponential random time T of mean one and independent of x, the particle splits into k particles with probability p k , where ∞ k=1 p k = 1, ∞ k=1 kp k = 2, and k k(k − 1)p k < ∞. The positions of the k particles are independent Brownian motions starting at x(T ). Each of these processes have the same law as the first Brownian particle. Thus, after a time t > 0, there will be n(t) particles located at x 1 (t), . . . , x n(t) (t), with n(t) being the random number of offspring generated up to that time (note that En(t) = e t ).
An interesting link between BBM and partial differential equations was observed by McKean [27]. If one denotes by This equation has raised a lot of interest, in part because it admits traveling wave solutions: there exists a unique solution satisfying u t, m(t) + x → ω(x) uniformly in x as t → ∞, (1.3) with the centering term, the front of the wave, given by and ω(x) is the unique solution (up to translation) of the o.d.e. 1 2 The leading order of the front has been established by Kolmogorov, Petrovsky, and Piscounov [24]. The logarithmic corrections have been obtained by Bramson [10], using the probabilistic representation given above. Equations (1.1) and (1.3) show the weak convergence of the distribution of the recentered maximum of BBM. Let x k (t) − m(t) , (1.6) and define for k = 1 . . . n(t), (1.7) With this notation, we consider the quantities (1. 8) In 1987, Lalley and Sellke [25] proved that lim t↑∞ Y (t) = 0 a.s. and lim t↑∞ Z(t) = Z a.s., (1.9) where Z is a strictly positive random variable with infinite mean. This paper is concerned with the large time limit of the empirical (time-averaged) distribution of the maximal displacement The main result is that F T converges almost surely as T → ∞ to a random distribution function. The limit is the double exponential (Gumbel) distribution that is shifted by the random variable Z: Theorem 1 (Ergodic Theorem). For any x ∈ R, almost surely, (1.11) where C > 0 is a positive constant.
The derivative martingale Z encodes the dependence on the early evolution of the system. The mechanism for this is subtle, and we shall provide first some intuition in the next section.
The limit (1.11) was first conjectured by Lalley and Sellke in [25]. They showed that, despite the weak convergence (1.3), the empirical distribution F T (x) cannot converge to ω(x) in the limit of large times (for any x ∈ R), and proved that the latter is recovered when Z is integrated, i.e. . (1.12) The issue of ergodicity of BBM has also been discussed by Brunet and Derrida in [14].
Ergodic results similar to Theorem 1 can be proved for statistics of extremal particles of BBM other than the distribution of the maximum. (This will be detailed in a separate work). Throughout the paper, we use the term extremal to denote particles at distance of order one from the maximum. We also refer to the level of the maximum of the positions as the edge, or frontier.
A description of the law of the statistics of extremal particles has been obtained in a series of papers of the authors [2,3,4] and in the work of Aïdékon, Beresticky, Brunet, and Shi [1]. It is now known that the joint distribution of extremal particles recentered by m(t) converges weakly to a randomly shifted Poisson cluster process. The positions of the clusters is a random shift of a Poisson point process with exponential density. The law of the individual clusters is characterized in terms of a branching Brownian motion conditioned to perform unusually large displacements. A description of such conditioned BBMs has been given by Chauvin and Rouault [15].
We point out that the interest in the properties of BBM stems also from its alleged universality: it is conjectured, and in some instances also proved, that different models of probability and of statistical mechanics share many structural features with the extreme values of BBM. A partial list includes the two-dimensional Gaussian free field [7,8,12], the cover times of graphs by random walks [18,19], and in general, log-correlated Gaussian fields, see e.g. [16,21].

Outline of the proof
It will be convenient to work with compact intervals D = [d, D] with −∞ < d < D < ∞ for the localization procedure introduced in Section 4. Convergence of the empirical distribution on these sets imply convergence of the distribution function F T (x). The proof of Theorem 1 goes as follows. First, we introduce a "cutoff" ε > 0 and split the integration over the sets [0, T ε] and (T ε, T ]. Precisely: with the above notations, we write The second term on the r.h.s. above does not contribute in the limit T ↑ ∞ first and ε ↓ 0 next. It thus suffices to compute the double limit for the first term. To this aim, we introduce the time R T > 0, which will play the role of the early evolution. The precise form is not particularly important and we will specify a choice only later. For the moment we only require that R T → ∞ as T ↑ ∞, but moderately, i.e. R T = o( √ T ) in the considered limit of large times. We rewrite the empirical distribution We now state two theorems which immediately imply Theorem 1: Theorem 2 below addresses the first term on the r.h.s of (2.2), while Theorem 3 addresses the second term.
The above statement is an improvement of [25,Theorem 1], where the probability was conditioned on a fixed time that only subsequently was let to infinity. The proof closely follows this caseand relies on precise estimates of the law of the maximal displacement obtained by Bramson [11].
Theorem 2 together with a change of variable and bounded convergence imply a.s., (2.4) which is the r.h.s. of (1.11). The integrand of the second term on the r.h.s of (2.2) has mean zero. Therefore, Theorem 1 would immediately follow from the above considerations if a strong law of large number holds. This turns out to be correct.  Contrary to the case of Theorem 2, whose short proof is given in Section 3, the Strong Law of Large Numbers (SLLN) turns out to be quite delicate. Due to the possibly strong correlations among the Brownian particles, it is perhaps surprising that a law of large numbers holds at all. Let T be large and consider two times s, s ∈ [0, T ]. It is clear that if the distance between s and s is of order one, say, then the extremal particles at s are strongly correlated with the ones at s , since the children of extremal particles are very likely to remain extremal for some time. Therefore, s and s need to be well separated for the correlations to be weak. On the other hand, and this is the crucial point, it is generally not true that the correlations between the extremal particles at time s and s decay as the distance between s and s increases. As shown by Lalley and Sellke [25, Theorem 2 and corollary], "every particle born in a branching Brownian motion has a descendant particle in the lead at some future time". Hence, if s and s are too far from each other (for example, if s is of order one with respect to T and s is of order T ), correlations build up again and mixing fails. Therefore, weak correlations between the frontiers at two different times only set in at precise time scales. It turns out that if s and s are both of order T , s, s ∈ [εT, T ] and well separated, i.e. |s − s | > T ξ for some 0 < ξ < 1, then the correlations between the frontiers are weak enough to provide a law of large numbers. By weak enough, we understand a summability condition on the correlations that lead to a SLLN by a theorem of Lyons, see Theorem 8 below. See Figure 1 a graphical representation. A precise control on the correlations is achieved by controlling the paths of extremal particles in the spirit of [2] (see Section 4 below for precise statements).

Almost sure convergence of the conditional maximum
We start with some elementary facts that will be of importance. First, observe that for t, s > 0 such that s = o(t) for t ↑ ∞, the level of the maximum (1.4) satisfies  k (t − s), k ≤ n (j) (t − s)} be all independent, identically distributed BBMs. The Markov property of BBM implies In particular, if F s denotes the σ-algebra generated by the process up to time s, the combinination of (3.1) and (3.2) yields for X ∈ R P ∀ k≤n(t) : 3) We will typically deal with situations where only a subset of {k : k = 1, . . . , n(t)} appears. In all such cases, the generalization of (3.3) is straightforward.
A key ingredient to the proof of Theorem 2 is a precise estimate on the right-tail of the distribution of the maximal displacement. It is related to [4,Proposition 3.3], which heavily relies on the work by Bramson [11].
Lemma 4. Consider t ≥ 0 and X(t) ≥ 0 such that lim t↑∞ X(t) = +∞ and X(t) = o( √ t) in the considered limit. Then, for X(t) and t both greater than 8r, for some γ(r) ↓ 1 as r → ∞ and C as in (1.12).
We lighten notations by setting , (3.8) and rewrite (3.7) accordingly: By a dominated convergence argument [11, Prop. 8.3 and its proof] one can prove that exists, uniformly for x in compacts. In fact, Bramson's argument easily extends to the case where x = o( √ t) (to see this, one simply expands the quadratic term in the Gaussian density appearing in the definition of the function G). Moreover, C(r) → C as r → ∞, with C as in (1.12), see [11, p. 145-146]. By Taylor expansion, for some function f (t, r; x, y ) which is integrable with respect to G(t, r; x, y )dy .
Plugging (3.11) in (3.9) we get the bounds The claim of the Lemma then follows by taking x ≡ X(t) in (3.12) and using(3.10).
Proof of Theorem 2. This is a straightforward application of Lemma 4 and the convergence of the derivative martingale. First we write We will prove almost sure convergence of the first term, the second being identical. Since s is in (ε, 1), we have R T = o(T · s) for T ↑ ∞. Therefore, by (3.1) and (3.2), and writing P M for integration with respect to the maximum, It immediately follows from the almost sure convergence of the derivative martingale that We may therefore use Lemma 4 to establish upper-and lower bounds for the probability of the maximum being larger than D + y k (R T ), precisely: The main contribution to both bounds above comes from the z k -terms defined in (1.7). Precisely, we write (3.16) as where , By (3.17), using that −a ≤ ln(1 − a) ≤ −a + a 2 /2 (valid for 0 < a < 1/2), and with the above notations, we obtain . To see that the ω terms in the lower bound do not contribute in the limit T ↑ ∞ (recall that R T ↑ ∞ as well), we observe that for some κ > 0 large enough and Y (R T ) as in (1.8), by (1.9). Therefore the ω term in the lower bound do not contribute in the limit T ↑ ∞.
Concerning the upper bound, the same argument as for the ω term together with the fact that Z(R T ) → Z as T → ∞ by (1.9) imply that The same is thus also true for k≤n(R T ) Ω k (R T ) 2 . It remains to show that Z (2) (R T ) → 0 almost surely, but this is evident since this sum is bounded from above by and both terms tend to zero, a.s., as T ↑ ∞ by (3.15) and (1.9). Therefore, by (3.19), almost surely. (3.23) This concludes the proof of Theorem 2.

The strong law of large numbers
This section is organized as follows. We introduce in subsection 4.1 a procedure concerning properties of the paths of extremal particles which we will refer to as localization. It is based on the description of the genealogies of extremal particles established in [2]. The details of the proof are given in subsection 4.2.
4.1. Preliminaries and localization of the paths. The following fundamental result by Bramson provides bounds to the right tail of the maximal displacement. These bounds are not optimal (they are surpassed by those of Lemma 4, which are tight), but they are sufficient and simpler.
We also recall an important property of the paths of extremal particles established by the authors in [2]. We introduce some notation. With t ∈ R + and γ > 0, we define We now choose values and introduce the time-t entropic envelope, and the time-t lower envelope respectively: and is the level of the maximum of a BBM of length t). By definition, and The space/time region between the entropic and lower envelopes will be denoted throughout as the time-t tube, or simply the tube. By a slight abuse of notation, given a particle k ≤ n(t) which is at position x k (t) at time t, we refer to its path as x k (s) where 0 ≤ s ≤ t. Moreover, we will say that a particle k is localized in the time t-tube during the interval (r, t − r) if and only if We say that it is not localized if the above requirement fails for some s in (r, t − r). The following proposition gives strong bounds to the probability of finding particles that are close to the level of the maximum at given times but not localized. It follows directly from the bounds derived in the course of the proof of [2, Corollary 2.6], cf. equations   What lies behind the Proposition is a phenomenon of "energy vs. entropy" which is absolutely fundamental for the whole picture. This is explained in detail in [2], but, for the reader's convenience, we briefly sketch the argument.
As it turns out, at any given time s ∈ (r, t − r) well inside the lifespan of a BBM, there are simply not enough particles lying above the entropic envelope for their offspring to make the jumps which eventually bring them to the edge at time t. On the other hand, although there are plenty of ancestors lying below the lower envelope, their position is so low that again none of their offspring will make it to the edge at time t. A delicate balance between number and positions of ancestors has to be met, and this feature is fully captured by the tubes.
With δ = δ(α, β, D) as in Proposition 6 we define We now consider the maximum of the particles at time s that are also localized during the interval (r T , s − r T ), see Figure 2 for a graphical representation. We denote this maximum by M loc (s). With this notation, by Proposition 6 and the choice (4.9), We pick R T ≡ 40 · r T , with r T as in (4.9). This choice clearly satisfies R T = o( √ T ) as required in Theorem 2. We emphasize that the prefactor is a choice. Only the condition R T > r T is needed.
We assume henceforth without loss of generality that both T and εT are integers.

4.2.
Implementing the strategy. Recall that Theorem 3 asserts that tends to zero as T goes to ∞. In order to prove the claim, we consider Rest loc ε,D (T ), defined as Rest ε,D (T ) but with the requirement that all particles in D are localized: (4.12) We now claim that the large T -limit of Rest loc ε,D (T ) and that of Rest ε,D (T ) coincide (provided one of the two exists, but this will become apparent below). Lemma 7. With the above notation, Proof of Lemma 7. We have (4.14) The proof that lim T ↑∞ (1) T ,ε = 0 and lim T ↑∞ (2) T ,ε = 0 (almost surely) is identical and relies on an application of the Borel-Cantelli lemma. We thus prove only the first limit. Let > 0. By the Chebeychev inequality, which is summable in T (recalling that we assume T ∈ N). Therefore, by Borel-Cantelli, As the above holds for all > 0 we have that (1) T ,ε converges to 0 as T ↑ ∞ almost surely, and concludes the proof of Lemma 7.
The following result is the major tool to establish the SLLN for the term Rest loc ε,D (T ). (By Lemma 7, this will then imply that the same is true for Rest ε,D (T )). The result is a small extension of a theorem of Lyons [26,Theorem 1], where the statement is given for the sum of random variables. Theorem 8. Consider a process {X s } s∈R + such that E[X s ] = 0 for all s. Assume furthermore that the random variables are uniformly bounded, say sup s |X s | ≤ 2 almost surely. If Proof. The extension to integrals is straightforward. In fact, by the summability assumption, we can find a subsequence T k ∈ N of times such that where T k → ∞ and T k+1 /T k → 1. (See [26, Lemma 2]). Therefore by Fubini, the sum without the expectation is almost surely finite, and we must have It remains to show this is true for all T ∈ N. This is easy since the variables are bounded. For any T , there exists k such that T k ≤ T ≤ T k+1 . Thus (4.21) The first term goes to zero by the previous argument. The second term goes to zero since and T k+1 /T k → 1.
Note that with obvious notations. The goal is thus to prove that both integrals satisfy the assumptions of Theorem 8. We address the first integral, the proof for the second being identical. By construction, X It therefore suffices to check the assumption concerning the summability of correlations.
Note that by the properties of conditional expectation (4.26) We claim that In order to see this, and proceeding with the program outlined at the end of Section 2, we now specify the concept of times well separated from each other. Choose 0 < ξ < 1 and split the integration according to the distance between s and s : The contribution of the first term on the r.h.s. above is negligible due to the uniform boundedness of the integrand and to the choice 0 < ξ < 1. We are thus left to prove that the contribution to (4.27) of the second term in (4.28) is finite. The following is the key estimate.
Theorem 9. There exists a finite T o such that the following holds for T ≥ T o : for some 1 , 2 > 0 not depending on T (but on the other underlying parameters), the bound holds uniformly for all s, s such that εT ≤ s < s ≤ T and s − s > T ξ .
The estimate directly implies the desired summability of the second term in (4.28). This concludes the proof Theorem 3. The proof of the estimate is somewhat lengthy and done in the next section.

Uniform bounds for the correlations.
We use here I and J to denote the two times s, s from the statement of Theorem 9.

C T (I, J) is the expectation of the random variablê
We rewrite these conditional probabilities using the Markov property of BBM, considering independent BBM's starting at their respective position at time R T and shifting the time by R T . This requires some additional notation. Take and note that m(I) = m(I T ) + √ 2I T + o(1) as T ↑ ∞. We consider the collection for 0 ≤ s ≤ I T − r T (the "shifted" I-tube). Note that the localization depends on k (in fact on y k (R T )). We drop this dependence in the notationM loc for simplicity. By the Markov property, the first conditional probability inĉ T (I, J) can be written in terms of the shifted process just defined: where the product runs over all the particles k's at time R T whose path is localized in the intersection of the I− and J−tubes during the interval (r T , R T ). The restriction to localized positions at time R T is weaker and sufficient for our purpose: Here and henceforth, we will use Ω T to denote a negligible term, which is not necessarily the same at different occurences. In the above case it holds Ω T = O(ln ln T ) by definition of the tubes). We thus get that (5.4) is at most (analogously for J T ) and Proof. In the notation introduced above, one has ) . The first inequality holds by dropping the localization condition. Therefore, this can be made arbitrarily small (uniformly in k) by choosing T large enough. The same obviously holds for ℘(J T ; y k (R T )) and ℘(I T , J T ; y k (R T )). Choose T large enough so that sup max{℘(I T ; y k (R T )), ℘(J T ; y k (R T )), ℘(I T , J T ; y k (R T ))} ≤ 1/6. (5.14) Coming back to (5.12), using that This is an upper bound for the first conditional probability in the definition ofĉ T (I, J). A similar reasoning, using this time the first inequality in (5.15), yields a lower bound for the second term inĉ T (I, J), i,e, the product of the conditional probabilities. The upshot is:ĉ almost surely and for large enough T . We now use that e a − 1 ≤ a · e a (which holds for a > 0) for the term in the brackets to get that (5.17) is at most By construction, Z(I T , J T ; R T ) ≤ min Z(I T ; R T ); Z(I T ; R T ) , implying that and therefore (5.18) is at most This is not far from the claim of Proposition 10. It remains to get rid of the exponential on the r.h.s. above. Using the bound (5.26), together with the definition of the Z and rearranging, we arrive at In view of (5.13), we may find T large enough such that the following holds uniformly for all k ∈ : in which case all terms appearing in (5.21) become negative, and this implies that concluding the proof of Proposition 10.
5.1. Proof of Theorem 9. We first observe that the expectation of R T appearing in Proposition 10 gives the right bound in Theorem 9. Indeed, using that (a + b + c) 2 ≤ 4a 2 + 4b 2 + 4c 2 , we get the upper bound
Recall that Z(I T , J T ; R T ) = ℘(I T , J T ; y k (R T )), (5.29) and By definition, (5.30) is the probability to find a particle of the BBM which has two extremal descendants, particle (1) say, whose position is above m(I T ) + D + y k (R T ) at time I T , and particle (2), which lies above m(J T ) + D + y k (R T ) at time J T . These two particles also satisfy localization conditions on their paths. In other words, this is the probability that the same ancestor k, with (relative) position y k (R T ), produces children (1) and (2) which are extremal at time I and J. As these generations are well separated in time, that is J − I > T ξ (and thus also J T − I T > T ξ ), we may expect this probability to be very small. In order to see that this is indeed the case, split the probabilities according to whether the most recent common ancestor of particles (1) and (2) has branched before time I T −r T (with r T as in (4.9)), or after. We write this as ℘(I T , J T ; y k (R T )) = ℘(I T , J T ; y k (R T ); split before I T − r T ) + ℘(I T , J T ; y k (R T ); split after I T − r T ).
( Figure 3 illustrates the first case). The second probability is in fact zero. Indeed, the condition (5.2) implies that the ancestor of (2) at time I − r T lies at heights which are at most the level of the entropic envelope associated with J. Since J − I > T ξ and this is easily seen to be way lower than the lower envelope of particle (1) associated with time I. In other words, the localization tubes of particles 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 Figure 3. Time of branching before I Proposition 11. For some (3) , (4) > 0 and T large enough the following bounds hold uniformly for all k ∈ and I T , J T as considered, almost surely.
The proof of this proposition is technical, and postponed to section 5.2. We show how this provides the last piece for the proof of Theorem 9. This is straightforward: by similar computations as in (5.27), holds uniformly for k ∈ . In order to prove this, we use a formula by Sawyer [29] concerning the expected number of pairs of particles ancestor branched in the interval (0, I T − r T ) and whose paths satisfy certain localization conditions, say T (1) and T (2) respectively. The expected number of such pairs is given by Here the probability P is the law of a Brownian motion x, and K = j p j j(j − 1) (with {p j } the offspring distribution). The time s is the branching time of the common ancestor, and µ s is the Gaussian measure with variance s. T (·) (a,b) denotes the condition on the path during the time interval (a, b).
A proof of this formula is given in [29, p. 664 and 686]. Sawyer counts the pairs of particles for the same time, whereas our case concerns particles for two different times: particle (1) at time I T , and particle (2) at time J T . The generalization of Sawyer's formula is however straightforward. The reader is referred to the intuitive construction of the formula provided by Bramson [10, p. 564].
Dropping the condition T (2) in the first probability of (5.38) yields a simpler bound: with T (1) and T (2) being the shifted tubes defined in (5.2) and (5.3). The idea is now to bound the second probability appearing in (5.40) uniformly in y. This procedure has been introduced in Bramson [10, Lemma 11], and proved useful also in [2, Theorem 2.1].
Lemma 12. It holds: where For the proof of Lemma 12 some facts concerning the Brownian bridge are needed. Denoting a standard Brownian motion by x, the Brownian bridge of length t starting and ending at zero, is the Gaussian process The Brownian bridge is a Markov process, and it has the property that z t (s), 0 ≤ s ≤ t is independent of x(t). This construction generalizes to the case where the endpoints of the bridge are a, b = 0; we denote by z (s) such a process. The following is also well known: with equality holding in distribution.
We now recall [2, Lemma 3.4] which deals with probabilities that a Brownian bridge stays below linear functions; the proof is elementary and will not be given here.
Lemma 13. Let z 1 , z 2 ≥ 0 and r 1 , r 2 ≥ 0. Then for t > r 1 + r 2 , where z(r 1 ) ≡ 1 − r 1 t z 1 + r 1 t z 2 and z(r 2 ) ≡ r 2 t z 1 + 1 − r 2 t z 2 . Proof of Lemma 12. We begin by first writing explicitly the underlying conditions on the paths. For f : R + → R, t → f (t) a generic function, we denote by f S (·) ≡ f (S + ·) its time-shift by S > 0. We also shorten y(s) ≡ √ 2s − x(s), where x(s) = y as in (5.40), and J T,s ≡ J T − s. We also set Ω T ≡ O(ln ln T ). By elementary manipulations one easily sees that where (E) is the event   where F 1 , F 2 are the entropic (resp. lower) envelopes of (5.2) shifted by s: with Ω T = O(ln ln T ). By the very same localizations, we also have a condition on x(s).
This reads For later use, we reformulate (5.48) into a condition on y k (R T ) + y(s), namely: We now construct an event (E ) (E). First, we drop the condition that the Brownian path is required to stay above F 2 . Second, we replace the condition on F 1 by the condition that the x-path remains, on the interval (0, J T,s − r T ), below the line segment interpolating between (0, F 1 (0)) and (J T,s , F 1 (J T,s )), see Figure 4 for a graphical representation. Precisely, we consider Let us put We now make some observations concerning the Gaussian density and the conditional probability appearing in (5.53).
For the Gaussian density, we recall that J T,s = J T − s for 0 ≤ s ≤ I T − r T ≤ I T . Moreover, since J T − I T > T ξ and J T ≥ εT , we see that  T )). In particular, the probability of the event is bounded by the probability that a Brownian motion stays below the linear interpolation of the points (0, F 1 (0)) and (I T − r T , F 1 (I T − r T )) during the interval of time (0, I T − r T ) intersected with the event x(I t − r T ) ≥ F 2 (I T − r T ), that is: Subtracting t I T −r T x(I T −r T ) and using the fact that x(I t −r T ) ≥ F 2 (I T −r T ), the above can be bounded above by P x(I t − r T ) ≥ F 2 (I T − r T ) times the Brownian bridge probability: (5.70) Now F 1 (I T − r T ) − F 2 (I T − r T ) ≤ κR β T , for some κ > 0. Therefore the probability of the Brownian bridge can be bounded using Lemma 13 by 2κ I T − r t R β T F 1 (0) = 2κ I T − r t R β T (y k (R T ) + D − R α T + Ω T ) .  hence, up to irrelevant numerical constant, the contribution of the first integral is at most for some (5) > 0 small enough. The contribution of the second integral is sub-exponentially small (in T ). To see this, recall that J T − I T > T ξ and s ∈ [I T /2, I T − r T ], thus for some κ 1 < 0 < κ 2 , implying that the second integral is, for some κ > 0, at most for some (6) , (7) > 0. This is obviously much smaller than the first contribution (5.76). Therefore, summing thus up, ℘(I T , J T ; y k (R T ); split before I T − r T ) This concludes the proof of Proposition 11 by putting (3) ≡ (6) and (4) ≡ (5) .