On the scaling limits of Galton Watson processes in varying environment

We establish a general sufficient condition for a sequence of Galton Watson branching processes in varying environment to converge weakly. This condition extends previous results by allowing offspring distributions to have infinite variance, which leads to a new and subtle phenomena when the process goes through a bottleneck and also in terms of time scales. Our assumptions are stated in terms of pointwise convergence of a triplet of two real-valued functions and a measure. The limiting process is characterized by a backwards ordinary differential equation satisfied by its Laplace exponent, which generalizes the branching equation satisfied by continuous state branching processes. Several examples are discussed, namely branching processes in random environment, Feller diffusion in varying environment and branching processes with catastrophes.


INTRODUCTION
Since the pioneering work of Lamperti [27], it is known that continuous-state branching processes (CSBP) are the only possible scaling limits of Galton-Watson (GW) branching processes and that every CSBP can be realized in this way (see the following section for definitions). Another characterization of CSBP's was stated in Lamperti [26] who claimed that they are in one-to-one correspondence with spectrally positive Lévy processes killed upon reaching 0 via a random time-change called the Lamperti transformation, see Caballero et al. [11] for a discussion of various proofs of this fundamental result. Grimvall [19] established general necessary and sufficient conditions for a sequence of renormalized GW processes to converge. These conditions involve the asymptotic behavior of triangular arrays for which explicit necessary and sufficient conditions have been known for a long time, see for instance Gnedenko and Kolmogorov [16]. Finally, Ethier and Kurtz [14,Chapter 9] gave another proof of these results via timechange arguments. Hence to a large extent, the asymptotic behavior and the structure of the limiting processes of GW processes are well understood.
The present paper aims at extending this understanding to the case of GW processes in varying (and random) environment. Recently, there has been a considerable interest for GW processes in random environment, in particular about problems related to the survival behavior in the critical and subcritical regime (see, e.g., [1,7,15,20]) or large deviations (see, e.g., [5,9]). Similarly, branching diffusions in varying environment have attracted attention, in part for biological motivations, see among others [6,8,17] and references therein. Nonetheless, little seems to be known about the asymptotic behavior of GW processes in random or varying environment, except for the important special case of finite variance, see [8,10,24,25].
The main result of the present paper establishes a sufficient condition for a sequence of GW processes in varying environment to converge in the sense of finite-dimensional distributions. The assumptions are stated in terms of the convergence of a characteristic triplet and exhibit an interesting class of processes in continuous time: CSBP in varying (and then random) environment which could be characterized by a triplet of measures. The drift term, equivalently described by a real-valued function, is here assumed to have local variations to get general convergence with tractable assumptions. Our result is thus applied to various examples, in particular to GW processes, Feller branching processes in varying environment and branching processes in random environment. Also, our approach, which relies on the convergence of Laplace transforms, provides qualitative properties on the limiting processes.
Scaling limits of GW processes. GW processes are classical Markov chains for population dynamics where individuals reproduce independently of each other and with the same reproduction law, see for instance Athreya and Ney [2]. Thus if Z i denotes the size of the population at time i ≥ 0, the process (Z i , i ≥ 0) obeys the following recursion: Z i+1 = N i,1 + · · · + N i,Z i where (N i,k , i , k ≥ 0) are i.i.d. with common distribution the reproduction law. Another equivalent characterization of GW processes is through the branching property. Namely, GW processes are the only discrete-time, N-valued Markov chains (Z j , j ≥ 0), with Z j the law of the Markov chain started at Z j 0 = j , such that Z j +k for j , k ≥ 0 is equal in distribution to Z j + Z k with Z k a copy of Z k independent of Z j .
Scaling limits of GW processes have been studied since Lamperti [27]. In general there may be centering terms, but we will only be interested here in scaling limits obtained by starting a process from a large initial state, speeding up time and scaling in space. Typically, we consider a sequence (Z (n) , n ≥ 1) of GW processes, where Z (n) has reproduction law µ n and starts with Z (n) 0 = n individuals, a sequence (ϑ n , n ≥ 1) of positive real numbers going to infinity, which will be called the speed of the GW process, and we consider the sequence (X n , n ≥ 1) of rescaled processes defined by X n (t ) = 1 n Z (n) ⌊ϑ n t ⌋ , t ≥ 0.
The asymptotic behavior of (X n ) yields relevant approximations for phenomena such as evolution of species, with both large initial populations and long time scales. The most interesting case is when the sequence (Z (n) , n ≥ 1) is near-critical, meaning that the mean of µ n is closer and closer to one. Indeed, in the strictly super-and subcritical cases, the processes evolve rapidly (one does not need to speed up time) and, for our purposes, essentially deterministically. Grimvall [19] has proved that the finitedimensional convergence of (X n ) is equivalent to the convergence in distribution of the sequence where for each n ≥ 1, (N i,n , i ≥ 1) are i.i.d. with common distribution µ n . Necessary and sufficient conditions for this convergence to hold are well-known, see the book by Gnedenko and Kolmogorov [16] on the convergence of infinitesimal triangular arrays. The infinitesimal assumption for triangular arrays corresponds precisely to the near-critical assumption for branching processes.
CSBP's are the continuous counterparts of GW processes, they are defined via a generalization of the branching property. Indeed, CSBP's are the only continuous-time, [0, ∞]-valued Markov processes (X x , x ≥ 0), with X x the law of the Markov process started at X x (0) = x, such that X x+y for x, y ≥ 0 is equal in distribution to X x + X y with X y a copy of X y independent of X x . As mentioned earlier, Lamperti [27] proved that if the above sequence (X n ) converges then the limit must be a CSBP, and that any CSBP can be approximated in this way.
Silverstein [28] gives a useful characterization of CSBP in terms of their Laplace transform. The branching property ensures that if X is a CSBP, then it satisfies E exp(−λX (t )) | X (0) = x = exp(−xu(t , λ)) for some function u(t , λ) called the Laplace exponent. Silverstein [28] has proved that for each λ > 0, the function u( · , λ) is characterized by the following differential equation: for some α ∈ R, β ≥ 0 and ν a measure on (0, ∞) such that ∞ 0 x 2 /(1 + x 2 )ν(d x) is finite. The function ψ is called the branching mechanism of the CSBP, and we see in particular that a CSBP is characterized by a triplet (α, β, ν). This fact can also be seen from the Lamperti transformation which makes a one-to-one correspondence between CSBP's and spectrally positive Lévy process killed upon reaching 0, see Lamperti [26] and Caballero et al. [11].
Renormalization of GW processes in varying environment. In this paper, we want to extend some of the above results to the case of GW processes in varying environment. In the case of random environment this corresponds to adopting a quenched approach. In terms of evolution or population dynamics, it can be motivated both by slowly fluctuating conditions for the population and major catastrophes. The last one will correspond to non-critical environments. So now, for each n ≥ 1 the process Z (n) is a GW process in varying environment (µ i,n , i ≥ 1) with µ i,n the reproduction law in the i th generation. Hence for i ≥ 0 we have the recursion where the random variables (N (n) i,k , i , k ≥ 0) are independent and N (n) i,k has distribution µ i,n . By analogy with the renormalization of GW processes, a natural way to renormalize the sequence (Z (n) , n ≥ 1) is by considering Z (n) 0 = n, an onto and increasing function γ n : [0, ∞) → N and by defining the process X n X n (t ) = 1 n Z (n) γ n (t ) , t ≥ 0. In the GW case we had γ n (t ) = ⌊ϑ n t ⌋ for some sequence (ϑ n ), but in general we need to consider more general functions γ n . Indeed, in varying environment consider the case where for each n ≥ 1 the environments are first all equal to some reproduction law µ 1 n and then take the constant value µ 2 n . If (µ i n , n ≥ 1) for each i = 1, 2 corresponds to a sequence of GW processes with speed (ϑ i n ), then it is natural to take the function γ n equal to the integer part of a piecewise linear function, which first takes slope ϑ 1 n and then ϑ 2 n . Then (X n ) would converge to a process X which can informally be described as a "piecewise CSBP", i.e., X would be a CSBP with some branching mechanism ψ 1 for some time after which it would behave like a CSBP with some other branching mechanism ψ 2 .
Related results. The asymptotic behavior of GW processes in varying environment has been thoroughly studied in the finite variance case, see for instance Keiding [24], Kurtz [25] or Borovkov [10]. One of the simplification in the finite variance case is that the speeds of finite variance GW processes are all the same, and equal to n (to be more precise, the time and space scales need to be the same, and equal to 1/(1 − ρ n ) with ρ n the mean of the offspring distribution). In particular, there is a natural way to speed up those GW processes, namely by considering the natural choice γ n (t ) = ⌊nt ⌋, which turns out to be a good candidate.
In the finite variance case, X n converges to a branching diffusion (also called Feller diffusion) in varying environment, which may have positive or negative jumps at a fixed time. Getting a general extinction criterion in this case is a challenging problem. In contrast with the case of constant environment, the average behavior of the process, given by the drift part, does not lead to the good criterion because of possible important variations. We refer to Section 3.2 for the explicit criterion in the Feller case.
Moreover, in random environment, we can observe different speeds of extinction in the subcritical case. This phenomenon is well-known in the discrete case, see, e.g., [15,20]. We refer to [8] for first results in the continuous framework, more precisely for branching Feller diffusion in random environment.
Besides time-inhomogeneous branching diffusions, more general time-inhomogeneous branching processes appear in the related literature on superprocesses. Dynkin [12] has built superprocesses whose mass X , which satisfies the branching property, obeys through their Laplace exponent u(s, t , λ) = − log E(e −λX (t ) | X (s) = 1) to the equation where K is some σ-finite measure and ψ(t , λ) is a time-varying branching mechanism, i.e., for each t ≥ 0 the function ψ(t , · ) is a branching mechanism with characteristics (α t , β t , ν t ). These processes do not allow for explosion, but this was allowed by El-Karoui and Roelly [13] via martingale method when K (d t ) = d t . We can say that these processes are characterized by a triplet of measures (α t K (d t ), β t K (d t ), ν t K (d t )) which are in some sense all absolutely continuous with respect to one another, since they are all absolutely continuous with respect to K . The processes that we consider are slightly more general, since we will indeed characterize our limiting objects by a triplet of measures, but that need not be absolutely continuous with respect to one another.
In the time-homogeneous setting, K is Lebesgue measure. The absolutely continuous component part of K represents the infinitesimal evolutions while its singular part represents times of catastrophes, corresponding to non-critical environment: the mass makes a sudden jump. Jumps at a fixed time may occur when the measure K has an atom. Note that Dynkin [12] builds superprocesses starting from continuous-time, discrete state-space branching systems. Starting from continuous-time processes allows to get rid of many technical difficulties that we have to deal with. But our focus is different, since we want to understand the asymptotic behavior of GW processes in varying environment.
Organization of the paper. In Section 2 we set up the framework and notation and state the main results. Theorem 2.1 gives a sufficient condition for convergence in the sense of finite-dimensional distributions, while Corollaries 2.4 and 2.5 give criteria for almost sure absorption or explosion. Before proving these results, we examine in Section 3 their relevance. We first compare the criterion obtained to known optimal conditions obtained by Grimvall [19] in the time-homogeneous case. We then specify the convergence of GW in varying environment with bounded variance. We finally look at the case of branching processes in random environment. In Section 4.1 we give an outline of the proof of Theorem 2.1 and an intuitive explanation of the dynamics satisfied by our limiting processes. The rest of Section 4 is then devoted to the proof of Theorem 2.1, while Corollaries 2.4 and 2.5 are proved in Section 5.

NOTATION AND MAIN RESULTS
2.1. General notation. For each n ≥ 1, we consider a Galton-Watson process in varying environment (Z (n) i , i ≥ 0). We fix the space scale equal to n while the time scale is allowed to vary over time. For n ≥ 1, we consider a non-decreasing, càdlàg and onto function γ n : [0, ∞) → N (here and elsewhere, N = {0, 1, . . .} denotes the set of non-negative integers). We then define the renormalized process (X n (t ), t ≥ 0) via the following formula: Since Z (n) is a branching process, for each λ ≥ 0 and z, i , j ≥ 0 with i ≤ j , one can write for some function v n . Then one can check that for any λ, x, s, t ≥ 0 with s ≤ t , where u n (s, t , λ) := nv n (γ n (s), γ n (t ), λ/n). The Markov property implies the following composition rule: for any 0 ≤ t 1 ≤ t 2 ≤ t 3 and λ ≥ 0, (1) u n (t 1 , t 3 , λ) = u n (t 1 , t 2 , u n (t 2 , t 3 , λ)).
For i ≥ 0 and n ≥ 1 we note t n i = inf{t ≥ 0 : γ n (t ) = i }, so that γ n (t n i ) = i by rightcontinuity of γ n , µ i,n the offspring distribution of generation i in Z (n) , ν i,n the measure on R with support included in [−1/n, ∞) defined by and α i,n , β i,n the two following (finite) real numbers: which can be rewritten in terms of the (µ i,n ) as follows: From now on let B denote the Borel subsets of R. For n ≥ 1, let α n and β n be the measures on R with support included in (0, ∞) defined by and let ν n be the measure on R 2 with support included in [−1/n, ∞) × (0, ∞) defined by Then the integral ν n ( f ) of a positive function f : .
From now on we identify any signed measure α with its corresponding càdlàg function of locally finite variation, see for instance Chapter 3 in Kallenberg [23], so we note indifferently α((s, t ]), α(s, t ] or α(t ) − α(s) for 0 ≤ s ≤ t . In particular, since α n {0} = β n {0} = 0 and t n i ⇔ i ≤ γ n (t ) one can rewrite where from now on we adopt the convention b a = 0 if b < a. We write |α| for the total variation of α, and in particular it holds that f dα ≤ | f |d|α| for any measurable function f . Note that one has |α n |(A) = i≥1 ½ {t n i ∈A} |α i−1,n |.

Main result.
The main result of the paper, Theorem 2.1 below, relates the asymptotic behavior of the sequence (X n ) in the sense of finite-dimensional distributions to the asymptotic behavior of the triplet (α n , β n , ν n ). Typically, we aim at controlling Laplace transforms of the kind E exp (−λ 1 X n (t 1 ) − · · · − λ k X n (t k )) X n (0) = 1 with λ i , t i ≥ 0. Because of the Markov and branching properties, this boils down to study the convergence of E(e −λX n (t ) | X n (s) = x) with s ≤ t (see the proof of Corollary 2.3). This latter is equivalent to the convergence of the Laplace exponent u n .
So we need to study the process X n between time s and t . In general, we may run into complications if in this time-interval there is a bottleneck which sends the process to 0. Indeed, remember that we are considering GW processes in varying environment, and so even if most offspring distributions are well-behaved (near-critical) nothing prevents a catastrophic environment to occur from time to time. This is in sharp contrast with standard GW processes, where all offspring distributions are near-critical.
Such a bottleneck can potentially create a problem of indetermination. Because CSBP's may not be conservative, i.e., they may explode in finite time. Then an indetermination of the kind ∞ × 0 can arise if our time-inhomogeneous process first explodes and then goes through a bottleneck. This indetermination is especially difficult to interpret since the pre-limit GW processes cannot explode in finite time. In Theorem 2.1, we first focus on the case where between time s and t the process does not go through any such bottleneck. The remaining cases are analyzed in Corollaries 2.4 and 2.5.
To formalize the above idea, we introduce for t ≥ 0 the following time ℘(t ): with the convention sup ∅ = 0. Intuitively, ℘(t ) is the time of the last bottleneck before time t . Hence by definition, for s ∈ (℘(t ), t ] there is no bottleneck between time s and t . It prevents the process from being absorbed a.s. and enables us to study the asymptotic behavior of u n (s, t , λ).
The assumption (A1) on the finiteness and convergence of |α n | is used several times in the proof, in particular to make the solution of the backward differential equation converge via Lipschitz properties. Another approach [8,25] allows to deal with infinite variations for α, but as far as we know, it is restricted to the finite variance framework and drift functions with infinite variation. Theorem 2.1 can be extended to get the convergence of the finite-dimensional distributions.

2.3.
Behavior on [0, ℘(t )]. Theorem 2.1 describes the asymptotic behavior of X n on [℘(t ), t ]. As discussed before this theorem, if s < ℘(t ) then between time s and t the process goes through at least one bottleneck that potentially sends it to 0, which may cause an indetermination of the kind ∞ × 0. To avoid this problem, we treat two special cases of interest.
Non-absorbing case: there is no bottleneck, so that ℘(t ) = 0 and Theorem 2.1 provides a picture on [0, t ]. Non-explosive case: the process cannot explode, so that it is absorbed at 0 if it goes through a bottleneck and u(s, t , λ) = 0 if s ≤ ℘(t ).
Corollary 2.4 provides a sufficient condition to be in the non-absorbing case, intuitively it should be enough that the average of each offspring distribution is bounded away from 0. On the other hand, Corollary 2.5 provides two sufficient conditions to be in the non-explosive case, one comes easily in terms of tightness of a suitable family of random variables and one, more demanding but more explicit, in terms of boundedness of first moments.
We emphasize that the following result holds with significantly weaker conditions than the conditions (A1) and (A2) needed for Theorem 2.1.

Corollary 2.4 (Non-absorbing case). Let t > 0. If
then ℘(t ) = 0. Moreover, for (4) to hold it is enough that the two following conditions hold: Roughly speaking, the second assumption ensures that µ i,n is not too close to δ 0 . It avoids the almost sure absorption in one generation. Note that the condition on the sequence (n −2 γ n (t ) i=0 µ i,n {0}) is satisfied as soon as (n −2 γ n (t )) is bounded. This is always the case in the constant environment case, where the fastest speed γ n (t ) = ⌊nt ⌋ is given by the finite variance case. This property seems to hold more generally, and it holds in the examples we study in Section 3.
We turn now to the problem of explosion. We know from the GW case that explosion may occur at a random time and we refer to Grey [18] for necessary and sufficient conditions. We specify a sufficient condition that guarantees that explosion almost surely does not occur; it is related to a first moment condition, which is also common in the GW case. Then Theorem 2.1 can be extended to the time interval [0, t ].
Corollary 2.5 (Non-explosive case). Fix λ, t > 0 and assume that the assumptions (A1) and (A2) of Theorem 2.1 hold. If for all s ≤ t , then liminf n→∞ u n (s, t , λ) = 0 for every s < ℘(t ). Moreover for (5) to hold, it is enough that 2.4. Assumptions (A1) and (A2), triangular arrays and processes with independent increments. The assumptions (A1) and (A2) are reminiscent of conditions for the convergence of non-infinitesimal triangular arrays, see for instance Theorem VII.4.4 in Jacod and Shiryaev [22]. Relationships between the convergence of GW processes, triangular arrays and Lévy processes are well-known. Grimvall [19] established general necessary and sufficient conditions for the convergence of GW processes in terms of some triangular arrays of rowwise i.i.d. random variables, see the introduction. Moreover, Jacod and Shiryaev [22] investigated the relationship between convergence of triangular arrays and the convergence of processes with independent increments. To a large extent, the two are are equivalent. Thus combining these results in the time-homogeneous case, we see that the convergence of a sequence of rescaled GW processes is equivalent to the convergence of corresponding Lévy processes. But this result can actually be directly obtained via time-change arguments, see, e.g., Helland [21] or Ethier and Kurtz [14,Chapter 9].
Our conditions (A1) and (A2) suggest that triangular arrays could play a role for the convergence of GW processes in varying environment; in view of Jacod and Shiryaev [22] this suggests in turn that processes with independent increments could also be interesting objects to consider. If this intuition turns out to be true, the time-homogeneous case suggests that the most efficient way to link GW processes in varying environment to processes with independent increments would be via time-change arguments. Nonetheless, it does not seem straightforward to extend the Lamperti transformation to the time-inhomogeneous case.

EXAMPLES AND APPLICATIONS
The goal of this section is to play around with the assumptions of Theorem 2.1. We apply this result to several motivating situations, namely GW processes (Section 3.1), GW processes in varying environment with bounded variance, leading to Feller diffusions in varying environment and with possible jumps (Section 3.2), and finally GW processes with random, i.i.d. environment (Section 3.3).
Some of our limits will be CSBP. To identify CSBP within the framework of Theorem 2.1 we will use the following lemma. For a ∈ R, b ≥ 0 and θ a measure on (0, ∞), we call Ξ the branching mechanism with characteristics (a, b, θ) the function satisfying We say that a CSBP has characteristic triplet (a, b, θ) if Ξ is its branching mechanism.
Proof. We prove that u t ,λ satisfies (11): we have This proves the result.
Intuitively, in the homogeneous case it is natural to consider γ n (t ) linear in t because the dynamics stays constant over time (this could be rigorously justified). So we write γ n (t ) = ⌊ϑ n t ⌋ for some real-valued sequence (ϑ n , n ≥ 1) going to infinity.
In this case, the assumption (A1) is equivalent to assuming that the functions α, β and ν([x, ∞) × (0, · ]) are linear in t and that as n goes to infinity. In particular, the assumption (A2) is automatically satisfied. We can summarize this as follows.

Corollary 3.2.
In the GW case, if the assumption (A1') holds then the sequence (X n ) converges in the sense of finite-dimensional distributions to a CSBP with characteristic triplet (α, β, ν).
The question is whether this is optimal, i.e., if X n converges in the sense of finitedimensional distributions, does (A1') necessarily hold? Grimvall [19,Theorem 3.4] has proved that (X n ) converges in the sense of finite-dimensional distributions if and only if some triangular array converges; combining this with Theorem 1 of § 25 in Gnedenko and Kolmogorov [16], we obtain the following result. [19] and Theorem 1 of § 25 in [16]). If (X n (1)) converges in distribution to a random variable X (1) with P(X (1) > 0) > 0, then there exist σ ≥ 0 and a measure ν ∞ on (0, ∞) such that

Feller diffusion in varying environment.
We prove here that GW processes in varying environment converge to Feller diffusion in varying environment with possible jumps at a fixed time, provided reproduction laws have bounded variance. This result is closely related to Kurtz [25]. Contrarily to [25], we assume here that α has finite variations, but we have weaker moment assumptions and no regularity required for β. This gives a generalization of convergence of GW to Feller diffusion to the case of varying environment which is recalled in the section dedicated to GW case. We note that the limit process may jump at fixed times. These jumps are multiplicative and may be negative. We refer to [6] for Feller diffusion with multiplicative jumps coming from biological motivations: the jumps correspond to cell division event where only a fraction of parasites is inherited by each daughter cell. The results given here allow to extend the large populations approximations [6] for the parasite population dynamic.
We denote by We give here conditions to ensure that (X n ) converges in the sense of finite-dimensional distributions on [℘(t ), t ] to a process with Laplace transform described by Theorem 2.1 with ν = 0.

Proposition 3.5.
Assume that there exist a càdlàg function α with locally bounded variations and a non-decreasingcàdlàg function β such that for every s ≥ 0, a > 0, as n → ∞, We assume also that for every t ≥ 0 such that α{t where u is the unique solution of the backward differential equation (3) associated to the triplet (α, β, 0). More explicitly, we have then Under the assumptions of the Proposition above, (X n ) converges in the sense of finite-dimensional distributions to a process denoted by X , which is a Feller diffusion in varying environment whose Laplace exponent is u. Letting λ go to zero and t → ∞ in the explicit expression of u obtained above yields directly the following asymptotic result.
Let us comment these results. The explicit expression of u given above can be guessed in several ways. It can be seen from the discrete expression u n and explicitly obtained by computing composition of linear fractional probability generating function, see the proof of Lemma 5.2. Also, considering the Laplace exponent of Feller diffusion whose coefficients are constant on successive time intervals and going through the limit gives another intuitive proof of this expression.
Moreover, we note from the proof below that (8) is equivalent to We have seen in Section 3.1 that this is the optimal condition to have convergence of GW processes towards Feller diffusion. It is satisfied soon as the second moments of µ i,n are uniformly bounded.
Proof of Proposition 3.5. First, we use Theorem 2.1 with γ n (t ) = ⌊nt ⌋ to prove that for any s ∈ [℘(t ), t ] we have u n (s, t , λ) → u(s, t , λ) with u is the unique solution of the backward differential equation (3) Hence the two sequences (α γ n (t ),n ) and (m γ n (t ),n ) must have the same limit, and since m γ n (t ),n → α{t } by assumption we obtain α γ n (t ),n → α{t }. Moreover, summing over i , using sup n≥ and so the assumptions yield α n (t ) → α(t ) and |α n |(t ) → |α|(t ). Similarly we use that Then, summing over i , we obtain similarly as before β n (t ) → β(t ) and for every t ≥ 0 and a ≥ 0. It remains to prove that β is continuous to obtain the assumptions (A1) and (A2) of Theorem 2.1 and complete the proof of the convergence of u n to the unique solution of the backward differential equation (3). To see that, we observe that Using that m i,n is bounded for i ≤ γ n (t ) and n ≥ 0 and that µ i,n [an + 1, ∞) goes to zero uniformly for i ≤ γ n (t ) by assumption, we deduce that β i,n goes uniformly to zero as n goes to infinity by letting a go to zero.
Let us specify now the value of the limit u(s, t , λ) of u n (s, t , λ) on [℘(t ), t ]. We use that α and β have locally finite variations, so the same hold for s →ᾱ(s, t ] and s → I (s) = and get the same jumps as (9). Finally, we note that and derive dG outside the jumps via to get (9) and conclude the proof.
3.3. Scaling of GW processes in random environment. The most popular model for GW processes in random environment is when the environments are i.i.d. It has been introduced in [29] and extended to stationary ergodic environments by Athreya and Karlin [3,4]. We want to study the case where we are mixing J sequences of GW processes with speeds (ϑ j n ). Our main goal is to gain insight into the correct speed γ n of the obtained time-varying GW process.
We recall that we work here in the case where drift functions have finite variation; a more general setting has been studied in the case when offspring distributions have finite variance, see, e.g., Kurtz [25]. In this case, the GW processes which are mixed all have the same speed. To the best of our knowledge, the following results when mixing GW processes with different speeds or mixing GW processes with the same speeds but with infinite variance are new. For safe of simplicity of the statements and the proofs, we restrict ourselves to a finite number of environments which occur in an i.i.d. manner, but our approach could be extended to more general cases.

Notation and assumptions.
In the rest of this section we fix some integer J ≥ 2. For each j = 1, . . . , J , we consider a sequence (Z (n,j ) , n ≥ 1) of GW processes with corresponding sequence of offspring distributions (µ j n , n ≥ 1) and speed (ϑ j n , n ≥ 1). We note α j 0,n , β j 0,n and ν j 0,n the numbers and measure defined similarly as α i,n , β i,n and ν i,n but with µ j n instead of µ i,n , and similarly with the functions and measures α j n , β j n and ν j n (with in addition ⌊ϑ j n t ⌋ instead of γ n (t )). We assume that there exist α j ∈ R, β j ≥ 0 and a measure ν j such that as n → +∞, for any t , x ≥ 0, In particular, the assumptions (A1) and (A2) are satisfied for the j th sequence of GW processes (Z (n,j ) , n ≥ 1) with speed (ϑ j n ), which converges to a CSBP with characteristics (α j , β j , ν j ).
We now consider the case where we are mixing these J GW processes in the simplest way. To do so, we assume that for each n ≥ 1, the offspring distributions (µ i,n , i ≥ 0) defining Z (n) are i.i.d. with µ i,n = µ j n with probability p j > 0. Let N j k,n be the number of times the j th environment has been chosen among the k first generations in the nth branching process, then we have conditionally on the environment where N j k,n = N j k,n /(kp j ). Note that the law of large numbers suggests that N j γ n (t ),n ≈ 1, an approximation to which we will come back shortly. We get similarly To satisfy the assumptions (A1) and (A2), we need the almost sure convergence of α n , β n and ν n and so we need to build the processes Z (n) on the same probability space. A way to do so is to consider π 0 = 0, π j = p 1 + · · · + p j for 1 ≤ j ≤ J , (U i , i ≥ 1) a sequence of i.i.d. random variables uniformly distributed on [0, 1] and to let Then N j k,n does not depend on n and the strong law of large numbers immediately implies that for every t > 0, N j γ n (t ),n = N j γ n (t ) → 1 almost surely, as n goes to infinity.

3.3.2.
Around γ n . When we observe X n in [0, t ] we see in average p j γ n (t ) times the j th environment. For the j th GW process to evolve significantly, we need to observe it over on the order of at least ϑ j n generations. Hence if γ n (t ) ≪ ϑ j n for each j , then we expect X n to have not evolved at all. On the other hand, if γ n (t ) ≫ ϑ j n for some j , then we expect the j th GW process to have already reached its terminal value. This latter case can be subtle, but this shows that when mixing environments, the speed that dominates is the speed of the "fastest" GW process, i.e., the GW process with speed ϑ * n = min j ϑ j n (we call it the fastest because this is the GW that needs to be sped up by the smallest speed). The following simple result captures this intuition.
Proof. The result is obvious for t = 0, so consider t > 0. We have which goes to 0 as n goes to infinity since the first term of the last upper bound is finite by assumption, α j n (t )/t → α j also by assumption and N j γ n (t ) → 1 by construction. This proves the result.
In particular, the assumption (A1) holds with all limits α, β and ν being degenerate (null) when γ n (t ) ≪ ϑ * n for each t ≥ 0. In the following subsection we will investigate more interesting cases, when the limit is not degenerate, but before that let us discuss a natural choice for γ n .
When the environment in the i th generation is equal to µ By definition, In other words, we have γ loc n (t ) ≈ Γ n t for large n, where

Two extreme cases.
We use this result to show convergence of X n in two extreme cases, when all the speeds are equal or when in contrast one speed dominates the others.
Proposition 3.8 (Mixing of GW with same speeds). Assume that ϑ j n = ϑ 1 n for all j = 1, . . . , J and let either γ n (t ) = ⌊ϑ 1 n t ⌋ or γ n = γ loc n . Then (X n ) converges in the sense of finitedimensional distributions to the CSBP with characteristics ( j p j α j , j p j β j , j p j ν j ).
Proof. Under the assumptions of the lemma, we have ϑ * n = ϑ 1 n and γ n (t ) ∼ ϑ 1 n t (resp. γ n (t ) ∼ ϑ 1 n t /p 1 ) as n goes to infinity, when γ n (t ) = ⌊ϑ 1 n t ⌋ (resp. γ n = γ loc n ). Thus in both cases we have sup n (γ n (t )/ϑ * n ) < ∞ and so Lemma 3.7 gives when γ n (t ) = ⌊ϑ 1 n t ⌋ and α n (t ) → t α 1 when γ n = γ loc n . Similar computations hold for |α n |, β n and ν n and we conclude as in the proof of the previous lemma.
Remark 3.10. The above results could be extended to a more general case where also the probabilities p j = p j n are allowed to depend on n. If they don't vanish, i.e., p j n → p j ∈ (0, 1) for each j = 1, . . . , J then the above results remain true. If p j n → 0 for some j , then what matters is not the speed ϑ j n but the ratio ϑ j n /p j n . Indeed, in [0, t ] we see in average p j n γ n (t ) times the j th environment, which similarly as before needs to be compared to ϑ j n .
3.4. Remarks on CSBP with catastrophes. Theorem 2.1 makes it possible to study GW processes where only few offspring distributions are not near-critical. The simplest example is given by taking all the µ i,n 's equal to a critical offspring distribution µ n , in such a way that the corresponding GW processes would converge to a CSBP. Then one can change µ γ n (t ),n and take its mean equal to 1 + α{t }. Then (X n ) would converge to a process X which is a CSBP on [0, t ) and on [t , ∞) and such that X (t ) = (1 + α{t })X (t −). Another way to create a discontinuity at a fixed time is to take µ γ n (t ),n = (1−1/n)δ 0 +(1/n)δ n with δ a the Dirac mass at a. Again, (X n ) would converge to a process X which is a CSBP on [0, t ) and on [t , ∞) and such that X (t ) = S(X (t −)) with (S(t ), t ≥ 0) a Poisson process. Theorem 2.1 allows accumulation of such fixed jumps; note that in both cases these jumps may be negative, whereas time-homogeneous CSBP only have positive jumps.
Building up on these two simple examples, we expect in general that if X is a Markov process, possibly time-inhomogeneous, satisfying the branching property and with a fixed discontinuity at time t ≥ 0, then there should exist a subordinator S t such that X (t ) = S t (X (t −)). Indeed, preliminary results suggest that the Markov property should imply the existence of such a process S t , while the branching property of X would force S t to be a subordinator.

PROOF OF THEOREM 2.1 AND COROLLARY 2.3
The following functions g and h will be used repeatedly in the sequel: Theorem 2.1 and Corollary 2.3 are proved in Sections 4.4 and 4.5. Before that, we give an overview of the proof in Section 4.1, where we also introduce additional notation. In Section 4.2 we establish preliminary results, used in Section 4.3 to get uniform controls on u n : Lemmas 4.3 and 4.5 prove that u n is bounded away from 0 and infinity and Lemma 4.4 gives a control on the variations of u n ( · , t , λ). These controls are used in Section 4.4 to prove Theorem 2.1 via Gronwall type arguments, and in Section 4.5 to prove Corollary 2.3.

Overview of the proof and additional notation.
Recalling the measures α, β, β and ν defined in the statement of Theorem 2.1 and the function h defined in (10), one sees that (3) can be rewritten in the following form:

h(x, u(y, t , λ))ν(d x d y).
This expression will turn out to be technically convenient because of the behavior of h(x, λ) as x → 0. Roughly speaking, h(x, λ) goes fast enough to zero as x → 0 to make get ride of the indetermination as n → ∞ and let the third term converge, see Lemmas 4.8 and A.2. We now derive a similar dynamics for u n . Define from now on ψ i,n the function (12) ψ i,n (λ) = −n log 1 − 1 The functions ψ i,n define the dynamics of u n via the following recursion. ψ i−1,n (u n (t n i , t , λ)).
Since nv n (γ n (t ), γ n (t ), λ/n) = λ, this gives u n (s, t , λ) = nv n (γ n (s), γ n (t ), λ/n) ψ i,n (nv n (i + 1, γ n (t ), λ/n)), which proves the result, plugging in the relation u n (s, t , λ) = nv n (γ n (s), γ n (t ), λ/n) and recalling γ n (t n i ) = i . Let us now explain how to go from (13) to (11). Because of the factor 1/n, under reasonable assumptions the term (1/n) 1 − e −λx ν i,n (d x) appearing in the definition (12) of ψ i,n should be small for large n. Then the approximation − log(1 − x) ≈ x for small x suggests that where the last equality follows from the definitions of α i,n , β i,n and h. In combination with (13), this last approximation suggests that which can be rewritten as follows, remembering the definitions of the measures α n , β n and ν n : (u n (y, t , λ)) 2 β n (d y) h(x, u n (y, t , λ))ν n (d x d y).
This last approximation, combined with the convergence of the triplet (α n , β n , ν n ) to (α, β, ν) in the sense of the assumption (A1), suggests that any limit u(s, t , λ) of the sequence (u n (s, t , λ)) should satisfy the dynamics (11). Let us conclude this section by commenting on the branching mechanism in the time-inhomogeneous case and by explaining how to get the finite-dimensional convergence of Corollary 2.3; this will be the opportunity to introduce additional notation which will be used in the following sections.
In the time-homogeneous case, u is characterized by the branching mechanism ψ via the equation see the introduction for more details. In the time-inhomogeneous case, the new dynamics (3) suggests that the branching mechanism becomes in some sense a measurevalued mapping. Indeed, defining for every measurable, positive function f : we see that (3) can be rewritten as In analogy with Ψ, we also define for each n ≥ 1 the measure Ψ n ( f ) as follows: so that (13) becomes equivalent to (17) u n (s, t , λ) = λ + Ψ n (u n ( · , t , λ))((s, t ]).
Hence to prove convergence of the finite-dimensional distributions, we need a stronger result than the convergence u n (s, t , λ) → u(s, t , λ) for fixed λ, which is the content of Theorem 2.1. Namely, we need to show that u n (s, t , ℓ n ) → u(s, t , λ) if (ℓ n ) is a sequence converging to λ. This explains why in Section 4.4 we will derive such convergence results, which are stronger than what needed for Theorem 2.1. However, because u n (s, t , λ) is increasing in λ these stronger results will come almost for free from the results for fixed λ derived in Sections 4.2 and 4.3.
Then for any C ≥ 0, we have c ǫ n,t (C ) → 0 as n goes to infinity.
Proof. Fix t and C ≥ 0 and note Then (20) entails Since µ n (t ) = |α n |(t )+β n (t ) the sequence (µ n (t )) is bounded by assumption, showing that I t ,C is finite. It follows from the definition of ψ i,n and ǫ i,n that for any i ≥ 0  Proof. In the rest of the proof fix t and λ ≥ 0, note B t = 2sup n≥1 µ n (t ), which is finite by assumption, and C t ,λ = (λ + 2)(1 + B t )e B t . Following Lemma 4.2 choose n t ,λ ≥ 1 such that c ǫ n,t (C t ,λ ) ≤ 1 for all n ≥ n t ,λ . Since Z (n) i is finite for each i ≥ 0 and n ≥ 1, it follows that sup 0≤s≤t u n (s, t , λ) is finite for each n ≥ 1 and so to prove the result, it is enough to prove that sup u n (s, t , λ) : n ≥ n t ,λ , 0 ≤ s ≤ t = sup u n (t n i , t n γ n (t ) , λ) : n ≥ n t ,λ , 0 ≤ i ≤ γ n (t ) is finite. In the rest of the proof fix n ≥ n t ,λ and note a i = u n (t n i , t n γ n (t ) , λ). We prove by backwards induction that a i ≤ C t ,λ for all 0 ≤ i ≤ γ n (t ), and since the bound does not depend on n or i this will show the result. We have a γ n (t ) = λ ≤ C t ,λ so the initialization is satisfied. Now consider some 1 ≤ i < γ n (t ) and assume that a k ≤ C t ,λ for all i ≤ k ≤ γ n (t ): we prove that a i−1 ≤ C t ,λ .
Fix some i < k ≤ γ n (t ). By definition, we have By induction hypothesis, it holds that a k ≤ C t ,λ . Combined with c ǫ n,t (C t ,λ ) ≤ 1 (since n ≥ n t ,λ ), this gives 0 ≤ 1 + ǫ k−1,n (a k ) ≤ 2. Together with the inequality g (x, y) ≤ x 2 /(1 + x 2 ), which holds for all x, y ∈ R because Φ 1 ≥ 0 by convexity, see (19), we finally get . Hence for any i − 1 ≤ j ≤ γ n (t ), this gives together with Lemma 4.1 for the first equality This can be rewritten . Then by induction one gets . Since a ′ γ n (t ) = A = λ + 2 and d γ n (t ) ≤ d 1 + · · · + d γ n (t ) = 2µ n (t ) ≤ B t , this shows that a i−1 ≤ C t ,λ which achieves the proof of the induction and shows that c u t ,λ ≤ C t ,λ . This gives the finiteness of c u t ,λ . And since C t ,λ is clearly increasing in both t and λ, for any s ≤ t and λ ≥ 1 we obtain c u s,t ≤ C t ,1 which gives the second part of the lemma.
In the sequel for t , λ ≥ 0 we define ∆ u t ,λ by Note that ∆ u t ,λ is finite in view of Lemmas 4.2 and 4.3 when the two sequences (|α n |(t )) and (β n (t )) are bounded.
Proof. Let 0 ≤ s ≤ s ′ ≤ t and λ > 0: Lemma 4.1 and the definition of ǫ i,n give Recall the definitions (21) and (22) of c ǫ n,t (C ) and c u n,t . Since 0 ≤ t n i ≤ t for any 0 ≤ i ≤ γ n (t ), we have u n (t n i , t , λ) ≤ c u t ,λ and in particular |ǫ i−1,n (u n (t n i , t , λ))| ≤ c ǫ n,t (c u t ,λ ) for all γ n (s) < i ≤ γ n (s ′ ). Using in addition (20) with C = c u t ,λ , we obtain which proves the result.
For t , λ > 0, s ≤ t and N ≥ 1, define the constants (24) c u s,t ,λ (N ) = inf u n (y, t , λ) : s ≤ y ≤ t , n ≥ N and N s,t ,λ = inf N ≥ 1 : c u s,t ,λ (N ) > 0 . The following result shows that u n is uniformly bounded away from 0 for large enough n.
In particular, the function t → ℘(t ) is increasing and N s,t ,λ is finite for every s ∈ (℘(t ), t ].
In the sequel, for t , λ > 0 and ℘(t ) < s ≤ t we note for simplicity c u s,t ,λ = c u s,t ,λ (N s,t ,λ ) which satisfies c u s,t ,λ > 0.
Proof of Lemma 4.5. For t , λ > 0 define the two sets (iv) there exist sequences (n(k)) and (v k ) such that v k ∈ [s, t ] for each k ≥ 1 and for any ε > 0, lim k→+∞ n(k) = +∞ and lim The equivalence between (iii) and (iv) relies on the fact that both conditions are equivalent to the following one: the sequence of random variables (X n(k) (t ), k ≥ 1) under P( · | X n(k) (v k ) = 1) converges in distribution to 0. Let us also explain the last equivalence. The condition (iv) implies that liminf n→∞ inf v∈[s,t ] P X n (t ) > ε | X n (v) = 1 = 0 for every ε > 0, which is stronger than (v). Now, assuming that (v) holds, one can find sequences (n(k)), (ε k ) and (v k ) such that v k ∈ [s, t ] and lim k→+∞ n(k) = +∞, lim k→+∞ ε k = 0 and lim Then the sequences (n(k)) and (v k ) satisfy (iv) since for any ε > 0, We now prove that ℘( · ) is an increasing function. Let t ′ > t : we will show that Then s ≤ t ′ , and the composition rule (1) together with the monotonicity of u n in λ give Since this last quantity is equal to 0 this proves that s ∈ S (t ′ , λ) and gives the result. Finally, the fact that N s,t ,λ is finite when ℘(t ) < s ≤ t follows readily from the fact that The result is proved.

Proof of Theorem 2.1.
We now use the results of Sections 4.2 and 4.3 to prove Theorem 2.1. The main idea is to use Gronwall type argument. We will use Gronwall's lemma in the backwards form of Lemma 4.6, while Lemma 4.7 establishes a sort of Lipschitz property of Ψ needed to use Gronwall's lemma. In a similar vein as the following lemma, we refer to Lemma 3.2 in Dynkin [12] which states and proves a particular case of this result to construct superprocesses. R(x)π(x).
Proof. It follows the proof of Dynkin. By induction where ǫ n (s, t ) yields the rest of the Taylor expansion of the exponential function : This completes the proof.
Note that these constants are monotone in η and T . (0,∞)×A and in particular the measure µ is σ-finite. Finally, for any measurable, positive functions f 1 and f 2 and any A ∈ B, we have Proof. Let 0 < η < T and η ≤ y, y ′ ≤ T , and fix x ≥ 0: the constant c 2 (η, T ) is finite because with H (y) = h(x, y)(1 + x 2 )/x 2 . One can compute H ′ (y) = xe −y x + yΦ 1 (y x) and so This upper bound being independent of x, we get the finiteness of c 2 (η, T ), and hence of c 3 (η, T ). As for (27), we have (0,∞)×A using Fubini's theorem for (i) and (iv), the assumption (A1) for (ii) (using also that the set {x : ν({x} × A) > 0} has zero Lebesgue measure), Fatou's lemma for (iii) and finally the definition of ν n and β n for (v). Since µ = |α| + β this implies the σ-finiteness of µ. Consider finally f 1 and f 2 two measurable, positive functions, .
and plugging in the constant c 2 , we obtain which was to be proved.
Before finally turning to the proof of Theorem 2.1, we state an intermediate result whose long and tedious proof is postponed to the appendix. Lemma 4.8. Fix t , λ > 0 and consider any sequence (ℓ n ) with ℓ n → λ. For n ≥ 1, let R n be the function Then R n (s) → 0 for any ℘(t ) < s ≤ t and sup {R n (s) : 0 ≤ s ≤ t , n ≥ 1} is finite.  is the unique function satisfying the following properties: (2) u is càdlàg; From this lemma, one sees in particular that for any s ∈ [℘(t ), t ], the limit of the sequence (u n (s, t , ℓ n ), n ≥ 1) depends on (ℓ n ) only through its limit, i.e., if (ℓ ′ n , n ≥ 1) is another sequence with ℓ ′ n → λ then lim n→+∞ u n (s, t , ℓ n ) = lim n→+∞ u n (s, t , ℓ ′ n ).
Proof of Lemma 4.9. In the rest of the proof fix t , λ > 0 and (ℓ n ) a sequence converging to λ. Let ℓ = inf n≥1 ℓ n and L = sup n≥1 ℓ n and assume without loss of generality, since ℓ n → λ > 0, that ℓ > 0. To ease the notation, note in the rest of the proof ℘ = ℘(t ) and u n (s) = u n (s, t , ℓ n ) for 0 ≤ s ≤ t . We decompose the proof in four steps: first we prove that the sequence (u n (s), n ≥ 1) is Cauchy for any s ∈ (℘, t ], then that it is Cauchy for s = ℘, then that u satisfies the claimed properties and finally that it is the only such function. Before beginning, note that everything is trivial if ℘ = t , because then u n (s) = ℓ n and Ψ(u)((s, t ]) = 0 for any s ∈ [℘, t ]. Hence in the sequel we assume that ℘ < t .
First step: (u n (s)) is Cauchy for s ∈ (℘, t ]. In the rest of this step fix s ∈ (℘, t ] and for s ≤ y ≤ t define R n (y) = |Ψ n (u n )((y, t ]) − Ψ(u n )((y, t ])|. Then (17) gives for any s ≤ y ≤ t and any m, n ≥ 1 Since the function u n (s, t , λ) is increasing in λ, we have for any y ∈ [s, t ] and n ≥ N s,t ,ℓ (recall that N s,t ,ℓ is defined in (24) and is finite by Lemma 4.5) Similar monotonicity arguments lead to u n (y) ≤ c u t ,L for any y ≤ t and n ≥ 1, so that monotonicity properties of c 3 (η, T ) in η and T give for n, m ≥ N s,t ,λ We finally get the bound Lemma 4.8 combined with the dominated convergence theorem shows that the right hand side of the above inequality goes to 0 as n 0 → +∞ which proves that the sequence (u n (s), n ≥ 1) is Cauchy and completes the proof of this first step.
Fourth step: uniqueness. Now let us prove uniqueness: let u be a function with the same properties. Then Lemma 4.7 gives [s,t ] u, c u t ,L + sup [s,t ] u (s,t ] |u − u|dµ and we conclude that u = u using Lemma 4.6 (note that sup [s,t ] u is finite because u is càdlàg).
Lemma 5.1. Fix t , λ > 0 and assume that the two sequences (|α n |(t )) and (β n (t )) are bounded. Fix some 0 < a ≤ min(1/c u t ,λ , 1) and for n ≥ 1 and 0 ≤ i < γ n (t ) define the two following quantities: and β i,n := 1 + ǫ i,n (u n (t n i+1 , t , λ) Then there exists n 0 = n 0 (t , λ, a) such that for all n ≥ n 0 and all 0 ≤ i < γ n (t ), , t , λ)) 2 and also Proof. By Lemma 4.2 let n t ,a ≥ 1 be such that 1 + ǫ i,n (y) ∈ [0, 2] for all n ≥ n t ,λ , y ≤ 1/a and 0 ≤ i < γ n (t ). By definition we have ψ i,n (y) = 1 + ǫ i,n (y) (1 − e −y x ) ν i,n (d x) and since 1−e −x ≥ x − x 2 for all x ≥ −1 we obtain for every n ≥ n t ,a , y ≤ 1/a and 0 ≤ i < γ n (t ) This yields the first inequality of the lemma using the equality u n (t n i , t , λ) = u n (t n i+1 , t , λ) + ψ i,n (u n (t n i+1 , t , λ)) that stems from Lemma 4.1 and the fact that u n (t n i+1 , t , λ) ≤ c u t ,λ ≤ 1/a for all n ≥ 1 and i < γ n (t ). Moreover, This gives the second inequality of the lemma whereas the last one comes from 1+x 2 ≤ 2 if −1/n ≤ x ≤ a, which gives for n large enough and every i ≤ γ n (t ). Together with ǫ i,n (u n (t n i+1 , t , λ)) ∈ [0, 2] this completes the proof.
be two sequences of respectively positive and non-negative real numbers such that there exist ǫ, i c i and w i ≤ M for every 0 ≤ i ≤ I , then for every 0 ≤ i ≤ I , Proof. In the rest of the proof note ρ k = c k M 2 /ǫ and for x ≤ M let The left hand side corresponds to a Taylor expansion with rest r i (x). We can compose it recursively thanks to the stability of homographies. This kind of techniques is used in the study of branching processes in random environment with linear fractional offspring distribution. Since r i (x) ≤ ρ i we obtain from the previous equation for any

V. BANSAYE AND F. SIMATOS
In particular, since by assumption w i+1 ≤ M and w i ≥ w i+1 b i − w 2 i c i we obtain By backwards induction one immediately sees that w i ≥ v I ,i for all 0 ≤ i ≤ I , where v I ,I = w I and v I , The definition of v I ,i gives rise to the backwards recursion v I ,i = γ i v I ,i+1 + δ i with v i = 1/v i , from which one easily deduces that v I ,i = v I ,I Γ I −1 This concludes the proof.
Note that B ν n depends on d but, similarly as t or λ, we do not reflect this in the notation because d will be fixed once and for all shortly. Bounding the two last terms thanks to (33), we have Hence to prove B ν n → 0 we only have to show that B ν n → 0 for every d > 0. So in the rest of this step we fix an arbitrary d > 0 and show that B ν n → 0. Fix ε > 0 and consider the partition ((a j , b j ], 1 ≤ j ≤ J ) and ((a ′ k , b ′ k ], 1 ≤ k ≤ K ) of (s, t ] given by Lemma A.1, which does not depend on n. Then we can write B ν n ≤ J j =1 B ν,1 n,j + K k=1 ( B ν,2 n,k + B ν,3 n,k ) with |h(x, u n (y))|ν n (d x d y) + |h(x, u n (y))|ν(d x d y) and Further we write B ν,1 n,j ≤ B ν,4 n,j + B ν,5 n,j with h(x, u n (y)) − h(x, u n (b j )) ν n (d x d y) h(x, u n (y)) − h(x, u n (b j )) ν (d x d y) and h(x, u n (b j ))ν(d x d y) .
using µ(a j , b j ] ≤ ε to get the second inequality. The arguments to control B ν,3 n,k and B ν,5 n,j are very similar: we treat the case B ν,5 n,j in detail and mention necessary changes needed for B ν, 3 n,k . We need the constant c 6 (37) c 6 (T ) = sup which is finite because ∂h ∂x (x, y) = ye −x y + y x 2 + x y − 1 (1 + x 2 ) 2 and so for x, x ′ ≥ 0 and 0 ≤ y ≤ T , Let π n,j be the signed measures defined for A ∈ B by π n,j (A) = ν n (A × (a j , b j ]) − ν(A × (a j , b j ]).
For B ν,3 n,k one needs to consider the measure π n,k defined similarly but with A × {b ′ k } instead of A × (a j , b j ]. With this notation we have h(x, y)π n,j (d x) .