A compact containment result for nonlinear historical superprocess approximations for population models with trait-dependence

We consider an approximating sequence of interacting population models with branching, mutation and competition. Each individual is characterized by its trait and the traits of its ancestors. Birth- and death-events happen at exponential times. Traits are hereditarily transmitted unless mutation occurs. The present model is an extension of the model used in [M\'el\'eard and Tran, EJP, 2012], where for large populations with small individual biomasses and under additional assumptions, the diffusive limit is shown to converge to a nonlinear historical superprocess. The main goal of the present article is to verify a compact containment condition in the more general setup of Polish trait-spaces and general mutation kernels that allow for a dependence on the parent's trait. As a by-product, a result on the paths of individuals is obtained. An application to evolving genealogies on marked metric measure spaces is mentioned where genealogical distance, counted in terms of the number of births without mutation, can be regarded as a trait. Because of the use of exponential times in the modeling of birth- and death-events the analysis of the modulus of continuity of the trait-history of a particle plays a major role in obtaining appropriate bounds.


Introduction
The main goal of the present article is to verify a compact containment condition for "Nonlinear historical superprocess approximations for population models with past dependence" as stated in [9, Lemma 3.5(i)] in a more general setup. As a by-product, two errors from [9] are fixed and a result on the paths of individuals is obtained in a broader context. The obtained result extends the results of [9] as far as compact containment of the approximating processes goes.
Compact containment is one of the two properties to establish tightness of a sequence of laws on D Y ([0, T ]), where we denote by D Y ([0, T ]) the space of càdlàg functions from [0, T ] to Y embedded with the Skorohod topology, with Y a given Polish space (cf. Jakubowski's criterion for tightness as stated in [1,Theorem 3.6.4]). Compact containment means that for any T > 0 and for any > 0 fixed, one can find a compact set in Y such that the n th -approximating population at time t ∈ [0, T ] is located outside this set with probability at most, uniformly in time t ∈ [0, T ] and n ∈ N. The result is stated in Theorem 3.4 (here Y = M F (D E )). Consequently, compact containment results provide additionally some control on the paths of particles (cf. Lemma 3.9).
In [9], interacting population models are under consideration, where each individual is assigned a trait. The models involve branching, mutation and competition. Birthand death-events happen at exponential rates. The rates depend on the trait of the individual and on the history of its trait through its ancestry. We therefore identify each individual rather by the history of its past traits up to the present time, that is we consider historical particles. Competition is modeled by means of an additional term in the death-rate that takes into account the trait-history of the other individuals as well.
As a consequence, historical processes are particularly well-suited to record the evolution of the traits of individuals in a population over time. For each n ∈ N, an approximating population is given. It is then shown in [9] for large populations with small individual biomasses and under additional assumptions, that the diffusive limit for n → ∞ converges to a limiting nonlinear historical superprocess limit. Existence of the limiting process is established by proving the tightness of the sequence of laws of the approximating populations. One of the major strengths of historical processes is that they allow for a control of the traits of historical particles, present in the population at time t, uniformly in t ∈ [0, T ]. That is, we obtain a control on the history of the trait of the particle through its ancestry as well. For instance, Lemma 3.9 yields a control on the modulus of continuity of the paths of the particles in the population.
The present article extends the result on compact containment from [9] from R d to Polish trait-spaces and from translation-invariant Gaussian mutation densities to a class of mutation kernels that allow for a dependence on the parent's trait as well (cf. Hypothesis 2.1). Additionally, a lower bound on the interactive killing rate is dropped (cf. (2.10)).
Historical particles can be modeled as càdlàg paths on the trait space E. Here it is important to recall that relative compactness in D E involves both controlling the range as well as the modulus of continuity of the traits of the particle along its path. To show compact containment of the sequence of approximating particle systems therefore involves not only controlling jump-sizes in trait space at times of birth but also controlling the impact of an accumulation of jumps in a period of time.
As a result, the use of exponential rates in modeling birth-and death-events is a challenge compared to a setup with equidistant time-steps. Indeed, one of the main steps in proving compact containment is to obtain a bound on the expected fraction of historical particles at a fixed time T outside a compact set K ⊂ D E (cf. Proposition 3.5 to come). In the equidistant case, this bound can be readily obtained by induction over each time-step respectively birth/death-event, see [11,Lemma II.3.3(a)] and reduces to a bound on the evolution in trait-space of a single particle only. In the non-equidistant setup, the number of trait-changes (that is, birth with mutation) until time T now plays a major role in the derivation of an appropriate bound.
Coupling-techniques are an important tool in this article: For the populations in question, we construct couplings with "dominating" respectively "minorizing" populations in the sense that one population is a sub/super-population of the other one. This EJP 19 (2014), paper 97. Page 2/13 ejp.ejpecp.org is done by choosing birth-and death-rates appropriately with the aim to loose certain dependencies in the rates on the paths of the particles. Also, for two paths with a different number of trait-changes until time T , the moduli of continuity are compared by means of coupling-techniques.
In [8] the results of the present paper will be applied in a context of evolving genealogies where the metric is the mutational distance, with mutations happening at birth (see also the remark at the end of Section 3).
We next briefly introduce the model of [9] and its extensions in the present context. For a biological motivation and a discussion of literature the interested reader is referred back to [9]. We start with some basic notation taken in part from [9]. Notation 1.1 (cf. Notation of [9, Section 1]). For a given metric space E, we denote by C(E), C b (E) respectively B(E) the continuous, bounded continuous respectively bounded functions on E.
We further denote by D E = D(R + , E) the space of càdlàg functions from R + to E embedded with the Skorohod topology. For a function x ∈ D E and t > 0, we denote by x t the stopped function defined by x t (s) = x(s ∧ t) and by x t− the function defined by x t− (s) = lim r↑t x r (s). We will also often write x t = x(t) for the value of the function at time t. For y, w ∈ D E and t ∈ R + , we denote by (y|t|w) ∈ D E the following path: For a constant path w with w u = x, ∀u ∈ R + , we will write (y|t|x) with a slight abuse of notation.
Denote by M F (E) the set of finite measures on E embedded with the topology of weak convergence.

The historical particle system
We shortly introduce the population model from [9]. Where there is an extension made, we will remark on it. Note in particular that [9] prove existence and convergence to a nonlinear historical superprocess. As we only concern ourselves with proving a compact containment condition for the approximating populations, certain assumptions made in [9] become therefore unnecessary.
In the n th , n ∈ N approximation step, [9] consider a discrete population in continuous time where individuals reproduce asexually and die. Each individual is assigned a trait. The first extension is that the trait space E is assumed to be Polish The lineage or past history of an individual is defined as follows: To an individual of trait x born at time S m , having m − 1 ancestors born at times 0 = S 1 < S 2 < · · · < S m−1 , with S m−1 < S m , and of traits (x 1 , x 2 , . . . , x m−1 ), we associate the path This path is called the lineage of the individual. For n ∈ N, we consider an individual characterized by the lineage y ∈ D R d in a population X n ∈ D M F (D E ) : The population at time t is represented by a finite point measure where N n t = n X n t , 1 is the number of individuals alive at time t. Note in particular that individuals are attributed the weight 1/n in this scaling.
Initial conditions: To ensure existence, uniqueness and compact containment of the approximating particle systems, assume sup n∈N E[ X n 0 , 1 2 ] < ∞ and the sequence of laws of (X n 0 ) n∈N is tight on M F (D E ).
(2.4) The initial conditions coincide with what is used in [9] in the parts of proofs that are relevant to our article (cf. [9, Proposition 2.6 and Proposition 3.4]). The corresponding first part of the assumption can be found in [9, (2.14)]. An exponent of 3 instead of 2 only becomes necessary in the context of applying a Girsanov-argument along the lines of the proof of [5,Theorem 5.6]. Note in particular that this bound yields a uniform bound on the first and second moments of the overall mass over time, see Lemma 3.1 below, which is not only a crucial ingredient in the proof of existence and uniqueness of the approximating systems but also important in verifying the compact containment condition as we deal with finite measures and not probability measures (see (2.3)). The second part of the above assumption is included in [9, (3.5)].
Let us now recall the population dynamics.

Reproduction:
The birth rate at time t is b n (t, y) = nr(t, y) + b(t, y) When an individual with trait y t− gives birth at time t, the new offspring is either a mutant or a clone: • With probability 1 − p ∈ [0, 1], the new individual is a clone of its parent, with same trait y t− and same lineage y.
is a stochastic kernel, the so-called mutation kernel on E. To this mutant is associated the lineage (y|t|h).
For the sake of simplicity, the mutation density k n (h) is assumed to be a Gaussian density with mean 0 and covariance σ 2 Id/n.
Here we generalize from mutation densities to mutation kernels and allow for a dependence on the parent's trait as well. Note that [9] often speak of "jump sizes" to signify the change of trait at time t from y t− to y t = y t− + h. In the present context we continue to use this wording to signify the change of trait at time t from y t− to y t = h. Hypothesis 2.1 (Assumption on the mutation kernel). Let α n (x, dh), n ∈ N be a stochastic kernel on E. For y 0 ∈ E fixed, let Y n ∈ D E be a process that starts in y 0 and jumps according to the kernel α n (x, dh) at rate n. Denote by P n y0 its distribution starting from y 0 . We now assume that the sequence of laws of where X n 0 is the initial condition of the n th -approximating population.
In Lemma 3.2 below we give sufficient conditions on the kernel to satisfy this hypothesis. One of the conditions includes the Gaussian setup from [9].
The proof of existence and uniqueness of the approximating particle systems is a direct adaptation of [5, Sections 2,3 and 5].

Results
In this section we provide results and short proofs, as well as an outlook at the end. We start with a uniform bound on the first and second moments of the overall mass over time, resulting from the assumptions made above.
Proof. The proof is a direct adaptation of the proof of [5, Theorem 5.6].
We continue by providing two sufficient conditions on the mutation kernel to satisfy Hypothesis 2.1. The conditions are inspired by [10, Assumption 2.3].

Lemma 3.2.
Suppose assumptions (2.4) on the initial conditions (X n 0 ) n∈N hold. Either condition on the mutation kernel α n (x, dh) to follow is then sufficient to satisfy Hypothesis 2.1 on α n (x, dh).
(2) E is a closed subset of R d and there exists a generator A of a Feller semi-group , n is a sequence tending to 0 as n tends to infinity and C is a constant.        .9)). Recall the weakening of the assumption on the interaction kernel (see (2.10) and the paragraph following it). When we decrease the death-rate we can introduce a coupling of the original historical process with a historical process with D ≡ 0, U ≡ 0 in such a way that the population of the former is a sub-population of the latter, uniformly over time.
In what follows we will loosely call such a coupling "dominating" and a coupling where the coupled process yields sub-populations of the populations of the original process "minorizing". Once we prove (3.9) for the case D ≡ 0 and U ≡ 0 we therefore obtain (3.9) for D, U satisfying (2.10). Note in particular that the scaling by 1/n in (2.3) is crucial for such a conclusion.
One of the main steps to prove Proposition 3.5 is to establish the following result. The generalization to Polish spaces and more general mutation operators is the main challenge in comparison to [9]. The proof of Proposition 3.5 can be found in Section 4. The proof of Proposition 3.7 is postponed to Section 5.
Denote by K T := {y T |y ∈ K} ⊂ D E the set of the paths of K stopped at time T .  Note that it is enough to show that there exists K ⊂ D E relatively compact in Proposition 3.7. The sets K to be constructed in the proof of Proposition 3.7 are of a particular form, namely for T > 0 we prove existence of K ∈ D T with D T as defined below.
Before proceeding, the reader may want to have a look ahead at Definition 5.3 and Theorem 5.4 where the notations w (y, δ, T ) respectively w (A, δ, T ) for the modulus of continuity of a path y respectively a set A and a criterion for relative compactness in D E are recalled from [4]. We finish this section with a Lemma that yields a control on the modulus of continuity of the paths of the particles in the population. It is a direct consequence of Proposition 3.5.   (3.14) Choose t 0 such that w (K T , t 0 , T ) < τ to conclude the claim.

Remark 3.10 (Application to evolving genealogies on marked metric measure spaces).
In [8], the compact containment result of Theorem 3.4 as well as the control on the modulus of continuity as stated in Lemma 3.9 are applied in the context of evolving genealogies, modeled by means of marked metric measure spaces (mmm-spaces). Establishing relative compactness here requires, for example, a control on the number of balls of (genetic) radius necessary to cover the population. For an introduction to mmm-spaces the interested reader is referred to [3], for relative compactness see [6,Proposition 7.1] in the un-marked setup respectively [3, Theorem 3 and Remark 2.5] in the marked one. In [7,Theorem 2], convergence of tree-valued Moran to Fleming-Viot dynamics is proven. Exponential rates are used to model the dynamics in the approximating population models. [7] work in an ultra-metric setup where the genetic distance between two individuals alive at time t equals twice the time to their most recent ancestor (cf. [7, (2.20)]). Hence, to obtain an -coverage it remains to derive a bound on the number of most recent ancestors (mrca) at time t − . In [8], the metric under consideration is genetic distance instead: in the n th -approximating population genetic distance is increased by 1/n at each birth with mutation. Hence, genetic distance of two individuals is counted in terms of births with mutation backwards in time to the mrca. In this nonultrametric setup, the control over the whole path as provided by historical particle systems is particularly suitable. By interpreting genetic age of a particle as a trait, the control on the modulus of continuity of the historical path immediately translates into a control on genetic distance backwards in time.

Proof of Proposition 3.5
Proof of Proposition 3.5. For T, > 0 and K ⊂ D E compact let S n := inf{t ∈ R + |X n t (K c T ) > }  Denote by K t := {y t |y ∈ K} ⊂ D E the set of the paths of K stopped at time t. To bound P(S n < T ) by , uniformly in n ∈ N, we have to control X n t (K c T ), that is the mass of the population outside of K T , uniformly over the whole time-interval t ∈ [0, T ]. The first step consists in introducing a more tractable quantity, namely instead of K T we follow [9] and focus on K T ⊂ K T (note that if a path leaves K T it leaves K T as well) and decompose {S n < T } into disjoint sets according to the behaviour of the population at the fixed final time t = T . We get The probability of the first event can be bounded using Markov's inequality. The ensuing expectation E[X n T ((K T ) c )] (at fixed time T ) can be made arbitrarily small by choosing K big enough as we will see later.
The bound on the second probability is the more involved. Reason as in [9, Step 2] to see that to prove (3.9) it suffices to show that there exist η ∈ (0, 1), n 0 ∈ N both independent of K ⊂ D E such that for all n ≥ n 0 , Outline of the remainder of the proof of Proposition 3.5. Steps 2-5 of the proof of [9, Proposition 3.4] establish the claim we are interested in, that is the extension of the statement of [9, Lemma 3.5(i)] (compare to Proposition 3.5). We already recalled Step 2 above, leading up to inequalities (4.4)-(4.5) that remain to be shown. The claim of validity of the first inequality is formulated in Lemma 4.2 below. The proof is an adaptation of Steps 3-5. The change to the remaining Step 6, that is the proof of (4.5) respectively Proposition 3.7 is the most involved due to allowing for a more general mutation kernel. The proof is therefore postponed to Section 5 below. In the alternative proof below we follow the ideas of [9] but avoid this recursive argument. As an additional result, the stronger assumption on the interaction kernel in [9], namely U > 0 can be dropped as this is the only instance where it is used in [9]. Lemma 4.2. For T, > 0 and K ⊂ D E compact, there exist η ∈ (0, 1), n 0 ∈ N both independent of K ⊂ D E such that for all n ≥ n 0 , Proof. Following the abstract reasoning of Step 3 of the proof of [9, Proposition 3.4] up to and including equation [9, (3.25)], we conclude that it is enough to show that there exists η ∈ (0, 1), n 0 ∈ N large enough such that for n ≥ n 0 Z n S n (dy) = 1 y S n ∈K S n X n S n (dy). Choose the birth rate nr(t, y) as in [9] but change the death rate to nr(t, y) + D 0 with D 0 = D 0 (T ) > 0 a small enough constant to be chosen later on. We now obtain instead of [9, (3.28)] as an upper bound to the left hand side in (4.7), It now remains to show that there exist η ∈ (0, 1), (4.10) Follow the reasoning of Step 4 in [9], the only difference being that we replace D + U N by the constant D 0 and 2η by η throughout. Note in particular, that η is finally defined as in [9, (3.42)] but with the factor of 2 replaced by 1 on the right hand side. This leads directly up to Step 5, where it remains to show that for (z, r) ∈ D R × [0, T ] arbitrarily fixed and whereZ · is the diffusive limit of Z n S n +· , 1 as introduced above [9, (3.34)] in Step 4. Also note the characterization ofZ in [9, (3.37)].
Conclusion of the proof of Proposition 3.5. Taking Lemma 4.2 and Proposition 3.7 together yields the claim.

Proof of Proposition 3.7
Proof of Proposition 3.7. Coupling with a dominating historical particle system allows us to assume b n (t, y) = nr(t, y) + B as birth rate (cf. (2.6)) and d n (t, y) := nr(t, y) as death rate (cf. (2.9)) at time t. Next construct the tree underlying X n analogously to [9, Step 6] by pruning a Yule tree with traits in E.
A particle of lineage y at time t gives two offspring (one is the parent, one the child) at rate b n (t, y) + d n (t, y). One has lineage y and the other has lineage (y|t|h) (recall (1.1)), where h is distributed following [9, (2.5)]). Using Harris-Ulam-Neveu's notation to label the particles (see e.g. [1]), we denote by Y n,α for α ∈ I = ∪ +∞ m=0 {0, 1} m+1 the lineage of the particle with label α. Remark 5.1 (Clarification of notation). The lineage of the particle with label α does only record the lineage of the particle until the random time S |α|+1 (cf. (2.2)). To regard particles as individuals alive indefinitely, identify the lineage of the particle with label α with the lineage of the particle (α, β) with β = (0, . . . , 0) and |β| → ∞.
Particles descending from the same individual at time 0 are exchangeable and the common distribution of the process Y n,α (in the new notation) is the one of a pure jump process on E, where the jumps occur at rate b n (t, y) + d n (t, y) = 2nr(t, y) + B and where the new traits are distributed according to the probability measure 1 2 δ yt− (dh) + 1 2 K n (y t− , dh) (5.2) (with probability 1/2 we pick the parent with probability 1/2 the child at the time of birth of an offspring). We denote by P n x its distribution starting from x ∈ E.
At each node of the Yule tree, an independent pruning is made: the offspring are kept with probability p(n) := b n (t, y)/(b n (t, y) + d n (t, y)) and are erased otherwise.
Following [9], let us denote by V n t the set of individuals alive at time t and write α i to say that the individual α is a descendant of the individual i. Recall that N n 0 is the number of individuals present at time 0. Let r(x(s), x(t)), (5.5) where {t i } ranges over all partitions of the form 0 = t 0 < t 1 < · · · < t n−1 < T ≤ t n with min 1≤i≤n (t i − t i−1 ) > δ and n ≥ 1. Note that w (x, δ, T ) is nondecreasing in δ and in T .   Let Y n be a process that starts in X i 0 ∈ E, the initial position of individual i ∈ {1, . . . , N n 0 } and is distributed according to P n X i 0 . Denote by N T (Y n ) the number of jumps of Y n up to time T . Then, for any A > 0, Let Y n be a coupled jump-process which has the same sequence of jumps as Y n but jumps at dominating rate 2nR + B. Then the coupling can be constructed such that the inter-jump-times of Y n minorize those of Y n . The fact that these times are equal or smaller implies that by definition of K, P (Y n ) T ∈ K T ≤ P (Y n ) T ∈ K T and N T (Y n ) ≤ N T (Y n ), the latter being Pois(λ n ) with λ n := T (2nR + B). Then there exist constants C 1 , C 2 > 0 such that for any 0 > 0 we may now choose A large enough so that E e cN T (Y n )/n 1 {N T (Y n )>An} ≤ E e cN T (Y n )/n 1 {N T (Y n )>An} (5.11) ≤ e λn(e c/n −1) P Pois(λ n e c/n ) ≥ An ≤ C 1 P Pois(C 2 n) ≥ An < 0 .
Put this back into (5.10) and (5.4) to obtain E[X n T ((K T ) c )] ≤ e cA E D E X n 0 (dy)P n y0 {y : y T ∈ K T } + 0 E X n 0 , 1 , where P n y0 denotes the distribution of Y n starting in y 0 .
Choose A big enough such that the second term in (5.12) is /2 at most, uniformly in n ∈ N. Keep A fixed and use (2.4) and Hypothesis 2.1 to get the required bound in Proposition 3.7. Here we note that the process Y n of Hypothesis 2.1 jumps according to the kernel α n (x, dh) at rate n, whereas the process Y n jumps under P n y0 at rate 2nR + B according to the jump kernel in (5.2). The change in the rate amounts to a time change only. Replacing jumps by jumps of size zero increases the chances to stay inside the relatively compact set K (cf. Theorem 5.4).