Coalescent results for diploid exchangeable population models

We consider diploid bi-parental analogues of Cannings models: in a population of fixed size $N$ the next generation is composed of $V_{i,j}$ offspring from parents $i$ and $j$, where $V=(V_{i,j})_{1\le i\neq j \le N}$ is a (jointly) exchangeable (symmetric) array. Every individual carries two chromosome copies, each of which is inherited from one of its parents. We obtain general conditions, formulated in terms of the vector of the total number of offspring to each individual, for the convergence of the properly scaled ancestral process for an $n$-sample of genes towards a ($\Xi$-)coalescent. This complements M\"ohle and Sagitov's (2001) result for the haploid case and sharpens the profile of M\"ohle and Sagitov's (2003) study of the diploid case, which focused on fixed couples, where each row of $V$ has at most one non-zero entry. We apply the convergence result to several examples, in particular to two diploid variations of Schweinsberg's (2003) model, leading to Beta-coalescents with two-fold and with four-fold mergers, respectively.


Introduction and main results
For haploid population models, in which every individual (gene) has one parent (gene), coalescent processes have been used widely in order to describe the ancestral structure of a sample of n genes when the total population size N is sufficiently large. The purpose of this work is to extend the coalescent theory to general diploid population models, in which individuals carry two copies of each gene which they inherit from two distinct parental individuals. In this context, we derive the diploid analogue of Möhle and Sagitov's classification of the ancestral processes in exchangeable haploid population models [22]. This gives a unified picture of studying genealogies in an exchangeable diploid setting, which has up to now only been available in special cases.
We consider a general diploid exchangeable population model with fixed constant population size N ∈ N := {1, 2, . . .} and non-overlapping generations m ∈ N 0 := {0, 1, 2, . . .} without explicit sexes (see however our remarks in Section 2.3.3). The generations are labeled backwards in time. That is, m = 0 is the current generation; m = 1 is one generation backwards in time and so on. Each individual possesses two chromosome copies, each inherited from one of its two parents. Which parental chromosome is inherited is a uniform random pick, independently for each child. For m ∈ N, let V (m) i,j be the number of children by individuals i and j (for i < j) in the m-th generation. We call these quantities pairwise offspring numbers and implicitly define V i,i = 0. We assume that the reproduction law is independent and identically distributed from generation to generation, i.e., the matrices V (m) i,j 1≤i<j≤N , m ∈ N are i.i.d. We will often write V i,j = V (1) i,j for simplicity. We have 1≤i<j≤N V i,j = N because the population size is fixed. Note that despite this dependence on N of the law of (V i,j ) 1≤i<j≤N we will suppress the N -dependence in the notation. Our fundamental assumption is the following exchangeability condition: (1) (V i,j ) 1≤i<j≤N d = V σ(i),σ(j) 1≤i<j≤N for any permutation σ of {1, . . . , N }, i.e., (V i,j ) 1≤i<j≤N is a finite jointly exchangeable array (see also Remark 4 in Section 1.1). children parents Figure 1 An example for the assignment of parental genes in a population of size N = 7, each individual has two gene copies (the filled circles).
Generally, we are interested in tracking the genealogy of a sample of n ∈ V N := {1, 2, . . . , N } genes from the present population of size N . We will follow the customary approach of describing ancestral relations among n sampled genes by partitions characterising which genes are descended from the same parental gene. Unless specified otherwise, asymptotic relations refer to letting N → ∞ throughout the paper.
Let E n be the collection of partitions of V n and E ∞ be the collection of partitions of N. Any element in E n can be expressed by ξ = {C 1 , C 2 , . . . , C b } where C i ∩C j = ∅ for i = j and ∪ b i=1 C i = V n with b = |ξ| the number of partition elements in ξ. When it is necessary to make the representation unique we order the C i , i = 1, . . . , b by their smallest element in ascending order. In the following we will also refer to the partition elements as blocks. For any ξ, η ∈ E n , write ξ ⊆ η if and only if every block of η is a union of (one or more) blocks of ξ.
In order to also specify which ancestral genes belong to the same ancestral individuals we use notation introduced by Möhle and Sagitov in [23] and consider the state space S n = {{C 1 , C 2 } , . . . , {C 2x−1 , C 2x } , C 2x+1 , . . . , C b } : b ∈ V n , x ∈ V ⌊ b 2 ⌋ , {C 1 , . . . , C b } ∈ E n , where ⌊x⌋ is the largest integer less than or equal to x. We equip the space E n as well as S n with the discrete topology. For later use we define a map cd : S n → E n such that for any ξ = {{C 1 , C 2 } , . . . , {C 2x−1 , C 2x } , C 2x+1 , . . . , C b } ∈ S n , cd(ξ) := {C 1 , C 2 , . . . , C 2x−1 , C 2x , C 2x+1 , . . . , C b } ∈ E n . Following [4], we call cd(ξ) the complete dispersion of ξ. Now sample n genes randomly from the current generation. We can think of sampling n/2 individuals and looking at both of their genes or sampling n individuals and inspecting only one randomly chosen gene in each or something in-between, this will not matter in the limit we are interested in. For m ∈ N 0 , let ξ n,N (m) be the configuration of the genealogical structure for the sampled genes when looking m generations backwards in time: i and j are in the same block of ξ n,N (m) if and only if the i-th and the j-th sampled genes have the same ancestral gene m generations ago. We also keep track of the grouping of these ancestral genes into ancestral (diploid) individuals (this is necessary so that the dynamics of ξ n,N is Markovian). For example, in Figure 1 if we sample all genes of the children then the leftmost parent carries two ancestral genes while the third parent from the left only carries one. We are interested in the convergence of the (suitably time-scaled) ancestral process ξ n,N (m) m∈N 0 , which is a Markov chain with state space S n .
For the description of the possible limit processes as N → ∞ the total offspring numbers giving the total number of offspring of individual i for 1 ≤ i ≤ N will play a crucial role. Note that these V i children may be full or half siblings. We have N i=1 V i = 2N and the vector (V i ) 1≤i≤N inherits exchangeability from the array (V i,j ). Indeed, for any permutation σ on V N , we have Thus, (V i ) 1≤i≤N can essentially be viewed as an offspring distribution for a Cannings model with population size 2N (in which always only N individuals are parents to offspring in the following generation).
In order to consider a suitable scaling for the large population limit a key quantity is the probability that two genes (picked at random) from two distinct individuals, which are chosen randomly without replacement from the same generation, have a common ancestor (gene) in the previous generation. In our model this quantity is given by (4) c where (v) k := v(v − 1) · · · (v − k + 1) denotes the k-th falling factorial (see Lemma 3.1 below, where also alternative expressions for c N are given). If c N → 0 as N → ∞, the correct time scaling is 1/c N and any limiting genealogical process will be a continuous-time Markov chain. We will assume that c N → 0 as N → ∞ throughout this paper. We write V (1) ≥ V (2) ≥ · · · ≥ V (N ) for the ranked version of (V 1 , . . . , V N ) and 2N , . . . , 2N , 0, 0, . . . for the law of their ranked (total) offspring frequencies, viewed as a probability measure on the infinite dimensional simplex ∆ := {(x 1 , x 2 , . . .) : For all x = (x 1 , x 2 , . . .) ∈ ∆, denote by |x| := ∞ i=1 x i and (x, x) := ∞ i=1 x 2 i , put 0 := (0, 0, . . . ) ∈ ∆. We equip ∆ with the topology of coordinate-wise convergence, metrised e.g. via d ∆ (x, y) = ∞ i=1 2 −i |x i − y i | for any x, y ∈ ∆.
Note that (9) shows that Ξ({0}) corresponds to the rate for binary mergers of two blocks, which is the dynamics of Kingman's coalescent. We remark that since ϕ(0) = 0 the measures Ξ ′ and Ξ give the same mass to 0 and so have the same Kingman coalescent component.
We also point out that in Theorem 1.1 we only state f.d.d. convergence since one cannot expect weak convergence on D([0, ∞), S n ), the set of S n -valued càdlàg paths equipped with Skorohod's J 1 -topology (see, e.g., [10,Ch. 3.4]). This is because whenever two ancestral genes descend from the same parental individual the probability that they descend from different ancestral genes (carried by the parental individual) is 1/2, as is the probability that they descend from the same ancestral gene (resulting in a coalescent event). We have chosen our scaling such that the latter event happens at a finite rate in the limit. Thus, also the former event, which creates some partition ξ ∈ S n \ E n happens at a positive rate in the limit. But the reason we have f.d.d. convergence in E n in Theorem 1.1 is that in the limit any ξ ∈ S n \ E n transitions instantaneously to cd(ξ) ∈ E n . Thus, due to the discrete topology on S n we always have a non-vanishing probability of an accumulation of jumps of finite size which precludes weak convergence in Skorohod's J 1 -topology. However, if we instead consider the process which tracks the succession of complete dispersion states then weak convergence on D([0, ∞), E n ) holds: Corollary 1.2. Let ξ n,N (m) := cd ξ n,N (m) ∈ E n be the ancestral partition of the n sampled genes m generations in the past, irrespective of the grouping into diploid individuals. Under the assumptions of Theorem 1.1, we have weakly on D([0, ∞), E n ) and the limit process is the n-Ξ-coalescent from Theorem 1.1.
Before continuing we briefly outline the structure of the remaining paper. After discussing our main result and its relation to the literature in Section 1.1 we consider various examples and discuss their biological motivation in Section 2. In particular, we study two diploid variations of a model by Schweinsberg [29] in Section 2.1 and 2.2. We also discuss the relation to previous results on coalescents for diploid population models and possible extensions in more detail in Section 2.3.
The final Section 3 is dedicated to the proofs of our main results. In Section 3.1 we prove Theorem 1.1 and Corollary 1.2. In Sections 3.2 and 3.3 we prove the more technical convergence results for the models considered in Section 2: Proposition 2.3 of Section 2.1 is proven in Section 3.2 and Proposition 2.5 of Section 2.2 in Section 3.3.
1.1. Discussion. In this section we first give a brief overview over existing coalescent theory in the haploid and diploid setting. Subsequently, we make several remarks regarding our main results.
Classical large population approximation results in the haploid setting can be found in Kingman [15,16], where a convergence theorem to the classical coalescent (nowadays known as Kingman's coalescent) is established for a class of exchangeable populations. In recent years, there has been a tremendous development in coalescent theory. We refer to Pitman [25], Sagitov [26] and Donnelly & Kurtz [8] for coalescents with multiple mergers and to Schweinsberg [28] and Möhle & Sagitov [22] for coalescents with simultaneous multiple mergers. At the same time, coalescent theory has been applied to more complex population models. Sagitov [26] deduced a necessary condition for the convergence of the haploid ancestral process to coalescents with multiple mergers. Möhle and Sagitov [22] then fully classified haploid exchangeable population models, so called Cannings models, in terms of the convergence of their ancestral lines to coalescents with simultaneous multiple mergers. They characterised the coalescent generators in terms of the joint moments of offspring sizes as well as in terms of a sequence of measures defined on the infinite dimensional simplex. Subsequently, Sagitov [27] presented a criterion of weak convergence to the coalescent with simultaneous multiple mergers by a scaled vector of the ranked offspring sizes which constitute a given generation.
For diploid population models, the available theory has been more limited. Möhle [17] introduced a diploid population model with selfing and studied the ancestral process in the Wright-Fisher case. He proved that in this case the limit is Kingman's coalescent. We recover Möhle's result without selfing as a special case of our general result, see Section 2. In Möhle [18] it was proved that the scaled ancestral process of n sampled genes in the two-sex Wright-Fisher model behaves like Kingman's coalescent. In this context, Möhle also derived coalescence estimates for general offspring mechanisms if only two genes are sampled. Subsequently, Möhle and Sagitov [23] completely classified the coalescent patterns in two-sex diploid exchangeable population models and established conditions for the limiting scaled ancestral process to either be Kingman's coalescent or the coalescent with (simultaneous) multiple mergers. In contrast to our set-up, individuals are either male or female (N individuals each) and in each generation N couples are formed that have children according to a general exchangeable offspring distribution. Sexes are again assigned randomly conditioned on there being again N males and N females. This is a special case of our result, see Section 2.3.1. In fact, Theorem 1.1 is in a sense an explicitly worked-out version of the remarks in [23,Section 7].
Birkner et al. [4] studied a diploid Moran type population model in which two individuals drawn uniformly at random contribute a (potentially) large number of offspring relative to the total population size. They proved that due to this property and the diploid inheritance the scaled ancestral process admits in the limit simultaneous multiple mergers in up to four groups. In Section 2.3.2 we give more details on the relationship to our main results. In particular, the single-locus analogues of Theorems 1.2 and 1.3 in [4] can be recovered as a special case of Theorem 1.1.
We would like to emphasize a number of points regarding our main results: 1. Broadly speaking, Theorem 1.1 says that we can (for N large) use "equivalent" sampling on the gene level and ignore the grouping of genes into diploid individuals. This phenomenon has been observed many times before (e.g. in [23] and [4]), it is explained by an asymptotic separation of time-scales: the "breaking up" of grouping into diploids is much faster than nontrivial coalescence on the gene level (see the proof of Theorem 1.1). For finite N , the process ξ n,N is in general not a Markov chain, this is one of the reasons why we consider ξ n,N in Theorem 1.1. However, the limit process is Markovian.
2. ξ n (t) t≥0 can in a natural way be interpreted as a tree describing the genealogy of n sampled genes. In population genetics applications, functionals of this tree, in particular the total length and the length of all branches subtending i leaves for i ∈ {1, 2, . . . , n − 1} are of interest. By Corollary 1.2, the distribution of such functionals of ξ n,N ⌊t/c N ⌋ t≥0 converges as well.
3. Note the normalisation with 2c N in (5). The expression in (7) is the limit object related to sampling according to the ("haploid") offspring vectors (V 1 , . . . , V N ) and the asymptotically correct scaling of Φ N (so that the corresponding limit object Ξ ′ is a probability measure on ∆) is given by 1/c ′ N with see e.g. [27,Eq. (1.5)]. We can interpret c ′ N as referring to sampling directly on the level of chromosomes where c N as defined in (4) refers to sampling on the level of diploid individuals. We have c ′ N ∼ 2c N for N → ∞ (see also Lemma 3.1) and our normalisation in (7) entails φ 1 (2) = 2.
4. (1) says that (V i,j ) is a finite jointly exchangeable array. A related notion is that of "separately exchangeable arrays" where rows and columns may use different permutations. See also the discussion in Section 2.3 and see e.g. [14], [1] for general background on exchangeable arrays.

Examples
We will now apply Theorem 1.1 to various examples. Some of these have been considered in the literature before, and we recover the known limiting results in an efficient way, some of the examples are new or analysed here in a more general setting.
Arguably, the simplest diploid population model is the diploid Wright-Fisher model where each individual in the children's generation is independently assigned two distinct parents by drawing twice without replacement from the parent's generation; the joint distribution of (V i,j ) 1≤i<j≤N is then a N 2 -dimensional multinomial distribution with uniform weights. This model was considered e.g. by Möhle [17] and to set the stage we briefly discuss how we recover his result for the case with no selfing (s = 0 in Möhle's [17] notation) from Theorem 1.1. Proof. By choosing W i ≡ 1 in Section 2.1 this model is a special case of the class considered there and the result follows from Proposition 2.3, case 1. Alternatively, one can easily check that the (sharp) criterion on the third factorial moment of V 1 from Möhle [20,Eq. (14) in Sect. 4] for convergence to Kingman's coalescent is satisfied.
It is well known that the class of possible coalescent processes arising as limiting genealogies in population models is much richer than just Kingman's coalescent. An important family of examples for the haploid case is given by the Beta(2 − α, α)-coalescents, where 0 < α < 2. For the sub-case 1 ≤ α < 2 these are well motivated by a class of models that were proposed and analysed by Schweinsberg [29], which work as follows: Let each generation consist of N haploid (adult) individuals. Individual i produces X i juveniles, where X 1 , . . . , X N are independent copies of X with E[X] > 1 and X has a (strictly) regularly varying tail, with c ∈ (0, ∞). Then, N of the S N = X 1 + · · · + X N ( > N typically) juveniles are drawn at random without replacement to form the next (adult) generation. It turns out (see [29,Thm. 4]) that in the limit N → ∞, the suitably scaled genealogies of samples from such a population model converge to a Beta(2 − α, α)-coalescent. This is a particular so-called Λ-coalescent. In the notation of (9) it is given by where Beta(2 − α, α) is the probability law on [0, 1] with density Here, for a, b > 0, B(a, b) = Γ(a)Γ(b)/Γ(a + b) denotes the Beta-function. There is also a rich mathematical structure linking these particular Λ-coalescents to stable branching processes, see e.g. [2,3]. This model captures situations where occasionally some individuals can, for example due to environmental fluctuations, possibly produce many more offspring than others (note that (10) with α < 2 implies Var[X i ] = ∞). It is thus a possible mathematical formalisation of the concept of "sweepstakes reproduction" that appears in the biological literature, see e.g. Eldon and Wakeley [9] and the discussion and references therein.
There are various possibilities how one can extend this model -literally or in spirit -to a diploid scenario. We explore two such possibilities below in more detail: In Section 2.1 we assign each individual i in a given generation independently a random "fitness value" W i ≥ 0 and decree that each child has a chance ∝ W i W j to descend from couple (i, j), i.e. the joint law of offspring numbers is given by (13). In Section 2.2, each couple (i, j) independently produces X i,j juveniles and then N out of the i<j≤N X i,j juveniles are drawn at random to form the next generation, analogous to [29].
It turns out that two distinct forms of "diploid Beta(2 − α, α)-coalescents" arise from these two set-ups: The limiting Ξ in Section 2.1 arises from a Beta(2 − α, α)-distributed x by replacing it with two equal weights x/4 (see Equation (19) in Proposition 2.3) whereas in Section 2.2 it is split into four equal weights (see (see Equation (29) in Proposition 2.5).
Intuitively, this can be understood as follows: In Section 2.1, the dominant contribution comes from situations when one W i is exceptionally large (≈ O(N )) whereas all others are much smaller; then there is a large family of half-siblings and the two weights correspond to the two chromosome copies of the exceptional individual i; all the other parents will typically have a total number of offspring which is negligible in comparison to N and none of their genes will be involved in a multiple merging event. On the other hand, in Section 2.2 the dominant contribution comes from cases when X i,j ≈ O(N ) for exactly one couple (i, j) and all other X k,ℓ ({k, ℓ} = {i, j}) are much smaller; then there is a large family of full siblings and the four weights correspond to the four chromosome copies of the two individuals in the successful couple.
In particular, we see that the answer to the question which diploid coalescent is appropriate for a given biological population with potentially highly skewed individual reproductive success can depend on the typical mating behaviour.
We describe and analyse these diploid variations of Schweinsberg's [29] model in more detail in Sections 2.1 and 2.2 below. In Section 2.3, we briefly discuss how results of previous studies of diploid population models, especially from [4] and [23], fit into our framework. We also mention possible extensions and additional examples there.
The perspicacious reader will observe that in Propositions 2.3 and 2.5 below (compare Assumptions (14) and (26), respectively), we have excluded the boundary cases α = 2 and α = 1. In view of Schweinsberg's [29] results for the haploid case, we expect the following: For α = 2, c N ∼ c(log N )/N in Lemmas 2.2 and 2.4 and convergence to Kingman's coalescent; for α = 1, c N ∼ c/ log N and in both Proposition 2.3, 2. and Proposition 2.5, 2., the uniform distribution on [0, 1] will appear, i.e. the limiting coalescent will then be a variation on the Bolthausen-Sznitmancoalescent where jumps are broken into two groups and into four groups, respectively.
We leave the details to future work.
Given the W i 's let (we will see in Section 3.2 that the event Z N = 0 has negligible probability in the limit N → ∞). Note that when the W i 's are identical, (13) coincides literally with our version of the diploid Wright-Fisher model, see Proposition 2.1.
The "fitness" in this section's title is not based on an explicitly modelled genetic type and is not passed on to offspring as the values are drawn afresh in each generation. The offspring distribution in (13) may be appropriate for a population with high individual reproductive potential (thinking e.g. of plants or marine species that can in principle produce large numbers of seeds or eggs) in an environment that fluctuates rapidly both in space and time. In reality, there may be a very complex and highly variable interplay between ecological and genetic factors that determine the reproductive success of a given individual at a given time. All this would be subsumed in this model into a random "effective fitness parameter" W .
For the fitness parameter we will consider separately the finite variance case as well as the case of (strictly) regularly varying tails such that Before stating the convergence result we specify the asymptotic behavior of the scaling parameter c N as specified in (4). A key quantity is (15) the probability that a randomly chosen child is an offspring of parent 1.
Lemma 2.2. The pair coalescence probability over one generation for V i,j 's as in (13) is given by By scaling with the appropriate c N we obtain the following convergence results. (17)) converges in the f.d.d. sense to Kingman's coalescent.
2. If the tails of W vary (strictly) regularly as specified in (14) with the density of the Beta(2 − α, α) distribution given in (11).
The proof of Lemma 2.2 and Proposition 2.3 can be found in Section 3.2.

2.2.
Diploid population model related to supercritical Galton-Watson processes. In this section, we consider another diploid version of the model introduced and studied by Schweinsberg in [29] in which an abundance of offspring is produced in each generation of which only a limited number survives. The model is similar to that of Section 2.1 in that large families may be produced. However, in contrast to the random individual fitness model of the last section, in which individual parents may have many offspring due to an unusual fitness, we here have parent couples that may produce a large family.
More concretely, let X i,j = X (N ) i,j , 1 ≤ i < j ≤ N, be the "potential offspring" of parent i and j in any given generation with a distribution that may depend on N. For notational convenience, we set X We also denote the number of potential offspring to parent i by be the total number of potential offspring. The actual total population size is always fixed at N. If S N ≥ N , we obtain the next generation by sampling N of these offspring at random without replacement. We use to denote the number of offspring sampled from X i,j for any 1 ≤ i < j ≤ N. Our scaling will be such that there are enough offspring for resampling with sufficiently high probability so that the details of (any exchangeable) offspring assignment will not be relevant otherwise. We assume in the following that so that the potential offspring of parent pairs are generated as in a Galton-Watson process. In addition, we assume that where the law of X does not depend on N and satisfies (23) µ Note that (23) implies that (We will see in Lemma 3.13 below that this implies that the event {S N < N } has asymptotically negligible probability in the scaling regimes we consider.) Finally, we require one of the following assumptions: These assumptions might appear at first sight somewhat artificial, see however the discussion in Remark 2.6 below.
Before stating the convergence result we again specify the asymptotic behavior of the scaling parameter c N given in (4).

If (26) holds then
By scaling with c N we obtain the following convergence result. (27)) converges in the f.d.d. sense to Kingman's coalescent.
The proof of Lemma 2.4 and Proposition 2.5 can be found in Section 3.3. Remark 2.6. 1. The model considered in this section is appropriate for a large, unstructured population of N diploid individuals with promiscuous reproductive behaviour. It is intended to capture situations where there is potentially great variability between the number of juveniles produced by different mating couples and this is achieved in analogy to Schweinberg's model [29] in the mathematically simplest way by assuming that the X (N ) i,j are independent with the same distribution.
At first sight, it may seem then that the structural assumptions (22) and (25) or (26) are artificial choices just to "make the mathematical theory work". However, if we stipulate, as seems biologically reasonable, that the law L(X From the point of view of a biological model, we suggest to read (22) as follows: Given a large population of size N ≫ 1 let c X,1 = N P(two randomly drawn individuals produce potential offspring together) and let L(X) be the law of the number of potential offspring produced by two randomly drawn individuals, given that they do produce some. Then, if X satisfies (25) or (26), the genealogy of an n-sample is over time-scales ∝ 1/c N approximately described by Proposition 2.5. See Section 2.3.3 for possible extensions.
2. We do not strive here to answer in full generality the mathematical question "if one only assumes that X what are sharp conditions on the family ν N of probability measures on Z + so that the genealogical processes of finite samples in such population models converge?" Obviously, then necessarily sup N N P(X (N ) i,j > 0) < ∞ and we see from the proofs of Lemma 2.4 and Proposition 2.5 that in the case of infinite variance E (X i,j > 0) uniformly in N is required for the limit in (82) to exist. In fact, one can cook up examples where N P(X (N ) i,j > 0) and N α−1 c N oscillates as a function of N or where even though lim x→∞ x α P(X i,j > 0) =: c X,2 exists for all N one has convergence to different coalescents along different subsequences. (22) enforces P(X = 0) = 0. If one prefers to allow 0 < P(X = 0) < 1, one can replace p N by p N P(X > 0) and X by X ′ where P(X ′ ∈ ·) = P(X ∈ · | X > 0).

Relation to previous work and possible further extensions.
2.3.1. Diploid population model with randomly chosen pairs as couples. We here recover the convergence result of [23, Theorem 4.2 and Corollary 4.3] concerning a diploid two-sex population model. In order to make comparisons to [23] easier we assume here that the population size is given by 2N. Now, let {J 1 , . . . , J N } be a random and randomly ordered partition of {1, . . . , 2N } into N subsets of size 2. This partition describes the grouping of individuals into N distinct couples which give birth to the individuals of the next generation. Let V 1 , . . . , V N be a sequence of exchangeable non-negative random variables representing the number of children from each couple respectively for its ranked version. The offspring distribution (V i,j ) 1≤i<j≤2N is then given by It is clear that the (V i,j ) 1≤i<j≤2N are exchangeable as in (1). Note that the corresponding total offspring size vector (V 1 , . . . , V 2N ) is a random permutation of ( V 1 , V 1 , . . . , V N , V N ). In particular, V 1 and V 1 have the same distribution. Thus, we obtain from (4) that (remember that we use population size 2N here) .
Let us remark that this model can also be interpreted as a two-sex model, which is the formulation used in [23]. Here, in each generation, we randomly assign sexes to the offspring such that there are N male and N female offspring. Subsequently, the couples are formed at random between the males and the females. We have the following proposition: [23]). Assume that c 2N → 0 as N → ∞ and that Remark 2.8. We note that (30) can be equivalently formulated in terms of moment conditions as in (6) or as in Appendix A.
Note that we see clearly from the form of Ξ = Ξ ′′ • ϕ −1 that the mass of each large family is split into four equal parts, representing the four chromosomes from a particular couple.

2.3.2.
A diploid population model with occasional large families. Here, we briefly discuss how the class of continuous-time diploid population models from [4], which involve suitably rare but "large" reproduction events with a single large family, can be formulated in our present discrete-time context and thus the "single-locus" analogues of Theorems 1.2 and 1.3 there can be obtained from our main result.
The children's generation arises as follows: Randomly choose two distinct individuals {I 1 , I 2 } from V N = {1, 2, . . . , N }. Individuals I 1 and I 2 form a couple (as in [23]) and have a random number Ψ N of children together but with no-one else, i.e. V I 1 , The limiting behaviour depends on the sequence of laws L (Ψ N ), N ∈ N: 1. A simple choice, inspired by [9], is to assume that the "large" family constitutes always a fixed fraction of the total population: Given ψ ∈ (0, 1), we assume . Thus Theorem 1.1 yields that the scaled ancestral process converges to a Ξ-coalescent process with Thus, Theorem 1.1 shows that the scaled ancestral process converges to a Ξ-coalescent process with Ξ = ψ 2 III. If γ > 1 the limit process is Kingman's coalescent. We note that alternatively, we could assume that with probability 1 − N −γ there is just a Wright-Fisher reproduction step and with probability N −γ a reproduction step with one exceptionally fertile couple as above occurs. This yields the same limit process.
2. More generally, we can assume that the sequence of laws L (Ψ N ), N ∈ N satisfies c N → 0 as N → ∞ with c N from (32) and there exists a probability measure F on [0, 1] such that The matrix of offspring numbers (V i,j ) 1≤i<j≤N can equivalently be viewed as a (n exchangeable) random multigraph on N nodes, by drawing V i,j undirected edges (one for each child) between nodes i and j. For example, we can interpret the model from Section 2.2 as a variation on Erdős-Rényi graphs (remembering (22) From the point of view of biological modelling, one might feel that the set-up in Section 2.2, which in particular enforces that the number of reproductive partners of a typical individual is essentially Poisson distributed (and hence the number of potential offspring has a compound Poisson distribution), is somewhat restrictive. A natural generalisation of (22) would be the following: Let D i be independent copies of an N 0 -valued random variable D with E[e λD ] < ∞ for λ in a neighbourhood of 0; we think of D i as the number of reproductive partners of individual i. Use the configuration model to assign reproductive partners, i.e. attach D i "half-edges" to node i and then randomly match all half-edges conditional on producing no self-loops (if D 1 + · · · + D N is odd, throw away the last half-edge, say). Finally replace each resulting edge e by a random number X e of edges where X e are independent copies of X. If E[X]E[D] > 2 and X satisfies (25) or (26), then a suitable analogue of Proposition 2.5 will hold. We do not go into detail here but note that by Theorem 1.1, asymptotically for our study of genealogies, only the joint law of (V i ) 1≤i≤N , which in the language of random graphs corresponds to the empirical degree distribution, is important. In the extension of the model from Section 2.2 just sketched, this will again on the relevant time-scales be dominated by one exceptionally large value of X e if (26) holds and negligible compared to N if (25) holds. See e.g. [30] for background on random graphs, which is currently a very active research topic.
Obviously, these models allow various generalisations where the "degree of promiscuity" can be chosen as a parameter: One could for example assign a% of the children to fixed couples as in the model from Section 2.3.1 and the remaining (100 − a)% of the children by using a "configuration model" as just discussed.
In most of the models discussed so far we did not include individuals of different sexes. However, as described in Section 2.3.1 two-sex models, possibly with unequal sex ratio r : 1 − r, can be in principle easily embedded into our set-up: take a random bi-partite "exchangeable" multigraph on ⌊rN ⌋ and N −⌊rN ⌋ nodes with N edges (equivalently: a separately exchangeable ⌊rN ⌋×(N −⌊rN ⌋)matrix, with values in N 0 , summing to N ), assign individuals i = 1, . . . , N randomly to the two "sex groups". One can combine this with small variations of all the models discussed in Section 2 and one can for example also incorporate differences in the variance of reproductive success between the two sexes in this class of models.
One can also allow the possibility of selfing, i.e. P(V i,i > 0) > 0. Then complete dispersion will not happen (asymptotically) immediately but only after a certain random number of sampled genes have merged due to selfing, analogous to [17]. We leave the details to future work. 3.1.1. The pair coalescence probability. We start by analyzing the pair coalescence probability c N . Recall that this is the probability that two genes (picked at random) from two distinct individuals, which are chosen randomly without replacement from the same generation (in the population of size N ), have a common ancestor (gene) in the previous generation.
Proof. Pick two distinct individuals at random from the current population and pick from each of them independently one of the two gene copies at random by a fair coin flip. These two genes may be descended from the same ancestral gene in the previous generation if the two individuals have both parents or just one parent in common (full siblings, half siblings). The probabilities that the two genes are descended from the same ancestral gene of one of the parent individuals is then 1 4 and 1 8 respectively. Thus by the exchangeability assumptions (1). Alternatively write to obtain the first equality in (33).
Note that one can also express c N in terms of variance and covariances of V i,j , as follows: 3.1.2. Transition probabilities. Next, we analyze the transition probabilities of the ancestral process. Let Π n,N = (π n,N (ξ, η)) ξ,η∈Sn be the transition matrix of the Markov chain ξ n,N (m) m∈N 0 . (Π n,N can be viewed as an |S n |×|S n |-matrix if we fix an order for S n , which we will do later). In particular, we see from the argument in Lemma 3.1 that c N = π 2,N ({{1}, {2}}, {{1, 2}}). It turns out that for our purposes it is sufficient to describe the π n,N (ξ, η) in case that ξ ∈ E n . For some b ≤ n we thus consider the transition probability from states ξ ∈ E n to η ∈ S n of the form for some a ≤ b and 2d ≤ a such that ξ ⊆ cd(η). Assume that D i is a union of k i ≥ 1 classes from ξ with k 1 + · · · + k a = b.
Denote by E a,d the collection of elements in E a with a − d blocks, d of which have cardinality 2 while the other a−2d have cardinality 1. Then we can describe the grouping into diploid individuals in η via ζ = {ζ 1 , . . . , ζ a−d } ∈ E a,d : Let D i and D j belong to the same diploid ancestral individual in η if and only if {i, j} ∈ ζ. We put which is the number of offspring classes in ξ that belong to the j-th ancestral individual described by η, and we have therefore that a−d j=1 ℓ j = b.
In order to calculate and describe the transition probabilities we will introduce another useful concept and corresponding notation. Recall that V i,j (= V j,i ) represents the random number of offspring of the parental individuals i and j. By definition, each offspring inherits one chromosome copy from each parent. Let us assume that we randomly and uniformly "mark" one of these chromosome copies as "relevant" (in the sense that it will be this copy that we possibly later examine in the child), let V i,j be the number of offspring with parents from i and j who inherited their relevant chromosome copy from parent i. Mathematically, this means that conditional on and V i,j and V k,ℓ are independent when {i, j} = {k, ℓ}. We will write for the total number of "relevant" offspring of individual i. Note that we have N i=1 V i = N by definition. Proof. Note that For any permutation σ on V N , we have It follows from (1) that (37) equals (38), i.e. V σ(i),σ(j) 1≤i =j≤N = d V i,j 1≤i =j≤N .
Exchangeability of ( V i ) 1≤i≤N follows from this as in (3).
The following lemma states that if the limits in (6) exist then they can also be expressed in terms of the quantities Proof. Recall the combinatorial identity for choosing (without replacement) ℓ objects out of n i=1 a i objects which results from choosing exactly k i out of a i objects and from considering all the possible choices of k 1 + · · · + k n = ℓ. Now, set Then, expanding the definition of V i and using (39) yield Thus, where we have used the fact that (a special case of an identity for mixed factorial moments of a multinomial vector, see also Lemma 3.11) and the conditional independence properties of V i,j 's in the first equation. Note that if we replace in (41) the term then we obtain (as in (40)), The difference of the two terms (V i,j ) k i,j +k j,i and (V i,j ) k i,j (V i,j ) k j,i which get replaced inside the product in (42) vanishes whenever k i,j + k j,i ≤ 1 and is otherwise (with a combinatorial constant that depends on c and ℓ 1 , . . . , ℓ c but not on N ). Thus, because of Condition (6), which is the claim.
With the help of this lemma we can now prove the following: If ℓ 1 , . . . , ℓ a−d ≥ 2 we see from (6) that the limit in (44) equals When s ≥ 1 of the ℓ i are equal to 1, say ℓ 1 , . . . , ℓ a−d−s ≥ 2, ℓ a−d−s+1 = · · · = ℓ a−d = 1 we see from (8) that the limit in (44) equals Proof of Lemma 3.4. Since ξ ∈ E n ⊂ S n is a completely dispersed state, its b classes belong to b distinct individuals in the offspring generation and we can think of the chromosome copies which belong to ξ as the relevant ones (in the corresponding individuals), i.e., the transition from ξ to η corresponds to drawing b times without replacement from an urn which contains V i balls of colour i for i = 1, . . . , N . Thus Note that the factor a−d r=1 2 −ℓr+1 accounts for the fact that we still have to assign for each r = 1, . . . , a − d which of the ℓ r classes in ∪ i∈ζr D i descends from which of the two chromosome copies in the i r -th ancestral individual (this makes ℓ r assigned picks if we decree who descends from the "first" and who from the "second" chromosome in individual i r but we gain a factor of 2 because the roles of the "first" and the "second" chromosome are arbitrary and can be swapped). The second equation is a consequence of exchangeability of the V i . Finally, (44) follows from (45) and Lemma 3.3.
Note that π n,N is a Markov transition matrix on E n , a step according to π n,N means first taking a step according to π n,N and then applying the complete dispersion operator, i.e., ignoring the grouping into diploid individuals.
In particular, there is sampling consistency: For ξ ∈ E n , η = {D 1 , . . . , D a } ∈ E n we have Furthermore, the transition probabilities depend on n only implicitly through the merger structure that the transition from ξ to η induces. Lemma 3.5. Let n ∈ N, ξ ∈ E n , η ∈ S n where ξ has b classes and cd(η) arises from ξ by merging j ≥ 1 groups of classes with sizes k 1 , k 2 , . . . , k j ≥ 2 from ξ and leaving s ≥ 0 singleton classes (in particular, η has a = j + s classes and b = k 1 + · · · + k j + s). Then where λ b;k 1 ,...,k j ;s are the transition rates of the Ξ-coalescent defined in Theorem 1.1 (recalled in (9)).
Proof. Assume ξ and η are given by (34) and (35) with classes denoted by C i and D i , respectively. Recall that D i is a union of k i ≥ 1 classes from ξ with k 1 + · · · + k a = b and that ζ describes the grouping into diploid individuals. Now note that cd −1 (cd (η)) := {η ′ ∈ S n : cd(η ′ ) = cd(η)} can be parametrised by choosing any d ∈ {0, 1, . . . , ⌊a/2⌋} and ζ ∈ E a,d (d describes the number of ancestral individuals in η carrying two ancestral genes and ζ describes the grouping of the a ancestral chromosomes in η into diploid individuals). For ζ ∈ E a,d and i = 1, 2, . . . , a, we define ζ First consider the case s = 0 and hence a = j > 1: From (46), Lemma 3.4, as well as (6) and (7), where we used that by definition of ϕ, for any function F : and that (ϕ(x), ϕ(x)) = 1 2 (x, x). In the case j = 1, we additionally have the Kingman term in (49) of the form For the general case s > 0, we can employ the consistency relations (47) in order to use induction on s : Assume that (48) holds whenever the number of "singleton classes" involved is at most s, and b = k 1 + · · · + k j + s with k 1 , . . . , k j ≥ 2. Let cd(η) ∈ E n+1 arise from ξ ∈ E n+1 by a merger in j groups of sizes k 1 , . . . , k j ≥ 2, leaving s + 1 singleton classes. By the symmetries of the model, we may (without changing the transition probability) assume that one of the relevant singleton classes in η is {n + 1}. Then, rearranging (47) and using the induction hypothesis we see that λ b+1;k 1 ,...,k i +1,...,k j ;s − sλ b+1;k 1 ,...,k j ,2;s−1 .
The term on the right-hand side equals λ b+1;k 1 ,...,k j ;s+1 by the consistency relation for transition probabilities of Ξ-coalescents (implicit in [22,Eq. (11)   If the sequence of initial probability measures P X N (0) converge weakly to some probability measure µ, then the finite dimensional distributions of the process (X N (⌊t/c N ⌋)) t≥0 converge to those of a time continuous Markov process (X t ) t≥0 with initial distribution transition matrix Π (t) := P − I + e tG = P e tG , t > 0, and infinitesimal generator G.
Proof of Theorem 1.1: Applying Lemma 3.6, the strategy to prove our result is based on the decomposition of the transition matrix Π n,N = (π n,N (ξ ′ , η ′ )) ξ ′ ,η ′ ∈Sn . In order to have these transitions be well defined as matrices we choose a specific order of S n : Namely, consider the standard order of E n ⊂ S n . We will then insert all remaining elements of cd −1 (ξ) ⊂ S n directly following any ξ ∈ E n (the order here is fixed in an arbitrary way). This way, the matrix Π n,N decomposes into sub-matrices (Π n,N (ξ, η)) ξ,η∈En withΠ n,N (ξ, η) a cd −1 (ξ) × cd −1 (η) matrix.
Thus, we have P := lim m→∞ A m with sub matrix structureP (ξ, η) = 0 for ξ = η andP (ξ, ξ) = P ξ . It is easy to show that ξ,η is the sum of all the entries in the first row of matrix B n,N (ξ, η). Consequently, Assume η arises from ξ by merging j ≥ 1 groups of classes with sizes k 1 , k 2 , . . . , k j ≥ 2 and leaving s ≥ 0 singleton classes (b = k 1 + · · · + k j + s). Applying Lemma 3.5, we have Note that a transition from any given state η ′ ∈ S n with a ≤ n classes (as in (35)) to its complete dispersion state cd(η ′ ) happens whenever none of the ancestral genes of distinct ancestral individuals in configurations η ′ have a common parental ancestral gene in the previous generation. (Ancestral genes of the same individual naturally have distinct parental ancestral genes as we have excluded selfing.) For any pair of such ancestral genes the probability to have a common parental ancestral gene is c N and there are at most a 2 such pairs to consider. Thus, the transition probability satisfies It is clear that π n,N (η ′ , cd(η ′ )) → 1 as N → ∞ or in other words that A = lim N →∞ Π n,N . Hence, complete dispersion happens instantaneously in the limit of the genealogical process for the diploid population model. By eliminating all those instantaneous states, we can get an E n -valued marginal process (R n (t)) t≥0 whose generator is given by lim N →∞ g (N ) ξ,η ξ,η∈En . The process (R n (t)) t≥0 is exactly the n-Ξ-coalescent process.
Proof of Corollary 1.2: Our argument is essentially borrowed from the proof of Theorem 3.1 in Möhle [19]. Theorem 1.1 yields that the finite-dimensional distributions of ξ n,N (⌊t/c N ⌋) t≥0 = cd ξ n,N (⌊t/c N ⌋) t≥0 converge to those of ξ n . Thus, in order to strengthen this to weak convergence on the path space D([0, ∞), E n ) we only have to verify tightness there. Since E n is finite (and so in particular compact) and ξ n,N can by construction only move by mergers, it suffices to check that in the limit N → ∞, the jump times of cd ξ n,N (⌊·/c N ⌋) do not accumulate (see e.g. [10, Thm. 6.2 in Ch. 3] or [5]). Noting that for any ξ ∈ S n P cd ξ n,N (m + 1) = cd ξ ξ n,N (m) = ξ =P at least one pair of genes (necessarily from distinct individuals) merges in the next step ξ n,N (m) = ξ ≤ n 2 c N we see that the times between jumps of ξ n,N (m) m∈N 0 are stochastically larger than independent geometric random variables with success parameter c N n(n − 1)/2 which after time scaling converge in distribution to independent exponentials with rate n(n − 1)/2.  [29], the adaptation poses some additional technical difficulties. In particular, observe that Z N , the normalising constant in the representation (13) of the law of (V i,j ) as a mixture of multinomials is not literally an i.i.d. sum as in [29].
Recall from (12) and (13) that in the random individual fitness model V i,j , 1 ≤ i < j ≤ N are multinomial with N trials and with success probabilities of individuals i and j proportional to W i W j where the fitness parameters W i are independent copies of W with mean µ W .
Proof of Proposition 2.3. 1. The scaling of the pair coalescence probability c N is given in (17) in Lemma 2.2.
2. In this case, the scaling of c N is given by (18) in Lemma 2.2. To verify the claimed form of the limiting coalescent we should check that the probability measure Ξ ′ on ∆ appearing in (5) and (7) is given by i.e. Ξ ′ is the image measure of Beta(2 − α, α) under the mapping [0, 1] ∋ x → (x/2, 0, 0, . . . ) ∈ ∆. It is known that (51) from Lemma 3.7 implies that the vague limit measure of Φ N from (5) is concentrated on ∆ := {(x 1 , x 2 , . . . ) ∈ ∆ : x 2 = 0}, see [27,Cor. 2.1] (one can view ∆ as the canonical embedding of [0, 1] into ∆) so it suffices to observe that for any x ∈ (0, 1/2) by Lemma 3.8 This is a small variation on Lemma 12 from [29], addressing the case k = 2 and W integer-valued. For a rough idea why the asymptotic decay rate of E W k /(M + W ) k is M −α note that on the event {W ≥ M }, which has probability ∼ c W M −α , the integrand is almost constant.
Proof. For any bounded monotone g ∈ C 1 ([0, ∞)) with g(0) = 0 we have Applying this with g( For every L > 0 we have using that α < 2 and that k ≥ 2. Furthermore, and similarly for the lim inf. (for 2 ≤ k < N ) we can re-express Z N from (12) as and also as 3,N . (60) Lemma 3.10. Assume that W satisfies (14) or that µ (2) W < ∞. For 0 < δ < 1 and k ∈ {2, 3} let Furthermore, for all r = r(δ) > 0 with δ ∈ (0, 1) we have For (62) (under Assumption (14)) note that while the typical size of S k,N is ≈ µ W N by the law of large numbers, conditioned on S k,N ≫ µ W N there will typically be just one exceptionally large summand (by the tail assumption (14), this is much more likely than having many moderately large summands, cf. [24]). Then, the order of magnitude of (S k,N ) 2 − S (2) k,N will in fact be ≈ N S k,N up to constants.
We now assume that (14) holds. In order to prove (18) we first verify that Note that using (59) we can re-write We have noting that S 2,N /N → µ W and S (2) 2,N /N 2 → 0 as N → ∞ in probability (for the latter use that W 2 i have regularly varying tails of index α/2 ∈ (1/2, 1), so in particular S (2) 2,N /N 2/α is tight; this follows e.g. from [11, Thm. 2 in Section XVII.5]). Furthermore, consider and note that P(B N ) ≤ e −rN for N large enough with r > 0 due to (62) from Lemma 3.10 for k = 2. Since for small enough δ > 0 we have B N ⊂Ã c N,δ it then follows for those δ that and Lemma 3.9 together with (67) implies follows by taking δ ↓ 0.
Analogous, in fact a little easier, arguments can be used to show that Again using (66) we get now combine (67) with Lemma 3.9 as above and then let δ ↓ 0 to conclude (69). (65) and (69) combined with (16) yield (18).
We now assume µ (2) W < ∞. The proof of (17) is similar, in fact simpler: Instead of using Lemma 3.9, we simply observe that in this case Taking δ ↓ 0, this combined with (16) yields (17).

Write
Instead of spelling out the details of this computation, note that there is a combinatorial interpretation: Consider V 1,2 + V ′ + V ′′ numbered balls, of which V 1,2 are white, V ′ are red and V ′′ are blue. Then (V 1 ) 2 (V 2 ) 2 counts the number of ordered pairs we can form where the first pair consists of two distinct balls which are either white or red and the second pair consists of two distinct balls which are either white or blue (and the same ball(s) might possibly appear in both pairs). The right-hand side decomposes this number: There are (V 1,2 ) 4 pairs where all balls are white and all are distinct, 4(V 1,2 ) 3 pairs where all balls are white and exactly one ball appears twice, etc. Since the law of Lemma 3.11 in the first equality that from (60) and that we can bound except on an event with exponentially small probability, cf. (61) and (62) from Lemma 3.10.
We will not treat all the terms on the right-hand side of (71) in detail (but see Remark 3.12 below) since the computations are long but otherwise relatively straightforward. Consider for example the term for N large enough where we used Lemma 3.9 in the last inequality.
Proof of Lemma 3.8. We start by arguing that in order to show (52) it suffices to check that for x ∈ (0, 1) exists and is given by the right-hand side of (52) (note that this expression is continuous in x). Indeed, since L V 1 (W i ) = Bin (N, Q N ) and Bin(N, p)({0, 1, . . . , ⌈(1 − ǫ)pN ⌉} ∪ {⌊(1 + ǫ)pN ⌋, . . . , N }) ≤ e −rN for any p, ǫ ∈ (0, 1) with r = r(p, ǫ) > 0 by classical large deviation estimates for the binomial distribution (e.g. [7], Thm. 2.2.3 and Ex. 2.2.23 (b)) we may write for ǫ > 0 Due to (73) and the continuity of the limit we can choose ǫ > 0 small and then N large to make the second probability in the second term multiplied by N/c N and thus the second term multiplied by N/c N arbitrarily small. By choosing N potentially larger and by using the above large deviations result the conditional probabilities in the first and third line are arbitrarily close to 1 respectively arbitrarily close to 0 even when multiplied by N/c N ∼ 1/C (Beta) pair N α by (18). Thus, the claim (52) now follows from (73).
by the tail assumption (14), analogous to the proof of Lemma 14 in [29], and similarly for the lim inf. Finally note that from (18) in Lemma 2.2 we have

3.3.
Proof of Proposition 2.5. In this section we prove Lemma 2.4 and Proposition 2.5, the main convergence result for the diploid population model of Section 2.2, where potential offspring to parental couples are generated as in a supercritical Galton-Watson process and then pruned. In order to prove the main Proposition 2.5 we need two lemmas in addition to Lemma 2.4. The proof of these auxiliary lemmas is postponed to Section 3.3.1.
Our proofs in this section are to some extent parallel to those in [29], especially those of Lemmas 2.4 and 3.13. We note however that the arguments are somewhat more involved, in particular due to the fact that X (N ) i , 1 ≤ i ≤ N are not independent. Our proof of Proposition 2.5, 2. follows a slightly different route than that of its analogue, [29,Thm. 4 (c)] in that we verify condition (5) on the law of the ranked offspring frequencies directly without recourse to the moment criterion (6). One can alternatively prove Proposition 2.5 by verifying (6) but this route appears more cumbersome here because the X i are not independent in our set-up.
Recall that the offspring V (N ) i,j of individual i and j are sampled from the "potential offspring" i,j as before. Our first observation concerns large deviations of the total number of potential offspring S N = 1≤i<j≤N X (N ) i,j of (20) which by (24) has E[S N ] ∼ N µ with µ > 1. We now consider A i,j , i, j ∈ N to be independent copies of X from (22), independent of the A (N ) i,j 's. A convenient parametrisation of the set-up from (21) and (22) is to set the total number of "potential offspring-generating" pairs, is Bin N (N − 1)/2, p N -distributed.
for j > i as defined above. Then for all N and 0 < ε ≤ 1/2.
In the following we will drop the superscript (N ) often for notational simplicity.
Proof of Proposition 2.5. 1. In view of Lemma 2.4 and Condition (7) it suffices to verify that (5) and (7) with Ξ ′ = δ 0 (see e.g. [20,Thm. 4 (b)]). This can be checked along the lines of the proof of [29,Proposition 7]. We simply need that for some 0 < a < 1 To see that this is true we use (22) and (25) to estimate as N → ∞. This shows (81) and thus the claim.
2. Let V (1) ≥ V (2) ≥ · · · ≥ V (N ) be the ranked V i 's. We will verify that (5) holds with 1] δ (x/2,x/2,0,0,... ) Beta(2 − α, α)(dx) (and then Proposition 2.5, 2. follows from our main result, Theorem 1.1, via Condition (5)). This in turn will follow if we show that for every k ∈ N, Here we have used in the last line that with the substitution of z = (1 − y)/y The intuition behind this result is the following: V i,j > N y with y > 0 is possible (only) if X i,j is of order N (and then with overwhelming probability the sum of all other X k,ℓ with {k, ℓ} = {i, j} is ≈ N µ and both V i and V j are ≈ V i,j up to terms which are o(N )); we then have (using fluctuation bounds for hypergeometric distributions), thus Recall A (N ) from (77). By Lemma 3.14, A (N ) ≈ c X,1 N/2, then Combining this with Lemma 2.4 suggests (82) for k = 2, namely For ε > 0, (both events require essentially that there are at least two distinct pairs {i, j} = {k, ℓ} with X i,j , X k,ℓ ≥ (ε/2)N , say; observe that V i,j ≤ X i,j by definition).
A more detailed argument runs as follows: Let for y ∈ (0, 1/2), ε > 0. In the following we again drop the index N and note that by Assumption (26) and Lemma 3.13 using the fact that α > 1.
Since the V k,ℓ 's are hypergeometric conditional on the X k,ℓ 's with we have by fluctuation bounds for hypergeometric laws (e.g. as recalled in [29,Lemma 19]; see [6,12]) and by conditioning on X k,ℓ 's that with c(ε) > 0. Using (85) as well as the fact that (2) ≥ m} for every m and all i < j ≤ N we obtain that lim sup This gives Combining (84), (86), (87) with the calculation in (83) and Lemma 2.4 yields For the matching upper bound we let Note that if we choose ε small such that (1 − ε)µ > 1 we in particular have S N ≥ N on D (N ) . Also, by Lemmas 3.13 and 3.14 where c > 0 and the constant b > 0 can be chosen (much) larger than α − 1. Choose δ ∈ α−1 2 , α − 1 . Then E (N ) := ∃ i < j ≤ N, k < ℓ ≤ N with {i, j} = {k, ℓ} and X i,j ≥ N (1+δ)/α , X k,ℓ ≥ N (1+δ)/α due to (22) and (26) that because δ > (α − 1)/2. We also have for N large enough since on D (N ) ∩ (E (N ) ) c every V i consists of the sum of at most log N many nonzero V i,j (since this is true for X i and X i,j ) of which only one can be larger than N (1+δ)/α (note that (1 + δ)/α < 1 so that the largest V i,j needs to be of the same order as V (1) ).
Finally, for ε > 0 and N sufficiently large, we have P V (3) ≥ εN ≤ P ∃ distinct i, j, k ≤ N : X i , X j , X k ≥ εN ≤ P (D (N ) ) c + P E (N ) = o N 1−α as above. Combining again with Lemma 2.4, this completes the proof of (82) for k ≥ 3.

3.3.1.
Proofs of auxiliary results. In this section we prove Lemma 2.4, 3.13 and 3.14. To start with we need one more lemma.
Fix ε > 0, choose x * so that the bound in (94) holds for some c > µ and potentially enlarge x * further such that for all x ≥ x * we have x ≥ cm > µm for all m ≤ log x. Let x * * > e c p be so large that The right-hand side is smaller than 8ε when N ≥ N 0 where N 0 is chosen so that The lower bound can be proved analogously: it suffices to keep in the computation above the sum over m up to ⌊log x⌋ ∧ N where x ≥ max{x * , x * * , x † } with x † chosen so large that for any b > 0. The probability on the left-hand side of (79) is bounded by N k=1 P(A The probability bound in (80) is a classical large deviation bound for sums of Bernoulli random variables (the claimed constant is not sharp).
(For completeness, here are some details: for any β ∈ R, hence (we parametrise λ ≥ 0 in the formulas below) i,j 's in (76) that S N is indeed a sum of ≈ c X,1 N/2 i.i.d. summands (up to an exponentially small error term) and then we can literally apply [29,Lemma 5].
Under Assumption (25)  With the help of Lemma 3.13 (and Lemma 3.15) we can now prove Lemma 2.4. For this let us recall that the factorial moments for a hypergeometric random variable H ∼ hypergeom(n, m, x) are given by (see e.g. Formula (39.6) in [13]), Proof of Lemma 2.4. Recall that X 1 = X (N ) 1 . We have by (96) and (74) of Lemma 3.13 that In the case of 1. we now observe that for every ε > 0 by (74) and (75) of Lemma 3.13.
For showing 2. we first use (97), write S N = X 1 + (S N − X 1 ) and use that by Lemma 3.13 for ε > 0, We may then formally follow the argument in the proof of [29,Lemma 13] in order to obtain Now note that P(X 1 > x) = P(X (N ) 1 > x) ∼ c X,1 c X,2 x −α as x → ∞ uniformly in N by Lemma 3.15 and so the statement of Lemma 3.9 also holds in the current setting implying that c N ∼ N 8 (N µ) −α c X,1 c X,2 αB(2 − α, α).
Appendix A. The weak convergence criterion for the total offspring numbers Here, for ease of reference, we briefly recall some of the main results from Möhle and Sagitov [22], Sagitov [27], Schweinsberg [28] in the notation of our Theorem 1.1 and its assumptions. Note that our offspring numbers (V 1 , . . . , V N ) by (2) are analogous to the family size in each generation for haploid Cannings models. The only difference comes from the fact that the V i here sum to 2N whereas theirs sum to N .
Proof. We refer to the analog proof for Lemma 3.1 in [22].
Proof. Note that these φ j 's are exactly the particular case of ψ j,s 's when s = 0. By recursive formula (101), the monotonicity is true for φ j 's. This result has also been proposed in the Remark on Page 39 of Möhle [21]. The uniform bounded property follows from monotonicity as where we have used c N = E ((V 1 ) 2 ) / [8 (N − 1)].
Proof. The proof is analogous to Lemma 3.5 and Lemma 3.14 in [22].
Let Ξ r be a symmetric measure on the r-dimensional simplex ∆ r . Condition III: The weak convergence condition 1 2c N N r Ξ r,N → Ξ r as N → ∞ over ∆ r,ǫ , where the connection between Ξ r and Ξ ′ is given by for all S ⊆ ∆ r . It is clear that the probability measure Ξ ′ is uniquely determined by the sequence of measures (Ξ r ) r∈N .
Proof. (6) ⇔ Condition I and (6) ⇔ Condition II can be proved similarly as Page 1557 of [22]. Condition II ⇔ Condition III and (5) ⇔ Condition III can be proved following Pages 849-852 of [27]. Here we omit the details.