On multi-type Cannings models and multi-type exchangeable coalescents

A multi-type neutral Cannings population model with mutation and fixed subpopulation sizes is analyzed. Under appropriate conditions, as all subpopulation sizes tend to infinity, the ancestral process, properly time-scaled, converges to a multi-type exchangeable coalescent with mutation sharing the exchangeability and consistency property. The proof gains from coalescent theory for single-type Cannings models and from decompositions into reproductive and mutational parts. The second part deals with a different but closely related multi-type Cannings model with mutation and fixed total population size but stochastically varying subpopulation sizes. The latter model is analyzed forward and backward in time with an emphasis on its behaviour as the total population size tends to infinity. Forward in time, multitype limiting branching processes arise for large population size. Its backward structure and related open problems are briefly discussed.


Introduction
Multi-type models play an important role in mathematical population genetics and evolutionary game theory.Many of the early multi-type population models belong to the class of multi-type branching processes (Athreya and Ney [2], Harris [23]) but also different multi-type population models have been studied in the early literature, for example based on Poissonian or renewal inputs (Port [38]).In all these models, each individual has a certain type which determines (the distribution of) the number of offspring of that individual of any type.In applications, individuals might be objects like genes in biology, particles in physics or balls in combinatorial urn models.More recent and more advanced models even allow for genetic forces like selection and recombination.The reader is referred to the book of Ewens [15] for an overview on models in mathematical population genetics.We also refer exemplary to the works of Etheridge [11], Etheridge and Griffiths [12], Etheridge, Griffiths and Taylor [13] and Griffiths [21] to point to (multi-type) models 1 Mathematisches Institut, Eberhard Karls Universität Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany, E-mail address: martin.moehle@uni-tuebingen.deincluding biological forces such as selection and recombination.We are however interested in the neutral case where neither selection nor recombination acts in the population.To avoid technical difficulties it is assumed that the space E of possible types is finite or countable infinite.Our intension is to study some neutral multi-type population models with non-overlapping generations and mutation in the spirit of Cannings [8,9,10].In the first model studied in Section 2, all subpopulation sizes are constant, whereas in the second model studied in Section 3, the total population size is constant.
Before we turn to these multi-type models let us mention some fundamental and well-known results concerning single-type Cannings models.Cannings studied a population model with non-overlapping generations and a fixed number N ∈ N := {1, 2, . ..} of individuals in each generation r ∈ Z := {. . ., −1, 0, 1, . ..}.Each individual i ∈ [N ] := {1, . . ., N } alive in generation r ∈ Z produces a random number ν (r) i,N of offspring and these offspring form the next generation.It is assumed that, for each generation r, the offspring sizes ν N,N ), r ∈ Z, are independent and identically distributed (iid) over the generations.Since the total population size is assumed to be constant equal to N , the relation ν We are mainly interested in the ancestral structure of these models.For n ∈ N let P n denote the set of partitions of [n] := {1, . . ., n}.Take a sample of n ∈ [N ] individuals from generation 0. For each r ∈ N 0 := {0, 1, . ..} define a random partition Π r := Π (n,N ) r by the property that i, j ∈ [n] belong to the same block of Π r if and only if the individuals i and j share a common ancestor r generations backward in time.It is well known that (Π r ) r∈N0 is a time-homogeneous Markov chain with state space P n and initial state {{1}, . . ., {n}}.For x ∈ R define (x) 0 := 1 and (x) j := x(x−1) • • • (x−j +1) for all j ∈ N. Transitions from π ∈ P n to π ′ ∈ P n are only possible with positive probability if each block of π ′ is a union of some blocks of π.In this case the transition probability p ππ ′ := P(Π r = π ′ | Π r−1 = π) is given by (see [35,Eq.(3)]) where i and j denote the number of blocks of π and π ′ respectively and i 1 , . . ., i j are the group sizes of merging blocks of π.Note that i 1 + • • • + i j = i.The functions Φ (N ) j : {(k 1 , . . ., k j ) ∈ N j : for all k 1 , . . ., k j ∈ N with k := k 1 + • • • + k j ≤ N , therefore play a key role in the analysis of the ancestral structure of Cannings population models.In particular, for N ∈ N \ {1}, is the probability that two individuals, randomly sampled from some generation, share a common ancestor one generation backward in time.This probability, called the coalescence probability, is of fundamental interest in coalescent theory, since 1/c N is the time-scale in order to obtain convergence of the ancestral process as the total population size N tends to infinity.The quantity N e := 1/c N is called the effective population size of the Cannings model.The transition functions Φ (N ) j , j ∈ N, satisfy the consistency relation (see [33,Eqs.(3) and ( 4)]) and these functions are monotone in the sense that 5) follows from (4) by induction on the difference j − ℓ.
The consistency (4) and the monotonicity (5) play a key role (see, for example, [35]) in the analysis of the ancestral structure of Cannings models as N → ∞, which is the main reason why we mention these two properties already in the introduction.
The paper is organized as follows.In Section 2, a multitype Cannings model with fixed subpopulation sizes is introduced and its ancestral structure is analyzed in detail.The main convergence result (Theorem 1) is provided and verified in Subsection 2.1.Multi-type exchangeable coalescents with mutation and their block counting processes are considered in Subsections 2.2 and 2.3 respectively.
A closely related multi-type Cannings model with fixed total population size but variable subpopulation sizes is studied in Section 3. Subsection 3.1 studies its structure forward in time.A limiting multi-type branching process is provided in Subsection 3.2.Some backward results on this model are discussed in Subsection 3.3.Details on the particular multi-type Kimura model are deferred to Subsection 3.4.The paper finishes with a summary and discussion of some open problems in Section 4.

A multi-type Cannings population model with fixed subpopulation sizes
It is assumed that the type space E is finite or countable infinite and that the number of individuals of type k ∈ E does not change over the generations and is hence equal to some given constant N k ∈ N. Reproduction within each subpopulation of type k ∈ E is assumed to take place according to a neutral Cannings population model (as described in the introduction) with population size N k and offspring sizes ν i,N k ,k , i ∈ {1, . . ., N k }.The additional index k indicates that (the distribution of) the number ν i,N k ,k of offspring of the i-th individual in subpopulation k is allowed to explicitly depend on k.Offspring sizes in different subpopulations or in different generations are assumed to be independent.
In each generation, a mutation step follows the reproduction step.A given number N kℓ of the N k children born in subpopulation k ∈ E mutate to type ℓ ∈ E with ℓ = k.Note that ℓ =k N kℓ ∈ {0, . . ., N k } and that N kk := N k − ℓ =k N kℓ of the children born in subpopulation k do not mutate and, hence, keep their type k.Since the size of each subpopulation is assumed to be constant, the conservation equalities are required.Particular models of this form with subpopulation sizes N k = 2c k N for some c k , N ∈ N, and Wright-Fisher reproduction in each subpopulation have been studied by Notohara [36] and (Wilkinson-)Herbots [24,25,41].
In these works types are interpreted as colonies and mutation as migration between these colonies.Each children born in subpopulation k ∈ E has probability N kℓ /N k to mutate to type ℓ ∈ E with ℓ = k.However, since the subpopulation sizes are constant, the individuals do not mutate independently.For any given sample of n k (∈ [N k ]) children taken from the N k children born in subpopulation k ∈ E, and for given integers n kℓ ∈ N 0 , ℓ = k, with ℓ =k n kℓ ≤ n k , the probability that, for all ℓ ∈ E with ℓ = k, n kℓ of these n k children mutate to type ℓ, is given by the multi-hypergeometric expression where N kk := N k − ℓ =k N kℓ and n kk := n k − ℓ =k n kℓ is the number of children in the sample which do not mutate.Before the ancestral structure of this model is described, the notion of typed (or labelled) partitions is introduced.Let n ∈ N.Each partition of [n] can be written as {B 1 , . . ., B j }, where B 1 , . . ., B j are the (non-empty) blocks of the partition.Note that j i=1 B i = [n].We additionally equip each block B i , i ∈ [j], with a type k i ∈ E and call the set P n,E consisting of all π := {(B 1 , k 1 ), . . ., (B j , k j )} satisfying {B 1 , . . ., B j } ∈ P n and k 1 , . . ., k j ∈ E the space of typed (or labelled) partitions of [n].Each typed partition of [n] can be viewed as a usual partition of [n] with the additional property that each block of the partition is painted with some 'color' taken from the space E of possible 'colors'.We also call Take a sample of n ∈ [N ] individuals from generation 0, label them (in some arbitrary order) from 1 to n, and let k 1 , . . ., k n ∈ E denote the types of these individuals.The ancestry of the individuals in the sample can be traced back as follows.For r ∈ N 0 define a random typed partition Π r = Π (n,(N k ) k∈E ) r of [n] by the property that i, j ∈ [n] belong to the same k-block of Π r if and only if the individuals i and j share a common ancestor r generations backward in time and this ancestor has type k.The process (Π r ) r∈N0 is called a multi-type ancestral process, sometimes also a multi-type backward process or a multi-type discrete coalescent process.It is readily checked that (Π r ) r∈N0 is a time-homogeneous Markov chain with state space P n,E and initial state {({1}, k 1 ), . . ., ({n}, k n )}, where k i ∈ E denotes the type of individual i ∈ [n].Let p ππ ′ := P(Π r = π ′ | Π r−1 = π), π, π ′ ∈ P n,E , denote the transition probabilities.From the two-step definition of the model it follows that the transition matrix P := (p ππ ′ ) π,π ′ ∈Pn,E has the product form where P mut denotes the transition matrix of the ancestral process for the model without reproduction, i.e. for the model with ν i,N k ,k = 1 almost surely for all k ∈ E and all i ∈ {1, . . ., N k }, and P rep denotes the transition matrix of the ancestral process for the model without mutation, i.e. for the model with N kℓ = 0 for all k, ℓ ∈ E with k = ℓ.
For the model without mutation, transitions from π ∈ P n,E to π ′ ∈ P n,E are only possible with positive probability if each k-block of π ′ is a union of some k-blocks of π.In this case, since offspring numbers in different subpopulations are independent, it follows that where i k and j k are the number of k-blocks of π and π ′ respectively, i k,1 , . . ., i k,j k are the group sizes of merging kblocks of π, and For k, ℓ ∈ E, the backward mutation probability m kℓ , which is by definition the proportion of the individuals in subpopulation k after the mutation step, who where born in subpopulation ℓ, is where Note that m kk is the proportion of the individuals in subpopulation k, who did not undergo a mutation during the mutation step.
For the model without reproduction, the entries of the matrix P mut are obtained as follows.Let π, π ′ ∈ P n,E .A mutational transition from π to π ′ backward in time is only possible with positive probability if π ′ has the same blocks as π.For k, ℓ ∈ E let n kℓ denote the number of blocks being a k-block of π and a ℓ-block of π ′ .Then, where n k := ℓ∈E n kℓ denotes the number of k-blocks of π.
For example, if π = {([n], k)} and π ′ = {([n], ℓ)} for some k, ℓ ∈ E, then (12) reduces to the backward mutation rate m kℓ defined in (11).If all the N kℓ are sufficiently large, then there is essentially no difference between sampling without replacement and sampling with replacement, leading to the approximation

A limiting multi-type coalescent
To avoid technical difficulties it is in this section mainly assumed that the number of types is finite, i.e. |E| < ∞.The case of countable infinite type space E is briefly discussed in Remark 3 at the end of this section.
We are interested in the behavior of the ancestral process (Π ) r∈N0 as all subpopulation sizes N k , k ∈ E, become large, that is, as In order to state a convergence result, a couple of assumptions are imposed, which are described in the following.Let us start with the assumptions concerning the Cannings reproduction models acting in each subpopulation.For k ∈ E let denote the coalescence probability of the Cannings model acting in subpopulation k with population size N k > 1.It is assumed that c k (N k ) > 0 for all sufficiently large N k .Note that c k (N k ) = 0 if and only if ν 1,N k ,k = 1 almost surely.For every subpopulation k ∈ E, it is assumed that all the limits φ (k) j, i 1 , . . ., i j ∈ N with i 1 , . . ., i j ≥ 2, exist, where for all j ∈ [N k ] and i 1 , . . ., i j ∈ N with i The existence of the limits ( 16) is a relatively mild condition, since, by the monotonicity property, Φ (2) = c k (N k ) for all i 1 , . . ., i j ≥ 2 which shows that the fraction on the right-hand side of ( 16) is bounded between 0 and 1.Moreover, if the limits ( 16) exist for all j, i 1 , . . ., i j ∈ N with i 1 , . . ., i j ≥ 2, then these limits exist for the wider range of parameters j, i 1 , . . ., i j ∈ N satisfying i 1 + • • • + i j > j, which follows readily from the consistency relation of the functions Φ , by induction on the number of 1's among the i 1 , . . ., i j .In this case also the limits j ∈ N, exist.For each k ∈ E, the consistency relations and the monotonicity property of the functions Φ A more delicate calibration assumption on the coalescence probabilities for some constant d k ≥ 0. The assumptions on the mutation parameters are standard.It is assumed that, for all types k, ℓ ∈ E with k = ℓ, the backward mutation rate m kℓ = m kℓ (N ) depends on N in such a way that for some constant ρ kℓ ≥ 0. Under these assumptions, for a given sample size n ∈ N, two generator matrices Q rep and Q mut can be defined as follows.Let denote the generator matrix with the following entries.Let π, π ′ ∈ P n,E with π = π ′ and such that each k-block of π ′ is a union of some k-blocks of π.Let i k and j k denote the number of k-blocks of π and π ′ respectively, and let i k,1 , . . ., i k,j k denote the group sizes of merging k-blocks of π.
All other non-diagonal entries of Q rep are (by definition) equal to 0.
The second generator matrix is defined as follows.Let π = {(B 1 , k 1 ), . . ., (B j , k j )} ∈ P n,E .If π ′ is identical to π, except for the fact that only one single block of π ′ , say B i , has a type ℓ i different from k i , then All other non-diagonal entries of Q mut are (by definition) equal to 0. Note that q mut ππ = − j i=1 ρ kiℓi .We now state the main convergence result.Recall that the type space E is assumed to be finite and that N := min k∈E N k .

Theorem 1 (Convergence to multi-type coalescents).
Assume that the following three assumptions hold.
(iii) Mutation assumption: For any two types k, ℓ ∈ E with k = ℓ, the backward mutation rate m kℓ = m kℓ (N ) depends on N := min k∈E N k in such a way that (19) holds for some constant ρ kℓ ≥ 0 as N → ∞.
Sample n ∈ N individuals from generation 0, label them randomly from 1 to n, and let k 1 , . . ., k n ∈ E denote the types of these individuals.Then, the following statement holds.If c N → 0, then the time-scaled multi-type ancestral process (Π where Q rep and Q mut are the matrices (20) and ( 22) respectively.
Remark 1. Due to the structure of the entries (21) of the generator Q rep , the continuous-time limiting process Π (n) := (Π (n) t ) t≥0 arising in Theorem 1 allows for simultaneous multiple mergers of ancestral lineages in one arbitrary subpopulation but not at the same time in more than one subpopulation.During each mutational transition, the process Π (n) only allows for a change of the type of one single block.For |E| = 1, Theorem 1 essentially reduces to Theorem 2.1 of [35].A convergence result for the situation when c N → c > 0 is provided in Theorem 2 below.
Proof.Let π, π ′ ∈ P n,E with π = π ′ and such that each k-block of π ′ is a union of some k-blocks of π.Then, by (9), If there exists exactly one by ( 16) and the comments thereafter.If π ′ is such that in at least two different subpopulations a true merger event takes place, then as N → ∞.Thus, Let us now turn to P mut .Let π = {(B 1 , k 1 ), . . ., (B j , k j )} ∈ P n,E .If π ′ is identical to π, except for the fact that there exists exactly one single block of π ′ , say B i , which has a type ℓ i different from k i , then it follows from (12) and the assumption that all the limits (19) exist, that Thus, The transition matrix (8) of the ancestral process therefore has the asymptotic expansion where Thus, the convergence of the one-dimensional distributions is established.
The convergence of the finite-dimensional distributions follows by exploiting the Markov property of the involved processes.
It remains to verify the convergence in Example.(Multi-type Kingman n-coalescent) Assume that Wright-Fisher reproduction acts in each subpopulation.
Then, the coalescence probability in subpopulation k ∈ E is given by c k (N k ) = 1/N k .For simplicity it is assumed that the population sizes are all equal to N k = N ∈ N. Then the calibration assumption (ii) of Theorem 1 obviously holds with c N := 1/N > 0 and All other limits in ( 16) are equal to 0. Thus, Theorem 1 is applicable.The limiting process in Theorem 1 is a multi-type Kingman n-coalescent in the sense that single binary mergers of two ancestral lineages of the same type occur with rate 1.Note that binary mergers at the same time in more than one subpopulation are impossible.
In some cases the sequence (c N ) N ∈N does not converge to 0 (as assumed in Theorem 1) but to a positive constant c > 0. In this situation the following time-discrete variant of Theorem 1 holds.
Theorem 2. Suppose that the assumptions (i), (ii) and (iii) of Theorem 1 hold.Take a sample of n ∈ N individuals from generation 0 and let k 1 , . . ., k n ∈ E denote their types.Then the following statement holds.
If c N → c > 0 as N → ∞, then the multi-type ancestral process (Π with state space P n,E , initial state {({1}, k 1 ), . . ., ({n}, k n )} and transition matrix where the two stochastic matrices A mut = (a mut ππ ′ ) π,π ′ ∈Pn,E and A rep = (a rep ππ ′ ) π,π ′ ∈Pn,E are defined as follows.If π ′ has the same blocks as π, then where n kℓ denotes the number of blocks being a k-block of π and an ℓ-block of π ′ .All other entries of A mut are (by definition) equal to 0. Let π, π ′ ∈ P n,E with π = π ′ and such that each k-block of π ′ is a union of some k-blocks of π.Let i k and j k denote the number of k-blocks of π and π ′ respectively, and let i k,1 , . . ., i k,j k denote the group sizes of merging k-blocks of π.Then, All other non-diagonal entries of A rep are (by definition) equal to 0.
Remark 2. In contrast to the continuous-time limiting process in Theorem 1, the discrete-time limiting process r ) r∈N0 arising in Theorem 2 allows for simultaneous multiple mergers of ancestral lineages at the same time in more than one subpopulation.During each transition, the process Π (n) also allows for a change of the type of more than one single block.
Proof.If π ′ has the same blocks as π, then, by (12), as N → ∞.This shows that P mut → A mut as N → ∞.
Similarly, if π = π ′ are such that each k-block of π ′ is a union of some k-blocks of π, then, by (9), Due to the Markov property of the involved processes, the convergence of the finite-dimensional distributions follows immediately.For processes with discrete time set N 0 , the convergence of the finite-dimensional distributions is equivalent (see, for example, Billingsley [3, p. 19]) to the convergence in D Pn,E (N 0 ).
Remark 3 (Countable type space).Assume that the type space E is countable infinite.Then we conjecture that Theorem 1 remains valid under the additional assumption that ρ k := ℓ∈E,ℓ =k ρ kℓ < ∞ for all k ∈ E. Since the state space P n,E is not finite anymore, one can however not simply follow the proof for the finite type space case.Instead one may verify the relative compactness of the processes (Π ) t≥0 , (N k ) k∈E , via similar techniques as in the proof of Theorem 2.1 of Herbots [24].The convergence in D Pn,E ([0, ∞)) then follows from Ethier and Kurtz [14, p. 131, Theorem 7.8].

Multi-type coalescent with mutation
It is not hard to check that the limiting process Π (n) = (Π (n) t ) t≥0 arising in Theorem 1 is exchangeable in the sense that the distribution of Π (n) is invariant under relabelling of the n individuals.We call Π (n) a continuous-time multi-type exchangeable n-coalescent with mutation.
From the consistency relation it follows that the family of processes ) t≥0 , where ̺ nm : P n,E → P m,E denotes the natural projection from P n,E to P m,E defined via Exploiting Kolmogorov's extension theorem it can be shown that there exists a process Π = (Π t ) t≥0 (the projective limit of the sequence (Π (n) ) n∈N ) with state space P ∞,E , the space of labelled partitions of N, such that for every n ∈ N, (̺ n •Π t ) t≥0 is a multi-type exchangeable n-coalescent with mutation and the same infinitesimal rates, where ̺ n : P ∞,E → P n,E denotes the natural projection from P ∞,E to P n,E defined via = ∅} for all π ∈ P ∞,E .We call the process Π a multi-type exchangeable coalescent with mutation.
From the work of Schweinsberg [39] it follows that for every k ∈ E there exists a unique finite measure Ξ k on the infinite simplex ∆ := {x = (x i ) i∈N : that the infinitesimal rate limits in (16) have the integral representation , where a k := Ξ k ({0}) denotes the mass of Ξ k at 0 ∈ ∆ and (x, x) := i∈N x 2 i for x = (x i ) i∈N ∈ ∆.The distribution of Π is hence fully described by the sequence of measures Ξ = (Ξ k ) k∈E , the calibration constants d k ≥ 0, k ∈ E, and the mutation parameters ρ kℓ , k = ℓ.We therefore call the process Π also a (multi-type) Ξ-coalescent (with mutation).The process Π is the natural generalization of (single-type) exchangeable (and consistent) coalescents to the multi-type case.If all the measures Ξ k , k ∈ E, are concentrated on the subset [0, 1] × {0} × {0} × • • • of ∆, then we speak of a multi-type Λ-coalescent with mutation, where Λ := (Λ k ) k∈E and the finite measure For a recent paper dealing with multi-type Λ-coalescents of this form (and as well non-consistent multi-type Λcoalescents) we refer the reader to Johnston, Kyprianou and Rogers [27].If d k Ξ k = 0 is the zero measure, then there is no reproductive activity in subpopulation k, which may be interpreted as a sleeping seed bank.The ancestral structure of seed bank models has gained some interest in the literature (see, for example, Blath et al. [5,6] or González Casanova et al. [17]).At this point we also would like to refer the reader to the work of Griffiths [22], where the notion of a 'multi-type Λ-coalescent' seems to appear for the first time.In different mathematical context, the notion of a 'multi-type coalescent point process' also appears in the work of Popovic and Rivas [37], which is mentioned for completeness here.We would also like to draw the attention of the reader to the recent preprint of Allen and McAvoy [1] for a coalescent with a general spatial and genetic structure and to the work of Liu and Zhou [30] for a forward stepping stone model with Ξ-resampling mechanism.
All what have been said in this subsection applies also to the discrete-time limiting processes (Π (n) r ) r∈N0 , n ∈ N, arising in Theorem 2. Thus there exists a process (Π r ) r∈N0 with state-space P ∞,E such that for every n ∈ N the projected process (̺ n • Π r ) r∈N0 is a discrete-time multi-type exchangeable n-coalescent with mutation and the same infinitesimal rates.Again, this process can be fully described by a sequence Ξ = (Ξ k ) k∈E of finite measures Ξ k on the infinite simplex ∆, the calibration constants d k ≥ 0, k ∈ E, and the mutation parameters ρ kℓ ≥ 0, k = ℓ.adapted to the continuous-time setting, the process (N t ) t≥0 , called the block counting process of Π, is Markovian with state space N E and generator G = G rep + G mut , where G rep = (g rep ij ) i,j∈N E and G mut = (g mut ij ) i,j∈N E have the following entries.Let i = (i k ) k∈E , j = (j k ) k∈E ∈ N E with i ≥ j.If there exists exactly one k ∈ E with i k > j k , then

Multi-type block counting process
where the sum extends over all i k,1 , . . ., i k, All other non-diagonal entries of G rep are equal to 0. The diagonal entries are given by i k (1, . . ., 1) (≤ 0) is defined via ( 17) for all k ∈ E and i k ∈ N. The matrix G mut has entries g mut ij = i k ρ kℓ , if j = i − e k + e ℓ for some k, ℓ ∈ E with k = ℓ, where e k denotes the k-th unit vector in R E .For example, if each Ξ k is the Dirac measure at 0 ∈ ∆, then the generator G has entries In this case (N t ) t≥0 coincides with the structured coalescent studied by Notohara [36] and Wilkinson-Herbots [41, Eq. ( 3)], where ρ kℓ ≥ 0 is the mutation rate from type k to type ℓ (backward in time), ρ k := ℓ =k ρ kℓ and d k ≥ 0 is the coalescence rate of any pair of lineages in subpopulation k ∈ E. In the notation of Johnston, Kyprianou and Rogers [27, Example 2.2], this structured coalescent is the block counting process of a multi-type Kingman coalescent with binary merging rate ρ kk→k := d k and type changing rate ρ k→ℓ := ρ kℓ .

A multi-type Cannings population model with variable subpopulation sizes
We now study a different but closely related multi-type population model with constant total population size.We consider a population with a fixed number N ∈ N of individuals in each generation r ∈ Z.As for the model described in the introduction, each individual i ∈ [N ] alive in generation r ∈ Z produces a random number ν (r) i,N of offspring and each offspring a-priori inherits the type of its parent.
In each generation, independently of the offspring sizes, a mutation step follows the reproduction step.Each offspring of type k ∈ E mutates to type ℓ ∈ E with a given probability u kℓ ≥ 0. These offspring form the next generation, so our model has non-overlapping generations.
As for the standard (single-type) Cannings model, it is assumed that the offspring sizes are exchangeable within each generation and independent and identically distributed (iid) over different generations.Since the population size is assumed to be constant equal to N , the relation i∈ As before, the type space E is assumed to be finite or countable infinite.Without loss of generality E = {1, . . ., K} for some K ∈ N or E = N.Clearly, U := (u kℓ ) k,ℓ∈E is a stochastic matrix, called the mutation matrix.The model is neutral with no selection, since the number of offspring produced by each individual does not dependent of the type of this individual.We call this model the neutral multi-type Cannings model with mutation.For |E| = 1 the model reduces to the classical neutral exchangeable population model of Cannings [8,9,10] as described in the introduction.We use the notation ν i,N := ν

Forward structure
Let X k (r) denote the number of individuals of type k ∈ E in generation r ∈ N 0 , and set X(r) := (X k (r)) k∈E .It is readily seen that X := (X(r)) r∈N0 is a time-homogeneous Markov chain with state space ∆ N (E) : Let Π := (π ij ) i,j∈∆N (E) denote the transition matrix of X having entries π ij := P(X(r + 1) = j | X(r) = i).Clearly Π = Π(U ) depends on the mutation matrix U = (u kℓ ) k,ℓ∈E .
For the multi-allelic Cannings model without mutation (when U = I is the identity matrix) it is known (see, for example, Gladstien [16] or [34, Eq. ( 1)]) that Π rep := Π(I) has entries where with s 0 := 0 and In particular, π ii (I) = 1 for all states i of the form i = N e k with k ∈ E, where e k denotes the k-th unit vector in R E .Thus all the states N e k , k ∈ E, are absorbing.
In order to describe the structure of the transition matrix Π = Π(U ) for general mutation matrix U it turns out to be convenient to introduce a matrix Π mut as follows.Fix i = (i k ) k∈E ∈ ∆ N (E) and let M k , k ∈ E, be independent random variables, where M k = (M kℓ ) ℓ∈E has a multinomial distribution with parameters i k and (u kℓ ) ℓ∈E , i.e.P(M k = j) = i k !ℓ∈E u j ℓ kℓ /j ℓ ! for all j = (j ℓ ) ℓ∈E ∈ N E 0 with ℓ∈E j ℓ = i k .Let Π mut = (π mut ij ) i,j∈∆N (E) denote the matrix with entries where we use the dot subscript summation notation for all i, j ∈ ∆ N (E), where the sum extends over all matrices M = (m kℓ ) k,ℓ∈E ∈ N E×E 0 having row sums m k• := ℓ∈E m kℓ = i k , k ∈ E, and column sums m •ℓ := k∈E m kℓ = j ℓ , ℓ ∈ E. Clearly, the matrix Π mut depends on the mutation matrix U = (u kℓ ) k,ℓ∈E .For example, for E = {1, 2} (two types) it follows that 0 and j = (j 1 , j 2 ) ∈ N 2 0 with i 1 +i 2 = N and j 1 + j 2 = N , where M 11 and M 21 are independent, M 11 has a binomial distribution with parameters i 1 and u 11 and M 21 has a binomial distribution with parameters i 2 = N −i 1 and u 21 .
The following result clarifies the structure of the transition matrix Π = Π(U ) for general mutation matrix U .Lemma 1.The chain X (of the model with general mutation matrix U ) has transition matrix where Π rep := Π(I) is the transition matrix of the chain X for the multi-type Cannings model without mutation having entries ( 24) and Π mut is the matrix with entries (26).
Remark 4. The transition matrix Π thus factorizes into the reproductive part Π rep (involving the offspring sizes ν 1,N , . . ., ν N,N ) and the mutational part Π mut (involving the mutation matrix U ).
Proof of Lemma 1.
By the independence of ν where the sum extends over all matrices M = (m kℓ ) k,ℓ∈E ∈ N E×E 0 having column sums k∈E m kℓ = j ℓ , ℓ ∈ E, and row sums ℓ∈E m kℓ = D k (i), k ∈ E. Alternatively, where the sum extends over all matrices M = (m kℓ ) k,ℓ∈E ∈ N E×E 0 having column sums k∈E m kℓ = j ℓ , ℓ ∈ E, and It remains to note that the right-hand side of Eq. ( 29) is equal to The transition probability π ij simplifies considerably in particular situations, as the following examples demonstrate.
Example.(parent independent mutation) If the mutation probabilities u kℓ = u ℓ do not depend on the type k ∈ E of the parent, then (29) reduces to the multinomial expression i = (i k ) k∈E , j = (j k ) k∈E ∈ ∆ N (E).Eq. ( 30) is obvious from the model and can be also derived from (27) as follows.By the convolution property for multinomial distributions Mn(i k , u), k ∈ E, with the same probability vector u := (u ℓ ) ℓ∈E , M • = k∈E M k has a multinomial distribution with parameters k∈E i k = N and u.Thus, by (25), does not depend on i and (30) follows from (27).Note that the transition probability π ij in (30) neither depends on i nor on the offspring sizes ν 1,N , . . ., ν N,N .If K := |E| ∈ N and Example.(absence of mutation) If U = I (identity matrix), which corresponds to the multi-allelic Cannings model without mutation, then the matrix Π mut with entries (25) is the identity matrix and the transition matrix Π = Π rep Π mut = Π rep has entries (24).
Example.(multi-allelic Wright-Fisher model with mutation) If the family size vector ν N := (ν 1,N , . . ., ν N,N ) has a symmetric multinomial distribution Mn(N, ( (29), the factorials d k !cancel, and from l∈E m kl = d k it follows that where the sum extends over all matrices M = (m kℓ ) k,ℓ∈E with π ℓ := N −1 k∈E u kℓ i k for ℓ ∈ E. In particular, conditional on X(r) = i, for every ℓ ∈ E the random variable X ℓ (r+1) has a binomial distribution with parameters N and π ℓ , converging as N → ∞ in distribution to a Poisson distribution with parameter N π ℓ = k∈E u kℓ i k .For |E| < ∞ the chain X and its diffusion limit as N → ∞ of this multi-allele Wright-Fisher model with mutation has been studied extensively in the (classical) literature on mathematical population genetics.We refer the reader exemplary to Etheridge [11], Ewens [15] and Griffiths [18,19,20].further example, the multitype Kimura model, is deferred to Subsection 3.4.
The forward structure is often alternatively described by the process X := ( X(r)) r∈N0 having state space E N , where X(r) := ( X i (r)) i∈[N ] and X i (r) denotes the type of the i-th individual alive in generation r ∈ N 0 .Processes of this form are extensively used, for example in Birkner et al. [4] even for more general type spaces E. Let f : Strictly speaking, the process X contains (pathwise) slightly more information than X, since the type of each individual is known.However, due to the exchangeability of the model arising from the random assignment condition, from the distributional point of view the processes X and X contain the same information.More precisely, the transition probabilities π xy := P( X(r + 1) = y | X(r) = x), x, y ∈ E N , of the process X are related to those of X via where i := (i k ) k∈E := f (x) and j := (j k ) k∈E := f (y).Alternatively, where S N denotes the set of all permutations of [N ], C 0 := 0 and For N = 1 we recover the mutation probabilities π xy = u xy , x, y ∈ E.
In absence of mutation (u kk = 1 for all k ∈ E), where i := (i k ) k∈E := f (x) and j := (j k ) k∈E := f (y) and For the multi-type Wright-Fisher model with mutation, each child j ∈ [N ] chooses randomly and independently its parent.Thus, Fix x ∈ E N and j ∈ ∆ N (E).Define i := f (x).Then, This expression depends only via i = f (x) on x.Thus, by Rosenblatt's criterion for functions of Markov processes, with X also the process X = (X(r)) r∈N0 = (f ( X(r))) r∈N0 is Markovian and ( 35) is the transition probability π ij of the chain X to move from i to j, in agreement with (31).

A limiting multi-type branching process
Assume that the number of type is finite but not equal to 1; without loss of generality, So far, the model was described forward in time by the process X = (X(r)) r∈N0 having state space is determined by the other components X 1 (r), . . ., X L (r), we can disregard the last component X K (r) and view the forward process X = (X(r)) r∈N0 as a process with state space S N,L by writing X(r) equivalently in the form The following definition is useful to understand the behavior of X r as the total population N tends to infinity.
We shall see soon that the asymptotic independence of the offspring sizes alone will not be sufficient to ensure convergence of the forward process to a limiting process as N → ∞.An additional assumption on the mutation probabilities is needed, which turns out to be of the form that is, mutations from the 'last' type K back to any other type k < K are impossible.Clearly, (38) wipes out several important models.The remark at the end of this section shows that the following results fail if (38) does not hold.
The convergence results presented later (Theorem 3) are based on the following lemma.
Lemma 2. Suppose that the offspring sizes ν 1,N , . . ., ν N,N are asymptotically independent with limiting variable ξ.Let K ∈ N \ {1} and set L := K − 1 ∈ N. If u KK = 1, that is, mutations from type K to any type ℓ ∈ [L] are not possible, then where |j| := ℓ∈[L] j ℓ .This convolution property turns out to be crucial for the proof of Lemma 2 and ensures the branching property of the limiting process arising in Theorem 3 below.
Remark 6.For K = 2, (39) reduces to P(X Proof of Lemma 2. Fix (i ℓ ) ℓ∈[L] , (j ℓ ) ℓ∈[L] ∈ N L 0 and let N be sufficiently large such that i K := N − ℓ∈[L] i ℓ ∈ N 0 and j K := N − ℓ∈[L] j ℓ ∈ N 0 .Define i := (i k ) k∈E and j := (j k ) k∈E .We have to verify that π ij converges to the righthand side of (39) as N → ∞.Since u KK = 1 and, hence, u Kℓ = 0 for all ℓ ∈ [L], it follows that only matrices M = (m kℓ ) k,ℓ∈E with last row (m Kℓ ) ℓ∈[K] equal to (0, . . ., 0, D k ) contribute to the sum in (28).Thus, (28) reduces to where m k := ℓ∈[L] m kℓ and the sum extends over all 'reduced The convergence of the finite-dimensional distributions is established.For processes with discrete time set N 0 , the convergence of the finite-dimensional distributions is equivalent (see, for example, Billingsley [3, p. 19]) to the convergence in D N L 0 (N 0 ), i.e. to the weak convergence Q N → Q as N → ∞, where Q N and Q denote the image measures of X and Z respectively.Note that Q N and Q are probability measures on (R L ) N0 equipped with the Borel σ-field generated by the product topology (of pointwise convergence).
Example.For the Wright-Fisher model, the offspring sizes ν 1,N , . . ., ν N,N are asymptotically independent, where the limiting variables ξ 1 , ξ 2 , . . .are iid copies of a random variable ξ having a Poisson distribution with parameter 1.From (40) and u kℓ it follows that the random variables Y kℓ are independent with Y kℓ Poisson distributed with parameter u kℓ , k, ℓ ∈ [L].Theorem 3 is applicable provided that u KK = 1.In this case, for the limiting L-type branching process the probability that a parent of type k ∈ [L] produces for each ℓ ∈ [L] exactly j ℓ children of type ℓ is given by ℓ∈ Remark 9. Without the condition u KK = 1, Lemma 2 and, hence, Theorem 3 in general fail.Consider for example the Wright-Fisher model with K = 2 types.Fix i ∈ N 0 .For all N ≥ i, conditional on X 1 (0) = i, the random variable X 1 (1) has a binomial distribution with parameter N and π 1 := (u 11 i + u 21 (N − i))/N .This binomial distribution does not weakly converge as N → ∞ as long as u 21 > 0, since N π 1 = u 11 i + u 21 (N − i) → ∞ in this case.Thus, (39) does not hold if u 22 < 1, so Lemma 2 and Theorem 3 are not applicable.

Some backward results
For Suppose that one has taken in some generation r ∈ Z a sample of |i| := k∈E i k individuals, where i k of these individuals are of type k, k ∈ E. For j = (j k ) k∈E ∈ S N (E) let A ij denote the event that, for each k ∈ E, the i k individuals of type k have exactly j k parents.Define p ij := P(A ij ) and P := (p ij ) i,j∈SN (E) .Clearly, all quantities A ij = A ij (U ), p ij = p ij (U ) and P = P (U ) depend on the mutation matrix U .For the model without mutation, that is, for U = I (identity matrix), it is known (see, for example, [34, Proposition 1]) that P rep := P (I) has entries where |i| := k∈E i k , |j| := k∈E j k and the sum extends over all m = (m 1 , . . For k ∈ E let e k denote the k-th unit vector in R E .From (43) it follows that, for all N ∈ N\{1}, p 2e k ,e k (I) = Φ 1 (2) = c N does not depend on k ∈ E and coincides with the coalescence probability (3).
In order to obtain formulas for p ij = p ij (U ) for general mutation matrix U we proceed as follows.Let P mut := (p mut ij ) i,j∈SN (E) denote the matrix with entries where the sum extends over all matrices M = (m kℓ ) k,ℓ∈E ∈ N E×E 0 having row sums ℓ∈E m kℓ = i k , k ∈ E, and column sums k∈E m kℓ = j ℓ , ℓ ∈ E. The matrix P mut has a similar structure as the matrix Π mut with entries ( 26), but note that u kℓ is replaced by its transpose u ℓk and that i and j belong to S N (E) instead of ∆ N (E).For general mutation matrix U the backward probabilities p ij = p ij (U ) can be calculated using the following result.Proof.Fix i = (i k ) k∈E , j = (j k ) k∈E ∈ S N (E).Recall that the model is defined forward in time by a reproductive step (involving the offspring sizes ν 1 , . . ., ν N ) followed by a mutational step (involving the mutation matrix U ). Backward in time we therefore first have to take into account the mutational step and then the reproductive step.Having this factorization in mind it follows that p ij (U ) = Remark 10.The matrix P is in general not stochastic, as already mentioned in the remarks on page 715 of [34].It is hence not straightforward to define a proper ancestral process in the same way as for the model studied in Section 2.

Discussion and open problems
In Section 2 we have analyzed the ancestry of a multi-type Cannings model with fixed subpopulation sizes and mutation leading to limiting coalescent processes with mutation (see Theorem 1) enjoying the exchangeability and consistency property.
We have also studied (see Section 3) a different but closely related Cannings model with constant total population size but variable subpopulation sizes with an emphasis on its forward structure.Under certain conditions its forward structure can be approximated by a limiting multi-type branching process (Theorem 3).However, questions concerning its ancestral structure (see Remark 10) and duality results linking its forward and backward structure, including algebraic approaches to duality, remain open.
Particular classes of multi-type Cannings models have not been discussed in this paper.Schweinsberg [40] studies the ancestry of a class of single-type Cannings model obtained via sampling without replacement from a supercritical branching process.Huillet et al. [26] study analog single-type models based on a sampling with replacement strategy.We leave the study of multi-type versions of the models of [40] and [26] for future work.

For t ≥ 0
and k ∈ E let N (k) t denote the number of k-blocks of Π t .Define N t := (N (k) t ) k∈E .Due to a classical criterion of Burke and Rosenblatt [7, Theorem 1], r +1) = j | X(r) = i, ν (r) N = m)P(ν (r) N = m), where the sum extends over all m = (m 1 , . . ., m N ) ∈ N N 0 satisfying P(ν (r) N = m) > 0. Let I k := {s ∈ [N ] : individual s in generation r has type k}, k ∈ E.Then, P(X(r + 1) = j | X(r) = i, ν where m kℓ corresponds to the number of children of parents of type k which mutate to type ℓ.Note that k∈E m kℓ = j ℓ , ℓ ∈ E, and ℓ∈E m kℓ = s∈I k m s , k ∈ E. The offspring variables ν 1,N , . . ., ν N,N are exchangeable and the sets I k , k ∈ E, are pairwise disjoint with k∈E I k = [N ] and |I k | = i k , k ∈ E. Therefore,

Lemma 4 .
The backward matrix P = (p ij ) i,j∈SN (E) (for the model with general mutation matrix U ) is given by P = P mut P rep , where P mut = (p mut ij ) i,j∈SN (E) is the matrix with entries (44) and P rep = P (I) is the backward matrix for the multitype Cannings without mutation having entries (43).
ℓk m kℓ !p dj (I),(45)where the sum extends over all matrices M = (m kℓ ) k,ℓ∈E having row sums ℓ∈E m kℓ = i k for all k ∈ E and d := (d ℓ ) ℓ∈E is defined viad ℓ := k∈E m kℓ for all ℓ ∈ E. Note that |d| := ℓ∈E d ℓ = k,ℓ∈E m kℓ = k∈E i k =: |i|.In particular, d ∈ S N (E).Now split the sum in (45) into p ij (U ) = d∈SN (E) M k∈E i k !ℓ∈E u m kℓ ℓk m kℓ !p dj (I),(46)where M extends over all matrices M = (m kℓ ) k,ℓ∈E satisfying ℓ∈E m kℓ = i k for all k ∈ E and k∈E m kℓ = d ℓ for all ℓ ∈ E. Note that the sum M in (46) is empty if |d| = |i|.The right-hand side in (46) is equal to d∈SN (E) p mut id p rep dj = (P mut P rep ) ij .Thus, P = P mut P rep .