2023 Bernoulli factories and duality in Wright–Fisher and Allen–Cahn models of population genetics

Mathematical models of genetic evolution often come in pairs, connected by a so-called duality relation. The most seminal example are the Wright–Fisher diﬀusion and the Kingman coalescent, where the former describes the stochastic evolution of neutral allele frequencies in a large population forwards in time


Introduction
The Bernoulli factory problem is to construct a realisation of a Bernoulli(f (p)) random variable (or an f (p)-coin) using an almost surely finite number of independent p-coins, where f : [0, 1] → [0, 1] is a known function but p ∈ [0, 1] is unknown.The special case f (p) = 1/2 was formulated and solved by John von Neumann in 1951 [von51].Later, Keane and O'Brien provided a necessary and sufficient condition for a given function f to have a Bernoulli factory [KO94].In brief, f has a Bernoulli factory if and only if it is identically equal to zero or one, or it is continuous and polynomially bounded, i.e. min{f (p), 1 − f (p)} ≥ min{p, 1 − p} n (1) for all p ∈ [0, 1] and some n ∈ N.However, the proof of Keane and O'Brien is only partly constructive: it relies on a recursively defined sequence whose explicit solution is intractable.Constructions of algorithms have relied on approximation of f by Bernstein polynomials, which are naturally associated with p-coins [HNP11, LKPR11, NP05], or on other series expansions of f with non-negative coefficients [Men19].
Many seminal models of population genetics rely on the random propagation of alleles from one generation to the next.A prototypical example is the Wright-Fisher model, in which a population of fixed size N ∈ N evolves in discrete generations.Individuals carry one of two alleles, a or A, and each individual inherits the allele of a parent which it samples independently and uniformly from the previous generation.The mechanism of sampling alleles by sampling parents ensures that the model carries information of the forward-in-time evolution of allele frequencies, as well as the backward-in-time genealogies of samples of individuals.Frequently, these two modelling perspectives satisfy a duality relation which renders both models more tractable than they would be in isolation.We direct interested readers to e.g.[Dur08] for an introduction to Wright-Fisher and genealogical models in population genetics.
In the absence of mutation, inheriting an allele from a uniformly sampled parent models neutral evolution, where the conditional mean allele frequency in a generation equals that in the previous generation.Nonneutral models in which the mean is not constant can be obtained by sampling offspring alleles as f (p)-coins when the allele frequency in the previous generation is p, and f models so-called frequencydependent selection.In order to retain the aforementioned backward-in-time genealogical picture and its associated duality, it is desirable to generate the f (p)-coins in a given generation by sampling parental alleles from the previous generation.Since the allele frequency in the parental generation is p, parental alleles can be thought of as p-coins, motivating a connection to Bernoulli factories.Our contribution is to use a Bernoulli factory to extend two standard models of population genetics to more general settings than has been done previously.They are (i) the non-neutral Wright-Fisher diffusion and its ancestral selection graph dual [CHS22, EGT10, GS18, KN97], for which the function f models frequency-dependent selection as described above, and (ii) the Allen-Cahn PDE which models stationary allele frequencies in a spatial continuum, in which f appears as external forcing [EGL22].
The remainder of the manuscript is organised as follows.In Section 2, we review the Bernoulli factory of [KO94] which turns out to be convenient for our purposes.In Sections 3 and 4 we introduce the Wright-Fisher diffusion with frequency-dependent selection and the Allen-Cahn equation.In each case, we also demonstrate how Bernoulli factories facilitate the construction of very large classes of these models.Section 5 concludes with a discussion on connections to earlier results, as well as some potential extensions.

The Keane-O'Brien factory
Let f : [0, 1] → [0, 1] be continuous and polynomially bounded as in (1).Let Xn (p) be the sample mean of n independent p-coins.Define sequences of functions f k and integers η(f, k) as follows: set f 1 (p) := f (p), and where each η(f, k) is finite, independent of p, and large enough that By [KO94, page 218] such an η(f, k) exists, and moreover, each f k is polynomially bounded.
Let L ∼ Geo(1/4) be independent of all p-coins.Then is a Bernoulli factory for f [KO94, pages 217-219] based on the series expansion which converges uniformly in p. Pseudocode for this Bernoulli factory is shown in Algorithm 1.We will Algorithm 1 Bernoulli factory for continuous f : [0, 1] → [0, 1] satisfying (1).Require: Return 1. 5: else 6: Return 0. 7: end if refer to Bernoulli factories based on (2) as Keane-O'Brien factories.They have two features which which will turn out to be essential.
Remark 2. The random variable L, and hence the number of p-coins η(f, L) needed to determine the allele of a child, is independent of the realisations of those coins.This will facilitate the construction of ancestral graphs containing all possible ancestors before the alleles carried by any of those ancestors are known.
Any Bernoulli factory for which Remark 2 holds can be thought of as a mixture of finite factories, that is, a mixture of Bernoulli factories with a deterministic number of coins each.

The frequency-dependent Wright-Fisher diffusion
Consider a population of N individuals {Z (N ) k } k≥0 evolving in discrete time, where the state of the system at time k is Z be the frequency of the A allele in generation k.Each individual in generation k + 1 samples its allele conditionally independently given , where the probability of an A allele is ) with for a fixed constant σ > 0, a continuous function f : [0, 1] → [0, 1] satisfying (1), and where we assume the population size is large enough that N ≥ σ.
Concretely, the realisation of each allele is implemented as follows.With probability 1 − σ/N , a given individual samples one parent and inherits its allele.With the complementary probability, the individual samples an independent copy of L, followed by η(f, L) parents chosen uniformly at random with replacement.Conditional on L and ).A realisation of the process of children sampling parents is depicted in Figure 1.k } k≥0 with N = 5 and 4 generations.The second generation from the bottom includes a merger of three lineages, as well as an individual which samples three parents, two of which ended up overlapping.The generation at the top includes two binary mergers.Alleles for all individuals in all generations could be generated given alleles for the generation at the top, as well as a rule for deciding the allele of the three-parent child.
Sampling parents with replacement is essential for the connection with Bernoulli factories, as each parent can then be thought of as an i.i.d.coin on the two-point space of alleles {a, A}.However, sampling with replacement also permits a child to choose the same individual as a parent many times, which is unnatural from a biological point of view.Formally, the η(f, L) parents are merely a mechanism for generating alleles for children in a parent-driven way which turns out to facilitate duality with a genealogical process.A biologically interpretable genealogy with one parent per child can be recovered by selecting, for each multi-parent child, one of the child-parent connections to denote the "real" parent, with all other child-parent connections designated "virtual" and hence non-biological.The way in which the real parent is chosen can be arbitrary as long as it has no impact on the distribution of the child allele, and is independent of the child-parent connections of other individuals.However, retention of virtual parents is essential for constructing both the genealogical and allele frequency processes simultaneously, and hence obtaining the duality result in Theorem 3.
Our aim is to verify convergence of the allele frequency process { Ỹ (N ) k } k≥0 in the infinite population limit.To that end, we define the continuous-time process (Y Theorem 1.Let f be polynomially bounded and Lipschitz continuous on [0, 1].Then, in the Skorokhod topology, (Y which is well-defined for any bounded, measurable function h : [0, 1] → R. To show the claimed convergence, let h be a C 2 ([0, 1]) function with bounded third derivatives, which is a convergence-determining class.Expanding h around y on the right-hand side yields up to terms which are of lower order since h ′′′ is bounded.Evaluating the binomial expectations yields as N → ∞.The proof of weak convergence is completed by using the pseudo-Poisson approximation of [Kal02, Theorem 19.28] with time step 1/N , because the limiting diffusion is Feller.as the number of lineages which are ancestral to the sample k generations in the past.The number of lineages decreases by one when two lineages find a common ancestor, decreases by more than one if multiple lineages find common ancestors (which will happen with negligible probability when N → ∞), or increase by η(f, L) − 1 whenever a lineage samples η(f, L) ancestors.In the pre-limiting particle system, any number of these events can co-occur in one generation, particularly when n ≥ 3.But transitions other than isolated binary mergers and single multifurcations turn out to vanish in a suitably rescaled infinite population limit.To that end, we define the continuous time Markov jump process whose existence we prove next.
Theorem 2. The limit in (4) exists, and (A t ) t≥0 has generator Proof.The pre-limiting, discrete-time process {A k } k≥0 undergoes a myriad of transitions involving subsets of individuals finding common ancestors, as well as individuals branching into many potential ancestors, potentially in the same generation.However, only transitions with a per-generation probability Θ(1/N ) will contribute to the time-rescaled limit with a finite rate.Events with probability o(1/N ) will not occur in the limit at all, while events with probability ω(1/N ) will appear at a dense set of times.However, it will turn out that the latter only result in identity transitions in A t , and hence do not affect the limit.
The probability of two lineages originating from a common ancestor one generation earlier, with neither being involved in a branching event, is implying that two lineages will merge to a common ancestor at rate 1 in the limit.A triple merger, or more than one simultaneous merger, has probability at most and hence will not appear in the limit.
A single individual branches into η(f, k) ancestral lineages with probability σ N while any event involving more than two lineages branching in one generation has probability at most Hence, only isolated branching events appear in the limit.It is also clear from ( 5) and (6) that the probability of at least one merger and branching event in one generation is o(1/N ).
All other transitions involve no mergers or branching events, and hence do not affect the limiting ancestral process.Noting that there are n 2 pairs of individuals to merge with probability (5) and n individuals to branch with probabilities (6) yields the claimed generator G.
Duality between the Wright-Fisher diffusion (3) and the ancestral process (4) is a relation where y ∈ [0, 1] and n ∈ N are respective initial conditions, and h is a duality function, the specification of which will require some exposition.
We follow [CHS22] and define the random function P t (y) as the conditional probability that all n leaves at time zero in the ancestral process carry allele A, given that the A allele frequency at time t in the past is y ∈ [0, 1], and given the realisation of the ancestral process A 0:t := (A s ) s∈[0,t] started from the n lineages.For example, in the absence of branching events we have P t (y) = y At , while when n = 1, a single branching event into η(f, k) ancestors and no mergers in the history A 0:t yields where f k is as specified in Section 2. Other patterns of merger and branching events will result in a more complicated Bernstein polynomial of degree A t , where V t (j) is the random Bernstein coefficient which equals the probability that the n leaves all carry allele A, given that j of A t roots do.As in [CHS22, Definition 2.12 and Proposition 2.13], the vector (V t ) t≥0 := (V t (0), . . ., V t (n)) t≥0 is also a Markov jump process with transitions at rate nσP(L = k), and at rate n 2 .Also, let be the vector of order n Bernstein polynomials, and for vectors u and v of common length n + 1, let be the usual inner product.
Proof.The proof is a small adaptation of that of [CHS22, Theorem 2.14] to our more general setting.Duality between the coalescing mechanism of the ancestral process (8) and the diffusion coefficient of the Wright-Fisher diffusion (3) is standard, and we omit it to focus on establishing the same relation for the branching mechanism (7) and the drift term in (3).
Following [CHS22, Section 4.3], let Y y k ∼ Bin(k, y) and K n k,i ∼ Hyp(n+ k − 1, k, i) be independent random variables, where Hyp(a, b, c) denotes the hypergeometric distribution with b draws without replacement from a population of size a containing c successes.Then, for v ∈ R n+1 we have where L ∼ Geo(1/4) and the third equality is due to (2).We also have as can be seen by considering the event that the left-hand side takes value (j, k), which necessitates that = j and Y y n+η(f,L)−1 = k + j on the right-hand side.Using the shorthand η = η(f, L), the probability of the event on the left can be written as which is the claimed probability of the event on the right.Intuitively, (10) expresses the fact that flipping η coins and n − 1 coins in separate groups is statistically identical to flipping n + η − 1 coins and randomly allocating them into groups of size η and n − 1 afterwards.Using (10), the drift of the Wright-Fisher diffusion (3) satisfies which is precisely (7) applied to the v-argument of H(y, v) = B dim(v)−1 (y), v .

The Allen-Cahn model of spatial genetics
The Allen-Cahn equation on a Lipschitz domain Ω ⊆ R d is for a given λ > 0 and f : [0, 1] → [0, 1], and subject to suitable initial and boundary conditions.In [EGL22], the authors consider a model of the spatially structured frequency u(x, t) of an allele A governed by where ε > 0, ν > 0, ∂ n denotes the normal derivative at the boundary, and u 0 : Ω → [0, 1] (see also [Goo18]).They construct a solution to (12) by using a particle system in which a single particle started at a location x ∈ Ω undergoes a Brownian motion with speed 2, branches into three independent copies at rate (1 + εν)/ε 2 , and particles are reflected from the boundary.At a given end time t > 0, a leaf particle at location z ∈ Ω samples one of two alleles, a and A, with respective probabilities (1 − u 0 (z), u 0 (z)).All particles sample their alleles independently.Then, particles propagate their alleles rootwards along the realisation of the Brownian tree.The allele of an internal branch is decided by a majority vote among its three children unless exactly one child carries allele A, in which case the parent branch carries A with probability 2νε/(3 + 3νε), and a otherwise.The probability that the root particle x ∈ Ω carries allele A under these dynamics solves (12) [EGL22, Proposition 2.4].
The ingredients of the particle system can be read off from (12).The Laplacian ∆ is the generator of Brownian motion run at speed 2, the branching rate (1 + εν)/ε 2 is an upper bound for the right-hand side, and which demonstrates that the nonlinearity in (12) has an interpretation as the voting system described above.It would be straightforward to adapt the proof of [EGL22, Proposition 2.4] to any other polynomial right-hand side of the form for some m ∈ N, λ > 0, and coefficients p j ∈ [0, 1] by setting λ as the branching rate into m particles, and suitably adapting the voting scheme.Our contribution is to prove an analogous result for (11), subject to the same initial and boundary conditions as (12), when f is merely continuous and polynomially bounded.
Consider a branching Brownian motion reflected off the boundary ∂Ω, with branching rate λ, started from a single particle at x ∈ Ω at time 0. At a branching event, the number of offspring is given by η(f, L), where L ∼ Geo(1/4).At a terminal time t > 0, a leaf particle at z ∈ Ω samples allele a (resp.A) with probability 1 − u 0 (z) (resp.u 0 (z)), and each leaf carries out this choice independently.Alleles are propagated rootwards along the tree: a branch with η(f, k) offspring, j of whom carry allele A, carries allele A if f k (j/η(f, k)) ≥ 1/2, and carries allele a otherwise.Let F (x, t) be the allele carried by the root particle at position x ∈ Ω when the branching process is run until time t > 0.
Theorem 4. Suppose f is continuous and polynomially bounded as in (1).Viewed as a function of x ∈ Ω, the probability P(F (x, t) = A) solves (11) subject to the initial and boundary conditions in (12).
Proof.The proof is an adaptation of that of [EGL22, Proposition 2.4] to our more general setting.To verify that q(x, t) := P(F (x, t) = A) solves (11) on the interior of Ω, let S denote the first branching time of the initial particle and (W t ) t≥0 be a Brownian motion.For small h > 0, q(x, t + h) = P(F (x, t + h) = A|S > h)P(S > h) + P(F (x, t + h) = A|S ≤ h)P(S ≤ h) where the subscript in E x denotes the starting point of (W t ) t≥0 and the last equality follows via (2).Since f is continuous, regularity of the heat semigroup and the tower law (viewing q(x, t) as the expectation of an indicator function) yield Hence, The boundary condition is inherited from reflecting Brownian motion [BH91] since branching events occur at a finite rate and hence will not take place on the boundary.

Discussion
In [CHS22], the authors prove analogues of Theorems 1-3 for the case for a fixed m ∈ N, positive coefficients {β ℓ } m ℓ=2 , and a sequence of [0, 1]-valued coefficients {{p i,ℓ } ℓ i=0 } m ℓ=2 .Earlier work by González Casanova and Spanò also covered the case where {π j } ∞ j=0 is a probability mass function [GS18].Our results cover all Lipschitz continuous, polynomially bounded functions f : [0, 1] → [0, 1], which includes both of these classes as special cases.Furthermore, [CHS22, Section 2.10] mentions that their approach should extend to the m = ∞ case.Our Bernoulli factory approach demonstrates that this is true, and also that there is no further difficulty in handling our more general class of drift functions.
The works of [GS18] and [CHS22] are more general than our results in two ways: they tackle sequences of drift functions f N → f as N → ∞, and they incorporate jumps in the limiting forward-in-time Wright-Fisher diffusion, along with multiple mergers into the reverse-time ancestral process.Both of these generalisations could be incorporated into our model at the cost of increased technicality.We have chosen to omit them to focus on the class of drift functions, which is our main interest.
The link our work establishes between individual-based models and diffusive scaling limits provides rigorous justification for a range of diffusive approximations which have been obtained for non-neutral finite-population models [TN06,LL07].In addition, we provide an associated genealogical description and a formal duality relation between the two processes.Genealogical processes of Wright-Fisher diffusions with frequency-dependent selection are also considered in [CG04], though their approach is to condition on the allele frequency trajectory and hence avoid the need for branching events in the ancestral process.Their setting is formulated for generic drift functions of the form β(x)x(1 − x), though in practice they focus on Bernstein polynomials of degree no more than two.
Approaches based on moment duality, with duality function h(x, n) = x n , have been used to obtain series expansions of Wright-Fisher diffusion transition functions [BEG00], and [CHS22] uses the Bernstein duality in (9) and the Bernstein coefficient process (V t ) t≥0 to study fixation for drifts of the form (13).It is unclear whether similar results can be usefully obtained in our setting in practice because the sequences {f k } k≥1 and {η(f, k)} k≥1 are difficult to compute.Hence, so are the Bernstein coefficients.
The construction of the solution to the Allen-Cahn equation (11) from a branching Brownian motion is also an example of duality between these two processes.This result could also be generalised by replacing the Laplacian with a more general second order differential operator, provided that it has suitable regularity and generates a diffusion with tractable reflecting behaviour at boundaries.
The key advantage of the Keane-O'Brien factory is that it covers the whole class of continuous, polynomially bounded functions f .Other factories could be used to obtain different ancestral processes, Wright-Fisher diffusions, and constructions of the solution to the Allen-Cahn equation.However, it is essential that the number of coins needed by the factory is independent of the realisations of those coins, or at least that there is a almost surely finite upper bound on the number of coins regardless of realisations.Otherwise, the trick of keeping track of all resulting branches to fill in alleles (or votes) later cannot work, because the number of branches cannot be determined until earlier alleles have been resolved.As far as we are aware, the only other somewhat general Bernoulli factory with this independence property is that of [Men19], which is identical to the branching mechanism used in [GS18] and applies to exactly the same functions.Seminal Bernoulli factories, such as that of Nacu and Peres [NP05] based on Bernstein polynomial approximation of f , inherently link the decision to keep flipping coins to the outcomes of earlier coins.
Finally, there are multivariate analogues of Bernoulli factories, in which independent, m-sided dice with probability mass function (p 1 , . . ., p m ) are used to construct a v-sided die with mass function f (p 1 , . . ., p m ).To date, attention has focused on domains which exclude boundaries, and where the coordinates of f are rational functions [Mor21, M LNW22], or where v = 1 [PLS23].We believe a suitable extension of such dice enterprises to cases including boundaries could be used to obtain analogues of the convergence and duality results presented here in the case with more than two non-neutral alleles.

Funding and data sharing statements
JK was supported by EPSRC research grant EP/V049208/1.K L was supported by the Royal Society through the Royal Society University Research Fellowship.Data sharing is not applicable to this article as no new data were generated or analysed.

Remark 3 .
The requirement of a Lipschitz drift could be slightly relaxed by using the more cumbersome conditions for a Feller semigroup given in [XYZ19, Theorem 3.3].They cover drifts satisfying a condition akin to |f (y) − f (z)| ≤ −C|y − z| log(|y − z|), or small variations thereof, where C > 0 is a constant.See [XYZ19, equations (14) and (15), as well as Remark 2.3] for a precise class of non-Lipschitz functions which can be handled.Next we consider a sample of n ∈ N individuals from a given generation (which we arbitrarily set as generation 0) in the pre-limiting Wright-Fisher model {Z (N ) k } k≥0 .We define the ancestral process A (N ) k