1 Introduction

The coupled Wright–Fisher diffusion was introduced by Aurell et al. (2019) with the purpose of analysing networks of loci in recombining populations of bacteria, or more precisely, detecting couples of loci co-evolving under strong selective pressure when the linkage disequilibrium is low across the genome. The model includes parent dependent mutation, interlocus selection and free recombination. Mutation is assumed to occur independently at each locus, while selection consists of first and second order selective interaction among loci.

This particular type of assumptions on the selection and recombination structure are suitable for example for some populations of bacteria, as showed in Skwark et al. (2017), where the same type of assumptions are made. In Skwark et al. (2017), it is explained that the high amount of homologous recombination in populations of Streptococcus Pneumoniae, which results in low linkage disequilibrium across the genome, makes this population ideal for detecting genes that evolve under shared selection pressure. On the contrary, in other populations of bacteria, e.g. Streptococcus Pyogenes, the low amount of homologous recombination makes it difficult to separate couplings attributable to recombination from those attributable to selection and thus the assumptions above are not suitable to study such populations.

The mathematical idea corresponding to these biological characteristics, is that the recombination is high enough to be approximated with infinite recombination, which would make the processes at each locus independent, and it is thus the selection only that causes the coupling between the diffusions at the different loci.

Furthermore, it is assumed that selection acts on the individual loci and on pairs of loci. The pairwise selection can be thought of as a network, where the vertices represent the loci and the edges the possible interactions, as shown in Aurell et al. (2019). Of course, the possible set of interactions could, in principle, be more complex than a network, but considering pairwise interactions turns out to be useful to reveal certain types of co-evolutionary patterns, see Skwark et al. (2017).

The model considers L different loci where, at each locus, a number of variants (alleles) are possible. The allele types at locus l are labelled by \(1,\ldots ,M_l\), thus assuming that the type space at each locus is finite. The population is haploid. The coupled Wright–Fisher diffusion is obtained as the weak limit of a sequence of discrete Wright–Fisher models characterised by the assumption that the evolution of the population at one locus is conditionally independent of the other loci given that the previous generation at each locus is known, see Aurell et al. (2019) for details. It is based on quasi-linkage equilibrium where the fitness coefficients, see Sect. 2, are inspired by a Potts model, see Gao et al. (2019), Neher and Shraiman (2011), and generalise the classical additive fitness under weak selection, see e.g. Bürger (2000, Ch. II), to the multi-locus case. With two loci and without the fist order selection terms, the coupled Wright–Fisher diffusion is reduced to a haploid version of the model with weak selection, loose linkage in Ethier and Nagylaki (1989).

Here we state the definition of the diffusion as the solution of a system of stochastic differential equations, without reference to the underlying discrete model. The coupled Wright–Fisher diffusion, \({\mathbf {X}} = \{{\mathbf {X}}(t), t\ge 0\}\), represents the evolution of the vector of all frequencies of allele types at each locus. Let

$$\begin{aligned} {\mathbf {X}}^{(l)}(t)=(X_1^{(l)}(t),\ldots ,X_{M_l}^{(l)}(t))^\mathrm {T}\end{aligned}$$

represent the vector of frequencies at locus l, with \(X_i^{(l)}(t)\) being the frequency of allele type i at locus l, then

$$\begin{aligned} {\mathbf {X}}(t)=({\mathbf {X}}^{(1)}(t)^\mathrm {T},\ldots ,{\mathbf {X}}^{(L)}(t)^\mathrm {T})^\mathrm {T}. \end{aligned}$$

The process \({\mathbf {X}} \) is the strong solution to the system of stochastic differential equations

$$\begin{aligned} \mathrm {d}{\mathbf {X}}(t)= \mu ({\mathbf {X}}(t)) \mathrm {d}t + D({\mathbf {X}}(t)) \nabla V({\mathbf {X}}(t)) \mathrm {d}t + D^{\nicefrac {1}{2}}({\mathbf {X}}(t)) \mathrm {d}{\mathbf {W}}(t), \end{aligned}$$
(1)

where V is a specific quadratic function encoding the structure of the interactions, \(\nabla V\) its gradient, while the mutation vector \(\mu \) and the diffusion matrix D have the following block structure,

$$\begin{aligned} \mu ({\mathbf {x}})= \left( \begin{matrix} \mu ^{(1)}({\mathbf {x}}^{(1)})\\ \vdots \\ \mu ^{(L)}({\mathbf {x}}^{(L)}) \end{matrix} \right) , \quad D({\mathbf {x}})= \left( \begin{matrix} D^{(1)}({\mathbf {x}}^{(1)})&{} &{} \\ &{} \ddots &{} \\ &{} &{} D^{(L)}({\mathbf {x}}^{(L)}) \end{matrix} \right) , \end{aligned}$$

with \(\mu ^{(l)}:{\mathbb {R}}^{M_l}\rightarrow {\mathbb {R}}^{M_l}\) and \(D^{(l)}:{\mathbb {R}}^{M_l}\rightarrow {\mathbb {R}}^{M_l \times M_l}\). The functions V, \(\mu \) and D are described in detail in the next section. The process \({\mathbf {W}}=(({\mathbf {W}}^{(1)})^\mathrm {T},\ldots ,({\mathbf {W}}^{(L)})^\mathrm {T})^\mathrm {T}\) is a multidimensional Brownian motion with \({\mathbf {W}}^{(l)}\) having the dimension of \({\mathbf {X}}^{(l)}\).

The system of SDEs (1) consists of L systems of equations for \({\mathbf {X}}^{(1)},\ldots ,{\mathbf {X}}^{(L)}\), coupled by the drift term \(D\,\nabla V\). Note that, if \(\nabla V = 0\), there is no interaction among the loci and the coupled Wright–Fisher diffusion consists of L independent Wright–Fisher diffusions, that is, each \({\mathbf {X}}^{(l)}\) solves

$$\begin{aligned} \mathrm {d}{\mathbf {X}}^{(l)}(t)= \mu ^{(l)}({\mathbf {X}}^{(l)}(t)) \mathrm {d}t + D^{(l)\nicefrac {1}{2}}({\mathbf {X}}^{(l)}(t))\, \mathrm {d}{\mathbf {W}}^{(l)}(t), \end{aligned}$$

which is the SDE for a single-locus, multi-type Wright–Fisher diffusion with mutations. In fact, the coupling of the loci is entirely due to selective interactions that are described by the drift term \(D\,\nabla V\). Without the interaction drift term, the diffusion in this paper, with \(L=2\), is reduced to the independent-loci model in Ethier and Griffiths (1990). That is, the weak limit of a sequence of multi-locus neutral Wright–Fisher diffusions with recombination rate going to infinity. In the multi-locus case, the same diffusion appears also in Griffiths et al. (2016, Sect. 3.3) as an example under free recombination.

An interesting feature of the coupled Wright–Fisher diffusion, addressed by Aurell et al. (2019) as one of the main motivations for its introduction, is its stationary density which appeared, in a more general form, as a conjecture by Kimura over half a century ago. Kimura (1955) suggests a Wright–Fisher model for multi-locus and multi-allelic genetic frequencies and conjectures that the stationary density is of the form \(\pi e^m\), where \(\pi \) is the product of Dirichlet densities and m is a generic mean fitness term. The coupled Wright–Fisher diffusion is constructed so that the quadratic function V could replace the generic m. Indeed, under the assumption of parent independent mutations, the stationary density, p, of the coupled Wright–Fisher diffusion is known up to a normalising constant Z, and corresponds to the one conjectured by Kimura with \( m = 2V \),

$$\begin{aligned} p \propto \pi e^{2 V}, \end{aligned}$$
(2)

see Sect. 2 for the definition of \(\pi \) and V. In fact, the form of the stationary density, under parent independent mutations, relies on the fact that the covariance of the diffusion defines a Svirezhev–Shahshahani metric on the simplex, with respect to which the drift is a gradient, see Bürger (2000, Appendix E.3).

In this paper a dual process for the coupled Wright–Fisher diffusion is studied. In population genetics, Markov duality has proven its effectiveness in combining information from two processes related to the same population: a diffusion process modelling the evolution of frequencies of genetic types forward in time and a reverse-time jump process modelling the ancestral history of a sample of individuals taken at the present time. The simplest and most well known duality relationship in this context is the moment duality between the Wright–Fisher diffusion and the block counting process of the Kingman coalescent.

The strength of Markov duality is that it provides a tool to analyse properties of the population by combining knowledge about the forward-in-time process and the backward-in-time process. Even when both processes are complicated, as often happens when mutation, recombination or selection mechanisms are involved, some known properties of one process can be used to analyse unknown properties of the other process and vice versa, leading to further insights about the population.

Several duality relationships have been established between various generalisations of the Wright–Fisher diffusion and the associated time reversed ancestral processes generalising the coalescent process. For example, when the selection mechanism is taken into account, the ancestral process associated to the Wright–Fisher diffusion is the ancestral selection graph (ASG), see Krone and Neuhauser (1997), Neuhauser and Krone (1997), which is closely related to the dual process in this paper when only one locus is considered, see Sect. 4. Unlike the Kingman coalescent, which has a tree structure, the ASG is branching and coalescing: the ancestral tree is replaced by an ancestral graph containing true and virtual lineages and embedding the genealogy of the sample of individuals. For a complete survey on duality for Markov processes, see Jensen and Kurt (2014), and for a brief overview of duality in population genetics see Griffiths et al. (2016) and the references therein.

In this paper, the main result concerns the derivation of a dual process for the coupled Wright–Fisher diffusion. The results show that, in this model, the dual process corresponds to the block counting process of L coupled ASGs, one for each locus, evolving simultaneously. Coalescence, mutation and single-branching, which is due to selection acting on the single loci, occur at different times in the different ASGs, whereas branching that is due to selection acting on pairs of loci, occurs simultaneously in two ASGs. The latter type of branching is referred to as double-branching in this paper. The main result in this paper is Theorem 1, which provides a description of the transition rates of the pure jump Markov process, \({\mathbf {N}} = \{{\mathbf {N}}(t), t\ge 0\}\), that is dual to the coupled Wright–Fisher diffusion, \({\mathbf {X}} \), through the duality relationship

$$\begin{aligned} {\mathbb {E}}\left[ F({\mathbf {X}}(t),{\mathbf {n}})|{\mathbf {X}}(0)={\mathbf {x}}\right] = {\mathbb {E}}\left[ F({\mathbf {x}},{\mathbf {N}}(t))|{\mathbf {N}}(0)={\mathbf {n}}\right] , \end{aligned}$$
(3)

where F is a duality function, to be determined. The derivation uses a generator approach as in Griffiths et al. (2016) and Etheridge and Griffiths (2009). It is based on the duality relationship of the infinitesimal generators

$$\begin{aligned} {\mathcal {L}}F(\cdot ,{\mathbf {n}})({\mathbf {x}})={\mathcal {L}}^D F({\mathbf {x}},\cdot \,)({\mathbf {n}}), \end{aligned}$$
(4)

where \({\mathcal {L}}\) is the generator of the coupled Wright–Fisher diffusion and \({\mathcal {L}}^D\) the unknown generator of the dual process. By proposing an appropriate duality function F, the generator \({\mathcal {L}}^D\) of the dual process can be identified, from which transition rates of the dual process are obtained. Under mild conditions, which are verified in Sect. 6, the method of duality (Ethier and Kurtz 1986, Ch. 4), also used in e.g. Barbour et al. (2000), Etheridge and Griffiths (2009) and Mano (2009), ensures that the duality relationship of the generators (4) implies (3).

Understanding the structure of an ancestral process, \({\mathbf {N}}\), which is dual to a diffusion of the Wright–Fisher type, \({\mathbf {X}}\), plays a significant role in population genetics inference. As is often the case, the available data consist of observations of the genetic types of a sample of individuals at the present time, \({\mathbf {N}}(0)={\mathbf {n}}\), whereas the evolution of the process is not observed. This results in the likelihood function being intractable when the size of the population is large. In order to compute the likelihood, one could, in theory, condition on the genealogical history of the sample and then integrate over all possible histories that are compatible with the sample. However, the domain of integration is so large that, in practice, numerical integration methods are useless even for intermediate sized populations. Simulation-based methods are generally preferred. As carefully explained by Stephens in Stephens (2007), naive Monte Carlo methods based on simulating the histories forward in time produce next to useless approximations of the likelihood for problems involving samples of a more than few individuals. This is due to the fact that only very few simulations contribute significantly to the approximation, while the contribution of the remaining simulations is negligible. Simulation- and likelihood-based techniques that have proven to work for these problems are Markov chain Monte Carlo, importance sampling and sequential Monte Carlo. All these methods rely on knowing, to some extent, the structure of the ancestral process in order to approximate its backward dynamics, see e.g. Griffiths and Tavaré (1994), Koskela et al. (2015, 2018), Stephens (2007), Stephens and Donnelly (2000, 2003) for details.

From the duality relation (3), it is also possible to derive an expansion of the transition distribution of the diffusion \({\mathbf {X}}\), see Barbour et al. (2000), Etheridge and Griffiths (2009), Griffiths et al. (2016), in terms of the limit of the transition densities of the dual process \({\mathbf {N}}\). In the absence of mutation, the duality relation (3) can also be used to determine fixation probabilities. That is, the probability that the frequency of a given allele at a given loci is equal to 1. Such probabilities may be studied by taking the limit, as \(t \rightarrow \infty \), in (3) and considering the recurrence/transience properties of the dual process \({\mathbf {N}}\), see e.g. Foucart (2013), Griffiths et al. (2016), González Casanova and Spanó (2018), Mano (2009) for studies of the Wright–Fisher process with selection and frequency dependent selection, the multi-locus Wright–Fisher process with recombination, the \(\varLambda \)-Wright–Fisher process with selection and the \(\varXi \)-Wright–Fisher process with frequency dependent selection, respectively.

The paper is outlined as follows. In Sect. 2 a background on the coupled Wright–Fisher diffusion is provided. Section 3 outlines the general generator approach to derive a dual process. In Sect. 4 the case of one locus, two allele types and parent independent mutations is considered. In this case the dual process is related to the ancestral selection graph, moreover, explicit formulas for the stationary density of the diffusion and the transition rates of the dual process are obtained. The main result is provided in Sect. 5, and proved in Sect. 6, where a dual process is derived in the general multi-locus setting. The final Sect. 7 provides additional details in the case of two loci, two alleles, selection and parent independent mutations, more precisely, the transition rates of the dual process are expressed in terms of beta and confluent hypergeometric functions.

2 Preliminaries on the coupled Wright–Fisher diffusion

In this section the coupled Wright–Fisher diffusion is introduced and the explicit expression for its infinitesimal generator is provided. The notation in this section differs slightly from that in Aurell et al. (2019), where the frequency of the last allele type at each locus is omitted, being a function of the other frequencies, whereas in this paper an expanded version of the diffusion is considered, which includes all the frequencies. Since the frequencies sum up to one the descriptions are equivalent. For our purpose we find the expanded version more convenient to work with.

For a given integer \(L\ge 1\), the number of loci, let \(M_1, \ldots , M_L\) be positive integers representing the number of alleles at each locus. Put \(M=\sum _{l=1}^{L}M_l\). A vector \({\mathbf {x}}\in [0,1]^{M}\) is interpreted as the concatenation of L vectors with lengths \(M_1,\ldots ,M_L\), i.e. \({\mathbf {x}}= (({\mathbf {x}}^{(1)})^\mathrm {T}, \ldots , ({\mathbf {x}}^{(L)})^\mathrm {T})^\mathrm {T}\) with \({\mathbf {x}}^{(l)}\in [0,1]^{M_l}\), \( l=1,\ldots L \), and the coordinate i in vector \({\mathbf {x}}^{(l)}\) is denoted by \(x _{i}^{(l)}\). Similarly, a matrix \(A\in {\mathbb {R}}^{M\times M}\) consists of \(L^2\) blocks with dimensions \((M_l\times M_r)_{l,r=1,\ldots ,L}\). The block at position (lr) is denoted by \(A^{(lr)}\) and its component at position (ij) is denoted by by \(A_{ij}^{(lr)}\). Furthermore \({\mathbf {e}}_{i}^{(l)}\) denotes the unit vector in \({\mathbb {R}}^M\) with the \(i^{th}\) component of its lth building vector being equal to 1.

In the following, each of the terms appearing in (1) will be described, starting from the interaction drift term. The quadratic function \(V:[0,1]^M\rightarrow {\mathbb {R}}\) is given by

$$\begin{aligned} V({\mathbf {x}})= {\mathbf {x}}^\mathrm {T}{\mathbf {h}}+\frac{1}{2}{\mathbf {x}}^\mathrm {T}J {\mathbf {x}}, \end{aligned}$$

where \({\mathbf {h}}\in {\mathbb {R}}^{M}_+\) and \(J \in {\mathbb {R}}^{M\times M}_+\) is a symmetric block matrix with the blocks on the diagonal equal to zero matrices, i.e. \(J^{(ll)} =0 \in {\mathbb {R}}^{M_l\times M_l}\) and \(J^{(lr)}=(J^{(rl)})^\mathrm {T}\) for all \(l,r=1,\ldots ,L\). The vector \({\mathbf {h}}\) and matrix J contain the selection parameters, expressing, respectively, the one-locus selection and the selective interaction among pairs of loci. In order to clarify the role of the selection parameters in terms of fitness, we may express the fitness coefficient of the haplotype \(\sigma =(i_1,\ldots ,i_L)\) as

$$\begin{aligned} w_\sigma = 1+ \sum _{l=1}^L h_{i_l}^{(l)} + \sum _{l=1}^L\sum _{\begin{array}{c} r=1\\ l<r \end{array}}^L J_{i_l i_r}^{(rl)}. \end{aligned}$$
(5)

Note that \(\nabla V({\mathbf {x}})={\mathbf {h}}+J{\mathbf {x}}\), since the matrix J is symmetric. Let \(g({\mathbf {x}})=D({\mathbf {x}})\nabla V({\mathbf {x}})\). Then, the components of \(g({\mathbf {x}})\) are

$$\begin{aligned} g_i^{(l)}({\mathbf {x}})= \sum _{k=1}^{M_l} d_{ik}^{(l)}({\mathbf {x}}^{(l)})\tilde{h}_k^{(l)}({\mathbf {x}}), \quad \text {with} \quad \tilde{h}_k^{(l)}({\mathbf {x}})=h_k^{(l)}+ \sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{m=1}^{M_r} J_{km}^{(l r)}x_m^{(r)}. \end{aligned}$$
(6)

The drift function \(\mu \) models the mutations. It is assumed that mutations occur independently at each locus, in particular, at the \(l^{th}\) locus the mutation rate is \(\frac{\theta _l}{2}\) and the probability matrix of mutations is \(P^{(l)}=(P_{ij}^{(l)})_{ i,j=1\ldots ,M_l}\). The transition rates of mutations from type i to type j at locus l are thus \(u_{ij}^{(l)}=\frac{\theta _l}{2}P_{ij}^{(l)}\). As in the standard Wright–Fisher model with parent dependent mutations, the components of the drift function are defined by

$$\begin{aligned} \mu _i^{(l)}({\mathbf {x}}^{(l)})= \sum _{j=1}^{M_l} [u_{ji}^{(l)}x_j^{(l)}-u_{ij}^{(l)}x_i^{(l)}]. \end{aligned}$$
(7)

Finally, the components of the diagonal block \(D^{(l)}({\mathbf {x}}^{(l)})\) of the diffusion matrix \(D({\mathbf {x}})\) are defined by

$$\begin{aligned} d_{ij}^{(l)}({\mathbf {x}}^{(l)})= x_i^{(l)}(\delta _{ij}-x_j^{(l)}) \quad \text {with}\quad \delta _{ij}= {\left\{ \begin{array}{ll} 1 \text { if } i=j, \\ 0 \text { if } i\ne j, \end{array}\right. } \end{aligned}$$
(8)

which is characteristic for Wright–Fisher processes.

Having defined \(\mu , D, \) and V, a compact definition of the coupled Wright–Fisher diffusion can be given, in terms of its infinitesimal generator. The coupled Wright–Fisher diffusion \(\{{\mathbf {X}}(t)\}_{t\ge 0}\) is a M-dimensional diffusion process on the state space

$$\begin{aligned} {\mathcal {S}}= \left\{ {\mathbf {x}}\in [0,1]^M \text { s.t. } \sum _{i=1}^{M_l}x_i^{(l)}=1, \quad \forall l=1,\ldots ,L, \right\} , \end{aligned}$$

with generator

$$\begin{aligned} \begin{aligned}&{\mathcal {L}}f({\mathbf {x}})= \sum _{l=1}^{L}\left[ \sum _{i=1}^{M_l} \left( \mu _i^{(l)}({\mathbf {x}}^{(l)})+g_i^{(l)}({\mathbf {x}})\right) \frac{\partial f}{\partial x_i^{(l)} }({\mathbf {x}}) +\frac{1}{2}\sum _{i,j=1}^{M_l} d_{ij}^{(l)}({\mathbf {x}}^{(l)}) \frac{\partial ^2 f}{\partial x_i^{(l)} \partial x_j^{(l)} }({\mathbf {x}}) \right] , \end{aligned} \end{aligned}$$
(9)

where \(\mu \), g and d are given by (7), (6) and (8), respectively. The generator \({\mathcal {L}}\) is defined on the domain \( C^2({\mathcal {S}})\).

Before proceeding with the derivation of the dual process, the stationary density (2) is explicitly presented. Consider representing the coupled Wright–Fisher diffusion on the state space

$$\begin{aligned} {\bar{\mathcal {S}}}= \left\{ {\bar{\mathbf {x}}}\in [0,1]^{M-L} \text { s.t. } \sum _{i=1}^{M_l-1}{\bar{x}}^{(l)}_i \le 1 \quad \forall l=1,\ldots ,L \right\} , \end{aligned}$$

where

$$\begin{aligned} {\bar{{\mathbf {x}}}} = ({{{\bar{x}}}}^{(1)}_1, \ldots , {{{\bar{x}}}}^{(1)}_{M_1-1}, \dots , {{{\bar{x}}}}^{(L)}_1, \dots , {{{\bar{x}}}}^{(L)}_{M_L-1})^\mathrm {T}\in \mathcal {{{\bar{S}}}} \end{aligned}$$

is identified with

$$\begin{aligned} {\mathbf {x}}= \left( {{{\bar{x}}}}^{(1)}_1, \dots , {{{\bar{x}}}}^{(1)}_{M_1-1}, 1-\sum _{i=1}^{M_1 -1} {{{\bar{x}}}}^{(1)}_i, \dots , {{{\bar{x}}}}^{(L)}_1, \dots , {{{\bar{x}}}}^{(L)}_{M_L-1}, 1-\sum _{i=1}^{M_L -1} {{{\bar{x}}}}^{(L)}_i\right) ^\mathrm {T}\in {\mathcal {S}}. \end{aligned}$$

If there are no interactions among loci, the coupled Wright–Fisher diffusion consists of L independent Wright–Fisher diffusions and the stationary density is well known when the mutations are parent independent. Wright himself proved that the stationary distribution of a single-locus, multi-type Wright–Fisher diffusion with parent independent mutations is Dirichlet, see Wright (1949). Therefore, the stationary density of independent Wright–Fisher diffusions is the product of Dirichlet densities. More precisely, let

$$\begin{aligned} \pi (\bar{{\mathbf {x}}})=\prod _{l=1}^{L}\pi _l({\bar{{\mathbf {x}}}}^{(l)}), \quad \text {with} \quad \pi _l({\bar{{\mathbf {x}}}}^{(l)})=\prod _{i=1}^{M_l-1}({\bar{x}}_i^{(l)})^{2 u_i^{(l)}-1} \left( 1-\sum _{i=1}^{M_l-1}{\bar{x}}_i^{(l)} \right) ^{2 u_{M_l}^{(l)}-1}, \end{aligned}$$

where \(\pi (\bar{{\mathbf {x}}})\) is the non-normalised stationary density of a coupled Wright–Fisher diffusion with no interaction among loci. In the presence of interaction and assuming parent independent mutations, i.e. \(u_{ij}^{(l)}=u_j^{(l)}\), \(i,j=1,\ldots ,M_l, l=1,\ldots ,L\), Aurell et al. (2019) prove that there is an additional exponential factor in the stationary density, that is

$$\begin{aligned} p({\bar{\mathbf {x}}})=\frac{1}{Z} \pi ({\bar{\mathbf {x}}}) e^{2 V({\bar{\mathbf {x}}})}, \end{aligned}$$
(10)

with V defined on \({\bar{\mathcal {S}}}\) by naturally defining the missing frequencies as one minus the sum of the other frequencies at the same locus. The form of the stationary density is explicit up to a normalising constant. In general, it is difficult to compute the normalising constant Z explicitly, but under additional assumptions it can be computed numerically, as demonstrated in Sects. 4 and 7.

3 Outline of the derivation of a dual process

To derive a process that is dual to the coupled Wright–Fisher diffusion, a generator approach will be used as in Griffiths et al. (2016), where the authors find a dual process for a multi-locus Wright–Fisher diffusion with recombination. In this section the method will be explained, in general terms.

Let \({\mathcal {L}}\) be the generator of the diffusion process (9) and \({\mathcal {L}}^D\) be the unknown generator of a dual process. Suppose that the following relationship holds

$$\begin{aligned} {\mathcal {L}}F(\cdot ,{\mathbf {n}})({\mathbf {x}})= {\mathcal {L}}^DF({\mathbf {x}},\cdot )({\mathbf {n}}), \quad {\mathbf {x}}\in {\mathcal {S}}, \quad {\mathbf {n}}\in {\mathbb {N}}^M, \end{aligned}$$
(11)

for some duality function F that needs to be determined. Using the relationship (11) the transition rates of a dual process can be identified from its generator. To pursue this approach, it is necessary to compute the left hand side of (11) by applying the generator \({\mathcal {L}}\) to the duality function F, considered as a function of \({\mathbf {x}}\), and rewrite it into the form

$$\begin{aligned} {\mathcal {L}}F(\cdot ,{\mathbf {n}})({\mathbf {x}})= \sum _{{\hat{{\mathbf {n}}}}}q({\mathbf {n}},{\hat{{\mathbf {n}}}}) \left[ F({\mathbf {x}},{\hat{{\mathbf {n}}}})-F({\mathbf {x}},{\mathbf {n}})\right] , \end{aligned}$$
(12)

for some non-negative coefficients \(q({\mathbf {n}},{\hat{{\mathbf {n}}}})\), \({\hat{{\mathbf {n}}}}\in {\mathbb {N}}^M\), \({\hat{{\mathbf {n}}}}\ne {\mathbf {n}}\). In light of the duality relationship, expression (12) can be interpreted as the generator \({\mathcal {L}}^D\) applied to the duality function F, considered as a function of \({\mathbf {n}}\). Consequently, the dual process obtained in this way is a pure jump Markov process on the discrete space \( {\mathbb {N}}^M \) with transition rate matrix \(Q=(q(\cdot ,\cdot ))\), the off-diagonal elements being the non-negative coefficients in (12) and the diagonal elements being chosen so that the sum of each row is 0. The alleged duality relationship is validated once the transition rates and the proper duality function are determined.

Consider the following proposal for the duality function, F. The inspiration for the proposal comes from the duality function for the one-locus Wright–Fisher diffusion with mutations, see e.g. Etheridge and Griffiths (2009) and Griffiths et al. (2016). It can be generalised to the multi-locus setting by taking

$$\begin{aligned} F({\mathbf {x}},{\mathbf {n}})= \frac{1}{k({\mathbf {n}})} \prod _{l=1}^{L}\prod _{i=1}^{M_l} (x_i^{(l)})^{n_i^{(l)}}, \end{aligned}$$
(13)

for some function \(k:{\mathbb {N}}^M\rightarrow {\mathbb {R}}{\setminus }\{0\}\) that is determined in the following. Note that the duality function \(F(\cdot ,{\mathbf {n}})\) defined in (13) belongs to \({\mathcal {C}}^\infty ({\mathcal {S}})\), for all \({\mathbf {n}}\in {\mathbb {N}}^M\), and thus it belongs to the domain of \({\mathcal {L}}\). Let \(\tilde{\mathbf {X}}\) be distributed according to the stationary distribution of the diffusion process \(\{{\mathbf {X}}(t)\}_{t\ge 0}\), when such a distribution exists. Then \({\mathbb {E}}\left[ {\mathcal {L}}F(\tilde{\mathbf {X}},{\mathbf {n}})\right] =0\). Therefore, by taking expectation under the stationary distribution in (12), it follows that

$$\begin{aligned} \sum _{{\hat{\mathbf {n}}}}q({\mathbf {n}},{\hat{{\mathbf {n}}}}) {\mathbb {E}}\left[ F(\tilde{\mathbf {X}},{\hat{{\mathbf {n}}}})-F(\tilde{\mathbf {X}},{\mathbf {n}})\right] =0, \end{aligned}$$

which implies that \({\mathbb {E}}\left[ F(\tilde{\mathbf {X}},\cdot )\right] \) must be constant. The constant can be taken to be equal to 1, and consequently,

$$\begin{aligned} k({\mathbf {n}})= {\mathbb {E}}\left[ \prod _{l=1}^{L}\prod _{i=1}^{M_l} (\tilde{X}_i^{(l)})^{n_i^{(l)}}\right] . \end{aligned}$$
(14)

Note that the existence of a stationary distribution for the diffusion is needed in order to define the function k. Thus, in the following, it is assumed that such a distribution exists. Furthermore, in order for the duality function F to be well defined, the function k needs to be non-zero, which holds if

$$\begin{aligned} {\mathbb {P}}(\tilde{X}_i^{(l)}=0)=0, \quad i=1,\ldots , M_l, l=1,\dots L. \end{aligned}$$
(15)

In many cases it is possible to verify that a stationary distribution exists and fulfils (15). For example, as shown in the previous sections, when the mutations are parent independent, the stationary density is known, see (10), and \(k({\mathbf {n}})\ne 0 \) for all \({\mathbf {n}}\in {\mathbb {N}}^M\). More generally, condition (15) is satisfied when the stationary distribution has a density with respect to the Lebesgue measure. Even if a stationary density is not known in an explicit form, classical techniques, see e.g. Khasminskii (1980), may be used to show its existence and properties, using the Fokker–Planck equation in Aurell et al. (2019).

A relevant case, in which (15) is not verified, is the case of no mutations, \(\theta =0\). Nevertheless, it is still possible to derive a dual process in this case by defining the function k in a simpler way that does not rely on a stationary distribution. The derivation of the dual process actually becomes simpler than the one outlined in this section. The case of no mutations is treated separately in Sect. 5, Corollary 1. Elsewhere in the paper it is assumed that a stationary distribution exists and satisfies (15).

To find the transition rates of the dual process, it remains to obtain an expression of the form (12). In fact, it is sufficient to obtain an expression of the form

$$\begin{aligned} {\mathcal {L}}F(\cdot , {\mathbf {n}})({\mathbf {x}})= \sum _{{\hat{{\mathbf {n}}}}\ne {\mathbf {n}}}q({\mathbf {n}},{\hat{{\mathbf {n}}}})F({\mathbf {x}},{\hat{{\mathbf {n}}}}) +q({\mathbf {n}},{\mathbf {n}})F({\mathbf {x}},{\mathbf {n}}), \end{aligned}$$
(16)

with the requirement that \(q({\mathbf {n}},{\hat{{\mathbf {n}}}})\) is positive for \( {\hat{{\mathbf {n}}}}\ne {\mathbf {n}}\) (it will be soon clear that \(q({\mathbf {n}},{\mathbf {n}})\) is thus negative). Once (16) is obtained, it is possible to derive expression (12) as follows. Rewriting (16) yields

$$\begin{aligned} {\mathcal {L}}F(\cdot , {\mathbf {n}})({\mathbf {x}})= \sum _{{\hat{{\mathbf {n}}}}\ne {\mathbf {n}}}q({\mathbf {n}},{\hat{{\mathbf {n}}}}) \left[ F({\mathbf {x}},{\hat{\mathbf {n}}}) -F({\mathbf {x}},{\mathbf {n}})\right] +\left( \sum _{{\hat{{\mathbf {n}}}}\ne {\mathbf {n}}}q({\mathbf {n}},{\hat{{\mathbf {n}}}})+q({\mathbf {n}},{\mathbf {n}})\right) F({\mathbf {x}},{\mathbf {n}}). \end{aligned}$$
(17)

Keeping in mind that \({\mathbb {E}}\left[ {\mathcal {L}}F(\cdot ,{\mathbf {n}})(\tilde{\mathbf {X}})\right] =0\) and that \({\mathbb {E}}\left[ F(\tilde{\mathbf {X}},\cdot )\right] \) is constant, one can apply the expectation with respect to the stationary distribution to get

$$\begin{aligned} \sum _{{\hat{{\mathbf {n}}}}\ne {\mathbf {n}}}q({\mathbf {n}},{\hat{{\mathbf {n}}}})+q({\mathbf {n}},{\mathbf {n}})=0. \end{aligned}$$
(18)

Therefore (17) implies (12) and it remains to write \({\mathcal {L}}F\) as in (16) by finding the positive coefficients \(q({\mathbf {n}},{\hat{{\mathbf {n}}}})\). Furthermore, (18) can be used to find a recursion formula for the function k.

Throughout the rest of the paper, the emphasis will be on obtaining an expression of the type (16). This approach is first illustrated in a simpler case (single locus), in order to lighten the formulas and highlight the ideas, and is subsequently used in the general case of the coupled Wright–Fisher diffusion. The simpler case turns out to be closely related to a well known model: the ancestral selection graph.

4 The ancestral selection graph

When only one locus is considered, the coupled Wright–Fisher diffusion is simply a one-locus Wright–Fisher diffusion with selection. Let \(L=1\) , \(M_1=2\) and assume that mutations are parent independent, i.e. \(u_{ij}=u_j \; \text {for } i,j=1,2\). The matrix of pairwise selection parameters is the zero matrix and the quadratic function V becomes linear

$$\begin{aligned} V({\mathbf {x}})=h_1 x_1 +h_2 x_2. \end{aligned}$$

Let j(i) be the index opposite to i,

$$\begin{aligned} j(i)= {\left\{ \begin{array}{ll} 2 &{} \text {if } i=1\\ 1 &{} \text {if } i=2 \end{array}\right. }. \end{aligned}$$

Then, the drift terms can be written as follows

$$\begin{aligned} \begin{aligned} \mu _i({\mathbf {x}})&= u_i x _{j(i)} -u_{j(i)}x_i, \\ g_i({\mathbf {x}})&= h_i x_i(1-x_i) - h_{j(i)} x_i x_{j(i)}, \quad i = 1,2. \end{aligned} \end{aligned}$$

The diffusion process solving (1) under the assumptions in this section is a two-types Wright–Fisher diffusion with selection and parent independent mutations. It is known that the genealogical process for this type of Wright–Fisher diffusion is embedded in a graph with coalescing and branching structure, the ancestral selection graph (ASG), studied by Krone and Neuhauser (1997) and Neuhauser and Krone (1997). In the ASG, first the coalescing-branching structure is constructed leaving types aside, then types and mutations are superimposed on it. In contrast, here it is assumed that the types of individuals in the sample \({\mathbf {n}}\) are known and mutations are included in the dual process rather than superimposed afterwards. Our approach is similar to the one in Etheridge and Griffiths (2009), where the authors derive a dual process for the finite population size Moran model and use it to find the limiting transition rates of the dual process for the diffusion.

Following the outline in Sect. 3, a dual process is derived as follows. By applying the generator \({\mathcal {L}}\) to the duality function F in (13), rewriting the derivatives of F, and rearranging the terms yields

$$\begin{aligned} {\mathcal {L}}F(\cdot ,{\mathbf {n}}) ({\mathbf {x}})&= \sum _{i=1,2} (u_i x _{j(i)} -u_{j(i)}x_i ) \frac{n_i}{x_i} F({\mathbf {x}},{\mathbf {n}}) \\&\quad + \sum _{i=1,2} x_i(h_i - h_i x_i - h_{j(i)}x_{ j(i)}) \frac{n_i}{x_i} F({\mathbf {x}},{\mathbf {n}}) \\&\quad + \frac{1}{2} \sum _{i=1,2} x_i(1-x_i) \frac{n_i(n_i-1)}{(x_i)^2} F({\mathbf {x}},{\mathbf {n}}) -x_1 x_2 \frac{n_1n_2}{x_1 x_2}F({\mathbf {x}},{\mathbf {n}}) \\&= \sum _{i=1,2} \frac{n_i(n_i-1)}{2} \frac{1}{x_i} F({\mathbf {x}},{\mathbf {n}}) + \sum _{i=1,2} u_i n_i \frac{x_{j(i)}}{x_i} F({\mathbf {x}},{\mathbf {n}}) \\&\quad - \sum _{i=1,2} h_i (n_i +n _{j(i)}) x_i F({\mathbf {x}},{\mathbf {n}})\\&\quad - \left\{ \frac{n}{2}(n-1) +\sum _{i=1,2}n_i u_{j(i)}-\sum _{i=1,2}n_i h_i \right\} F({\mathbf {x}},{\mathbf {n}}), \end{aligned}$$

where \(n=n_1+n_2\). To obtain an expression of the form (16) the expression in the last display can be rewritten as follows. First replace \(x_i=1-x_{j(i)}\) to obtain positive coefficients for the selection terms, then use the identities, for \(i=1,2\),

$$\begin{aligned} \frac{1}{x_i} F({\mathbf {x}}, {\mathbf {n}})&= \frac{k(\mathbf {n-e}_i)}{k({\mathbf {n}})} F({\mathbf {x}},\mathbf {n-e}_i), \end{aligned}$$
(19)
$$\begin{aligned} \frac{x_{j(i)}}{x_i} F({\mathbf {x}},{\mathbf {n}})&= \frac{k(\mathbf {n+e}_{j(i)}-{\mathbf {e}}_i)}{k({\mathbf {n}})} F({\mathbf {x}},\mathbf {n+e}_{j(i)}-{\mathbf {e}}_i), \end{aligned}$$
(20)
$$\begin{aligned} x_i F({\mathbf {x}},{\mathbf {n}})&= \frac{k(\mathbf {n+e}_i)}{k({\mathbf {n}})} F({\mathbf {x}},\mathbf {n+e}_i), \end{aligned}$$
(21)

where \({\mathbf {e}}_i\), \(i=1,2\), are the unit vectors in \({\mathbb {N}}^2\). Finally, it yields,

$$\begin{aligned} \begin{aligned} {\mathcal {L}}F(\cdot ,{\mathbf {n}})({\mathbf {x}})&= \sum _{i=1,2} \frac{n_i(n_i-1)}{2} \frac{k(\mathbf {n-e}_i)}{k({\mathbf {n}})} F({\mathbf {x}},\mathbf {n-e}_i) \\&\quad + \sum _{i=1,2} u_i n_i \frac{k(\mathbf {n+e}_{j(i)}-{\mathbf {e}}_i)}{k({\mathbf {n}})} F({\mathbf {x}},\mathbf {n+e}_{j(i)}-{\mathbf {e}}_i) \\&\quad + \sum _{i=1,2} h_{j(i)} n \frac{k(\mathbf {n+e}_i)}{k({\mathbf {n}})} F({\mathbf {x}},\mathbf {n+e}_i) \\&\quad - \left\{ \frac{n}{2}(n-1) +\sum _{i=1,2}n_i u_{j(i)} +\sum _{i=1,2} n_{j(i)} h_i \right\} F({\mathbf {x}},{\mathbf {n}}), \end{aligned} \end{aligned}$$
(22)

which is the desired expression. As demonstrated in Sect. 3 the transition rates of a dual process can be identified directly from this expression. Therefore the dual process for the Wright–Fisher diffusion considered in this section, with respect to F, is the pure jump Markov process on the state space \({\mathbb {N}}^2\), with transition rates as follows. The dual process, in state \({\mathbf {n}}\), jumps to state

  • \(\mathbf {n-e}_i\), \(i=1,2\), s.t. \( n_i\ge 2\), at rate

    $$\begin{aligned} q({\mathbf {n}},\mathbf {n-e}_i) = \frac{n_i(n_i-1)}{2} \frac{k(\mathbf {n-e}_i)}{k({\mathbf {n}})}; \end{aligned}$$

    [coalescence]

  • \(\mathbf {n+e}_{j(i)}-{\mathbf {e}}_i\), \(i=1,2\), s.t. \( n_i\ge 1\), at rate

    $$\begin{aligned} q({\mathbf {n}},\mathbf {n+e}_{j(i)}-{\mathbf {e}}_i) = u_i n_i \frac{k(\mathbf {n+e}_{j(i)}-{\mathbf {e}}_i)}{k({\mathbf {n}})}; \end{aligned}$$

    [mutation]

  • \(\mathbf {n+e}_i\), \(i=1,2\), at rate

    $$\begin{aligned} q({\mathbf {n}},\mathbf {n+e}_i) = h_{j(i)} n \frac{k(\mathbf {n+e}_i)}{k({\mathbf {n}})}. \end{aligned}$$

    [branching]

As anticipated, the dual process just described corresponds to the limiting process in Etheridge and Griffiths (2009), which is the block counting process of the ancestral selection graph with types and mutations included in the backward evolution. From the transition rates q, it is observed that three types of events are possible for the dual process: mutation, coalescence and branching. The first two appear also in the Kingman coalescent, while the latter is a virtual addition to the true genealogical process that is characteristic of the ASG. Seen forward in time, branching represents the event that two potential parents are chosen and only the one carrying the advantageous allele reproduces. Backward in time, when a branching happens, the individual splits into two individuals: its true parent and its virtual (potential) parent.

To complete the identification of the transition rates, \(q({\mathbf {n}},{\mathbf {n}})\) is defined as the coefficient of \(F({\mathbf {x}},{\mathbf {n}})\) in (22),

$$\begin{aligned} q({\mathbf {n}},{\mathbf {n}})=-\frac{n}{2}(n-1) -\sum _{i=1,2}n_i u_{j(i)} -\sum _{i=1,2} n_{j(i)} h_i . \end{aligned}$$

The equality (18) ensures that the sum of each row of the transition matrix is equal to zero. Furthermore, by rewriting (18) in terms of the function k, a recursion formula can be obtained as in Krone and Neuhauser (1997, Thm 5.2). The formula, which we omit, is not useful in general to compute k explicitly, and even in the simpler case of no selection, where the formula, in principle, could be used, it is computationally too expensive for practical purposes.

In general it is not possible to find a closed-form expression for k and thus for the transition rates. However, when the mutations are parent independent, as in this example, the stationary density is explicitly known up to a normalising constant Z and thus k can be written as an integral with respect to the stationary density

$$\begin{aligned} k({\mathbf {n}})= \frac{1}{Z} \int _0^1 \!\!\!\! x^{n_1+ 2u_1-1}(1-x)^{n_2+2u_2-1} e^{ 2[h_1 x + h_2 (1-x)]} dx. \end{aligned}$$

The integral above cannot be computed analytically but it is related to the confluent hypergeometric function of the first kind, the Kummer function, which can be efficiently computed numerically. The idea of using the Kummer function originates from Aurell et al. (2019) and Krone and Neuhauser (1997), where it has been used to find, respectively, a series representation for the normalising constant and a representation for the expected allele frequency. Let \({}_{1}F_1\) be the confluent hypergeometric function, then, using its integral representation, it yields

$$\begin{aligned} k({\mathbf {n}})&= \frac{1}{Z} e^{2 h_2} B(n_1+2u_1,n_2+2u_2) {}_{1}F_1\left( n_1+2u_1,n_1+2u_1+n_2+2u_2,2(h_1-h_2)\right) , \\ Z&= e^{2 h_2} B(2u_1,2u_2) {}_{1}F_1(2u_1 , 2u_1 + 2u_2 , 2(h_1-h_2)), \end{aligned}$$

where B is the Beta function. See Abramowitz and Stegun (1970) for a complete collection of definitions and properties of confluent hypergeometric functions.

5 A multi-locus dual process

In this section a dual process for the coupled Wright–Fisher diffusion is derived in the general multi-locus setting, \(L \ge 1\) and \(M_l \ge 2\), \(l=1, \ldots , L\).

Theorem 1

Let \({\mathbf {X}}\) be the coupled Wright–Fisher diffusion with generator (9), where \(\mu \), g and d are given by (7), (6) and (8), respectively. Assume a stationary distribution for the diffusion exists and satisfies (15). Let k be given by (14) and let the duality function F be given by (13). Then there exists a dual process \({\mathbf {N}}\) for \({\mathbf {X}}\), in the sense of (3), with respect to the duality function F, where \({\mathbf {N}}\) is the pure jump Markov process on the state space \({\mathbb {N}}^M\) with the following transition rates. From the current state, \({\mathbf {n}}\in {\mathbb {N}}^M{\setminus }\{0\}\), \({\mathbf {N}}\) jumps to

  • \({\mathbf {n}}- {\mathbf {e}}_i^{(l)}, i=1,\dots M_l, l=1,\ldots , L,\) s.t. \(n_i^{(l)}\ge 2\), at rate

    $$\begin{aligned} q({\mathbf {n}}, {\mathbf {n}}- {\mathbf {e}}_i^{(l)} )= \frac{n_i^{(l)}(n_i^{(l)}-1)}{2} \frac{k({\mathbf {n}}- {\mathbf {e}}_i^{(l)})}{k({\mathbf {n}})} ; \end{aligned}$$

    [coalescence]

  • \({\mathbf {n}}- {\mathbf {e}}_i^{(l)} + {\mathbf {e}}_j^{(l)} , i,j=1,\dots M_l, l=1,\ldots , L,\) s.t. \(i\ne j,n_i^{(l)}\ge 1\), at rate

    $$\begin{aligned} q({\mathbf {n}}, {\mathbf {n}}- {\mathbf {e}}_i^{(l)}+ {\mathbf {e}}_j^{(l)} )= n_i^{(l)}u_{ji}^{(l)} \frac{k({\mathbf {n}}- {\mathbf {e}}_i^{(l)}+ {\mathbf {e}}_j^{(l)} )}{k({\mathbf {n}})} ; \end{aligned}$$

    [mutation]

  • \({\mathbf {n}}+ {\mathbf {e}}_j^{(l)}, j=1,\dots M_l, l=1,\ldots , L,\) at rate

    $$\begin{aligned} q({\mathbf {n}}, {\mathbf {n}}+ {\mathbf {e}}_j^{(l)} )= \left( n^{(l)}\sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}h_{k}^{(l)} +\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{i=1}^{M_r}n_i^{(r)}J_{ji}^{(lr)} \right) \frac{k({\mathbf {n}}+ {\mathbf {e}}_j^{(l)})}{k({\mathbf {n}})} ; \end{aligned}$$

    [single-branching]

  • \({\mathbf {n}}+ {\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)}, j=1\dots M_l, h=1\dots ,M_r, l,r=1,\ldots , L, r>l,\) at rate

    $$\begin{aligned} q({\mathbf {n}}, {\mathbf {n}}+ {\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)} )= \left( (n^{(l)}+n^{(r)}) \sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}\sum _{\begin{array}{c} m=1\\ m\ne h \end{array}}^{M_r} J_{km}^{(lr)} \right) \frac{k({\mathbf {n}}+ {\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)})}{k({\mathbf {n}})} ; \end{aligned}$$

    [double-branching] where \(n^{(l)}=\left\Vert {\mathbf {n}}^{(l)}\right\Vert _1=\sum _{i=1}^{M_l}n_i^{(l)}\). Furthermore,

    $$\begin{aligned} q({\mathbf {n}},{\mathbf {n}}) = - \sum _{l=1}^{L} \left( \frac{1}{2}n^{(l)}(n^{(l)}-1) + \sum _{\begin{array}{c} i,j=1\\ i\ne j \end{array}}^{M_l} n_i^{(l)}u_{ij}^{(l)} + \sum _{i=1}^{M_l}\sum _{\begin{array}{c} k=1\\ k\ne i \end{array}}^{M_l} n_{i}^{(l)}h_{k}^{(l)} +\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{k=1}^{M_l}\sum _{m=1}^{M_r} n^{(l)}J_{km}^{(lr)} \right) . \end{aligned}$$

Note that the mutation and coalescence jumps involve one locus at the time. The coalescence and mutation rates are similar to the transition rates of the Kingman coalescent process with mutations, the only difference being the function k which, despite having the same structure, is based on a different stationary density and depends on all the loci, not only on the one where the jump takes place. The single-branching rate not only contains the single-locus selection parameters in a form that generalises the rates in Sect. 4, but it also contains the two-locus selection parameters to include the effect of the pairwise interaction on the single locus. The single-branching also involve only one locus at the time. Finally, the double-branching rate reflects the particular structure of pairwise interactions of the coupled Wright–Fisher diffusion and it is, to the best of our knowledge, a novel type of transition rate appearing in genealogical processes related to Wright–Fisher diffusions. The double-branching represents simultaneous branching at two different loci. As anticipated in the introduction, the dual process can thus be interpreted as the block counting process of L coupled ASGs. Furthermore, if \(J=0\), the loci are independent since \(\nabla V={\mathbf {h}}\), and thus double-branching does not occur and the dual process consists of L independent ASGs, as in Etheridge and Griffiths (2009) with \(-2 \sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}h_{k}^{(l)} \) corresponding to \(\sigma _j \).

The explicit parts of the transition rates, not depending on the function k, have a very natural interpretation. As in the simpler case studied in Sect. 4, the basic principle is that weak types branch at a higher rate. The difference is that, while in the simpler case there are only two types, a viable type and a weaker type, here there are many types and many loci all influencing each other’s branching rates. To understand this behaviour in greater detail, some terms will be investigated more thoroughly. The term

$$\begin{aligned} n^{(l)}\sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}h_{k}^{(l)}, \end{aligned}$$

arises purely from the one-locus selection and contributes to the rate of adding a gene of type j at locus l. It depends on the one-locus viability of the other allele types (all except type j) at locus l, the higher their viability, the higher the rate of adding type j, and of course it is also directly proportional to the number, \(n^{(l)}\), of genes at locus l.

The rate of adding a couple of genes of type j at locus l and of type h at locus r is related to the term

$$\begin{aligned} (n^{(l)}+n^{(r)}) \sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}\sum _{\begin{array}{c} m=1\\ m\ne h \end{array}}^{M_r} J_{km}^{(lr)}. \end{aligned}$$

It depends on the viability of the other couples of allele types (all except couple jh) at loci l and r, the higher their viability, the higher the rate of adding type j and h at locus l and r, respectively. Again the rate is directly proportional to the number, \(n^{(l)}+n^{(r)}\), of genes at loci l and r.

Although the interpretation of some parts of the transition rates is straightforward, the function k remains implicit, similar to the simpler Kingman coalescent process and the ancestral selection graph with parent dependent mutations. When the mutations are parent independent, the stationary density is known up to a normalising constant and k can be expressed as an integral that sometimes can be easily computed numerically, see Sects. 4 and 7, where a series representation of k involving Kummer and Beta functions will be given. Nevertheless, even when the stationary distribution is not explicitly known, but still exists, Theorem 1 provides information on the structure of the transition rates of the dual process that may be useful. As explained in the introduction, many established inference methods for populations under various generalisations of the Wright–Fisher diffusion rely on approximating the backward dynamics of the associated genealogical process. Deriving a dual process for the coupled Wright–Fisher diffusion is central to further investigations concerning the genealogy of a sample and possibly provides a basis for the construction of inference methods inspired by the existing methods described in the introduction.

In general, if the transition rates are not known explicitly, it might seem difficult to provide bounds for the dual process. However, its growth is controlled by a much simpler Markov chain. Indeed, the process \( \left\Vert {\mathbf {N}}(t)\right\Vert _1\) is a jump process on \({\mathbb {N}}{\setminus }\{0\}\), with possible jumps: \(+2\), representing double-branching, \(+1\), representing single-branching and \(-1\), representing coalescence. As is typical of genealogical processes appearing in population genetics, the rate of negative (coalescent) jumps is at least quadratic and the rate of positive (branching) jumps is at most linear, as shown in details in (38) in the “Appendix”. This allows to construct a monotone coupling to bound the jump chain of \(\left\Vert {\mathbf {N}}\right\Vert _1\) by a simpler Markov chain which is reported in the “Appendix” as it could be useful for future work.

This section concludes with the extension of Theorem 1 to the case of no mutations, Corollary 1, and with two applications of the duality relationship. The first one is useful to derive an expansion of the transition density of the diffusion and the second one to study, in the absence of mutations, fixation/extinction probabilities of allele types.

The duality relation (3), which follows from Theorem 1, can be rewritten as

$$\begin{aligned} {\mathbb {E}}\left[ S({\mathbf {X}}(t),{\mathbf {n}})|{\mathbf {X}}(0)={\mathbf {x}}\right] = {\mathbb {E}}\left[ S({\tilde{{\mathbf {X}}}},{\mathbf {n}})\right] \sum _{{\mathbf {m}}\in {\mathbb {N}}^M} p_{{\mathbf {n}},{\mathbf {m}}}(t)F({\mathbf {x}},{\mathbf {m}}), \end{aligned}$$

where \(S({\mathbf {x}},{\mathbf {n}})\) is the probability mass function of L independent multinomial random variables with parameters \({\mathbf {x}}^{(1)},\ldots ,{\mathbf {x}}^{(L)}\) and \(n^{(1)},\ldots ,n^{(L)}\), and \(p_{{\mathbf {n}},{\mathbf {m}}}(t)\) are the transition densities of the dual process \({\mathbf {N}}\). By applying sample inversion as \(n\rightarrow \infty \), as in Etheridge and Griffiths (2009); Griffiths et al. (2016), an expansion for the transition density of \({\mathbf {X}}\) can be obtained in terms of the limit of the transition densities of \({\mathbf {N}}\), the stationary density of \({\mathbf {X}}\) and the duality function F. This corresponds to identifying the distribution of \({\mathbf {X}}(t)\) from its moments. The derivation of the expansion is essentially similar to the one in Griffiths et al. (2016), a rigorous proof is left to future work.

In the case of no mutation, \(\theta _l=0\), \(l=1,\ldots ,L\), the boundaries are absorbing for the diffusion \({\mathbf {X}}\). Any distribution that puts all its mass on one allele type for each locus is an invariant distribution for the diffusion but does not satisfy assumption (15). Nevertheless, as anticipated in Sect. 3, the derivation of a dual process in this case is simpler than in the presence of mutations, as there is no need of relying on invariant distributions to define the duality function. In fact, the duality function can be defined explicitly by defining the function k to be equal to a product of multivariate Beta functions,

$$\begin{aligned} k({\mathbf {n}})=\prod _{l=1}^L B({\mathbf {n}}^{(l)}) \quad \text {with} \quad B({\mathbf {n}}^{(l)})=\frac{\prod _{i=1}^{M_l} \varGamma (n_i^{(l)})}{\varGamma (n^{(l)})} . \end{aligned}$$
(23)

To get an intuition on why k is defined in this way, note that in the neutral one-locus model with parent independent mutations \(k({\mathbf {n}})=\frac{B({\mathbf {n}}+2{\mathbf {u}})}{B(2 {\mathbf {u}})}\), where \({\mathbf {u}}\) is the vector of mutation transition rates with \(u_j=\theta P_{ij}\), and, as \(\theta \rightarrow 0\), \(k({\mathbf {n}})\) converges to \(B({\mathbf {n}})\). Using definition (23) for k, the transition rates of the dual process can be derived from those in Theorem 1, see Sect. 6 for more details, to obtain the following

Corollary 1

Let \({\mathbf {X}}\) be defined as in Theorem 1 and assume \(\theta _l=0\), \(l=1,\ldots ,L\). Then there exists a dual process \({\mathbf {N}}\) for \({\mathbf {X}}\), in the sense of (3), with respect to the duality function \(F({\mathbf {x}},{\mathbf {n}})=\prod _{l=1}^L\frac{1}{B({\mathbf {n}}^{(l)})}\prod _{i=1}^{M_l} (x_{i}^{(l)})^{n_i^{(l)}}\). \({\mathbf {N}}\) is the pure jump Markov process on the state space \({\mathbb {N}}^M\) with transition rates

  • \( q({\mathbf {n}}, {\mathbf {n}}- {\mathbf {e}}_i^{(l)} )= \frac{n_i^{(l)}(n^{(l)}-1)}{2} , \quad i=1,\dots M_l, l=1,\ldots , L ; \)

  • \( q({\mathbf {n}}, {\mathbf {n}}+ {\mathbf {e}}_j^{(l)} )= n_j^{(l)}\sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}h_{k}^{(l)} +\frac{n_j^{(l)}}{n^{(l)}}\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{i=1}^{M_r}n_i^{(r)}J_{ji}^{(lr)} , \quad j=1,\dots M_l, l=1,\ldots , L; \)

  • \( q({\mathbf {n}}, {\mathbf {n}}+ {\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)} )= (n^{(l)}+n^{(r)}) \frac{n_j^{(l)}n_h^{(r)}}{n^{(l)}n^{(r)}} \sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}\sum _{\begin{array}{c} m=1\\ m\ne h \end{array}}^{M_r} J_{km}^{(lr)} , \quad j=1\dots M_l, h=1\dots ,M_r,\) \( l,r=1,\ldots , L, r>l. \)

Note that, due to the absence of mutations, the transition rates in this case are fully explicit. By Corollary 1, the duality relationship (3) can be rewritten as

$$\begin{aligned} {\mathbb {E}}\left[ \prod _{l=1}^L\prod _{i=1}^{M_l} X_{i}^{(l)}(t)^{n_i^{(l)}}|{\mathbf {X}}(0)={\mathbf {x}}\right] = {\mathbb {E}}\left[ \prod _{l=1}^L \frac{B({\mathbf {n}}^{(l)})}{B({\mathbf {N}}^{(l)}(t))} \prod _{i=1}^{M_l} (x_{i}^{(l)})^{N_i^{(l)}(t)} |{\mathbf {N}}(0)={\mathbf {n}}\right] , \end{aligned}$$
(24)

letting \(t \rightarrow \infty \) enables the study of fixation/extinction probabilities of allele types, as in e.g. Etheridge and Griffiths (2009), Foucart (2013), González Casanova and Spanó (2018), Griffiths et al. (2016) and Mano (2009).

6 Proofs of the main results

6.1 Proof of Theorem 1

Following the outline in Sect. 3, a dual process is derived as follows. By applying the generator \({\mathcal {L}}\) to the duality function F in (13) each term in the expression for \({\mathcal {L}}F\) is treated separately. As in Sect. 4, the terms arising from mutation and diffusion can be easily rewritten in the required form. Summing the mutation terms over allele types at locus l yields

$$\begin{aligned} \begin{aligned} \sum _{i=1}^{M_l} \mu _i^{(l)}({\mathbf {x}}^{(l)}) \frac{\partial F}{\partial x_i^{(l)} }({\mathbf {x}},{\mathbf {n}}) = \sum _{\begin{array}{c} i,j=1\\ i\ne j \end{array}}^{M_l} n_i^{(l)}u_{ji}^{(l)} \frac{x_j^{(l)}}{x_i^{(l)}}F({\mathbf {x}},{\mathbf {n}}) - \sum _{\begin{array}{c} i,j=1\\ i\ne j \end{array}}^{M_l} n_i^{(l)}u_{ij}^{(l)} F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$

Using identity (20) at locus l the mutation terms can be rewritten in the desired form

$$\begin{aligned} \begin{aligned}&\sum _{i=1}^{M_l} \mu _i^{(l)}({\mathbf {x}}^{(l)}) \frac{\partial F}{\partial x_i^{(l)} }({\mathbf {x}},{\mathbf {n}}) \\&\quad = \sum _{\begin{array}{c} i,j=1\\ i\ne j \end{array}}^{M_l} n_i^{(l)}u_{ji}^{(l)} \frac{k({\mathbf {n}}-{\mathbf {e}}_i^{(l)}+{\mathbf {e}}_j^{(l)})}{k({\mathbf {n}})} F({\mathbf {x}},{\mathbf {n}}-e_i^{(l)}+e_j^{(l)}) - \sum _{\begin{array}{c} i,j=1\\ i\ne j \end{array}}^{M_l} n_i^{(l)}u_{ij}^{(l)} F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$
(25)

For the diffusion terms, the diagonal and off-diagonal terms are written separately as

$$\begin{aligned} d_{ii}^{(l)} ({\mathbf {x}}^{(l)}) \frac{\partial ^2F}{\partial x_i^{(l) 2} } ({\mathbf {x}},{\mathbf {n}})&= n_i^{(l)}(n_i^{(l)}-1) \frac{1}{x_i^{(l)}} F({\mathbf {x}},{\mathbf {n}}) - n_i^{(l)}(n_i^{(l)}-1)F({\mathbf {x}},{\mathbf {n}}), \\ d_{ij}^{(l)} ({\mathbf {x}}^{(l)})\frac{\partial ^2F}{\partial x_i^{(l)} \partial x_j^{(l)} }({\mathbf {x}},{\mathbf {n}})&= -n_i^{(l)}n_j^{(l)}F({\mathbf {x}},{\mathbf {n}}), \quad i \ne j. \end{aligned}$$

Summing the diffusion terms at locus l and rearranging yields

$$\begin{aligned} \frac{1}{2}\sum _{i,j=1}^{M_l} d_{ij}^{(l)}({\mathbf {x}}^{(l)}) \frac{\partial ^2F}{\partial x_i^{(l)} \partial x_j^{(l)} } ({\mathbf {x}},{\mathbf {n}})&= \sum _{i=1}^{M_l} \frac{n_i^{(l)}(n_i^{(l)}-1)}{2} \frac{1}{x_i^{(l)}} F({\mathbf {x}},{\mathbf {n}})\\&\quad - \frac{1}{2}n^{(l)}(n^{(l)}-1)F({\mathbf {x}},{\mathbf {n}}). \end{aligned}$$

Now use identity (19) at locus l to obtain

$$\begin{aligned} \begin{aligned} \frac{1}{2}\sum _{i,j=1}^{M_l} d_{ij}^{(l)}({\mathbf {x}}^{(l)}) \frac{\partial ^2F}{\partial x_i^{(l)} \partial x_j^{(l)} } ({\mathbf {x}},{\mathbf {n}})&= \sum _{i=1}^{M_l} \frac{n_i^{(l)}(n_i^{(l)}-1)}{2} \frac{k({\mathbf {n}}-{\mathbf {e}}_i^{(l)})}{k({\mathbf {n}})} F({\mathbf {x}},{\mathbf {n}}-{\mathbf {e}}_i^{(l)})\\&\quad - \frac{1}{2}n^{(l)}(n^{(l)}-1)F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$
(26)

Next, consider the interaction terms. Using the definition of \(g^{(l)}\) and rewriting the derivatives of F yields,

$$\begin{aligned} \begin{aligned} \sum _{i=1}^{M_l} g_i^{(l)}({\mathbf {x}}) \frac{\partial F}{\partial x_i^{(l)} }({\mathbf {x}},{\mathbf {n}})&= \underbrace{ \sum _{i=1}^{M_l} \sum _{k=1}^{M_l} n_{i}^{(l)}h_{k}^{(l)}(\delta _{ik}-x_{k}^{(l)})F({\mathbf {x}},{\mathbf {n}}) }_{S_1}\\&\quad + \underbrace{ \sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{i=1}^{M_l}\sum _{k=1}^{M_l}\sum _{m=1}^{M_r} n_{i}^{(l)}J_{km}^{(lr)}(\delta _{ik}-x_{k}^{(l)})x_m^{(r)}F({\mathbf {x}},{\mathbf {n}}) }_{S_2}. \end{aligned} \end{aligned}$$
(27)

Note that the first group of sums, \(S_1\), contains the one-locus selection parameters while the second, \(S_2\), contains the pairwise selection parameters. Each of them will be treated separately. The one-locus selection term can be rearranged into

$$\begin{aligned} \begin{aligned} S_1= \sum _{i=1}^{M_l} n_{i}^{(l)}h_{i}^{(l)}F({\mathbf {x}},{\mathbf {n}}) - \sum _{i=1}^{M_l} \sum _{k=1}^{M_l} n_{i}^{(l)}h_{k}^{(l)}x_{k}^{(l)}F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$

As in Sect. 4, the fact that the sum of the frequencies at each locus equals one is used. Since \(x_{k}^{(l)}=1- \sum _{\begin{array}{c} j=1\\ j\ne k \end{array}}^{M_l}x_{j}^{(l)}\), the terms can be rearranged to obtain

$$\begin{aligned} \begin{aligned} S_1= -\sum _{i=1}^{M_l} \sum _{\begin{array}{c} k=1\\ k\ne i \end{array}}^{M_l} n_{i}^{(l)}h_{k}^{(l)}F({\mathbf {x}},{\mathbf {n}}) + \sum _{j=1}^{M_l} \left( n^{(l)}\sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}h_{k}^{(l)} \right) x_{j}^{(l)}F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$

The second part of (27) can be expressed as

$$\begin{aligned} \begin{aligned} S_2 = \sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{i=1}^{M_l}\sum _{m=1}^{M_r} n_{i}^{(l)}J_{im}^{(lr)}x_m^{(r)}F({\mathbf {x}},{\mathbf {n}}) - \sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{k=1}^{M_l}\sum _{m=1}^{M_r} n^{(l)}J_{km}^{(lr)}x_{k}^{(l)}x_m^{(r)}F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$

This time the equality

$$\begin{aligned} -x_k^{(l)}x_m^{(r)}= -1+ \sum _{j=1}^{M_l} \sum _{h=1}^{M_r} (1-\delta _{hm}\delta _{jk})x_j^{(l)}x_h^{(r)}, \end{aligned}$$
(28)

will be used. To see that (28) holds, the fact that the frequencies sum up to one at each locus is used multiple times, as follows,

$$\begin{aligned} \begin{aligned} -x_k^{(l)}x_m^{(r)}&= -x_k^{(l)}\left( 1-\sum _{\begin{array}{c} h=1\\ h\ne m \end{array}}^{M_r}x_h^{(r)} \right) \\&= -x_k^{(l)} + \sum _{h=1}^{M_r}(1-\delta _{hm})x_k^{(l)}x_h^{(r)}\\&= -1 + \sum _{\begin{array}{c} j=1\\ j\ne k \end{array}}^{M_l}x_j^{(l)} \cdot \sum _{h=1}^{M_r}x_h^{(r)} +\sum _{h=1}^{M_r}(1-\delta _{hm})x_k^{(l)}x_h^{(r)}\\&= -1 +\sum _{j=1}^{M_l}\sum _{h=1}^{M_r} (1-\delta _{jk}) x_j^{(l)}x_h^{(r)} +\sum _{j=1}^{M_l}\sum _{h=1}^{M_r} \delta _{jk}(1-\delta _{hm}) x_k^{(l)}x_h^{(r)}\\&= -1+ \sum _{j=1}^{M_l} \sum _{h=1}^{M_r} (1-\delta _{hm}\delta _{jk})x_j^{(l)}x_h^{(r)}. \end{aligned} \end{aligned}$$

Applying (28) in the expression for \(S_2\) and rearranging the terms, leads to

$$\begin{aligned} \begin{aligned} S_2&= - \sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{k=1}^{M_l}\sum _{m=1}^{M_r} n^{(l)}J_{km}^{(lr)}F({\mathbf {x}},{\mathbf {n}}) + \sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{m=1}^{M_r} \left( \sum _{i=1}^{M_l} n_{i}^{(l)}J_{im}^{(lr)}\right) x_m^{(r)}F({\mathbf {x}},{\mathbf {n}})\\&\quad + \sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{j=1}^{M_l}\sum _{h=1}^{M_r} \left( n^{(l)}\sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}\sum _{\begin{array}{c} m=1\\ m\ne h \end{array}}^{M_r} J_{km}^{(lr)} \right) x_{j}^{(l)}x_h^{(r)}F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$

Summing over l and putting similar terms together yields

$$\begin{aligned} \begin{aligned}&\sum _{l=1}^{L} \sum _{i=1}^{M_l} g_i^{(l)}({\mathbf {x}}) \frac{\partial F}{\partial x_i^{(l)} }({\mathbf {x}},{\mathbf {n}}) \\&\quad = - \left( \sum _{l=1}^{L}\sum _{i=1}^{M_l}\sum _{\begin{array}{c} k=1\\ k\ne i \end{array}}^{M_l} n_{i}^{(l)}h_{k}^{(l)} +\sum _{l=1}^{L}\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{k=1}^{M_l}\sum _{m=1}^{M_r} n^{(l)}J_{km}^{(lr)} \right) F({\mathbf {x}},{\mathbf {n}}) \\&\qquad + \sum _{l=1}^{L}\sum _{j=1}^{M_l} \left( n^{(l)}\sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}h_{k}^{(l)} +\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{i=1}^{M_r}n_i^{(r)}J_{ji}^{(lr)} \right) x_{j}^{(l)}F({\mathbf {x}},{\mathbf {n}}) \\&\qquad + \sum _{l=1}^{L}\sum _{\begin{array}{c} r=1\\ r> l \end{array}}^{L}\sum _{j=1}^{M_l}\sum _{h=1}^{M_r} \left( (n^{(l)}+n^{(r)}) \sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}\sum _{\begin{array}{c} m=1\\ m\ne h \end{array}}^{M_r} J_{km}^{(lr)} \right) x_{j}^{(l)}x_h^{(r)}F({\mathbf {x}},{\mathbf {n}}). \end{aligned} \end{aligned}$$

Use the identities (21) at locus l and

$$\begin{aligned} \begin{aligned} x_{j}^{(l)}x_h^{(r)}F({\mathbf {x}},{\mathbf {n}})= \frac{k({\mathbf {n}}+{\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)})}{k({\mathbf {n}})} F({\mathbf {x}},{\mathbf {n}}+{\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)}) \end{aligned} \end{aligned}$$

for the mixed terms involving loci l and r, in order to rewrite the selection terms in the desired form

$$\begin{aligned}&\sum _{l=1}^{L} \sum _{i=1}^{M_l} g_i^{(l)}({\mathbf {x}}) \frac{\partial F}{\partial x_i^{(l)} }({\mathbf {x}},{\mathbf {n}}) =\nonumber \\&\quad - \left( \sum _{l=1}^{L}\sum _{i=1}^{M_l}\sum _{\begin{array}{c} k=1\\ k\ne i \end{array}}^{M_l} n_{i}^{(l)}h_{k}^{(l)} +\sum _{l=1}^{L}\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{k=1}^{M_l}\sum _{m=1}^{M_r} n^{(l)}J_{km}^{(lr)} \right) F({\mathbf {x}},{\mathbf {n}}) \nonumber \\&\quad + \sum _{l=1}^{L}\sum _{j=1}^{M_l} \left( n^{(l)}\sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}h_{k}^{(l)} +\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{i=1}^{M_r}n_i^{(r)}J_{ji}^{(lr)} \right) \frac{k({\mathbf {n}}+{\mathbf {e}}_j^{(l)})}{k({\mathbf {n}})} F({\mathbf {x}},{\mathbf {n}}+{\mathbf {e}}_j^{(l)}) \nonumber \\&\quad + \sum _{l=1}^{L}\sum _{\begin{array}{c} r=1\\ r> l \end{array}}^{L}\sum _{j=1}^{M_l}\sum _{h=1}^{M_r} \left( (n^{(l)}+n^{(r)}) \sum _{\begin{array}{c} k=1\\ k\ne j \end{array}}^{M_l}\sum _{\begin{array}{c} m=1\\ m\ne h \end{array}}^{M_r} J_{km}^{(lr)} \right) \frac{k({\mathbf {n}}+{\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)})}{k({\mathbf {n}})} F({\mathbf {x}},{\mathbf {n}}+{\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)}). \nonumber \\ \end{aligned}$$
(29)

The terms arising from mutation (25), diffusion (26) and selection (29) are now written in form (16). It is finally possible to identify the transition rates of the dual process.

In order to complete the proof, the method of duality is applied, more precisely, Corollary 4.4.13 in Ethier and Kurtz (1986), which amounts to verifying the following conditions:

$$\begin{aligned} F({\mathbf {X}}(t),{\mathbf {n}}) -\int _0^t {\mathcal {L}}F(\cdot ,{\mathbf {n}})({\mathbf {X}}(s))ds \quad \text {and} \quad F({\mathbf {x}},{\mathbf {N}}(t)) - \int _0^t {\mathcal {L}}^D F({\mathbf {x}},\cdot )({\mathbf {N}}(s))ds \end{aligned}$$

are martingales and, for each \(T > 0\), there exists an integrable random variable \(\varGamma _T\) such that

$$\begin{aligned}&\sup _{0\le s,t\le T} \left| F({\mathbf {X}}(s),{\mathbf {N}}(t))\right| \le \varGamma _T, \end{aligned}$$
(30)
$$\begin{aligned}&\sup _{0\le s,t\le T} \left| {\mathcal {L}}F(\cdot ,{\mathbf {N}}(t))({\mathbf {X}}(s))\right| =\sup _{0\le s,t\le T} \left| {\mathcal {L}}^D F({\mathbf {X}}(s),\cdot )({\mathbf {N}}(t))\right| \le \varGamma _T, \end{aligned}$$
(31)

almost surely. First note that, as discussed in Sect. 3, \(F(\cdot ,{\mathbf {n}})\) belongs to the domain of \({\mathcal {L}}\), for all \({\mathbf {n}}\in {\mathbb {N}}^M\), and the integrability conditions (30)–(31) ensure that \(F({\mathbf {x}},\cdot )\) belongs to the domain of \({\mathcal {L}}^D\), for all \({\mathbf {x}}\in {\mathcal {S}}\), and that the processes \(F({\mathbf {X}}(t),{\mathbf {n}}) -\int _0^t {\mathcal {L}}F(\cdot ,{\mathbf {n}})({\mathbf {X}}(s))ds\) and \(F({\mathbf {x}},{\mathbf {N}}(t)) -\int _0^t {\mathcal {L}}^D F({\mathbf {x}},\cdot )({\mathbf {N}}(s))ds\) are integrable. The martingale property trivially holds for both processes.

In order to complete the proof, following Barbour et al. (2000) to verify (30)–(31), it is sufficient to show that there exists a function \(H: {\mathbb {N}}^M\rightarrow [0,\infty )\) such that

$$\begin{aligned} F({\mathbf {x}},{\mathbf {n}}) + \left| \mathcal {L^D}F({\mathbf {x}},\cdot )({\mathbf {n}})\right| \le H({\mathbf {n}}), \quad \forall ({\mathbf {x}},{\mathbf {n}})\in {\mathcal {S}}\times {\mathbb {N}}^M, \end{aligned}$$
(32)

and \(\left\{ H({\mathbf {N}}(t\wedge \tau _j)), 0\le t \le T, j\ge 1\right\} \), where \(\tau _j=\inf \{s\ge 0 : \left\Vert {\mathbf {N}}(s)\right\Vert _1\ge j \}\), is uniformly integrable, for all initial conditions, \({\mathbf {N}}(0)={\mathbf {n}}\in {\mathbb {N}}^M\), and all \(T\ge 0\). First, bounds for F and \({\mathcal {L}}^D F\) are provided.

The definition (14) of k and Jensen’s inequality yield

$$\begin{aligned} k({\mathbf {n}})= {\mathbb {E}}\left[ \prod _{l=1}^{L}\prod _{i=1}^{M_l} (\tilde{X}_i^{(l)})^{n_i^{(l)}}\right] \ge {\mathbb {E}}\left[ \left( \prod _{l=1}^{L}\prod _{i=1}^{M_l} \tilde{X}_i^{(l)}\right) ^{\left\Vert {\mathbf {n}}\right\Vert _1}\right] \ge {\mathbb {E}}\left[ \prod _{l=1}^{L}\prod _{i=1}^{M_l} \tilde{X}_i^{(l)}\right] ^{\left\Vert {\mathbf {n}}\right\Vert _1}. \end{aligned}$$

Denote \({\mathbb {E}}\left[ \prod _{l=1}^{L}\prod _{i=1}^{M_l} \tilde{X}_i^{(l)}\right] ^{-1}\) by a. Because of assumption (15), the latter expectation is non-zero and a is well defined, furthermore, \(a>1\). Consequently, using the definition (13) of F, it follows that

$$\begin{aligned} \begin{aligned} F({\mathbf {x}},{\mathbf {n}}) \le a ^{\left\Vert {\mathbf {n}}\right\Vert _1}. \end{aligned} \end{aligned}$$
(33)

Moreover,

$$\begin{aligned} \left| {\mathcal {L}} F(\cdot ,{\mathbf {n}})({\mathbf {x}}) \right| = \left| {\mathcal {L}}^D F({\mathbf {x}},\cdot )({\mathbf {n}}) \right|&\le \sum _{{\hat{{\mathbf {n}}}}} |q({\mathbf {n}},{\hat{{\mathbf {n}}}})| |F({\mathbf {x}},{\hat{{\mathbf {n}}}})-F({\mathbf {x}},{\mathbf {n}})| \nonumber \\&\le \sum _{{\hat{{\mathbf {n}}}}} |q({\mathbf {n}},{\hat{{\mathbf {n}}}})| \left[ a ^{\left\Vert {\mathbf {n}}\right\Vert _1+2} + a ^{\left\Vert {\mathbf {n}}\right\Vert _1} \right] \nonumber \\&\le b'' \left\Vert {\mathbf {n}}\right\Vert _1^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1} , \end{aligned}$$
(34)

since \(\left\Vert {\hat{{\mathbf {n}}}}\right\Vert _1\le \left\Vert {\mathbf {n}}\right\Vert _1+2\), \(\sum _{{\hat{{\mathbf {n}}}}} |q({\mathbf {n}},{\hat{{\mathbf {n}}}})|= 2 |q({\mathbf {n}},{\mathbf {n}})|\), and \(|q({\mathbf {n}},{\mathbf {n}})|\le b''' \left\Vert {\mathbf {n}}\right\Vert _1^2\), for some \(b'''>0\) and \(b''=4b'''a^2\). By (33) and (34), it is implied that inequality (32) holds true with \(H({\mathbf {n}})= b' \left\Vert {\mathbf {n}}\right\Vert _1^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1}\), for some \(b'>0\). Consider now the following representation formula for the expectation of H applied to the stopped dual process,

$$\begin{aligned} {\mathbb {E}}\left[ H({\mathbf {N}}(t\wedge \tau _j))| {\mathbf {N}}(0)={\mathbf {n}}\right] = H({\mathbf {n}}) +{\mathbb {E}}\left[ \int _0^{t\wedge \tau _j} {\mathcal {L}}^D H ({\mathbf {N}}(s)) ds\right] . \end{aligned}$$
(35)

Using the inequalities (38) of the “Appendix”,

$$\begin{aligned} \begin{aligned} {\mathcal {L}}^D H ({\mathbf {n}})&= b' \sum _{l=1}^{L}\sum _{i=1}^{M_l} q({\mathbf {n}},{\mathbf {n}}-{\mathbf {e}}_i^{(l)}) \left[ (\left\Vert {\mathbf {n}}\right\Vert _1-1)^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1-1} - \left\Vert {\mathbf {n}}\right\Vert _1^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1} \right] \\&\quad + b' \sum _{l=1}^{L}\sum _{i=1}^{M_l} q({\mathbf {n}},{\mathbf {n}}+{\mathbf {e}}_i^{(l)}) \left[ (\left\Vert {\mathbf {n}}\right\Vert _1+1)^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1+1} - \left\Vert {\mathbf {n}}\right\Vert _1^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1} \right] \\&\quad + b' \sum _{l=1}^{L}\sum _{\begin{array}{c} r=1\\ r> l \end{array}}^{L}\sum _{j=1}^{M_l}\sum _{h=1}^{M_r} q({\mathbf {n}},{\mathbf {n}}+{\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)}) \left[ (\left\Vert {\mathbf {n}}\right\Vert _1+2)^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1+2} -\left\Vert {\mathbf {n}}\right\Vert _1^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1} \right] \\&\le b' \left\Vert {\mathbf {n}}\right\Vert _1 a ^{\left\Vert {\mathbf {n}}\right\Vert _1-1} \{ c(\left\Vert {\mathbf {n}}\right\Vert _1-1) \left[ (\left\Vert {\mathbf {n}}\right\Vert _1-1)^2- a \left\Vert {\mathbf {n}}\right\Vert _1 ^2\right] \\&\quad +a d \left[ a (\left\Vert {\mathbf {n}}\right\Vert _1+1)^2 + a^2 (\left\Vert {\mathbf {n}}\right\Vert _1+2)^2 -2 \left\Vert {\mathbf {n}}\right\Vert _1^2 \right] \} \\&\le b , \end{aligned} \end{aligned}$$

where the last inequality holds since \(a>1\), for some \(b>0 \), and all \({\mathbf {n}}\in {\mathbb {N}}^M\). The inequality in the last display, together with (35), implies

$$\begin{aligned} {\mathbb {E}}\left[ H({\mathbf {N}}(t\wedge \tau _j))| {\mathbf {N}}(0)={\mathbf {n}}\right] \le b' \left\Vert {\mathbf {n}}\right\Vert _1^2 a ^{\left\Vert {\mathbf {n}}\right\Vert _1} + b T. \end{aligned}$$
(36)

Inequalities (32) and (36) finally ensure the method of duality can indeed be applied, which guarantees that the duality relationship between the generators (11), proved in this section, implies the duality among the processes in the sense of (3). This completes the proof of Theorem 1.

6.2 Proof of Corollary 1

Assume \(\theta _l=0\), \(l=1,\ldots ,L\), and let k be defined as in (23). The rewriting of the diffusion and selection terms in (26) and (29) remains valid, even if (15) is not satisfied in this case. Furthermore, it is possible to explicitly calculate, for \(l,r=1,\dots L, r\ne l, i,j=1,\ldots ,M_l, h=1,\ldots ,M_r\),

$$\begin{aligned} \frac{k({\mathbf {n}}-{\mathbf {e}}_i^{(l)})}{k({\mathbf {n}})}=\frac{n^{(l)}-1}{n_i^{(l)}-1}, \quad \frac{k({\mathbf {n}}+{\mathbf {e}}_j^{(l)})}{k({\mathbf {n}})} =\frac{n_j^{(l)}}{n^{(l)}}, \quad \frac{k({\mathbf {n}}+{\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)})}{k({\mathbf {n}})} =\frac{n_j^{(l)}n_h^{(r)}}{n^{(l)}n^{(r)}} . \end{aligned}$$

Replacing these ratios in (26) and (29) yields an expression of the form (16) and provides the expression for the transition rates. As outlined in Sect. 3 an expression of the form (16) implies that the duality relationship between the generators of the diffusion and its dual process holds if (18) is satisfied. Since a stationary distribution that satisfies (15) does not exist, the argument in Sect. 3 for proving (18) cannot be applied. However, direct calculation shows that

$$\begin{aligned} \begin{aligned}&\sum _{l=1}^{L}\sum _{i=1}^{M_l} q({\mathbf {n}},{\mathbf {n}}-{\mathbf {e}}_i^{(l)}) + \sum _{l=1}^{L}\sum _{j=1}^{M_l} q({\mathbf {n}},{\mathbf {n}}+{\mathbf {e}}_j^{(l)}) + \sum _{l=1}^{L}\sum _{\begin{array}{c} r=1\\ r>l \end{array}}^{L}\sum _{j=1}^{M_l}\sum _{h=1}^{M_r} q({\mathbf {n}},{\mathbf {n}}+{\mathbf {e}}_j^{(l)}+{\mathbf {e}}_h^{(r)}) \\&\quad = \sum _{l=1}^{L} \left( \frac{1}{2}n^{(l)}(n^{(l)}-1) + \sum _{i=1}^{M_l}\sum _{\begin{array}{c} k=1\\ k\ne i \end{array}}^{M_l} n_{i}^{(l)}h_{k}^{(l)} +\sum _{\begin{array}{c} r=1\\ r\ne l \end{array}}^{L}\sum _{k=1}^{M_l}\sum _{m=1}^{M_r} n^{(l)}J_{km}^{(lr)} \right) , \end{aligned} \end{aligned}$$

which implies (18). Finally, the method of duality, using Corollary 4.4.3 in Ethier and Kurtz (1986) as in the proof of Theorem 1, ensures the duality relationship between the processes holds.

7 Two loci, two alleles, with pairwise selection and parent independent mutation

In this section a particular example will be considered, where there are two loci, \(L=2\), and two allele types at each locus, \(M_1=M_2=2\). The pairwise interactions are represented by the matrix

$$\begin{aligned} J= \left( \begin{array}{llll} 0 &{} 0 &{} J_1 &{} 0 \\ 0 &{} 0 &{} 0 &{} J_2 \\ J_1 &{} 0 &{} 0 &{} 0 \\ 0 &{} J_2 &{} 0 &{} 0 \end{array} \right) , \end{aligned}$$

and there is no single-locus selection, \({\mathbf {h}}=0\). Furthermore, parent independent mutations are assumed.

In this special case, the function k, in (14), and consequently the transition rates of the dual process can be computed rather efficiently. The main difficulty in the computation is that the normalising constant of the stationary density (10) is unknown. It may be noted that computing the normalising constant and the function k are closely related problems. In fact, by defining

$$\begin{aligned} I(a_1,a_2,b_1,b_2)= \int _0^1 \!\! \int _0^1 \!\!\!\! x^{a_1-1}(1-x)^{a_2-1} y^{b_1-1}(1-y)^{b_2-1} e^{ 2[J_1 xy + J_2 (1-x) (1-y) ]} dx dy, \end{aligned}$$

the normalising constant can be written as

$$\begin{aligned} Z=I(2u_1^{(1)},2u_2^{(1)},2u_1^{(2)},2u_2^{(2)}), \end{aligned}$$

and the function k as

$$\begin{aligned} k({\mathbf {n}})= \frac{1}{Z} I(n_1^{(1)}+2u_1^{(1)},n_2^{(1)}+2u_2^{(1)},n_1^{(2)}+2u_1^{(2)},n_2^{(2)}+2u_2^{(2)}). \end{aligned}$$

The integral I cannot be computed analytically, but it is possible to find a series representation of it in terms of Beta and Kummer functions, which can be truncated to numerically evaluate the function k. The following formula is derived by a straightforward, albeit cumbersome, application of definitions and properties of Kummer functions

$$\begin{aligned}&I(a_1,a_2 ,b_1,b_2) \\&\quad = e^{2J_2} B(a_1,a_2) \sum _{n=0}^{\infty } \frac{[a_1]_n}{[a_1+a_2]_n} \frac{(-2J_2)^n}{n!} \\&\qquad \times \sum _{k=0}^n \left( {\begin{array}{c}n\\ k\end{array}}\right) \left( -\frac{J_1+J_2}{J_2}\right) ^k B(k+b_1,b_2) {}_1F_1(k+b_1,k+b_1+b_2,-2J_2), \end{aligned}$$

where B is the Beta function, \({}_1F_1\) is the Kummer function and \([a]_n=a(a+1)\cdots (a+n-1)\), for \(n>0\), \([a]_0=1\).

As an illustration, the stationary density of independent Wright–Fisher diffusions, with \(J_1 = J_2 = 0\), is compared to the stationary density of the coupled Wright–Fisher diffusion, with \(J_1 = J_2 = 2\), in Fig. 1. Both distributions have mutation rates \(u_1^{(1)}=u_2^{(1)}=u_1^{(2)}=u_2^{(2)}=0.8\). On the left hand side the mutation strength keeps the mass of the stationary distribution in the centre of the unit square. In contrast, on the right hand side, while the mutation strength still tends to keep the mass in the centre, the selection strength moves the mass towards the points (0, 0) and (1, 1), which represent the most viable couples of allele types, i.e., 1, 1 and 2, 2.

Fig. 1
figure 1

Stationary density of a coupled Wright–Fisher diffusion for two loci, two alleles, with no interaction (left) and non-zero interaction (right). Mutation parameters: \(u_1^{(1)}=u_2^{(1)}=u_1^{(2)}=u_2^{(2)}=0.8\). Double-locus selection parameters: \(J_1=J_2=0\) (left), \(J_1=J_2=2\) (right)