The Expected Sample Allele Frequencies from Populations of Changing Size via Orthogonal Polynomials

In this article, discrete and stochastic changes in (effective) population size are incorporated into the spectral representation of a biallelic diffusion process for drift and small mutation rates. A forward algorithm inspired by Hidden-Markov-Model (HMM) literature is used to compute exact sample allele frequency spectra for three demographic scenarios: single changes in (effective) population size, boom-bust dynamics, and stochastic fluctuations in (effective) population size. An approach for fully agnostic demographic inference from these sample allele spectra is explored, and sufficient statistics for step-wise changes in population size are found. Further, convergence behaviours of the polymorphic sample spectra for population size changes on different time scales are examined and discussed within the context of inference of the effective population size. Joint visual assessment of the sample spectra and the temporal coefficients of the spectral decomposition of the forward diffusion process is found to be important in determining departure from equilibrium. Stochastic changes in (effective) population size are shown to shape sample spectra particularly strongly.


Introduction
Population demography is particularly difficult to infer reliably, even without considering its interplay with different modes of selection [38]. Here, we consider an analytically tractable framework for modelling and inferring demography that is efficient in terms of its use of the often limited information within site frequency data. Previously, Vogl and Bergman [80] proposed a representation of a Moran diffusion process for drift and small mutation rates that allows for exact computation of the expected sample allele frequency spectrum of a single population across time. This method is particularly suited for calculation of non-constant population size dynamics through approximation by sequences of piecewise constant sizes. This will form the backbone of our article. We develop, analyse, and discuss three specific demographic models: i) single deterministic shifts in population size, which approximate growth/shrinkage models, ii) series of alternating fixed population sizes, which approximate boom-bust models, and iii) stochastic changes in population size. The deterministic demographic models i) and ii) are hardly new in this context [71,47,89,90], however, our assumption of small mutation rates further simplifies the exact calculation of expected sample allele frequencies in these cases. To the best of our knowledge, the stochastic demographic model iii) is novel; it utilises the theory of autoregression processes commonly used in econometrics (see [37,Chapter 7]). Further, we find sufficient statistics for inferring changing population sizes (under simplifying assumptions). This enables us to propose a small-scale but reliable inference framework and contribute to the discussion around the limitations of inference of demography using spectral methods [56].
Both coalescent and diffusion approaches have been used to generate the expected distribution of allele frequencies (aka the site frequency spectrum) of population samples [67,24,22,26,11]. Coalescent approaches yield accurate closed form solutions for small to moderate sample sizes, but for large samples sizes numerical problems generally arise. Coalescent approaches involve multiple steps: Starting with all observed lineages at the present time and looking back, each expected coalescent time, i.e., the duration for which the number of lineages remains constant, is determined. Conditional on the coalescent time and number of lineages present, the expected number of mutations on each thereby constructed branch of the genealogy is calculated. The expected sample allele frequencies follow from summing mutations along the bifurcating lineages. This process is cumbersome particularly with changing population sizes: To circumvent numerical pitfalls, computationally intense simulations are often employed [e.g., 30,16]. Even computationally more efficient analytical solutions for simplified scenarios can require slow high-precision numerics [50]. Coalescent methods based on the spectral decomposition of the transition matrix describing the increase of lineages across generations (aka the "matrix coalescent", [87]) do not require such libraries for step-wise constant population sizes [87,62] and exponential growth scenarios [62]. Other growth patterns have also been well-approximated analytically [8].
Incorporation of non-constant population sizes into diffusion equations is considered comparatively conceptually straightforward; however, closed form solutions for expected sample allele spectra are available only when considering concrete sampling schemes in combination with the backward diffusion process [26]. Analytically tractable solutions to the diffusion density itself have long been studied in population genetics: Kimura [40] showed that the biallelic Wright-Fisher diffusion for population allele frequencies can be re-parameterised to the so-called oblate spheroidal differential equation. The general solution to the system is an infinite sum over: i) A spatial component constituted by the eigenvectors that depends solely on the allele frequencies. It is given by a class of weighted orthogonal polynomials with a normalisation constant that is determined by the initial population conditions. ii) A temporal component modulating the 'decay' of polymorphic allele frequencies over time by the corresponding eigenvalues. Kimura often assumed the infinite sites mutation model [42], where each novel mutation hits a previously unmutated site. Formal treatment of neutral parent-independent mutation diffusions by others followed [69,25], and a more direct mathematical approach was introduced by Song and Steinrücken [71]: If the generator associated with the biallelic diffusion process fulfills certain mathematical requirements, a spectral decomposition can be applied to find an analytical representation of the allele transition density (extension to the multiallelic case in [72]). Applications to diffusion processes with recurrent mutations and selection of arbitrary strength have been demonstrated (which goes beyond Kimura, who models only weak selection). Note that the spectral decomposition of the diffusion generator for populations with deterministic, piecewise constant population sizes under selection and with mutations segregating as per the infinite sites model has also been derived [90]. To obtain explicit results for the expected sample allele frequencies from the transition densities for populations obtained using either of the above methods, binomial sampling at the extant time is required [26]. This sampling, coupled with a system of ordinary differential equations on the moments of the sample spectrum derived from the time-reversed Wright-Fisher diffusion [11], also provides analytical equations for the sample site frequency spectrum while by-passing exact calculation of the transition density itself [89,90]. In Bergman et al. [3], analytical formulae for the transition densities are used to obtain the sample allele spectra via a dynamic programming approach: The continuous allele proportions of biallelic Moran diffusion processes are treated as latent/hidden variables, and realisations of discrete sample allele frequencies at predetermined intervals across time are treated as emission/observed variables. The logic of the classic forward-backward algorithm from Hidden Markov Model (HMM) literature [65] can then be applied for inference: e.g., the probability of observed sample allele frequencies at any point in time given assumptions on the population parameters can be determined via the forward algorithm, and the combination of the forward and backward algorithms can be utilised to calculate the joint probabilities of population and sample allele proportions at any point in time. Recall that standard Moran/Wright-Fisher diffusions are considered dual processes to the coalescent in the sense that the expected allele proportions that these models yield are identical, once the state spaces are made comparable by an appropriate function. It can, e.g., be shown that the neutral biallelic Moran diffusion model with recurrent mutations (which is equivalent to parent-independent mutations) in the first half of [3] is dual to the coalescent with respect to the binomial sampling scheme [10,7,61]. From the HMM theory it follows that the hidden probabilities of population allele proportions can be computed at a cost polynomial in the number of observations via finite mixtures of binomial likelihoods (note that this is for constant population sizes, [61]). To make the resulting densities tractable, they can be expressed via spectral decomposition up to the order of the sample size. Then the temporal aspect of the transition density can be described through the binomial mixtures, which ties to the long-established duality between the spectral representation and the diffusion [27,75,28]. However, the finite (rather than infinite) expansion also reduces the computational burden of the system. Note that similar reasoning is implicit in the moment method of Zivkovic and Stephan [89], Zivkovic et al. [90], who forgo exact calculation of the transition densities.
In the second half of [3], the same HMM approach is applied to the spectral representation of the diffusion limit of the so-called boundary-mutation Moran model [81,80]. It is assumed that the expected heterozygosity is less than about 2.5 · 10 −2 [81], which is true for most eukaryotes [48]. Mutations segregating in a sample of moderate size can then be modelled to occur only at previously monomorphic sites and normalised so they enter the polymorphic region at a constant equilibrium rate in relation to drift. While "back-mutations" are generally allowed, they are excluded while the allele is polymorphic.
In this article, we extend the spectral representation of the boundary-mutation Moran model for drift and small mutation rates [80] (recap in Section 2.1) to both deterministic and stochastic piecewise constant population sizes (see Sections 3.1, 3.3, 3.4, 3.5). For changes occurring on different time scales, we determine analytical expressions for the resulting sample site frequency spectra via the HMM-inspired method [81,80] (recap in Section 2.2). These enable us to assess the convergence behaviour of the sample spectra vs the expected convergence of the effective population size for these differing time scales. Further, we propose an agnostic inference framework for inferring past (effective) population sizes from these sample spectra (see Section 3.2). Throughout, we wish to emphasise that the use of boundary-mutations simplifies the spectral representation of the transition densities (which we discuss fully in Section 2.3): Changes in population size, and thus the overall scaled mutation rate, affect only the temporal component. Therefore, only a single full spectral decomposition is required rather than one per population size change as in the more general mutation models used both in the first half of Bergman et al. [3] and in the moment methods of Zivkovic and Stephan [89], Zivkovic et al. [90].

Methods: General Mathematical Framework
Let us begin by recapitulating the central aspects of the boundary-mutation Moran model, its limiting diffusion process and the corresponding spectral representation of the transition density in Section 2.1, but see also Section 8).
Then, we will review explicit calculation of sample allele frequencies using the HMM-inspired forward-backward algorithm in Section 2.2. This enables us to introduce our general framework for incorporating non-constant population sizes in Section 2.3, which is fundamental to the specific demographic models and inference approaches in the next Section 3.

The Spectral Representation of the Boundary-Mutation Diffusion
The discrete, neutral, biallelic Moran model is a model for genetic drift [53,52,54] that assumes a monoecious, haploid population of effective size N e with alleles of a focal and a non-focal type. The type is determined either by polarisation into ancestral and derived alleles (as in the infinite sites model) or by allele groupings in a biallelic system, e.g., by contrasting the bases adenine and thymine vs the bases cytosine and guanine. In each Moran reproduction step, a randomly chosen individual is replaced by the offspring of another randomly chosen individual. The expected lifetime of an individual haplotype, N e, therefore represents the generation time. The neutral boundary-mutation Moran model [81] was derived as a first order expansion in the overall scaled mutation rate θ of the general biallelic Moran model with separately parameterised drift and mutation terms [2]. The scaled overall mutation rate is defined as θ = N eµ, where N e is the effective population size and µ is the overall point-wise mutation rate per generation. The mutation rate in the boundary-mutation model is normed so that the expected heterozygosity is independent of the population size. It is assumed that the polymorphic sites are subject only to drift (and selection, should it be incorporated in the model), and that back-mutations can only occur at the boundaries. These can be biased, with α towards the focal type and β = 1 − α from the focal type. Simulations show that this approximation holds well if the expected equilibrium heterozygosity of the population 2αβθ < 0.025 [81], and therefore holds for the protein coding genes of most eukaryotes [48]. (Note that with low scaled mutation rates, the approximation remains reliable even with incorporation of selection [82].) The boundary-mutation model can be extended to multiple alleles [68,5,6,83]. Samples from such multi-allelic diffusion models with similarly low-scaled overall mutation rates are also well approximated by the boundary-mutation Moran model because the probability of sites being more than biallelic is also very low [5,6].
Generally, in order to approximate discrete population genetic models by a diffusion process, time and space must be re-scaled. Time in the haploid Moran model t is re-scaled to N e 2 steps, which corresponds to N e generations. Recall that drift is twice as strong in the Moran model (with a variance of 2x(1−x) N e per generation) compared to the corresponding Wright-Fisher model. Furthermore, diploid individuals are usually considered in the Wright-Fisher model, meaning that time there would be measured in 2N e generations and the scaled mutation rate would be θ = 4N eµ. The alleles frequencies of the population must also be re-scaled to proportions xǫ[0, 1] at each point on the now continuous time axis. The transition matrix is then appropriately replaced by the transition rate density φ(x | t). Generally, this diffusion process describes the trajectory of the population allele proportions within the polymorphic region x ǫ ]0, 1[, subject to drift (and mutation, and selection). Appropriate boundary conditions are required to ensure that the behaviour of allele proportions at x = 0, x = 1 reflects the underlying assumptions on mutation patterns and that probability is conserved. Song and Steinrücken [71] determine the spectral decomposition of the "full" generator associated with the forward biallelic (Wright-Fisher) diffusion process with recurrent mutations operating on x ǫ [0, 1] (i.e., operating on the boundaries as well as the so-called "diffusion part"). Analogously to the discrete case, the spectral representation of the boundary-mutation (Moran) diffusion process [80] was derived as the θ → 0 limit of this spectral decomposition. In this limit, the forward eigenvectors then become (compare [80,Eqs. 32,37]: where δ(.) is Dirac's delta. The U n (x) here are the modified Gegenbauer polynomials (see [60,Eq. 18.5.77]): These are orthogonal with respect to the weight function w(x) = x(1 − x) for any orders of expansion n, m > 2: where δ n,m is Kronecker's delta, and ∆ n = (n−1) n(2n−1) for n ≥ 2 is the proportionality constant; later, we will additionally require ∆ 0 = ∆ 1 = 1.
The corresponding eigenvalues of the system converge to λ 0 = λ 1 = 0, λ n = n(n − 1) for n ≥ 2. Note that the modified Gegenbauer polynomials here are a special case of the Jacobi polynomials that make up the eigenvectors in the biallelic, recurrent mutation model [71]. Kimura [40] utilised Gegenbauer polynomials to find an analytical solution for the Komolgorov forward (Fokker-Planck) diffusion equation for pure drift, modelling the polymorphic trajectory of mutations between occurrence and fixation.
The only population genetic force considered in the above system is drift. Mutations arising exclusively at the boundaries are re-introduced by setting the eigenvalue λ 1 to θ with 0 < θ << 1. Mutations enter the polymorphic region at rates αθb 0 (x | t) and βθb 1 (x | t) respectively, where b 0 (x | t) and b 1 (x | t) represent the probability masses already fixed at the boundaries at time t or expected to fix there imminently by drift. This induces the following system of linear differential equations for the temporal mutation dynamics [80,Eqs. 43,50,51]: where: implying A n = 0 for odd n, and Observe that the system is time-homogeneous for n = {0, 1} and time-inhomogeneous for n ≥ 2. Combining spatial and temporal dynamics, the transition rate density of the forward boundary-mutation diffusion equation has the following analytic solution [80,Eq. 42,37]: Note that this forward system (as well as the corresponding backward system not shown here) is generally more tractable after diagonalisation (see [3,). However, the separate treatment of time-homogeneous and timeinhomogeneous differential equations in Eqs. (4) is more convenient for the purposes of this article.

The Forward-Backward Algorithm
Using the explicit representation of the transition rate density, population allele proportions evolve forwards in time starting from a continuous initial state distribution ρ(x) at time t a , t a ≤ t via [3,Eq. 70]: The coefficients are determined from the mapping of the initial state or ancestral distribution onto an orthogonal polynomial state space. At the extant time t = 0, a finite sample of the current population allele frequencies is observed. The vector of sample allele occupancies Y is the emitted distribution in the HMM framework: elements of this are denoted as Y k , where k Y k = K is the total sample size. The probability of observing a realised count of y k focal alleles based on current population allele frequencies is given by the K−order orthogonal polynomial expansion of a binomial distribution [3,Eqs. 73]: with spatial coefficients: The calculation of the coefficients can be simplified by noting that the binomial distribution with sample size K corresponds to a polynomial of power K, which can be expressed easily as a sum of Gegenbauer polynomials (see Eqs. (17), (18); this also avoids numeric problems.
To obtain the probability of observing y k focal alleles in a sample of size K under the assumption that the population of interest has evolved from an ancestral population at time t a up to the time of sampling at t = 0 according to the biallelic boundary-mutation Moran model, the forward pass of population allele trajectories and the binomial sampling scheme are combined [3,Eqs. 47,74]: This probability can be interpreted as a mixture of binomial distributions with success (and failure) probabilities determined by the current population allele frequencies as given by the forward pass of the boundary-mutation diffusion model (see also Appendix B, Section 8). Note that, due to orthogonality, only a K-order sum is required here rather than an infinite expansion. Directly implemented as above, this probability can be evaluated in a matter of seconds in the open source statistics software R [64]; however, exceeding a sample size of K = 24 incurs numerical instability. With the high precision floating point library MPFR [19], reliable results can be obtained up to K = 37. Clearly, specialised programs are still lacking for application to larger samples. However, direct implementation of these formulae suffices for model-validating simulations and application to small or down-sampled data sets as required for this article.

Piecewise Constant (Effective) Population Size
For the remainder of this article, we will continue to assume a single population evolving forward in time according to the boundary-mutation Moran model. However, from now on time will be segmented into epochs indexed by j ǫ {0, .., J}. Each epoch ends with time point t j . Let us then define the epoch lengths as s j = t j − t j−1 with s 0 = −∞. We may take t J = 0, as our samples usually come from the present. We will assume that the (effective) population size remains constant at N e j within epoch j and then instantaneously switches to N e j+1 at the onset of epoch j + 1. Within each epoch, the population converges towards its current expected equilibrium. Let us first consider the homogeneous boundary equations: Thus, the boundary terms converge to b 1 = α and b 0 = β in all epochs, irrespective of the population size. We can therefore assume that the boundary terms b 1 = α and b 0 = β are constant. From this, it follows that B (α,θ) n in Eq. (4) is always zero.
In equilibrium, the solution of the polymorphic equations reduces to [80, Appendix A2]: where Ξ n = − 4n−2 n−1 . From here on, let us denote θ * j = αβθ j as the biascomplemented overall scaled mutation rate of the population in epoch j. Note that θ * j also corresponds to the equilibrium solution of the boundary-mutation Moran diffusion model for each epoch j (see Appendix A, Section 7.1 for a discussion of its interpretation).
The time-inhomogeneous interior equations per epoch then have the following general form for n ǫ {(1, .., N ) : 2n} and j ≥ 1: With starting condition τ (α,θ) n (t = t j−1 ), the solution of this differential equation is: While b 0 and b 1 remain constant, the frequencies in the monomorphic classes in a sample may vary over time with the overall scaled mutation rate θ j = 4N e j µ.
To calculate the expected sample allele frequencies of a population of piecewise constant effective population size using the above framework, the initial state distribution must be specified and the decomposition of the transition rate density must be determined for each epoch within the forward pass in Eq. (12); the binomial sampling/emitted distribution remains the same. The polymorphic sample allele frequency spectrum can then be determined via: And the sample allele frequencies at the boundaries follow accordingly: Eqs. (16)(17)(18) will be the basis for the analyses in Section 3.

Model 1: Single Shift in Population Size
Now consider a population that undergoes a single change in (effective) population size, meaning there are only two epochs: At time t = t 0 , the overall bias-complemented ancestral mutation rate of the population θ * 0 switches instantaneously to θ * 1 . The population allele frequencies thus evolve from the old equilibrium attained before t 0 towards a new one after the shift according to Eq. (16) with τ (α,θ) n (t = t 0 ) = θ * 0 : The marginal probabilities of sample allele frequencies can then be determined as outlined in Section 2.3. Note that if θ * 0 ≪ θ * 1 , the solution above can be approximated by the solution of a simple one-parameter model (see Section 7.2).

Results and Discussion for Model 1
By calculating the marginal probabilities for each possible segregating value of the focal allele, an expected biallelic polymorphic sample site frequency spectrum can be constructed and assessed for departure from equilibrium. Visual representations that contrast an observed or simulated distribution of segregating sites with a neutral spectrum are simple and informative [57,1]: Fig. (1) shows a series of expected polymorphic sample spectra drawn from a growing (panel 1A) and a shrinking (panel 1B) population at successive time points after the shift in (effective) population size. The log ratio of these expected sample spectra vs a sample assumed to be in equilibrium at the ancestral overall-biased scaled mutation rate (see [81,Eq. 13]) are depicted in Fig. (1) panels 2A and 2B; these second plots emphasise the distinct demographic signatures. In Fig. (1) panels 1A and 2A, which show the samples from a growing population, the presence of excess rare alleles is recognisable. Conversely, Fig. (1) panels 1B and 2B show an increased proportion of intermediate frequency alleles for the samples from a shrinking population. These are well-established phenomena in population genetics and a battery of neutrality tests have been constructed to detect them [73,23,22,18,44], among them Tajima's D. For such tests, the site frequency spectrum is construed as the minor allele frequency distribution and is given by either the number of polymorphic sites ζ n at frequency n N , where nǫ[1, N − 1] when an outgroup is available, or by the number of polymorphic sites ζ n,N −n at frequencies n N and N −n N when ancestral and derived alleles cannot be distinguished. The classic neutrality tests contrast two estimators for the overall scaled mutation rate θ that are sensitive to deviations from neutrality in different regions of the allele frequency spectrum. These estimators are constructed by differently weighted linear combinations ofθ n = nζ n [1]. In Fig. (1) panels 1C and 2C, a version of Tajima's D modified for the boundary-mutation model, which we will call the D-statistic (see Section 7.3 for its derivation), is inferred for the series of samples: This clearly shows how the signal of the demographic event first becomes clearer in the samples over time, and then fades to undetectable as a new equilibrium is approached (compare also the change in deviation from equilibrium over time in Fig. (1) panels 1A, 1B and 2A, 2B. It is also apparent that populations lose polymorphism much faster than they accrue it (compare [58]).

An Agnostic Inference Approach
Let us now assume a population that undergoes multiple shifts in (effective) population size over time, yielding a history of J demographic epochs, indexed with 0 ≤ j ≤ J, and let us assume the time points t j as given. The general solution of the time-inhomogeneous equations at t = t J is then for n ǫ {(1, .., N ) : 2n}: where r j is shorthand for the scaled epoch length It has previously been noted that sampled spectra are not fully informative of a population's demographic history [56]: With orthogonal polynomial expansions, the demographic history is mapped onto a function space spanned by e −λnrj , which implies several things: Because the exponential function decays rapidly, increasing orders of expansion have a decreasing influence on the shape of the spectrum. Furthermore, demographic histories orthogonal to the function space remain "hidden" -in effect, this means past events can cancel each other out and produce a 'simplified' historical trajectory. In other words, an infinite number of demographic histories can actually produce the same set of spatial and temporal coefficients and therefore the same observed spectrum.
Let us return to the Eq. (20). In order to infer all θ j and therefore all the past (effective) population sizes, (i) the left hand side of the equation, i.e. the temporal dynamics, must be determined from the sampled spectrum, and (ii) the r j , i.e. the scaled time points of demographic events, must also be specified in some way. Consider a sampled spectrum of size K comprised of a total of L loci, with L (0) and L (1) loci fixed for the respective alleles as well as L (0,1) polymorphic loci, so that L = (L (0) + L (0,1) + L (1) ). In the context of the biallelic boundary-mutation Moran model, these observed counts of loci are sufficient statistics for the probabilities of the respective monomorphic and polymorphic events, meaning that they contain all the information from the data that can be used to construct estimators of these events [83]. It follows thatα = is the minimum variance, unbiased maximum likelihood estimator for the mutation bias [78,83]. Hence, the mutation bias can be immediately estimated from the observed spectrum. Then the polymorphic spectrum can be symmetrised, yielding unbiased sample allele occupancy probabilities for k = 1, ..., K − 1. These can be plugged into the left side of the polymorphic equation for expected sample allele frequencies given the past population history from Eq. (17). The purely spatial binomial coefficients on the right hand side of this equation can also be readily determined, so only the temporal coefficients τ (α,θ) n (t = t J ) remain unknown. Seen for all the inferred polymorphic allele frequencies simultaneously, Eq. (17) clearly describes a consistent system of K − 1 equations that can be uniquely solved for the K − 1 temporal coefficients. These are required for the left hand side of the system of equations in Eq. (20).
As the number and duration of the past population epochs is generally unknown, specification of the scaled epoch lengths r j must be ad hoc. Recall that only the even-order solutions of the temporal equations are non-zero since the mutation bias is assumed constant. For example, a sample spectrum of size K = 6 is shaped by the polynomials of degree n = {2, 4, 6}. Furthermore, for Eq. (20) to constitute a unique solution, the number of epochs J plus one (the additional one being for the ancestral state) must be equal to the number of informative polynomial coefficients. In our example with K = 6, this means we can uniquely infer the current as well as two past overall bias-complemented overall mutation rates. A convenient agnostic placement of the time points at which to make these inferences is inspired by the half-life of exponential decay: Settingr j = log (2) λ2j+1 , each epoch is placed where half the change in overall-scaled mutation compared to the preceding epoch is expected to have occurred based on the eigenvalues of the observed spectrum. And within epoch, the amount of change captured by each order of expansion is proportional to the corresponding eigenvalue (see Fig. 2). There is no guarantee that these r j are the true scaled time points at which the demographic events occurred; they are simply "optimally" placed for characterising the effect of exponential change. Note that only the lowest (or to a lesser extent the second lowest) order of expansion from the first epoch significantly influences the shape of the spectrum at the extant time, whereby all (or almost all) orders of expansion from later epochs influence the shape of the spectrum at the extant time, albeit less drastically.
Altogether, we can conclude that if we assume the same number of (effective) population sizes -current and historical -as there are even coefficients in the polynomial expansion of the observed site frequency spectrum, and if the demographic events that caused the changes in size are assumed to be optimally placed in the above sense, unique solutions for all current and historical bias-complemented overall mutation rates θ j can be found from the system of equations in Eq. (20). These resulting estimators for all the θ j are derived from sufficient statistics via two consistent systems of equations with unique solutions (one-to-one mappings of parameter spaces); therefore the resulting estimators for the θ j are unique minimum unbiased estimators (see [29,Chapter 10]).
Both the accuracy and potential for wider application of this method are, as yet, limited by numerical instability: For our previous example with sample size K = 6, a total number of loci in the order of 10 8 is required for the accuracy of all three estimators to be adequate (see Fig. 3). This is, however, the upper limit of loci in a site frequency spectrum that can be simulated using the open software program R. It is clear that the feasibility of large-scale use of this method hinges on specialised implementation, but also on large data sets.
In Fig. (1) panels 1D and 2D, the sample allele occupancy probabilities from the single-shift examples of the previous subsection are hypergeometrically downsampled to K = 4 and used to generate spectra of 2 · 10 8 independent loci, from which both the extant and the ancestral overall bias-complemented scaled mutation rate are inferred. This novel inference approach infers the true values most accurately a touch sooner after the change in (effective) population size than the D-statistic detects the greatest signal of departure from equilibrium (compare Fig. (1) panels 1C and 2C). Note the pattern of inferring slightly more extreme values for the ancestral mutation rate before and for the current mutation rate after this point to compensate for the positioning of the hypothetical epoch split time.  , and (C) respectively. Sample spectra comprised of 1e 6 , 1e 7 , and 1e 8 loci are simulated according to the allele occupancy probabilities of each equilibrium distribution. Then, the current as well as two past overall Furthermore, bias-complemented scaled mutation rates are inferred from each spectrum. This procedure is repeated three times, and the resulting estimators are represented in dependence of the number of loci, with "+", "x", and "o" denoting the most to least recent overall-biased mutation rates for each iteration of the procedure. The horizontal lines mark the true values , and (C) respectively.

Model 2: Deterministically Alternating Population Size
In this subsection, we construct a caricature model of boom-bust population size dynamics: To do so, we assume a population that has haploid (effective) size N 0 for an epoch of length s 0 (this will be the bust epoch), and then switches in-stantaneously to having haploid (effective) size N 1 for an epoch of length s 1 (this will be the boom epoch). Together, these two epochs form a cycle that repeats indefinitely. Classic population genetic results apply to the overall (effective) size of this population: If the epochs are short compared to generation length (which is the inverse population size in the diffusion setting), the long-term effective size can be approximated by the harmonic mean across the two epochs [88,41]. The heterozygosity of populations undergoing deterministic, cyclical changes in (effective) size has been studied extensively for various trajectories of rapid population growth followed by instantaneous collapse; the harmonic mean approximation holds both without mutations and when novel mutations enter at a low rate per locus per generation [58,55]. Both the ratio between the maximum and minimum heterozygosity within each cycle as well as the ratio between the maximum heterozygosity and the harmonic average heterozygosity within each cycle increase with the severity and duration of the population collapse [58,55]. We will examine both the expected sample polymorphic site frequency spectrum of populations undergoing boom-bust life cycles as well as the temporal part of their spectral decomposition.
The harmonic mean of the (effective) population size across each cycle is: Diffusion time within each epoch can be scaled relative to this harmonic mean so that the scaled duration of the epochs become s 0 i.e., in this subsection only, time is scaled in units of the harmonic mean effective population size. Similarly, the harmonic mean of the overall bias-complemented scaled mutation rates is defined as: Then the solutions to the linear differential equations for the temporal dynamics of the boom and bust phases can be written as the following for j ǫ {(1, .., N ) : 2j}: As the number of cyclical iterations increases with j → ∞, the temporal dynamics of the even and odd epochs evolve towards two separate solutions at t j and t j+1 , which we will denote as ϑ s0,n = τ (α,θ) n (t = t j ) and ϑ s1,n = τ (α,θ) n (t = t j−1 ), respectively. These solutions follow the system of equations: This system is easily solved: and equivalently: Essentially, rapidly alternating boom and bust phases ultimately result in the temporal coefficients of both epochs converging towards the harmonic mean of the individual overall bias-complemented scaled mutation rates. The effective population size is then accurately captured by N h , as anticipated. Conversely, it follows immediately from Eq. (22) that for long epoch lengths , the temporal coefficients of the epochs converge to their individual equilibrium solutions: Then the arithmetic mean of ϑ s0,n and ϑ s1,n is the appropriate approximation of the expected long-term effective population size, again as anticipated.

Results and Discussion for Model 2
In Fig. 4 panels 1A-1C, the expected polymorphic sample spectra from boom and bust epochs are plotted relative to the harmonic mean for relatively long (A), intermediate (B), and short (C) epoch lengths compared to the average overall bias-complemented scaled mutation rate. Recall that this is easily achieved by plugging the long-term solution of the temporal dynamics of the boom (Eq. 23) and bust epochs (Eq. 24) into Eq. (17). Technically, according to the aforementioned critera, both scenarios A and B qualify as intermediate: In A, only the expansion order n = 2 can be approximated by the harmonic mean with the remainder likely well-approximated by the arithmetic mean. Meanwhile in B, only the expansion order n = 2 fulfills the approximation criteria of the harmonic mean but several higher order expansion are no longer accurately represented by the arithmetic mean. In panel 1A, the sample spectra from the boom epochs show the u-shape characteristic of population growth with an excess of low and dearth of high frequency alleles; similarly, the bust epochs show a flat, inverted u-shape that consistently lies below the harmonic equilibrium, characteristic of population contraction. Only the lowest order temporal coefficients have started converging towards the harmonic equilibrium (see panel 2A). The result of the D-statistics reflect these patterns (see Table. 1). In our intermediate scenario B, the sample spectrum of the boom epoch has a clear wshape; the sample spectrum of the bust epoch simply shows a more pronounced inverse u-shape with the peak hitting the harmonic mean. For these epoch lengths, the lowest order temporal coefficients are almost equal to the harmonic mean, and some but not all the higher order coefficients have started converging (see panel 2B). In Nawa and Tajima [57], visual assessment of the spectrum of segregating sites reveals w-shapes for populations recovering from a bottleneck: the number of intermediate alleles increase faster than the number of minor alleles, leading to confused signals in the D-statistic (see [57, Table 2] alongside our Table 1). This w-shape becomes more pronounced with bottlenecks that are further back in the population history, are comparatively short, or are less extreme in terms of collapse in population size. Bottlenecks with the opposite characteristics lead to such an extreme w-shape that it becomes an inverted u-shape. The bust epoch can therefore be considered a mild, recurrent bottleneck. In scenario C, where the epoch lengths are relatively short, all but the highest orders of expansion can be approximated by the harmonic mean. Both sample spectra deviate from the harmonic mean only by a small enrichment or diminution in the proportion of singletons respectively (see Table 1).
In Table 2, the results of applying our agnostic inference approach from Section (3.2) to sample spectra drawn from both the boom and the bust phases are presented for each of the previous epoch length scenarios. In order to do this, the spectra are hypergeometrically downsampled to size K = 4 so that the current and one past overall-biased mutation rate can be inferred. The results gained from the sample spectra drawn from the bust phase are reasonably informative of the underlying boom-bust population cycle: The results for long epoch lengths are near the individual equilibrium overall bias-complemented epoch lengths, and convergence towards the harmonic mean is noticeable as epoch lengths decrease. The results from the bust phase are less informative: Because the collapse in (effective) population size is so rapid (with exp − λ n t for all j when n > 2), the spectrum contains little to no historical information. So while the estimators for the current overall bias-complemented scaled mutation rates are reasonable, the past estimates (inferred from not very far back in time) are no more than an indication that the (effective) population size was formerly larger.   Generally, deterministic boom-bust type models are applicable to organisms that experience major, semi-regular environmental events on a time scale that is long compared to their generation lengths. For example, the El Nino currents in the equatorial pacific occur roughly every four years and are known to impact heterozygosity in some insect populations [20]. With the annual growth and subsequent collapse of naturally occurring populations of Drosophila in temperate climates, however, generation times are short compared to the scale of seasonal variation in the environment, making seasonal adaptation possible [49]. This comparatively rapid turnover in generations implies that the difference between the (effective) sizes of boom and bust epochs will often be of several orders of magnitude, so N e 0 /N e 1 ≪ 1. Realistic values may be N e 0 /N e 1 ≤ 10 −4 with θ * h = 0.01 and epoch lengths of roughly 10 N h . From such large ratios between N e 0 : N e 1 , it follows that θ * h θ * 1 ≈ 0. Note that then the long-term temporal coefficients of the boom and bust epochs (Eqs. 23,24) can be approximated by: and Furthermore, observe that θ * h = θ * s0,n s0+s1 s0 ). Substituting this, the difference between the long-term dynamics of the epochs becomes: The difference between the long-term temporal mutation dynamics of the boom and bust epochs is therefore independent of the overall bias-complemented scaled mutation rate of the boom epoch. Rather, for each order n of the temporal expansion, the corresponding eigenvalue is scaled by a term comprised of the overall bias-complemented scaled mutation rate of the bust epoch, the cycle length, and the duration of the boom epoch relative to the bust epoch. While this effect was apparent from past simulations [55], we have now shown it analytically. The accuracy of this final expression depends on the conditions ahead of Eq. (28) and Eq. (29). If necessary, upper bounds for accuracy can be determined via the error expansion of the Taylor series. Note that one can see from these last calculations that unrealistically large sample sizes may be necessary to be able to detect the signatures of the boom-bust dynamics in the spectra of Drosophila, since the number of generations per year will be very much lower than N e h .

Autoregression Model for Temporal Dynamics
In Sections 3.1,3.3, populations are subject to major and predictable changes in (effective) size. However, most fluctuations in (effective) population size are responses to demographic [46] or environmental [45] stochasticity that is not necessarily regular over long time periods. While these fluctuations can be crucial to the fate of the populations, they are generally not easily quantifiable. In the following, stochastic variation in the overall bias-complemented scaled mutation rate is incorporated into the system of temporal equations of the forwards population process from Eq. (14).
Let us denote the long-term solution for the temporal coefficient for each epoch τ (α,θ) n (s j > t > s j−1 ) as ϑ j,n Ξ n . Furthermore, let us fix the relative scaled epoch length c = s j 4αβµ. The long-term temporal coefficients then solve the following equation for n ǫ {(1, .., N ) : 2n} and j > 0 (compare Eqs. 14,22): This equation has the general form γ j = γ j−1 m j + ǫ j (1 − m j ) and is therefore a first order autoregression model. Because |m j | < 1, the process is also stationary, i.e. its moments are independent of j ([37, Chapter 7]). In the classic case, ǫ j enters the process without a coefficient and is a serially uncorrelated, zero-mean and constant variance stochastic process (i.e. white noise).
Here, it corresponds to θ * j , and we must characterise its distribution in order to make statements about ϑ j,n . Note that the occurrence of the overall biascomplemented scaled mutation rates θ * j in Eq. (31) is reminiscent of an inverse gamma distributed variable. Therefore, we assume for j > 0: The range of the hyperparameter a is restricted to ensure existence of the mean and the variance.

Characterising the Solutions to the Temporal Dynamics
Next, we will use the distribution of the overall scaled mutation parameter θ * j together with the properties of the first-order autoregression processes to determine the mean, variances, and covariances of the solutions of the longterm temporal coefficients ϑ j,n .
Mean. Returning to the autoregression process in Eq. (31) and invoking stationarity, we have for n ǫ {(1, .., N ) : 2n} and j > 0: Recall that E θ * = ∞ 0 p(θ * )d(θ * ), where p(θ * ) is the density of the inverse gamma distribution from Eq. (32). By straightforward application of the substitution and integration by parts rules for evaluation of each integral, we obtain for n ǫ {(1, .., N ) : 2n} and j > 0: Using L'Hôpital's rule, the limiting results for short and long relative epoch lengths (scaled by the eigenvalues) can be determined: When epochs are relatively short, the expected value of the long-term temporal coefficients converges towards the harmonic mean of the inverse gamma distribution that was chosen to model the distribution of the overall bias-complemented scaled mutation parameter. When epochs are relatively long, convergence is towards the arithmetic mean of this inverse gamma distribution. Similar results have been obtained for the heterozygosity and the effective size of populations with stochastic cycles of varying size within the framework of the (Wright-Fisher) diffusion [33,35,34]: There, predetermined potential population sizes can easily be modelled as finite states of a Markov Chain. Both the effective size of the population and its heterozygosity generally converge towards the harmonic means when the autocorrelation of transitions between population size states is low. In contrast, high (positive) autocorrelation generates convergence to the arithmetic mean (which is always greater than the harmonic mean), and, when the scenario permits negative autocorrelation, this can lead to results that lie below the harmonic mean.
Variance. The method for calculating the variance of the solutions to the longterm temporal coefficients ϑ j,n is analogous to that for determining the mean. Starting again from Eq. (31) and using the stationarity of the process for n ǫ {(1, .., N ) : 2n} and j > 0: Evaluating integrals as before, one obtains for n ǫ {(1, .., N ) : 2n} and j > 0: Clearly, the variance can then be determined by the law of total variance; its limiting values for both short and long relative scaled epoch lengths follow directly, for n ǫ {(1, .., N ) : 2n} and j > 0: Correlations and Covariances. Note that we have implicitly assumed that the long-term coefficients ϑ j+r,n are pairwise-uncorrelated for r > 0, since there is serial independence between the overall bias-complemented scaled mutation parameters θ * j of successive epochs. Using these properties together with the law of total variance, the first order covariances (i.e., those between ϑ j+1,n and ϑ j,n ) are easily determined, for n ǫ {(1, .., N ) : 2n} and j > 0: Note this is simply the variance times the correlation coefficient. Higher orders of covariance are similarly, for n ǫ {(1, .., N ) : 2n} and j > 0: Clearly, the limits for short and long scaled relative epoch lengths are:

Results and Discussion for Model 3
In Fig. (5), the behaviour of the theoretical mean, variance, and first order covariance of the temporal population coefficients is exemplified for varying relative epoch lengths c and select expansion orders n. For low orders of expansion n, the moments of the temporal mutation coefficients are close to the lower limits (i.e., the harmonic means) even for rather long relative epoch lengths c; convergence towards the lower limits is then very rapid when c → 0. For the higher expansion orders n, the trajectory of the moments of the temporal coefficients shows a more gradual convergence from the higher to lower limits (i.e., the arithmetic to the harmonic mean) as the relative epoch lengths decrease. Note that the analytical solutions examined here correspond almost precisely to results from simulations; snippets of these comparisons are presented in Appendix A, Section 7.4.
Also apparent in Fig. (5) are three little 'x' marks -at these epoch lengths, as well as at one additional epoch length that exceeds the limit of the x-axis, we determined the expected polymorphic allele frequencies of a sample of size K = 24 were according to procedure described in Section 2.3 and using the expected value of the temporal coefficients from Eq. (34). These are shown relative to both the arithmetic and harmonic equilibrium spectra in Fig. (6). Scenario A corresponds to relatively long epoch lengths; at this point the moments of the temporal mutation coefficients have not yet fully converged to the harmonic mean even for the expansion order n = 2 (see Fig. 5) and the polymorphic sample spectrum cannot be distinguished from the arithmetic equilibrium spectrum. Scenarios B and C show epoch lengths between those at which expansion orders n = 2 and n = 4 fully converge to the harmonic mean but are close to these respective required lengths (see Fig. 5); the expected polymorphic sample spectra for both scenarios clearly lie below the arithmetic equilibrium spectrum -in fact, the intermediate frequency alleles have almost (B) or fully (C) reached harmonic mean frequency while the high/low frequency alleles exceed it. With the very short epoch length of scenario D, the moments have fully converged to the harmonic mean for almost all orders of expansion (note that the highest are not shown for the sake of visibility) and the expected polymorphic allele spectrum is almost identical to the harmonic equilibrium spectrum except for excess singletons. Note that scenario B has epoch lengths of almost the same magnitude as the average overall bias-complemented scaled mutation rate, and scenario D has epoch lengths of the same magnitude as the square of the average overall bias-complemented scaled mutation rate. In general, the variance of the expected polymorphic sample allele spectra visibly decreases with relative epoch length.
Interestingly, the D-statistic picks up a clear signature of population collapse for scenario A with the relatively long comparative epoch lengths, and equally strong signatures of population expansion for the increasingly short relative epoch lengths of scenarios B-D (see Table 3). Overall, it is apparent that compared to samples of the same size drawn from a population evolving according to a boom-bust model with the same harmonic mean of the overall mutation parameter across epochs, the samples from populations subject to stochastic changes in size generally harbour signatures of more distinct and consistent demographic change. Traces of this are still present even for very short epoch lengths.
After hypergeometrically downsampling all the sample spectra to K = 6, our agnostic inference approach from Section 3.2 can be applied to infer the current and two historical overall bias-complemented scaled mutation rates (see Table 3). Generally, these vary around the expected means per scenario.    . Expected sample allele spectra of size K = 24 were generated for each scenario at the end pf epoch J. Visualised here are the log-ratios of the expected sample site frequency spectra vs the spectrum generated by the harmonic means of θ * j in black, and the log-ratio of the expected sample site frequency spectra vs the spectrum determined by the arithmetic mean of θ * j in grey. The dashed lines represent the spectra obtained by adding/subtracting one standard deviation of ϑ J,n to the mean.   Table 4: Inference of temporal coefficients for sampled spectra from Fig. (6). The "-" in the naming of epochs indicates once step back in time.

Model 4: Serially Correlated Population Size and Stochastic Epoch Lengths 3.5.1. Rewriting the Autoregression Model
In Section 3.4, the relative epoch length c = s j 4αβµ is assumed constant. Incorporating stochasticity here is relatively straightforward; introduce for j > 0: Note that cp(ν) is then an exponential distribution with mean c, such that for j > 0: Setting η j = 1 θ * j , the autoregression process for the solution to the long-term temporal dynamics can be rewritten to yield: for nǫ{(1, .., N ) : 2n} and j > 0: ϑ j,n Ξ n = ϑ j−1,n Ξ n e −λncνj ηj + ν * j Ξ n 1 − e −λncνj ηj .
Moments. As in Section 3.4, the moments of the temporal mutation parameters, which correspond to the moments of the autoregressive process in Eq. (44), can readily be determined: Once again, stationarity of the process can be invoked and then the expectations can be taken with respect to first ν and then η. It is easily seen from there that E ϑ (ϑ j,n ) is identical to before. E ϑ (ϑ j,n ) 2 differs only slightly in that the final three summands of the numerator in Eq. (37) are each multiplied by factor 2 (observe that this corresponds with the terms multiplied with E η (η 2 p(η|ν)) from Eq. 46). The variance, covariance and correlation are therefore inflated compared to the model with fixed relative epoch lengths (see also Appendix A, Section 7.4).

Conclusions
In this article, the spectral representation of the transition density of the diffusion approximation of the biallelic boundary-mutation Moran model for drift and low-scaled overall mutation rates is utilised to describe population allele trajectories evolving forwards in time. This model separates the dynamics of mutations, which are restricted to the monomorphic boundaries, from those of drift, which determine the polymorphic interior. The eigenfunctions of this process can therefore be represented using a combination of (i) Gegenbauer polynomials, a class of orthogonal polynomials that describe the spatial component of polymorphic allele frequency trajectories subject to only drift, (ii) temporal coefficients that scale the polymorphic spatial component by the effect induced by the boundary mutations, (iii) and boundary terms to balance the probability flux and take into account that mutations enter the polymorphic region at a constant equilibrium rate. The corresponding eigenvalues are λ 0 = 0, which is associated with the equilibrium distribution, λ 1 = θ, and λ n = n(n − 1) for n ≥ 2. Hence the spectral decomposition comprises both the monomorphic and polymorphic parts of the allele frequency spectrum (compare [71,90]); note that classic diffusion and coalescent models only explicitly describe the polymorphic interior. Incorporating piecewise changes in (effective) population size by proxy of changes in the overall scaled bias-complemented scaled mutation rate into the forward diffusion of the boundary-mutation diffusion model requires only one full spectral decomposition to: firstly, determine the spatial component of the process, which remains the same over time, and secondly, solve a system of equations for the temporal coefficients, which differ by time epoch. This is simpler and computationally advantageous to changing the basis function for every piecewise change in (effective) population size, as done in general mutation (Wright-Fisher) diffusion models in the past [47,89,90].
In this article, we calculate the temporal coefficients as well as the exact sample allele frequency spectra for populations with single deterministic shifts in (effective) size, deterministic boom-bust life cycles, and stochastic changes in (effective) size. To the best of our knowledge, the stochastic models are novel. Throughout this article, we present conjoint assessment of the temporal population dynamics and the shape of the polymorphic sample spectra (more specifically, the shape of the log ratio of the sample spectrum vs an equilibrium distribution). Visual inspection of sample spectra has been promoted over summary statistics for a clearer understanding of which regions of sample spectra deviate from equilibrium and to what extent [57,1], and we see the advantages again here: In our evaluation of the boom-bust model with intermediate epoch lengths in Fig. (4), the polymorphic sample spectrum from the boom phase has a w-shape that the D-statistic interprets as a collapsing (effective) population size (see Table 1). However, this signal is actually caused by the rapid accrual of intermediate frequency alleles vs high/low frequency alleles after recurrent, intermediately spaced bust phases or mild-effect bottlenecks that fail to eradicate all the population's standing variation [compare 57, Table 2]. In our evaluation of sample spectra from populations with stochastic (effective) population size, we see that the D-statistic is blind to the differing effect of the time scale on which the changes occur (Table 3 and Table 6). It picks up on the excess high/low frequency alleles vs intermediate as soon as the epochs become shorter and interprets this as population growth, but not on the decrease of excess intermediate alleles for shortening epoch lengths.
As informative as the polymorphic sample spectra are per se, we argue that assessing the level of convergence of the temporal coefficients of the source populations leads to a more nuanced understanding of the population dynamics. In practical terms, carrying out forward population simulations and spectral decompositions as part of experimental planning can help inform exactly what sample sizes/orders of expansion are needed to detect departure from equilibrium or, conversely, for what sample size/orders of expansion equilibrium can safely be assumed. Even retrospectively, decomposition of estimated transition rate matrices from population samples can be informative of precisely how far which regions of the sample spectrum are from equilibrium and how far, for e.g., one has to down-sample to obtain accurate equilibrium estimates of standard population genetic parameters, or if this is even possible.
Recall that the effective size of a population in time-and space-discrete Wright-Fisher and Moran models without mutation is often defined via the first non-unit eigenvalue λ N e of the transition probability matrix for the respective neutral models. This is equal to 1 − 1 2N or 1 − 2 N 2 , respectively [15, p. 127]. This eigenvalue determines the rate of decline in population heterozygosity (H) over time: In the Wright-Fisher model, heterozygosity at the discrete time point N e H 0 , which can be approximated by H T = H 0 e − T 2N in the limit of large population sizes [14] (but see also [9]). In the Moran model, the equivalent approximation is H T = H 0 e − 2T N 2 per Moran step. In the limit of large population sizes, the genealogical tree of a sample drawn from a diploid Wright-Fisher or a haploid Moran model is given by Kingman's coalescent [43] (with equilibrium coalescent times scaled by 2N and N 2 generations, respectively). Arguably the most prevalent definition of the effective size of a population is through the linear re-scaling of the observed population size required for the limiting genealogical tree to still correspond to Kingman's coalescent even when there are temporal or spatial variations in population size [59]. In the limit of large N , Wright-Fisher/Moran models are approximated by diffusion models; in the case of pure drift, the first non-zero eigenvalue of the corresponding transition rate densities converge towards the coalescent effective population size. In particular, for populations of short-scale, cyclical variations in size [84,85,63] or for subdivided populations with high migration rates and large demes [32], convergence of this eigenvalue is towards the harmonic mean across time intervals or demes. When deterministic or stochastic changes in (effective) population size or migration rates occur on a shorter time scale than the coalescent events, the coalescent effective population size corresponds to the harmonic mean across epochs or demes [36,70,31]. However, when changes occur on the same time scale as coalescent events, the limiting ancestral genealogy of the sample is a time-stochastic version of Kingman's coalescent [39]; and changes on longer time scales than coalescent events are negligible in terms of sample genealogy. In the latter two cases, the coalescent effective population size is usually considered undefined [70] (but see Sano et al. [66], where non-linear scaling across time is permitted so that the effective population size lies between the harmonic and arithmetic means and converges towards the arithmetic mean in these cases respectively). Regardless of whether the effective population size is assessed via heterozygosity, leading eigenvalue, or re-scaling to Kingman's coalescent rate, only the variation in a sample of size K = 2 is considered. In contrast, the temporal coefficients considered in this article are coefficients of all even orders of expansion n ≥ 2 of the population transition rate density into orthogonal polynomials (as the odd ones are zero when mutation bias is constant). They are of the form c 1 + c 2 e −λnf (n,N ) , and describe the decline of the effect of changes in overall-biased mutation rate on polymorphism over time. Classic population genetics would guide us to using the temporal coefficient of order n = 2 with corresponding eigenvalue λ 2 to gauge the effective size of the population. Recall that this is the first order of expansion that is informative of population polymorphism, even though λ 1 is the second eigenvalue and the first non-zero eigenvalue, but is only influenced by mutation. This temporal coefficient also converges first: So when it has converged to the harmonic mean overall biascomplemented scaled mutation rate, the sample allele frequency spectra may still deviate from equilibrium, as shown repeatedly in this article.
The deviations from equilibrium in sample spectra drawn from populations of stochastically varying (effective) size across short epoch lengths are of particular interest (see Fig. 6, scenarios C and D): The spectra exhibit excess high and low frequency alleles. This pattern has recently been noted in genomewide data of many species; their genealogies are often considered more suitably modelled by multiple-merger coalescents than the classic Kingman coalescent [21]. Multiple-merger coalescents reflect skewed offspring distributions across generations, whether these are caused by population substructure, demography, recurrent selection, biased gene conversion, etc. (Note that misleading signals of multiple-merger genealogies can by caused by poor sampling coverage and erroneous allele polarization.) It is well-known that extreme discrete changes in (effective) population size, such as population growth after a bottleneck, can generate sample spectra with excess high and low frequency alleles, which one can either make an effort to distinguish from "true" multiple-merger genealo-gies or also chose to model as such. There appears to be less awareness that stochastic changes in (effective) population size also generate such sample allele spectra. The spectra drawn from such populations are known to converge towards equilibrium at the harmonic mean when epoch lengths are short and the genealogy of these samples accordingly converges towards Kingman's coalescent. While epoch lengths below the same order of magnitude as the overall-biased mutation rate or even on the order of magnitude of its square may conventionally be considered "safely small" to assume equilibrium, especially because not only the leading but the next couple eigenvalues will have converged, slight enrichment of allele frequencies at the edges of the spectrum are still apparent in our Fig. 6, scenarios C and D. Fluctuations in population size therefore strongly impact the shape of sample spectra, even when limiting results may be assumed to hold, and may explain some of the empirical observations.
Finally, we have shown that inference of population history from sample spectra can be performed using sufficient statistics within the framework of the boundary-mutation Moran model. Recall that inference using orthogonal polynomial methods considers only demographic histories that can be projected onto an orthonormal basis function [56]; this is never definitive without prior knowledge of historical events. However, by further specifying the properties of the basis function by determining a convenient agnostic placement of the timing of hypothetical demographic events we have elaborated on what may be called a "sensible" approach to inference of population history that makes the best possible use of available data without specific assumptions on the demographic events other than their maximum number [compare to 56].

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements
The authors would like to thank Burçin Yıldırım, Juraj Bergman, Conrad Burden, Joachim Hermisson, and Sandra Peer for conversations and input during this project, and are grateful to Carolin Kosiol for feedback.
CV's research is supported by the Austrian Science Fund (FWF): DK W1225-B20; LCM's by the School of Biology at the University of St.Andrews.

Understanding the Equilibrium of the Boundary-Mutation Diffusion
It can clearly be seen from the spectral representation of the transition rate density of the boundary-mutation Moran diffusion in Eq. (7), that the solution to the forward diffusion has singular components involving Dirac delta functions at the boundaries (compare [51,Appendix B]). These ensure the conservation of probability (i.e., that the distribution integrates to 1) at each point in time. The Gegenbauer polynomials in the interior are responsible for the flow of probability, and this can be normalised. The mutation terms, which induce the capacity of temporal variation in the system, provide a weighting of the flow of probabilities; classically this is done so the boundary weights are associated with the probabilities of fixation and loss [51](see also [77]), but we instead ensure the maintenance of expected equilibrium heterozygosity.
Due to the singular components of the transition rate density, the equilibrium density of the boundary-mutation diffusion model does not have a clear interpretation as a probability distribution; in past work, it has often been represented by the functional [79,80,Eqs. 23,52 respectively]: However, it has not yet been rigorously shown that a sample drawn from this functional follows a well-defined probability distribution, in particular the discrete stationary distribution of the boundary-mutation Moran model. In order to do this, we note that the function g N (x N , y k ) = αβθ x y k −1 is bounded from above by the interval [0, 1], but converges monotonically towards its point-wise limit g(x, y k ) = αβθ x y k −1 (1 − x) K−y k −1 on [1, N ] for lim N →∞ . Hence, according to the monotone convergence theorem [4,Theorem 16.2], the operations of integration and taking the limit can be reversed, and the integration can be taken over the full integral [0, 1] according to Beppo Levi's lemma [4,Theorem 16.3]; so: Using this, the probability of observing between 1 and K − 1 segregating focal alleles in a binomial sample from the above functional is given by: And the probability of observing only focal alleles in the sample is: The probability of observing no focal alleles follows analogously. The equilibrium distribution of the discrete boundary-mutation Moran model is thus recovered [81](Eq. 13), which also corresponds to the first order Taylor series expansion of the beta-binomial equilibrium distribution of the general mutation model [79]. Note that in order for the boundary probabilities to be positive, the sample size K must be less than roughly exp( min(α,β) αβθ ); this generally holds in practice.

Population Explosion
Note that if θ * 0 ≪ θ * 1 , the expression in the exponent of the solution to Eq. (19) tends to zero. A first order Taylor expansion around it yields:

D-statistic
Neutrality tests based on the infinite sites model are pervasive in population genetics. Formulated for site frequency data, they take the general form [44]: whereθ W andθ Z are unbiased estimators for the overall scaled mutation rate that differ in their weighting of polymorphic sites. In the case of Tajima's D [74],θ W is Watterson's (or Ewens' and Watterson's) estimator [12,13,86], which counts the expected number of segregating sites, andθ Z is a statistic that counts the number of pairwise differences between ancestral and derived nucleotides [73]. Note thatθ Z weights intermediate frequency alleles higher than θ W . Therefore recent positive selection or population expansion, which increase the frequency of rare alleles, lead to values D < 0; balancing selection and population collapse elevate the relative numbers of intermediate allele frequencies and therefore lead to values D > 0.
Here, we reformulate Tajima's D for the context of the biallelic boundarymutation Moran model: Let us assume a sample site frequency spectrum of size K comprised of L loci, and recall that y k denoted the realised count of focal alleles. Then,θ and, recalling that no distinction between ancestral and derived alleles can be made:θ Note that the variance in the denominator of the statistic D is conventionally calculated assuming complete linkage between sites. Here, as generally in this article, we assume recombination rates high enough for linkage to be negligible. Additionally, the overall scaled mutation rates are low. Thus, mutation events can then be seen as realisations of i.i.d. Poisson distributed random variables. Therefore:  Table 5: A population is assumed evolve forwards in time according to the autoregression model from Eq. (31), with the overall bias-complemented scaled mutation rate θ * j randomly fluctuating across epochs according to θ * j iid ∼ IG(a = 6, b = 0.01 * (a − 1)). The arithmetic mean of θ * j is then 0.01 and the harmonic mean 0.008333. After J = 1e 6 epochs, the mean, variance, covariance, and correlations for the temporal coefficients ϑ j,n for j between 1 and J and select n are evaluated and compared with their theoretical counterparts (rounded to 6 post-decimal digits).  Table 6: A population is assumed to be evolving forwards in time according to the autoregression model from Eq. (31), with the overall bias-complemented scaled mutation rate θ * j randomly fluctuating across epochs according to θ * j iid ∼ IG(a = 3, b = 0.01 * (a − 1)). The arithmetic mean of θ * j is then 0.01 and the harmonic mean 0.006667. After J = 1e 6 epochs, the mean, variance, covariance, and correlations for the temporal coefficients ϑ j,n for j between 1 and J and select n are evaluated and compared with their theoretical counterparts (rounded to 6 post-decimal digits).

Mean
Variance Covariance Correlation  Table 7: A population is assumed to be evolving forwards in time according to the autoregression model from Eq. (31), with the overall bias-complemented scaled mutation rate θ * j randomly fluctuating across epochs according to θ * j iid ∼ IG(a = 6, b = 0.005 * (a − 1)). The arithmetic mean of θ * j is then 0.005 and the harmonic mean 0.004167. After J = 1e 6 epochs, the mean, variance, covariance, and correlations for the temporal coefficients ϑ j,n for j between 1 and J and select n are evaluated and compared with their theoretical counterparts (rounded to 6 post-decimal digits).

Appendix B
In this section, we illustrate the connection between the spectral representation of the general biallelic reversible mutation and the biallelic boundarymutation diffusion models with embedded jump processes. It is meant as a rough sketch to further the understanding that while the former is dual to the coalescent with respect to the binomial sampling scheme ( [10,7,61]), the boundary-mutation model has a different dual process because of how mutations are incorporated into the diffusion operator. Sections 8.1, and 8.2 are general summaries drawn from [3], while Section 8.3 makes relationships that are implicit in [3] explicit. A method equivalent to that of [10] is used to find the dual jump processes of the general reversible mutation and boundary-mutation diffusion models in Sections 8.4, and 8.5 respectively, and the resulting transition probabilities of the jump processes are tabulated for the first several steps as a visual aid.

Forward and Backward Diffusion Equations
Let us assume a biallelic system with the proportion of the focal allele denoted as x (0 ≤ x ≤ 1) and t given in units of N . The Kolmogorov forward diffusion equation (FPDE) operating on an appropriate transition rate density φ(x | t) is given as: with boundary conditions determined by the specifics of mutation and selection (and drift). The corresponding backward Kolmogorov partial differential equation (BPDE) is: again with appropriate boundary conditions. Generally, the so-called forwards and backwards operators can be written as: with Q(x) = x(1 − x) representing the effect of genetic drift, and the following adjoint relationship holds: Assume that at the current time t = 0, a binomial sample of size K is taken conditional on x: Assume furthermore that at a historical time t = s, a probability measure ρ(x, t = s) is given, where 1 0 ρ(x | t = s) dx = 1. Set φ(x, t = s) = ρ(x | t = s) and ψ(x, t = 0) = Pr(y | K, x, t = 0); then the marginal likelihood of sample allele frequencies is: where s ≤ t ≤ 0. Note that the marginal likelihood is independent of t. Note that ρ here is an abbreviation for the starting condition φ(x, t = s) = ρ(x | t = s).
Assume furthermore an eigensystem of orthogonal polynomials, where the index n (0 < n < ∞) orders both the eigenvalues λ n , so that λ 0 = 0, λ n > 0 for n ≥ 1, and λ n < λ n+1 , as well as the polynomials B n (x), so that the order of the polynomial B n (x) is n. Note that this is only the case for specific P (x) and specific boundary conditions. Substituting the B n (x) into the BPDE, we obtain: The corresponding forward system can be written as: Note that these relationships require a diagonalised system, which is not what was used in the main text. Assume furthermore that the following orthogonality relationship holds: The binomial sampling probability Pr(y | K, x, t = 0) = K y x y (1 − x) K−y can be expressed as a sum of the B n (x) up to order K: and the initial distribution ρ(x, t = s) can be expressed as a sum of the F n (x):

Forward via Forward Operator
From Eq. (56), the full marginal likelihood of sample allele counts at t = 0 is then: Note that since this scheme moves from t = s forward in time to t = 0, only the FPDE is needed; it is also only necessary to use an expansion up to the sample size K. This system of eigenfunctions is applicable to both the general and the boundary-mutation diffusion models for genetic drift. It requires a change of base with every change of population genetic parameters.

Backward via Jump Process
Instead of going forward in time with the forward operator, as in the main part of the article, we now move backward in time to illustrate the well-documented connections with pure death processes. The binomial likelihood is expanded into a series of backward polynomials to give the probability of the sample conditional on x at time t: The starting condition is also expanded into the forward eigenfunctions to obtain a probability measure at any time s ≤ t ≤ 0: The overall likelihood is obtained by multiplying the forward and backward functions and integrating out the population allele proportion; using the orthogonality relationship (and that t − s = 0) this can be simplified to: Pr(y | K, ρ, t) = Recall that the order of the polynomials B n (x) is n and the eigenvalues are strictly increasing with n, where λ 0 = 0 and λ 1 converge slowly. Thus the orthogonal polynomials are lost over time in a simple death process at a rate increasing with their order. The binomial likelihood Eq. (55) constitutes a polynomial, which when substituted into the backward operator results in a system of first order linear differential equation as a temporal part and polynomials of descending order as spatial part, which we write as (P K , P K−1 , . . . , P n , . . . , P 0 ). In particular, the spatial component can be interpreted as a system of binomial likelihoods of decreasing order. Substituting into the BPDE and observing the boundary conditions results in a recursive system of the following form: L * Pr(i, n) = λ n t i,n | i,n−1 Pr(i, n − 1) + t i,n | i−1,n−1 Pr(i, n) − Pr(i, n) , (66) where (i, n) denotes the sample allele configurations of i focal alleles in a sample of size n. Let e in be the solution of this spatial recursive system, e.g., e in = 1 for the pair (i = y, n = K), e in = t y,n | y−1,K−1 for the pair (i = y − 1, n = K − 1), e in = t y,n | y−1,K−1 t y−1,K−1 | y−2,K−1 for the pair (i = y − 2, n = K − 2), and e in = t y,n | y−1,K−1 t y−1,K−1 | y−1,K−2 + t y,K | y,K−1 t y,K−1 | y−1,K−2 for the pair (i = y − 1, n = K − 2).
Summing over all n and i, we obtain a function that is identical to ψ(x, t): ψ(x, t) = K n=0 Pr(n | K, t)e in Pr(i | n, x) Similarly as with backward orthogonal polynomials, the likelihood of allele configurations can be calculated at any t by multiplying with ψ(x, t) = Pr(x | t, ρ, s) to obtain: Pr(i, n | ρ, t) = where:

General Reversible Biallelic Mutation Model
The backward diffusion operator with the general reversible mutation model is given by: Substituting the binomial likelihood Pr(i | n, x) = n i x i (1 − x) n−i for ψ(x, t), one gets: Note that the rate of the process is λ n = n(n − 1 + θ). Together with the temporal part, this defines a recursive system of ordinary differential equations, which can be solved at any time t ≤ 0.
Multiplying with the stationary distribution: Pr(x | β, θ) = Γ(θ) Γ(αθ)Γ(βθ) x αθ−1 (1 − x) βθ−1 (74) and integrating, the probabilities of a sample configuration (i, n) conditional on the sample size n and the transition probabilities (from a sample (i − 1, n − 1) to (i − 1, n) and (i, n)) can be determined (see Table 8).These transition probabilities result in an identical probability distribution to that determined in [10], where an ordered likelihood is used (the transition from the ordered to the unordered probabilities corresponds to hypergeometric sampling)

Boundary-mutation Model
The interior of the boundary-mutation diffusion model is purely determined by drift: and the boundary conditions balance the probability flow outwards due to drift: and inwards due to mutation: Note that the eigenvalues are λ 0 = 0, λ 1 = θ, and λ n = n(n − 1) for n ≥ 2. The corresponding backward diffusion is given by: which corresponds to the pure drift model.
Once again substituting the binomial likelihood Pr(i | n, x) = n i x i (1−x) n−i for the backward diffusion operator, we obtain: There are several things to be noted here: For i ≤ 1 or n − i ≤ i or both, the appropriate terms in the above equation become zero. The idea of the boundarymutation model is that the flow of probability caused by mutation and drift balance at the boundaries. Furthermore, mutation rates should be adjusted so the first order expansion of the probability of obtaining a heterozygous sample when K = 2 is identical to that in the general mutation model (compare [82,Eqs. 3,4]). For i = 1 and n ≥ 2, the first term in the last line of Eq. (79), which would then be i−1 n−1 Pr(i = 0 | n − 1, x), needs to be replaced with where H n−2 = n−2 i=1 1 n is the harmonic number; the analogous replacement is required for n − i = 1 and n ≥ 2. For i = 1 and n = 1, these terms must be set 1 − β and β, respectively.
Multiplying the conditional probabilities with the stationary distribution of the boundary-mutation diffusion model (Eq. 47), the probabilities of a sample of (i, n) conditional on the sample size n and the transition probabilities (from a sample (i − 1, n − 1) to (i − 1, n) and (i, n)) can be calculated (Table 9).

Numerics of jump processes vs orthogonal polynomials
We note that calculating the marginal likelihood involves summation over (K + 1)(y k + 1) states with the jump process. Modeling changing population sizes with a jump process is even more difficult [see 17, for an application to the infinite site model without recombination]. Contrast this with the polynomial approaches: a decomposition into K + 1 polynomials and a change of base with any change of mutation parameters is needed with the Jacobi polynomials, while with the approach in this article only (K − 1)/2 polynomials and no change of base is needed, which makes our approach numerically favorable.