Coupling adaptive molecular evolution to phylodynamics using fitness-dependent birth-death models

Beneficial and deleterious mutations cause the fitness of lineages to vary across a phylogeny and thereby shape its branching structure. While standard phylogenetic models do not allow mutations to feedback and shape trees, birth-death models can account for this feedback by letting the fitness of lineages depend on their type. To date, these multi-type birth-death models have only been applied to cases where a lineage’s fitness is determined by a single character state. We extend these models to track sequence evolution at multiple sites. This approach remains computationally tractable by tracking the genotype and fitness of lineages probabilistically in an approximate manner. Although approximate, we show that we can accurately estimate the fitness of lineages and site-specific mutational fitness effects from phylogenies. We apply this approach to estimate the population-level fitness effects of mutations in Ebola and influenza virus, and compare our estimates with in vitro fitness measurements for these mutations.


Introduction
The fitness effects of new mutations is a key determinant of a population's evolutionary potential to adapt over time. Studies exploring the distribution of fitness effects (DFE) in a wide range of organisms have revealed that, while many mutations are neutral, a smaller but significant fraction have substantial effects on fitness (Sanjuán et al., 2004;Eyre-Walker and Keightley, 2007;Visher et al., 2016). These findings have spurred interest in molecular evolutionary models that consider how non-neutral mutations shape sequence evolution and patterns of genetic diversity. Such models range in complexity from simple models assuming that selection operates uniformly across all sites (Muse and Gaut, 1994;Goldman and Yang, 1994;Yang and Nielsen, 2008) to parameter rich models with site-specific fitness effects (Halpern and Bruno, 1998;Lartillot and Philippe, 2004;Rodrigue et al., 2010;Hilton and Bloom, 2018). While all of these models assume sequences evolve along an underlying phylogenetic tree representing their shared common ancestry, all also assume that the mutation process driving sequence evolution is independent of the other evolutionary processes giving rise to the tree. This independence assumption implies that mutations do not feedback and affect the fitness of lineages in the tree, such that lineages carrying highly beneficial mutations are just as likely to survive and produce sampled descendants as lineages riddled with deleterious mutations ( Figure 1A).
While questionable in terms of biological realism, independence between the tree generating process and the mutation process allows for tractable statistical models. Assuming independence, the joint likelihood of a phylogenetic tree T and the sequence data S at the tips of the tree having evolved as observed can be factored into two distinct components: LðS; T j; Þ ¼ LðSjT ; ÞpðT jÞ: (1) The likelihood of the sequence data LðSjT ; Þ conditional on the tree and the mutational parameters can be computed efficiently for most continuous-time Markov models of sequence evolution (Felsenstein, 1981). The probability density pðT jÞ of the tree T given the parameters generating the tree can likewise be computed under widely used coalescent (Griffiths and Tavaré, 1994;Pybus et al., 2000) or birth-death models (Rannala and Yang, 1996;Stadler, 2009). In Bayesian phylogenetics, pðT jÞ is normally thought of as the prior distribution over trees rather than a likelihood, because the tree itself is inferred from the sequence data. Standard phylogenetic models assume that there is an underlying process by which individuals replicate and give rise to a phylogeny. Mutations occur along the lineages of the tree, generating the sequence data observed at the tips. The mutation process is assumed to be independent of tree generating process, such that mutations do not impact the branching structure of the tree. (B) The MFBD allows us to relax this assumption, such that mutations at multiple sites feedback and shape both the tree and sequence data. (C) Under the original multi-type birth-death model we track D n;i ðtÞ, the probability density that a lineage n at time t in state i produces the subtree descending from n and the observed tip states. We also track E i , the probability that a lineage produces no sampled descendants and is therefore unobserved. (D) In the MFBD model we instead track D n;k;i ðtÞ, the probability that a lineage n in state i at site k produces the subtree and the observed tip states at site k.
Because the fitness of a lineage f n will depend on its genotype at all sites, we use the marginal site probabilities ! to compute the probability that a lineage has a certain genotype, such as ACT (Approximation 1). We can then marginalize over the fitness of each genotype weighted by its approximate genotype probability to compute the fitness f n of a lineage (Approximation 2). Finally, we need to know the probability E n that a lineage left no other sampled descendants, which we approximate using the probability E u that a lineage with same expected fitness u leaves no sampled descendants (Approximation 3). The schematic in A was reproduced from the original figure by Louis du Plessis (https://github.com/Taming-the-BEAST/TechnicalLectureSources/tree/master/BeastIntro2018) with permission under a Creative Commons license. DOI: https://doi.org/10.7554/eLife.45562.002 The assumption of independence between the mutation and tree generating processes may be unproblematic in certain scenarios, such as if mutations are truly neutral or do not contribute to substantial fitness differences among lineages. A common argument invoked in defense of ignoring non-neutral mutations is that macroevolutionary tree generating processes like speciation and extinction play out on longer timescales than the substitution process fixing or removing mutations within a population (Bustamante, 2005). In this case, fitness variation drives the substitution process within a population but does not ultimately drive the formation of a phylogeny at the species level. But such separation-of-timescales arguments do not hold when segregating mutations contribute to substantial fitness variation between lineages in a phylogeny, such as for rapidly evolving microbes where several different mutant strains can co-circulate. In these cases, the tree generating and mutation processes occur on the same timescale, and the fitness effects of mutations can feedback and shape the branching structure of a phylogeny (Kaplan et al., 1988;Nicolaisen and Desai, 2012;Neher and Hallatschek, 2013). Ignoring non-neutral evolution in this case may introduce biases into phylogenetic inference. But perhaps more importantly, fitness differences among lineages can be correlated with ancestral genotypes, providing information about the molecular basis of adaptive evolution we would otherwise ignore.
We therefore explore an approach that couples molecular sequence evolution to the tree-generating process using multi-type birth-death (MTBD) models. Under this approach, mutations can directly impact the fitness of a lineage in the phylogeny by altering its birth or death rate ( Figure 1B). For a single evolving site or other character state, the joint likelihood of the phylogeny together with the observed tip states can be computed exactly under the MTBD model (Maddison et al., 2007;Stadler and Bonhoeffer, 2013;Kühnert et al., 2016). However, this approach is impractical for more than a few non-neutrally evolving sites due to the need to track all possible genotypes as separate types in the state space of the model. We therefore explore an approximate birth-death model that considers how mutations at multiple sites contribute to a lineage's overall fitness, without the need to track all possible genotypes in sequence space. This approach allows us to infer the fitness effects of individual mutations and the fitness of any particular lineage at any time (based on its inferred ancestral genotype) from the branching structure of a phylogeny. Because our approach is particularly relevant to rapidly adapting microbial pathogens, we apply it to Ebola and influenza virus sequence data in order to quantify the fitness effects of naturally occurring amino acid substitutions.

Materials and methods
The MTBD at a single evolving site At a single evolving site, the multi-type birth-death (MTBD) model of Stadler and Bonhoeffer (2013) can be used to compute the joint likelihood LðS; T j; Þ of the sequence or character state data S and phylogenetic tree T in a way that couples the mutation process with changes in fitness along a lineage. Let D n ðtÞ represent the probability density (i.e. the likelihood) that the subtree descending from lineage n evolved between time t and the present exactly as observed ( Figure 1C). Further, let D n;i ðtÞ represent this probability density conditional on lineage n being in state i out of M possible states at time t. Here the state of a lineage refers to a particular allele or character state (e.g. nucleotide or amino acid) at a single site. We reserve the term genotype to refer to a particular configuration of states across multiple sites in a sequence.
The density D n;i ðtÞ can be computed going backwards in time from the present (t ¼ 0) to time t along a lineage by numerically solving a system of ordinary differential equations: Here, l i is the birth rate and d i is the death rate of lineages in state i, and thus reflect a lineage's fitness. Mutations between states i and j occur at a rate g i;j , independently of birth events. Each term in (Equation 2) describes how D n;i changes through time by accounting for all of the different events that could have occurred along the lineage. The first term (a) considers the change in probability density given that no birth, death or mutation event occurred. The second term (b) considers the probability of a birth event that went unobserved because one of the child lineages produced no sampled descendants (this event has probability E i ðtÞ, see below). The third term (c) reflects the probability that the lineage mutated from state i to j. E i ðtÞ represents the probability that a lineage in state i is not sampled and has no sampled descendants. This probability can be computed at any time t by solving a second set of ODEs: The first term (a) reflects the probability that a lineage dies and is not sampled, where s i is the probability that a lineage in state i is sampled upon dying. Terms b-d have similar interpretations as in (Equation 2).
At a tip lineage n, we initialize D n;i ðtÞ ¼ d i s i if the lineage was sampled upon death at time t. Alternatively, if n was sampled at the present time t ¼ 0 before dying, then D n;i ðtÞ ¼ i , where i is the probability that an individual in state i was sampled at present. At a branching event, the probability density D a;i of the parent lineage a in state i giving rise to two descendent lineages n and m is updated as: The factor of two enters because either lineage m or n could have given birth and we must consider both possible events.
At the root, we can compute the probability density of the entire tree by summing over all possible root states: where q i is the prior probability that the root is in state i at time t root . Including the term 1 À E i ðt root Þ in the denominator conditions the birth-death process on giving rise to at least one sampled individual. D n represents the probability that the entire tree and the tip states S evolved as exactly as observed. It is therefore equivalent to the joint likelihood LðS; T j; Þ we seek where ¼ fgg and ¼ fl; d; sg.
In theory, this approach could be extended to evolution at any number of sites as long as we track D n;i ðtÞ for all possible genotypes i. Unfortunately, this approach has limited utility because the number of possible genotypes in sequence space scales exponentially with the number of sites L (i.e. 4 L possible genotypes for nucleotide sequences), making the MTBD model impractical for modeling evolution at more than a few sites.

The marginal fitness birth-death model
While the fitness of a lineage will generally depend on its genotype across multiple sites, tracking evolution in the space of all possible genotypes is, as just discussed, computationally infeasible. We therefore seek an approach that considers how mutations at multiple sites determine the fitness of a lineage without the need to track D n;i for all possible genotypes. In the approach described below and outlined in Figure 1D, we therefore track molecular evolution at each site, computing the probability that each site occupies each state, and then approximate the probability of a lineage being in any particular genotype based on these site probabilities. To compute the expected fitness of a lineage, we can then sum, or marginalize, over the fitness of each genotype weighted by its approximate probability. We therefore refer to this approach as the marginal fitness birth-death (MFBD) model.
First, in order to couple a lineage's fitness with the birth-death process, we will assume that the birth rate l n of any lineage n scales according to the fitness f g of its genotype: where l 0 is the base birth rate assigned to a particular reference genotype (e.g. the wildtype). A lineage's death rate can also be coupled to its fitness, but for simplicity we will assume a lineage's fitness is reflected only in its birth rate l n .
Let G be the set of all possible genotypes in sequence space and g k be the state of genotype g at site k. To make it clear when we are considering evolution in genotype space rather than at a particular site, we will write the probability density D n;i as D n;g when i refers to a particular genotype. Furthermore, let D n;k;i be the probability density of the subtree descending from lineage n given that site k is in state i. By definition, where the sum is over all genotypes in G with site k in state i. We can derive a difference equation for D n;k;i from D n;g in a straightforward manner: Taking the limit as Dt ! 0, we get a new system of differential equations for D n;k;i ðtÞ: d dt D n;k;i ðtÞ ¼ X fg2G:gk ¼ig ½ À ðf g l 0 þ X M j¼1 X fg 0 2G:g 0 k ¼jg g g;g 0 þ dÞD n;g ðtÞ þ2f g l 0 E n;g ðtÞD n;g ðtÞ þ X M j¼1 X fg 0 2G:g 0 k ¼jg g g;g 0 D n;g 0 ðtÞ Unfortunately, (Equation 9) would still require us to track D n;g ðtÞ for all possible genotypes, precisely what we wish not to do. We show below that, if we can approximate f g and E n;g for any given lineage, we can write (Equation 9) in terms of only D n;k;i (see (Equation 19)) and therefore do not need to track each genotype.

Approximating the fitness of a lineage
We begin by approximating the fitness f n of a lineage n. Even if we do not know the exact genotype of a lineage at a particular time, we can compute the lineage's expected fitness by summing over the fitness of each genotype f g weighted by the probability ! n;g that lineage n is in genotype g: The same logic can be extended to compute the expected marginal fitness Eðf n;k;i Þ of a lineage n that at site k is in state i: Eðf n;k;i Þ ¼ X fg2G:gk ¼ig f g ! n;g : Computing Eðf n;k;i Þ using (Equation 11) requires knowledge of the genotype probabilities ! n;g , which would again require us to track evolution in genotype space. We therefore introduce our major assumption: that we can approximate genotype probabilities using only the marginal site probabilities ! n;k;i that site k is in state i. We describe how we compute ! n;k;i below. For now, we make the approximation that! n;g ¼ Q L k¼1 ! n;k;gk P g2G Q L k¼1 ! n;k;gk : This approximation assumes that all sites evolve independently of one another, which is not generally true because mutations at different sites are linked together in genotypes with shared ancestral histories, creating correlations among sites that we ignore.
Using the approximate genotype probabilities! n;g , we can in turn approximate the expected marginal fitness of a lineage:f n;k;i ¼ X fg2G:gk ¼ig f g!n;g : If the fitness effects of each site act multiplicatively to determine the overall fitness of a lineage, we can computef n;k;i as:f where s ki is the fitness effect of site k being in state i. This formulation off n;k;i is useful if the number of sites L is large and the number of genotypes we need to sum over in (Equation 13) is therefore also extremely large.

Approximating the probability of no sampled descendants
The E n;g ðtÞ term in (Equation 9) represents the probability that a lineage n alive at time t in the past is not sampled and leaves behind no sampled descendants. E n;g ðtÞ therefore necessarily depends on the fitness of unobserved lineages descending from n and how fitness along these lineages evolves through changes in their genotype. Because it is often easier track evolution in one dimensional fitness space rather than high-dimensional sequence space (Kepler and Perelson, 1993;Tsimring et al., 1996), we simplify this problem by tracking a proxy for E n;g ðtÞ though fitness space.
Let E u be the probability that a lineage with expected fitness u leaves no sampled descendants. While fitness can take on a continuous range of values, we track these probabilities only for a discrete set of points V in fitness space. We can track E u for u 2 V by modifying (Equation 3) to obtain: We can then substitute E n;g ðtÞ in ( Equation 9) with E u for the fitness value u closest to f g orf n;k;i in fitness space.
Tracking evolution in fitness space requires us to specify rates g u;v for how lineages transition between fitness classes u and v. Let G u be the set of genotypes with expected fitness closest to u out of all fitness values in V. We approximate g u;v as: where ij is the mutation rate between genotypes i and j. In other words, we compute the average rate of transitions out of fitness class u into v by summing over all possible transitions between genotypes contained within each fitness class. Note that if each genotype falls in a unique fitness class such that jG u j ¼ 1 for all u 2 V, then E u is computed exactly. In the Results, we compare using the approximate transition rates above to compute E u versus an even simpler approximation where we assume no transitions between fitness classes along unobserved lineages, which has been assumed in earlier multi-type birth-death models (Rabosky et al., 2014;Barido-Sottani et al., 2018).

Computing the marginal site densities D n;k;i
Recall that (Equation 9) provided an exact way to track the marginal site densities D n;k;i based on the genotype densities D n;g . To efficiently evaluate D n;k;i without the need to track D n;g for all genotypes, we apply the three approximations made above. First, we approximate the genotype probabilities! n;g based on the marginal site probabilities. Second, we marginalize over the fitness of each genotype (weighted by its genotype probability) to computef n;k;i and then substitutef n;k;i for f g for all genotypes where g k ¼ i below. Third, we approximate E n;g by E u for a single fitness value u closest tof n;k;i . Making these approximations in (Equation 9) leads to: Assuming that the mutation rate from i to j at site k does not depend on the genetic background, we can substitute P M j¼1 P fg 0 2G:g 0 k ¼jg g g;g 0 with P M j¼1 g i;j , where g i;j is the per site mutation rate. We can likewise substitute P M j¼1 P fg 0 2G:g 0 k ¼jg g g;g 0 D n;g 0 ðtÞ with P M j¼1 g i;j P fg 0 2G:g 0 k ¼jg D n;g 0 ðtÞ. Making these substitutions and rearranging the sums in (Equation 17), we have: Recalling that D n;k;i ¼ P fg2G:gk ¼ig D n;g (and by extension D n;k;j ¼ P fg2G:gk¼jg D n;g ), we have: The significance of (Equation 19) is twofold. First, we can track sequence evolution at each site individually without tracking all genotypes. Second, givenf n;k;i , we can track the overall fitness of a lineage by marginalizing over the fitness effects of all possible mutations at other sites. We can therefore track sequence evolution at each site while simultaneously taking into account the coupled fitness effects of mutations at all other sites on a lineage's fitness. Computingf n;k;i still requires us to approximate the genotype probabilities using (Equation 12), which in turn requires the marginal site probabilities ! n;k;i . In our notation, ! n;k;i represents the conditional probability pðijT n ; S n Þ that lineage n is in particular state i, where T n represents the subtree descending from n with tip sequences S n represents the inverse conditional probability density pðT n ; S n jiÞ. We can therefore apply Bayes theorem to compute ! n;k;i given D n;k;i : ! n;k;i ¼ pðijT n ; S n Þ ¼ pðT n ; S n jiÞqðiÞ P M i pðT n ; S n jiÞqðiÞ ¼ D n;k;i qðiÞ P M i D n;k;i qðiÞ : The qðiÞ terms represent the prior probability that the lineage is in state i. Here we make a simplification in assuming that the tree ancestral and sister to lineage n has no information regarding ! n;k;i , and thus assume a uniform prior on qðiÞ ¼ 1=M. The qðiÞ terms therefore cancel above.
Because the fitness of a lineage depends on the state of all sites, we must solve (Equation 19) for all sites simultaneously as one coupled system of differential equations. This requires updating D n;k;i at each time step, which suggests the following iterative procedure.
At a tip n observed to be in genotype g, we initializef n;k;i as f g if g k ¼ i or elsef n;k;i ¼ 0,D n;k;i ¼ ds or , and ! n;k;i ¼ 1 if g k ¼ i, else ! n;k;i ¼ 0. Then at each time step backwards through time from time t to time t þ Dt, for each site and state we: 1. Update D n;k;i by numerically integrating (Equation 19) over time step Dt. 2. Update the marginal site probabilities ! n;k;i using (Equation 20) 3. Update the expected marginal fitness valuesf n;k;i using (Equation 13) or (Equation 14).

Computing the full joint likelihood
We can now compute the joint likelihood of the tree and sequence data if we track D n;k;i at each site back to the root. At the root, D n;k;i ðt root Þ represents pðT ; S k j; ; iÞ, the probability density of the entire tree T and the observed sequence data S k as site k, conditional on site k being in state i at the root. To be precise, D n;k;i only approximates pðT ; S k j; ; iÞ because we computed D n;k;i using the expected marginal fitness of a lineagef n;k;i based on approximate genotype probabilities. We therefore introduce an additional auxiliary variable F representing the entire set of expected fitness valuesf n;k;i computed over all lineages, sites and states. Using this notation, D n;k;i ðt root Þ ¼ pðT ; S k j; ; F ; iÞ. By summing over all possible root states at site k (and conditioning on survival), we can then compute: Likewise, we can compute the conditional probability density pðS k jT ; ; ; F Þ of the sequence data at site k given the tree: We already know pðT ; S k j; ; F Þ from above but now need the tree density pðT j; ; F Þ. This can easily be computed using a birth-death process where the birth rate of each lineage at any time t is always rescaled by its expected fitnessf n ðtÞ contained within F .
We can now compute the joint density pðT ; S 1:L j; Þ for all sites. Because each site is conditionally independent of all other sites given F , we can factor pðT ; S 1:L j; ; F Þ into a product of densities for S k at each site and the density of the entire tree T : We can thus approximate the joint likelihood of the sequence data and the phylogeny pðT ; S 1:L j; Þ as pðT ; S 1:L j; ; F Þ. This allows us to consider how selection shapes sequence evolution at each site while simultaneously considering how the fitness effects of mutations at multiple sites act together to shape the phylogeny. As (Equation 23) makes clear though, the goodness of our approximation depends on how well the fitness values in F are approximated, which in turn depends on how well we can approximate genotypes based on the marginal site probabilities. We explore the goodness of these approximations in the Results section.

Implementation
We first implemented the marginal fitness birth-death (MFBD) model in Matlab version R2017b. The Matlab implementation was used to test how well the MFBD model can approximate likelihoods and genotype probabilities relative to the exact multi-type birth death model tracking all possible genotypes for a simple model with only four genotypes. For statistical inference, the MFBD was implemented as an add-on package for BEAST 2 (Bouckaert et al., 2014) named Lumiè re, which extends the existing BDMM package for multi-type birth-death models (Kühnert et al., 2016). BEAST two is a general software platform that allows a wide range of evolutionary models including birth-death models to be fit to phylogenetic trees while jointly inferring the phylogeny using Bayesian MCMC sampling. The BEAST 2 implementation of Lumiè re therefore allows the joint posterior distribution of all parameters in the MFBD model and the phylogeny to be estimated from sequence data. Source code for Lumiè re and the Matlab implementation are freely available at https://github. com/davidrasm/Lumiere.

Simulations
To test the statistical performance of our approach, mock phylogenies and sequence data were simulated under a birth-death-mutation-sampling process using a variant of the Gillespie stochastic simulation algorithm (Gillespie, 2007) that recorded the ancestry of all individuals in the population. A binary sequence was associated with each lineage and allowed to mutate with a constant per-site mutation rate g. Mutations could alter the fitness of a lineage by either increasing or decreasing its birth rate according to site-specific fitness effects. At death events, lineages were sampled with probability s, in which case they were included in the mock phylogeny. Code for these simulations is available at https://github.com/davidrasm/Lumiere/tree/master/sim (Rasmussen, 2019; copy archived at https://github.com/elifesciences-publications/Lumiere).

Ebola analysis
We used the Lumiè re implementation of the MFBD model to estimate the fitness effects of amino acid mutations previously identified to increase the infectivity of Ebola virus in human cell lines (Diehl et al., 2016;Urbanowicz et al., 2016). We reanalyzed a set of 1610 whole genome EBOV sequences sampled from Guinea, Sierra Leone and Liberia in 2014 to 2016. The sequence alignment along with the time-calibrated molecular phylogeny we used for our analysis were downloaded from https://github.com/ebov/space-time/tree/master/Data. (Urbanowicz et al., 2016) measured the fitness effects of 17 viral genotypes carrying 18 different amino acid mutations in either single, double or triple mutant backgrounds relative to the Makona genotype first sampled at the beginning of the epidemic. Because our methods cannot estimate fitness effects of mutations at very low frequencies, we only analyzed 9 of these mutations that were present in at least 10 of the 1610 viral samples. Preliminary analysis revealed that these mutations fall within eight unique genetic backgrounds because of the way mutations are nested within other single or double mutant lineages in the phylogeny. Because the data of Urbanowicz et al. (2016) strongly suggest that epistatic interactions between mutations affect viral fitness, we estimated the genotypic fitness f g of these eight major genotypes rather than site-specific fitness effects s. We therefore used the MFBD to track sequence evolution at each site, but used (Equation 13) to marginalize over these genotypes when approximating the fitness of a lineage.
We estimated the fitness of each genotype relative to the Makona genotype, assuming a uniform ½0; 2 prior distribution on these fitness values. For the other parameters in the model, we assumed a fixed death or removal rate d of 0.1667 per day based on earlier estimates (Gire et al., 2014;Stadler et al., 2014). Sampling was modeled as occurring upon removal, with the sampling proportion s set to zero before March 2014, when the first sample was collected. After March 2014, we assumed a fixed sampling proportion of 0.056, reflecting the fact that the dataset included samples from 1610 individuals out of the 28,652 probable cases reported by the WHO (WHO, 2016). Lastly, we assumed a constant amino acid mutation rate over all sites with an exponential prior on both the forward and backward mutation rate with a mean rate of 2 Â 10 À3 per site per year. We also ran a second analysis where we included the geographic locations of lineages (Guinea, Sierra Leone and Liberia) as an additional evolving character state in our model. In this analysis, we estimated the effect of geographic location on transmission rates in Sierra Leone and Liberia relative to the base transmission rate in Guinea. Both analyses can be reproduced in Lumiè re with the XML input files available at https://github.com/davidrasm/Lumiere/tree/master/ebola.

Influenza H3N2 analysis
We used the Lumiè re implementation of the MFBD model to estimate the fitness effects of amino acid mutations in the hemagglutinin (HA) protein of human influenza virus subtype H3N2. In order to ensure our fitness estimates were directly comparable to the mutational fitness effects previously estimated by Lee et al. (2018), we focused our analysis on viral samples in the same antigenic cluster as the A/Perth/16/2009 strain studied by Lee et al. for two reasons. First, Lee et al. (2018) showed that the fitness effects of amino acid mutations in HA vary depending on the genetic background, with greater fitness differences between more divergent strains. We therefore only considered strains with low genetic divergence from A/Perth/16/2009. Second, the deep mutational scanning experiments were performed in cell culture, and therefore do not reflect the antigenic component of viral fitness in the human population. Only considering a single antigenic cluster therefore minimizes the effect of antigenic mutations.
To further minimize additional background variation in fitness due to geography, we only considered samples collected in the United States from January 2009 to the end of 2012. Overall, we downloaded 2150 sequences from the Influenza Research Database (https://www.fludb.org/) that met these criteria. Nucleotide sequences of the HA segment were aligned in Muscle (Edgar, 2004) and a maximum likelihood phylogenetic tree was estimated in RAxML (Stamatakis, 2014) using a GTR + Gamma substitution model. To get a time-calibrated phylogeny, branch lengths in the ML tree were converted into units of real calendar time with Least Squares Dating v0.3  using a previously estimated molecular clock rate for the HA segment of H3N2 of 5.72 x 10 À3 substitutions per site per year (Rambaut et al., 2008).
In our first analysis, we estimated mutational fitness effects from the H3N2 phylogeny under the MFBD model assuming that fitness effects are multiplicative across sites, as in (Equation 14). Because of the large number of naturally occurring mutations in the HA sequences, we limited our analyses to the 17 most abundant amino acid mutations that were present in more than 10% of the sampled sequences. To compare our estimates of population-level fitness effects to fitness effects measured in vitro, we converted the relative amino acid preferences at each site from the deep mutational scanning experiments to mutational fitness effects: where p k;i is the relative preference for amino acid i at site k. To compute these fitness effects, we used the averaged relative amino acid preferences reported in Dataset S3 of Lee et al. (2018).
In our second analysis, we used the relative preference data from the deep mutational scanning experiments to predict the population-level fitness of viral lineages. For this analysis, we considered all naturally occurring mutations in the HA protein that were present in at least 10 samples. In all, the fitness effects of 67 mutations distributed across 56 sites were included. To map relative amino acid preferences across multiple sites to population-level fitness, we assume that the mutational fitness effects computed from the relative amino acid preferences are additive on a log 2 scale, such that the fitness f n of a lineage is: Here, a is a linear scaling term that allows us to calibrate population-level fitness in terms of the sum of the site-specific fitness effects. We also include the scaling exponent k to account for curvature in the fitness landscape, as might be expected to arise if mutations interact globally through synergistic (k>1) or antagonistic (k<1) epistatic effects across sites (Elena et al., 2010). A complete list of the HA mutations considered, their fitness effects predicted by DMS and the XML input file needed to reproduce our analysis are available at https://github.com/davidrasm/Lumiere/tree/master/influenzaH3N2.

The four genotype model
We first consider a simple model of molecular evolution in order to compare the marginal fitness birth-death (MFBD) model against the exact multi-type birth-death (MTBD) model tracking all genotypes. Specifically, we consider a binary evolving sequence of length L ¼ 2 where all mutations are deleterious and carry a selective fitness cost s. Fitness effects of individual mutations act multiplicatively, such that the double mutant has fitness ð1 À sÞ 2 . With this simple model, it is therefore possible to track the evolutionary dynamics of all four genotypes (G ¼ f00; 01; 10; 11g) under both models. Figure 2A shows a phylogeny simulated under the four genotype model, colored according to the genotype of each lineage. We computed the joint likelihood that this tree and observed tip genotypes evolved under a range of different fitness values s for both the exact MTBD and approximate MFBD models ( Figure 2B). The likelihood profiles under both models peak around the true value of s and closely match at lower values of s, but begin to diverge at higher values. The probability of a single hypothetical lineage being in each genotype approximated under the MFBD model is also shown against the exact genotype probabilities computed under the MTBD in ( Figure 2C).
Because the MFBD approximates the probability of a lineage being in each genotype based on the marginal sites probabilities, we also compared how well the MFBD model approximates the genotype probability densities D n;g relative to the exact multi-type birth-death model. Recall that D n;g provides the probability that the subtree descending from a lineage n has evolved exactly as observed, and therefore forms the foundation of all likelihood calculations under our model. Averaged over all genotypes, the error introduced by approximating D n;g under the MFBD model is greatest at intermediate mutation rates (Figure 3A). When there is no selection (s ¼ 0), the MFBD introduces no error, but the error increases as the strength of selection increases ( Figure 3B). We can also consider a variant of the four genotype model where each of the single mutant genotypes is neutral with s ¼ 0 but an epistatic interaction between the two sites causes the double mutant to be deleterious with some fitness cost . Again, the error introduced by the MFBD grows as the strength of the epistatic fitness effect increases ( Figure 3C).
Taken together, these results suggest that the MFBD model introduces error in D n;g by ignoring correlations among sites due to the fact that selection acts at the level of genotypes, especially when epistasis is strong. The additional correlations between sites induced by selection then causes the genotype probabilities to deviate from those expected based on the marginal site probabilities. Conversely, at very high mutation rates, correlations between sites quickly break down so that sites evolve effectively independently of one another, such that the error introduced by the MFBD also decreases as the mutation rate becomes very high. Overall, the magnitude of the error introduced by approximating the genotype probabilities is small, especially when we can compare the MFBD model against a more naive approximation that tracks sequence evolution at each site completely independent of all other sites by setting the expected marginal fitnessf n;k;i ¼ s instead of using (Equation 14). This approximation completely ignores how the fitness of a lineage depends on mutations at other sites, and the error in D n;g is generally considerably greater than under the MFBD model ( Figure 3A-C; dashed lines). Moreover, even when the error introduced by the MFBD model is relatively large, the model still tracks the dynamics of D n;g backwards through time along a lineage well (Figure 3D-F; for parameter values marked by the black asterisks in A-C).
The MFBD model also approximates E n , the probability that a lineage has no sampled descendants, using a discretized fitness space and is therefore another source of potential error. Mirroring the results for D n;g , the error introduced by this approximation peaks at intermediate mutation rates while it increases monotonically with the strength of selection and epistatic fitness effects ( Figure 4A-C). Interestingly, tracking how lineages transition between fitness classes in fitness space does not improve the approximation relative to simply ignoring changes in fitness along unobserved lineages ( Figure 4A-C; dashed lines). The overall magnitude of error introduced by approximating E n is also small, although using a discretized fitness space does lead to some jaggedness in the dynamics of E n ( Figure 4D-F). However, only when selection is very strong (s>0:8) does tracking E n in fitness space result in significant errors, and then only in the more distant past ( Figure 4E). In this case, a lineage's fitness in the distant past may be a poor predictor of its probability of leaving sampled descendants at a time point in the distant future because the fitness of the lineage and its descendants may greatly change over time in a way that is difficult to predict without considering the exact mutational pathways through which a lineage can move in sequence space.

Estimating site-specific fitness effects
Next, we simulated phylogenies under a model where the fitness effect of the mutant allele at each site is drawn independently from a distribution of fitness effects (DFE) in order to test how well we can estimate site-specific fitness effects. Because there can be considerable uncertainty surrounding these fitness effects, we now estimate the posterior distribution of fitness effects using Bayesian MCMC. The accuracy and precision of the estimated fitness effects varies considerably across sites, as shown for a representative phylogeny with five evolving sites in Figure 5.
In order to better understand this variability, we simulated 100 phylogenies with randomly drawn fitness effects at either 2, 5 or 10 evolving sites. Overall, the estimated posterior median fitness effects are well correlated with their true values, although the strength of this correlation decreases as the number of sites increases ( Figure 6A-C). Coverage of the 95% credible intervals on the other hand increased from 71.0 to 72.8% to 77.4%.
While there is no systematic directional bias, fitness effects are underestimated for sites at which the mutant allele is at low frequency among sampled individuals and overestimated for sites where  the mutant allele is at high frequencies. This however appears to be an intrinsic feature of estimating fitness effects from the branching structure of a phylogeny, as the same phenomena is observed under the exact MTBD model with two sites and four genotypes ( Figure 6D), and the estimates made under the approximate MFBD model are highly correlated with estimates made under the exact MTBD model ( Figure 6E). Across all sites and simulations, accuracy decreased when the mutant allele at a given site was at low or high frequencies, and there was considerably more uncertainty for sites where the mutant allele was at very low frequencies ( Figure 6F). Thus, while the MFBD model generally performs well at estimating site-specific fitness effects, the accuracy and precision of these estimates varies greatly depending on the frequency of a given mutation in a phylogeny.

Ebola virus adaptation to humans
The Ebola virus glycoprotein (GP) binds to cells during viral cell entry and is therefore thought to be a key determinant of viral fitness in different hosts. Previously, (Urbanowicz et al., 2016) analyzed a large set of naturally occurring amino acid mutations in the GP isolated from patients during the 2013-16 epidemic in Western Africa. The effect of these GP mutations on fitness were then experimentally determined using infectivity assays in cell culture. Several mutant genotypes dramatically  In all simulations, sites where the Effective Sample Size of the MCMC samples was below 100 (less than 5% of all sites across simulations) were discarded. The death rate was fixed at d ¼ 0:05 but the birth, mutation and sampling rates were randomly drawn for each simulation from a prior distribution: l Uniform(0.1,0.2); g Exponential(0.01); s Uniform(0,1). Only the birth rate was jointly inferred with the site-specific fitness effects. DOI: https://doi.org/10.7554/eLife.45562.007 increased viral infectivity relative to the Makona genotype isolated during the earliest stages of the epidemic. However, the effect of these mutations on viral transmission and fitness at the host population level have not yet been determined. We therefore applied the MFBD model to a large dataset of 1610 Ebola virus (EBOV) genomes sampled during the 2013-16 epidemic to infer the populationlevel fitness effects of these GP mutations.
We analyzed 9 out of the 18 amino acid mutations analyzed by Urbanowicz et al. (2016) that were present in at least 10 of the 1610 viral samples. These nine mutations fall in eight different genetic backgrounds or genotypes (Figure 7). Because Urbanowicz et al. (2016) found evidence for epistatic interactions between several of these mutations, we estimated the fitness of these eight genotypes rather than site-specific mutational fitness effects. Table 1 shows the relative fitness of these genotypes estimated at the population-level versus their fitness in cell culture.
Mapping the genotypes and fitness of lineages inferred under the MFBD model onto the phylogeny allows us to reconstruct the series of events by which EBOV adapted to humans (Figure 7). Shortly after the epidemic started in 2013, the A82V mutation occurred and gave rise to lineage B, which then spread to Sierra Leone, Liberia and Mali. (Urbanowicz et al., 2016) found that the A82V mutation increases infectivity by 2-3 fold in cell culture. At the population-level, this mutation appears to have a less dramatic effect, increasing transmissibility by only 5% relative to the Makona genotype. The P330S mutation appears to have temporarily decreased the fitness of the main surviving clade in lineage A, although mutations N107D and G480D later rescue the fitness of this lineage, consistent with the findings of Urbanowicz et al. (2016). Meanwhile, the R410S mutation occurred within lineage B but did not have an immediate effect on fitness. However, R410S appears to epistatically interact with mutation K439E, which occurs twice along the same lineage carrying the R410S mutation and in this genetic background increases infectivity 2-3 fold in cell culture. We estimate that the A82V+R410S+K439E genotype had the highest population-level fitness, but only increased fitness by 14% relative to the Makona genotype. Three other mutations, R29K, T230A and I371V, also occurred in the A82V genetic background, but were not estimated to have further increased the fitness of the A82V genotype.
Because the A82V mutation occurred along a lineage that spread from Guinea to Sierra Leone and several of the genotypes we considered were also geographically restricted, we performed a second analysis to check whether our estimates of genotype fitness were confounded by geographic differences in transmission rates. In this model, we accounted for geographic effects by including location (Guinea, Sierra Leone or Liberia) as an additional evolving character state or 'site' in the model. We found no evidence that transmission rates differed by location; relative transmission rates were 1.01 (95% CI: 0.97-1.05) in Sierra Leone and 0.99 (95% CI: 0.94-1.04) in Liberia compared with Guinea. All mutant genotypes had higher estimated fitness relative to the Makona genotype under the model with geographic effects due to a lower estimated fitness of the Makona genotype. However, the rank order of genotypic fitness values is consistent across models (Table 1). Overall, the population level fitness of all eight genotypes agree with their fitness in cell culture in terms of the sign or direction of their effects, but these genotypes had much greater fitness relative to the Makona genotype in cell culture than at the population level.

Influenza H3N2 fitness variation
We also applied the MFBD model to estimate the fitness effects of mutations in the hemagglutinin (HA) protein of human influenza virus subtype H3N2. Lee et al. (2018) recently estimated the relative preference for each amino acid residue at all sites in the HA protein in cell culture using a reverse genetics approach known as deep mutational scanning (DMS). The fitness effect of mutating one amino acid to another is expected to correlate strongly with the relative preference for each amino acid in these experiments. We therefore sought to compare the population-level fitness effects of naturally occurring mutations estimated under the MFBD model with their fitness measured in vitro by DMS.
To minimize the effect of antigenic mutations, which would not be reflected in the DMS experiments, we limited our analysis to viral lineages in the same antigenic cluster as the A/Perth/16/2009 strain studied by Lee et al. (2018). We first estimated the fitness effects of the 17 most abundant mutations that reached a frequency of 10% or greater among viruses sampled in the United States between 2009 and 2012. We found no apparent relationship between the estimated populationlevel fitness effects of these mutations and their in vitro effects, although there is agreement that most of these mutations are nearly neutral ( Figure 8A). Although the 95% credible intervals on our estimates are narrow, these results need to be interpreted with extreme caution because our MCMC algorithm never converged on a stable posterior distribution for several mutations (Effective Sample Size < 10) due to strong correlations between mutations in their estimated fitness effects. This is likely due to the fact that many of these mutations occur only once in the phylogeny and share the same ancestry and therefore genetic background as other mutations in the phylogeny ( Figure 8B).
While our population-level estimates did not correlate with the in vitro data, the fitness effects predicted by DMS correlate strongly with the maximum frequency that naturally occurring mutations reach in the human population (Lee et al., 2018). We therefore sought to test whether using the DMS experimental data to inform the MFBD model about the fitness effects of mutations, rather than estimating them independently from the phylogeny, would result in a better fit of the model to the H3N2 phylogeny and sequence data. Doing so requires a fitness model that aggregates mutational fitness effects across sites and then maps this combined fitness to the population-level fitness of a lineage. In our model, we sum the mutational fitness effects predicted by the relative amino acid preferences across all sites to get a composite predictor of fitness: DMS ¼ P L k log 2 pk;i pk;l . We then use (Equation 25) to map DMS to overall population-level fitness.
Fitting our model to the H3N2 phylogeny allows us to calibrate how the mutational fitness effects based on relative preferences scale to population-level fitness. Overall, large changes in DMS , resulting from mutations to more or less preferred amino acid residues, have a relatively small impact on population-level fitness. Population-level fitness grows slowly and roughly linearly with mutations to more preferred amino acids (Figure 9; inset). Nevertheless, when mutational fitness effects are aggregated across all sites, there are substantial fitness differences between lineages (Figure 9). Relative to a hypothetical lineage bearing the consensus sequence, fitness ranges from 0.84 to 1.04 across lineages with many lineages having a relative fitness less than one, indicating a slightly deleterious mutation load. Accounting for these fitness differences results in the MFBD model informed by the DMS data fitting the H3N2 phylogeny substantially better (Log likelihood: À4184) than a model assuming all mutations are neutral (Log likelihood: À7510). As would be expected, lineages predicted to be more fit also tend to persist between influenza seasons. Most notably, a lineage with higher than average fitness circulates in 2009 and 2010 during the H1N1 pandemic. This lineages carries the T228A mutation, which is predicted to have a large beneficial effect in the DMS  experiments. It is therefore tempting to speculate that this mutation may have conferred an advantage that helped seasonal H3N2 compete with the pandemic H1N1 virus.

Discussion
Many assumptions are made in phylogenetics to model molecular evolution in a statistically tractable way. Historically, one of the most pervasive yet biologically questionable of these assumptions has been that sequences evolve neutrally along lineages, such the mutations do not feedback and alter the branching process shaping the phylogeny. Our marginal fitness birth-death (MFBD) model allows us to relax this core assumption in order to consider how non-neutral evolution at multiple sites affects sequence evolution, the fitness of lineages, and the overall branching structure of a phylogeny. While our approach is not exact in that it approximates genotype probabilities by assuming sites evolve independently when computing the marginal fitness of a lineage, we have shown that this approximation generally works well and only produces significant errors in rather extreme situations, such as the four genotype model with very strong selection or epistasis. While an earlier approach based on birth-death models allowed for lineage-specific fitness values to be inferred from the branching pattern of a phylogeny (Neher et al., 2014), this approach did not connect fitness back to the mutational process nor allow for the fitness effects of individual mutations or genotypes to be estimated. Using our approach, we demonstrated that the fitness effects of specific mutations can be estimated from simulated phylogenies under the MFBD with accuracy comparable to an exact multi-type birth death model. The MFBD model therefore provides a new, statistically powerful way of incorporating adaptive molecular evolution into phylodynamics. The MFBD model allows us to exploit phylogenetic information about adaptive evolution that most methods for inferring selection from patterns in sequence data ignore. Currently, codon-substation models (Goldman and Yang, 1994;Muse and Gaut, 1994) and the related class of mutation-selection models (Yang and Nielsen, 2008) are by far the most widely used approach for inferring selection. These approaches rely on comparing sequence substitution patterns such as the dN/dS ratio of non-synonymous to synonymous substitutions across sites. These approaches can be very powerful when sequences from highly divergent taxa are compared, such that enough time has elapsed for multiple substitutions at a single site to have accumulated between lineages. But on the shorter timescales relevant to evolution within a population, substitution patterns like the dN/dS ratio are relatively insensitive to selection pressures and may produce misleading inferences of selection (Kryazhimskiy and Plotkin, 2008). For example, a highly beneficial non-synonymous mutation that occurs in a single lineage and then spreads through a population may produce a very low dN/ dS ratio, indicative of purifying selection rather than adaptive evolution. In contrast, comparing the evolutionary dynamics of lineages with and without the mutation allows us to infer if that mutation confers a competitive advantage. Thus, considering the branching pattern of phylogenies provides additional information about molecular evolution not visible from substitution patterns in sequence data alone.
While new technologies increasingly allow researchers to quantify mutational fitness effects in vitro or even in vivo (Zanini and Neher, 2013;Thyagarajan and Bloom, 2014), how fitness measured in the lab translates to fitness in nature is largely unknown. This is especially pertinent for emerging pathogens whose epidemic potential often depends on new adaptive mutations (Antia et al., 2003;Longdon et al., 2014). Phylodynamic approaches like the MFBD model that can quantify fitness at the host population level are therefore greatly needed, as they offer a means to assess the epidemiological significance of mutant lineages. Extrapolating from our experience with Ebola, where the population-level fitness effects of each mutant genotype we considered matched the sign of their effect in cell culture, we suspect that fitness measured in the lab will generally agree with fitness in nature. This seems reasonable, as mutations that increase replication or cellular infectivity within hosts should generally promote transmissibility between hosts (e.g. Quinn et al., 2000;Fraser et al., 2007). But at the same time, there is no reason to believe that transmission rates will increase linearly or even monotonically with increasing within-host growth rates. We therefore expect that the magnitude of fitness effects might often greatly differ across scales, as we found for the A82V glycoprotein mutation in Ebola. While A82V doubles infectivity in cell culture, we estimated that it only increases transmissibility at the population level by 5% (95% CI: 4-7%). Interestingly, (Diehl et al., 2016) found that A82V only slightly increases viral titers in Ebola patients, which is likely a much better proxy for transmissibility than cellular infectivity, lending support to our more moderate estimates at the population level.
For influenza, we were unable to reliably estimate the fitness effects of individual mutations from the H3N2 phylogeny. We believe that these inference problems likely stem from the fact that many of these mutations occur only once in the phylogeny and in the same genetic background as other mutations in the HA protein. The shared phylogenetic ancestry of mutations creates an identifiability problem akin to the problem of collinearity in more standard regression-type models. In either case, the individual effects of highly correlated variables are difficult or impossible to infer. Nevertheless, including the mutational fitness effects predicted by deep mutational scanning experiments improved the fit of the MFBD model to the H3N2 phylogeny by thousands of log likelihood units. Accounting for these fitness effects in the MFBD model also revealed substantial variation in population-level fitness among viral lineages within a single antigenic cluster. Most lineages were reconstructed to have a slightly deleterious mutation load, consistent with earlier reports that background variation in fitness arising from deleterious mutations, not just antigenic mutations, plays a large role in determining which H3N2 lineages ultimately persist (Illingworth and Mustonen, 2012;Luksza and Lässig, 2014;Koelle and Rasmussen, 2015). Moreover, the fitness variation uncovered by our analysis likely represents only the 'tip of the iceberg', since there are likely mutations in other genomic segments besides HA with large fitness effects (Raghwani et al., 2017), which we did not consider.
The influenza analysis highlights some of the inevitable difficulties encountered when inferring mutational fitness effects from phylogenies. Increasing the number of sites under consideration also increases the complexity of the genetic background in which mutations occur due to the increased probability of mutations being linked to other mutations rather than occurring in isolation. This leads to strong correlations between the fitness effects of different sites in an increasingly high dimensional parameter space, making statistical inference challenging, especially using MCMC methods. Spurious correlations may also arise due to additional, unmodeled sources of fitness variation. For example, if a mutation occurs coincidently with another beneficial mutation or the mutation occurs by chance along a lineage spreading through a higher fitness environment, it will likely be inferred to increase fitness even if it is actually neutral. In the future, the MFBD should therefore be extended to account for unmodeled sources of fitness variation. For example, each lineage could be assigned a random fitness effect representing the unmodeled components of fitness variation. These random effects could then be modeled as a continuous trait evolving along lineages, such that more closely related lineages would be expected to have similar fitness and overly large changes between closely related lineages would be penalized. Such a model would then allow us to say whether a fitness effect attributed to a particular mutation could be equally well explained by random effects arising from unmodeled fitness variation. Until such a principled approach is implemented, the fitness effects of individual mutations need to be interpreted carefully unless they occur in multiple genetic backgrounds and confounding sources of fitness variation can be accounted for, as we tried to do for Ebola by including potentially confounding geographic fitness effects.
In spite of these shortcomings, we believe the MFBD model offers a powerful means to explore many questions not previously possible with strictly neutral phylodynamic models. Even if the fitness effects of individual mutations are not identifiable, it may still be possible to infer the distribution of fitness effects across sites, a key determinant of adaptive evolution that has only been explored in a few systems (Eyre-Walker and Keightley, 2007). The MFBD model can also be used to compare the fitness of a mutation or lineage across different environments, such as in different hosts of a pathogen. Finally, the MFBD is not limited to exploring sequence evolution, as the model is generalizable to any discrete character state, including phenotypic, geographic or environmental characters. Thus, more generally, our model can be thought of as a multi-trait, multi-type birth-death model that can be used to explore how different molecular and non-molecular characters interact to shape the overall fitness of lineages in a phylogeny.