The evolutionary dynamics of influenza A virus within and between human hosts

Word Count: Main Text Word Count:


4
The rapid evolution of influenza viruses has led to reduced vaccine efficacy, widespread drug 2 5 resistance, and the continuing emergence of novel strains. Broadly speaking, evolution is the rate of 4x10 -6 mutations per nucleotide per replication cycle and a within host effective 3 3 0 population size of 33 (given a generation time of 6 hours). This is consistent with the estimates 3 3 1 above. As we have recently estimated that 13% of mutations in influenza A virus are neutral 3 3 2 (47), we estimated that the true in vivo mutation rate would be approximately 8 fold higher than 3 3 3 our neutral rate -on the order of 3-4 x 10 -5 . This in vivo mutation rate is close to our recently  We find that seasonal influenza A viruses replicate within and spread among human hosts with  indicate that intrahost populations of influenza virus behave like much smaller populations. We which selective pressures are likely to be particularly strong (e.g. due to drug treatment or  We used both a simple presence-absence model and a more complex beta binomial model to fact that within host diversity was low, and there were very few minority iSNV shared among 3 6 2 individuals in a transmission chain. While our methods for variant calling may be more conservative than those used in similar studies, it is unlikely that our small bottleneck is an 3 6 4 artifact of this stringency. The beta binomial model accounts for false negative iSNV (i.e. variants that are transmitted but not detected in the donor), which can lead to underestimated 3 6 6 transmission bottlenecks (28). Our formulation of this model incorporates empirically determined 3 6 7 sensitivity and specificity metrics to account for both false negative iSNV and false positive iSNV 3 6 8 (34). Furthermore, if rare, undetected, iSNV were shared between linked individuals, we would expect to see transmission of more common iSNV (frequency 5-10%), which we can detect with 3 7 0 high sensitivity. In our dataset, however, the majority of minority iSNV above 5% were not only other study of natural human infection (11,28). While there are significant differences in the were often shared in both household pairs and randomly assigned community pairs (11). Accurately modeling and predicting influenza virus evolution requires a thorough understanding  , it would be possible for the highly pathogenic H5N1 virus to develop the requisite 4-5   3  9  0   mutations to become transmissible through aerosols during a single acute infection of a human   3  9  1 host (50, 52). Although the dynamics of emergent avian influenza and human adapted seasonal 3 9 2 viruses likely differ, our work suggests that fixation of multiple mutations over the course of a 3 9 3 single acute infection is unlikely. While it seems counterintuitive that influenza evolution is dominated by drift on local scales and 3 9 6 positive selection on global scales, these models are not necessarily in conflict. Within individuals we have shown that the effective population is quite small, which suggests that   the case definition; the first collection was a self-or parent-collected nasal swab collected at 4 2 0 illness onset. Subsequently, a combined nasal and throat swab (or nasal swab only in children < 4 2 1 3 years of age) was collected at the onsite research clinic by the study team. Families with very 4 2 2 young children (< 3 years of age) were followed using home visits by a trained medical assistant. Active illness surveillance and sample collection for cases were conducted October through  record. In the current cohort, serum specimens were also collected twice yearly during fall School, and all human subjects provided informed consent.

Identification of influenza virus
Respiratory specimens were processed daily to determine laboratory-confirmed influenza  We amplified cDNA corresponding to all 8 genomic segments from 5μl of viral RNA using the continuous function (e.g. (55-60)). Here, we also included the limitations in our sensitivity to 5 3 2 detect rare iSNV by integrating over regions of this probability density that were either below our of Kimura's original work is below. The time dependent derivative of this probability has been defined using the forward Kolmogorov equation and the solution is here adapted from Kimura, 1955 (43).
(1) summing over the first 50 terms. When we added an additional 50 terms (100 in total) we found 5 4 7 no appreciable change in the final log likelihoods. sum of the probability that the variant is lost by generation ‫ݐ‬ (i.e. the other allele is fixed , the probability that it is not detected due to the limit of detection (i.e. ) and the probability the variant is not detected due to low sensitivity for an allele at the second time is then (2) 5 5 9 5 6 0 The first term in equation 2 is adapted from Kimura, 1955 as Where q is defined as above. (Note that this is simply the probability of fixation for a variant at 5 6 4 initial frequency q). As in equation 1 the infinite sum was approximated with a partial sum of 50 terms. The probability of the allele drifting below our limit of detection can be found by integrating python package scipy (61).
Finally, the probability of an iSNV being present at the second time point, but escaping detection, the false negative rate for that range. Here, we assumed the entire range had the same . We also assumed perfect sensitivity above 10%.  The diffusion approximation treats changes in frequency as a continuous process because it can be determined, by applying a maximum likelihood method developed by Williamsom and Slaktin 1999 (44). In this model, the true allele frequencies move between discrete states (i.e. original application, allele counts were used, and sampling error was added to the model as a available from next generation sequencing and estimate sampling error as a normal distribution 5 9 5 with mean equal to the observed frequency and a standard deviation equal to that observed in  As in Williamson and Slatkin 1999, we assume a uniform prior on the initial state. Because we 6 1 1 know that our specificity is near perfect (Supplemental Table 1) and we restrict our analysis to 6 1 2 only polymorphic sites, the probability of any initial state is given by Using the scalar and cumulative properties of matrix multiplication equation 6 reduces to The probability of both alleles being transmitted is given by The log likelihood of ߣ for the data set is given by are missed due to poor sensitivity. Because the beta binomial model is aware of the frequency  This is the probability density that the transmitted allele is found in the recipient at a frequency transmitted variants is then given by   has the same sensitivity as the lower bound. The likelihood of ߣ for iSNV that are not observed in the recipient is then given by summing transmission pair. We then determined the probability of only transmitting one allele ‫ܣ(‬  For the presence/absence model, the probabilities for each possible outcome are given by is given by are defined as in equations 18 and 19 respectively, but with Again, the probability of observing both alleles is 8 The diffusion approximation to the Wright -Fischer model allows us to make predictions on the 8 1 9 allele frequency spectrum of a population given a mutation rate and an effective population size.
The probability of observing a mutation at frequency ‫‬ ௧ given an initial frequency of 0 can be 8 2 1 approximated as in (2) Where ߤ is the mutation rate. In this model mutation increases an allele's frequency from 0 but 8 2 5 after that initial jump, drift is responsible for allowing the mutation to reach it's observed in numerical integration, we assumed that any variant present at less than 0.1% was essentially 8 2 8 at 0%. As in the other within host models, we can account for nonpolymorphic sites by adding the likelihood that no mutation is present present in the reference strain from 2014-2015). The probability of not observing a mutation is given by Where we follow the same convention as in equation 5 for determining the false negative rate. all possible sites in the data set. Annotated computer code for all analyses can be accessed at    Triangles signify mutations that were found in more than one individual in a given season.