Influenza A virus evolution and spatio-temporal dynamics in Eurasian wild birds: a phylogenetic and phylogeographical study of whole-genome sequence data

Low pathogenic avian influenza A viruses (IAVs) have a natural host reservoir in wild waterbirds and the potential to spread to other host species. Here, we investigated the evolutionary, spatial and temporal dynamics of avian IAVs in Eurasian wild birds. We used whole-genome sequences collected as part of an intensive long-term Eurasian wild bird surveillance study, and combined this genetic data with temporal and spatial information to explore the virus evolutionary dynamics. Frequent reassortment and co-circulating lineages were observed for all eight genomic RNA segments over time. There was no apparent species-specific effect on the diversity of the avian IAVs. There was a spatial and temporal relationship between the Eurasian sequences and significant viral migration of avian IAVs from West Eurasia towards Central Eurasia. The observed viral migration patterns differed between segments. Furthermore, we discuss the challenges faced when analysing these surveillance and sequence data, and the caveats to be borne in mind when drawing conclusions from the apparent results of such analyses.

Low pathogenic avian influenza A viruses (IAVs) have a natural host reservoir in wild waterbirds and the potential to spread to other host species. Here, we investigated the evolutionary, spatial and temporal dynamics of avian IAVs in Eurasian wild birds. We used whole-genome sequences collected as part of an intensive long-term Eurasian wild bird surveillance study, and combined this genetic data with temporal and spatial information to explore the virus evolutionary dynamics. Frequent reassortment and co-circulating lineages were observed for all eight genomic RNA segments over time. There was no apparent species-specific effect on the diversity of the avian IAVs. There was a spatial and temporal relationship between the Eurasian sequences and significant viral migration of avian IAVs from West Eurasia towards Central Eurasia. The observed viral migration patterns differed between segments. Furthermore, we discuss the challenges faced when analysing these surveillance and sequence data, and the caveats to be borne in mind when drawing conclusions from the apparent results of such analyses.

INTRODUCTION
Low pathogenic avian influenza (LPAI) viruses have been isolated from more than 136 species of wild birds, most commonly from ducks, but also from other Anseriformes (geese and swans) and Charadriiformes (mainly gulls, waders and terns) (Alexander, 2000;Olsen et al., 2006;Webster et al., 1992). These bird groups have diverse annual life cycles and many are highly migratory, thereby potentially affecting spatial and temporal dynamics of avian influenza virus (AIV) at different geographical scales. Many species also frequent habitats where there is potential for direct or indirect contact with domestic birds (Webster et al., 1992), primarily ducks and geese, with the concurrent risk of cross-species transmission of AIVs into domestic animals. This incursion of virus from the wild bird reservoir may have several animal and human health implications, including the risk of emergence of highly pathogenic avian influenza (HPAI) viruses and threat to food security. It also provides a means by which AIV might be brought into closer proximity to humans (Newman et al., 2012). For Eurasia, waterbird migration can be broadly divided in five flyways: East Atlantic flyway, Black Sea-Mediterranean flyway, East Africa-West Asia flyway, Central Asia flyway and the East Asia Australian flyway. It should be noted that these flyways are oversimplifications and numerous exceptions exist (Hoyo et al., 2011;Munster et al., 2007;Olsen et al., 2006;van de Kam et al., 2004). Bird migration along the Central Asian flyway was reported to correlate with outbreaks of HPAI H5 and emphasized the need for bird surveillance (Newman et al., 2012). Despite widespread surveillance (Olson et al., 2014), there remain substantial unanswered questions about the spatial, temporal and ecological role of the host populations in defining the genetic structure of AIVs, and in inferring the role wild birds might play in trans-locating AIV from one geographical region to another. Such information is key for considering measures to reduce the risk of pathogen emergence from wildlife host reservoirs.
Previous work on identifying predictors of HPAI virus H5N1 occurrence have shown that human population size, duck density, rice cropping intensity, wild bird migration and poultry trade all contribute to virus prevalence and potential for detection (Takekawa et al., 2010(Takekawa et al., , 2013. Ideally, we would also want to use such spatial risk map approaches to better understand the ecology of LPAI viruses in wild birds, prior to any transmission to domestic birds. The challenges to such analyses are large as there are numerous host species with different ecological dynamics covering broad and far-reaching areas in short time frames and differences in intrinsic reservoir capacities. The prevalence of AIVs in their natural hosts depends on geographical location, seasonality, immune processes and species (Munster et al., 2007;Olsen et al., 2006;Vijaykrishna et al., 2013). The ecological drivers of these prevalence fluctuations and how they affect viral genetic diversity are less well-characterized (Fouchier & Munster, 2009;Latorre-Margalef et al., 2014;van Dijk et al., 2014). Previous studies to investigate patterns in the genetic diversity among wild bird AIVs have focused predominantly on North America, partly because of the existence of larger longitudinal AIV surveillance datasets in wild birds. Studies on North American wild birds documented a high rate of genome reassortment (Dugan et al., 2008), and a significant viral clustering by time and location of sampling (Chen & Holmes, 2009). Other work suggested that ducks in Alberta were representative of the total AIV diversity in North American Anseriformes and, whilst there might be spatial segregation to a particular migratory flyway over short time frames, the long-term persistence of AIV was independent of bird flyways with migration between populations throughout North America (Bahl et al., 2013). Extensive surveillance studies of AIV in ducks and shorebirds in North America have permitted analyses of reassortment rates, selection pressures and patterns of genetic diversity, but until recently there has only been limited whole-genome sequence data available for AIVs in Eurasia, Africa, South America and Oceania. AIVs found in Eurasian wild birds are predominantly genetically distinct from those of wild birds in the Americas (Dugan et al., 2008;Krauss et al., 2004;Obenauer et al., 2006), representing major geographical/continental lineages. Wild bird migratory flyways are different in Eurasia; thus patterns characterized for the Americas could differ substantially from those in Eurasia.
To explore the evolutionary and ecological dynamics of AIV in Eurasian wild birds, we used whole-genome sequences of AIVs isolated from several Anseriformes species sampled in West Eurasia along the East Atlantic flyway as part of an intensive wild bird surveillance study. These full-genome sequences were combined with genetic data of AIVs isolated throughout Eurasia. This large-scale study describes gene reassortment and viral migration within Eurasia in the light of wild bird migration and supports new directions in wild bird AIV surveillance.

RESULTS AND DISCUSSION
To study the spatio-temporal dynamics of AIVs in wild birds in Eurasia, more than 100 virus isolates collected from 1999 to 2007 were selected for full genomic sequencing of the coding regions. These virus isolates represented a diverse range of wild bird hosts, and included different subtypes and sampling locations predominantly within West Eurasia (Table 1). In addition, AIV full-genome sequences spanning NA1-NA9 and HA1-HA12 were retrieved from GenBank (Table S1, available in the online Supplementary Material). To focus on evolution of LPAI viruses in wild birds, we excluded all sequences from domestic birds and all sequences related to poultry outbreaks, particularly HPAI H5N1, H7 and H9.
Although AIVs have been isolated from more than 136 species of birds, the role of each of these species in maintaining virus diversity and virus spread is unclear. Differences in AIV prevalence and in prevalence of haemagglutinin (HA) subtypes and HA/neuraminidase (NA) subtype combinations have been observed among wild bird species Munster et al., 2007). However, it is possible that for specific host species certain AIV subtypes are endemic, allowing for genetic evolution and diversification of the virus, whereas in other host species 1 (1) *The number of newly submitted sequences is given within parentheses.

N. S. Lewis and others
this AIV subtype is more likely to be a transient pathogen and does not become established. Here, the role of host species on influenza virus diversity was investigated using maximumlikelihood (ML) trees coloured by the bird species group from which the virus was sampled ( Fig. 1; also see Fig. S1 for ML trees of all segments with strain names and Table 1 for host categories). Overall, no clear species-specific patterns could be identified. The observed genetic diversity did not seem to originate from a particular host nor were there genetic lineages limited to a single species. Nevertheless, our sequence dataset was biased with respect to bird species as the majority of AIVs included in our study were isolated from dabbling ducks (Table 1). Dabbling ducks more frequently harbour AIVs and therefore they are a 'target species group' for surveillance (Olsen et al., 2006). Due to the over-representation of dabbling ducks, we cannot exclude that the lack of species-specific patterns in the tree topology is an artefact. Most of the Eurasian shorebird sequences appeared to cluster together in the ML tree of the HA gene, suggesting a species-specific niche wider than the H13 and H16 niche, which has been reported previously for gulls in Eurasia and North America (Hinshaw et al., 1982;Krauss et al., 2004;Lewis et al., 2013) (Fig. S1). It should be noted that in our dataset most shorebird sequences were sampled in Oceania and were much older compared with the other Eurasian AIV sequences. Thus, our findings suggest that there is no strong species effect associated with virus diversity, similar to the results described previously for North American AIV (Chen & Holmes, 2009 To investigate how the genetic diversity partitioned according to geographical location, ML trees were coloured by four discrete regions; West (i.e. West Eurasia), East (i.e. East Eurasia), Central (i.e. Central Eurasia) and Oceania (Fig. S2). These four geographical regions also approximate migratory flyways: West Eurasia lies within the East Atlantic flyway, Central Eurasia lies within the Black Sea-Mediterranean and Central Asian flyway, and East Eurasia and Oceania represent the East Asian-Australasian flyways. Despite overlap in migratory flyways among these four regions, viruses sampled from one geographical region and from a particular time period were most closely related to other viruses sampled from the same geographical region and could be related to different migration patterns. To further investigate the spatial and temporal processes, BEAST was used to infer Bayesian phylogenetic trees in which all viruses were assigned to the four discrete regions (Figs. 2 and S3).
The Bayesian analysis revealed that for all internal segments, except for NS, the most recent common ancestors (MRCAs) containing these segments circulated *72-108 years ago (Table 2, Fig. 2; see Fig. S3 for Bayesian trees of all segments with strain names). This recent ancestry is 2000 1990 1980 1970 1960 1950 1940 1930 1940 1950 1960 1970 1980 1990 2000  suggestive of hemispheric sweeps of all genetic diversity in fairly recent times, as suggested previously by others (Worobey et al., 2014). The genetic diversity for the HA, NA and NS gene segments was maintained, corresponding with MRCAs much older than those of the other gene segments (PB2, PB1, PA, NP and M). However, the genetic diversity within each HA and NA subtype and NS allele was similar to that of the internal segments. For HA and NA, it was proposed that immunity in previously exposed bird populations allows the maintenance of multiple subtypes (Worobey et al., 2014). It has also been described that NS alleles A and B differentially suppress innate immune responses (Munir et al., 2011), perhaps allowing for maintenance of both alleles. Despite generally short times to the MRCA for the internal segments, multiple lineages co-circulated within the same years at the same locations. In our dataset, there was a high sample density of West Eurasian AIVs isolated between 2002 and 2009. However, despite this high sampling density, the genetic diversity found in West Eurasia did not completely represent the genetic diversity of AIVs throughout Eurasia during that time period. For example, for PB1 there is a lineage containing AIVs isolated from East and Central Eurasia and Oceania of which the common ancestor to the most closely related AIV from West Eurasia circulated w20 years ago. Despite probable host population and ecological differences between Eurasia and North America (Munster et al., 2007), we found similar nucleotide substitution rates for Eurasian AIV strains compared with previous studies including both North American and Eurasian AIV sequences (Chen & Holmes, 2010).
To test if more closely related viruses were more likely to share the same location than expected by chance alone (Parker et al., 2008), Bayesian trees were analysed for evidence of taxa association by location of sampling using the Bayesian Tip-significance (BaTS) package. When the entire period of sampling was analysed, we found strong clustering by location for all regions and all gene segments (data not shown). Although this indicated that there was a spatial component to the dataset and regional maintenance of a particular clade, it could also have been due to a bias in sampling during a particular time period. To reduce this bias, the same analysis was performed using time periods of 5 years (Table 3). Despite the shorter time period, there was significant clustering of viruses isolated from the same location. Ideally, 1 year would be most relevant to the annual life cycle -and thus annual migration -of the host;  Significant clustering of sequences from the four geographical regions was investigated by coding the regional location from which the virus was sampled onto the tips (taxa) of 900 posterior sampled trees, generating 100 null distributions, and testing the significance of the observed data. P#0.05 indicates significant geographical clustering, whilst P.0.05 indicates that traits were randomly distributed across the phylogeny. Significant values are given in italics. Only datasets with at least three sequences were included. this dataset comprised insufficient data for statistical power to analyse just 1 year. This illustrates one of the confounders with these data. When one attempts to reduce potential sampling bias or inconsistent sampling effort throughout the region, and capture diversity on a timescale that is relevant to the host species, one likely reduces statistical power. See Table 1 for further details on sampling by species, time and sampling site.
The Bayesian analysis was also used for ancestral state reconstruction of geographical location and to estimate the rate of viral migration among the geographical regions (Fig. 3, Table S2) (Lemey et al., 2009). The highest rates of viral migration were observed from Eastern Eurasia to Oceania for PB2, Western to Eastern Eurasia for PB1, HA, NP, NA and M, and Eastern to Western Eurasia for PA and NS. Such a lack of consistent and directional spatial pattern among gene segments was also observed for North American strains (Bahl et al., 2013). The inconsistent directionality observed here was likely due to differences in sampling bias and the high rates of reassortment. For HA and NA, the different subtypes are highly divergent and ancestral state reconstruction will also include modelling on the long inter-subtype branches potentially influencing the results. It should also be noted that due to sparse sampling before 1999, migration events inferred for older viruses were much more susceptible to sample bias.
We assessed reassortment by rooting the Bayesian maximum clade credibility (MCC) nucleotide substitution trees by older Australian strains and making tanglegrams (Fig. 4). Tanglegrams enable visualization of the locations of particular taxa within the PB2 tree and each of the trees of the other segments. In the absence of reassortment, the taxa should have a nearly horizontal linkage. The tanglegram patterns indicate that there was extensive reassortment, but without completely distorting clustering between sequences of the same geographical region. Viruses of a particular subtype do not necessarily have the same genetic makeup, even for a particular species, location or year. For NS, we observed co-circulation of the A and B alleles, and similar to HA and NA, these two alleles were not associated with separate lineages for other segments. Differences in reassortment rates between the internal segments of AIVs belonging to different subtypes have been reported for Eurasian AIVs (Lu et al., 2014). In particular, internal segments belonging to subtypes H1-H4 reassort with a lower rate compared with H5 and H9. It should be noted that this dataset included poultry AIV and poultryoutbreak-related AIV sequences, likely influencing reassortment rates (Lu et al., 2014).
Overall genetic diversity of AIVs in Eurasian Anseriformes can be captured by the genetic diversity found in dabbling ducks. AIVs isolated from dabbling ducks in Alberta are a good representation of the genetic diversity of AIVs circulating in North America (Bahl et al., 2013). In contrast, AIVs from West Eurasia, East Eurasia, Central Eurasia or Oceania do not represent the genetic diversity of the whole of Eurasia well. The genetic diversity of AIVs is shaped by many factors such as immunogenicity of the host, reassortment, migration patterns and life span of the hosts as well as virus durability in aquatic environments (Roche et al., 2014). The influence of heterosubtypic immunity is seen on the prevalence of both HA groups and on the level of HA clades in recaptured wild ducks (Latorre-Margalef et al., 2013). The incidence and prevalence of AIVs shows clear seasonal patterns due to host-pathogen interactions. The influx of immunologically naïve juveniles in summer and the arrival of susceptible migrants in autumn as well as moult aggregations are also likely drivers of AIV infection dynamics in temperate Eurasian latitudes (Jehl, 1990;van Dijk et al., 2014). Whether these disease dynamics patterns can be generalized over multiple subpopulations in different latitudes within Eurasia remains to be investigated. In some North American flyways, resident birds can also act as reservoirs of virus diversity and although migratory birds introduce AIV in these wild bird populations, these viruses do not necessarily become the predominantly circulating viruses . Whilst this might be true at sites in Eurasia where resident and migratory bird populations overlap, in many areas there is likely less opportunity for resident maintenance. Therefore, virus diversity is more likely driven by migration.
Here, we map the long-term spatial-temporal dynamics of the whole-genome of AIV in Eurasia. Despite in-depth wild bird surveillance in Eurasia, it is clear from this study that to assess the implication of migration patterns on the genetic diversity of AIV in Eurasia future whole-genome sequencing should be directed towards increased numbers of samples within a short time frame in locations along the different flyways. Such high-resolution studies have been performed in North America and West Eurasia, and are currently being actively pursued in the rest of Eurasia.
Incorporating of metadata such as host species, location and date of sampling, age, sex, and migratory status will illuminate future host-focused studies by including the impact of ecological factors like individual species diversity and life cycle on AIV genetic diversity.

METHODS
Dataset and genomic sequencing. Over a period of 15 years, 186 054 samples from 440 different bird species were analysed for the presence of AIVs. Positive isolates were subtyped and sequenced.
In collaboration with the National Institutes of Health and the J. Craig Venter Institute, *83 full or nearly full genomes and 30 partial genomes of AIVs have been submitted to GenBank.
The coding complete genomes of the influenza viruses were sequenced using a high-throughput next-generation sequencing pipeline at the J.  We show only the tree comparison for each segment with PB2 as the reference topology, but reassortment patterns were similar when other gene segments were used as the reference.
ML analysis. Phylogenetic trees for each segment were reconstructed with PhyML version 3.0 (Guindon & Gascuel, 2003), using the general time reversible (GTR) nucleotide substitution model with a proportion of invariant sites and a C distribution of among-site rate variation, all estimated from the data (determined by ModelTest as the appropriate nucleotide substitution model). GARLI version 0.96 (Zwickl, 2006) was run on the best tree from PhyML for 1 million generations to optimize tree topology and branch lengths.
Temporal phylogeny and relative genetic diversity. To identify potential errors in sequence data annotation that might have affected the clock estimation, we used the reconstructed ML nucleotide trees in Path-O-Gen version 1.3 (http://tree.bio.ed.ac.uk/software/pathogen) to generate linear regression plots of the years of sampling versus root-to-tip distance. We did not observe any anomalies in the eight segment datasets, which all exhibited a clock-like behaviour (Drummond et al., 2003).
We estimated rates of evolutionary change (nucleotide substitutions per site per year) and times of circulation of the MRCA (years) with BEAST version 1.7.3 using time-stamped sequence data with a relaxedclock Bayesian Markov chain Monte Carlo (MCMC) method (Drummond & Rambaut, 2007;Drummond et al., 2005Drummond et al., , 2006. For all analyses, the uncorrelated log-normal relaxed molecular clock and a C site heterogeneity model with four C categories was used in combination with the GTR nucleotide substitution model. A normal rate prior with a mean of 0.0033 substitutions per site per year (SD50.0016) was used (Bahl et al., 2011). These analyses were conducted with a Bayesian Skyline coalescent model, a random starting tree and a constant rate of migration. We performed at least three independent analyses of at least 100 million MCMC chains to ensure convergence and combined these analyses after removal of the burnin of 10% using LogCombiner version 1.7.3. Finally, the MCMC chains were summarized to reconstruct the MCC trees using TreeAnnotator version 1.7.3. Trees were visualized and coloured with the FigTree program version 1.4.0 (http://tree.bio.ed.ac.uk/software/ figtree/).
Phylogeography. We grouped our country-level dataset into West Eurasia, Central Eurasia, East Eurasia and Oceania because of insufficient sampling density to reconstruct exact sampling location of ancestral viruses. Discrete state ancestral reconstruction of viral sampling locations and migration rates between geographical regions were estimated with an asymmetrical state transition model. Given the large number of states, a Bayesian stochastic search variable selection (BSSVS) was employed to reduce the number of parameters to those with significantly non-zero transition rates (Lemey et al., 2009). From the BSSVS results, a Bayes factor (BF) test could be applied to assess the support for individual transitions between discrete geographical states. The BF was deemed statistically significant if the combined independent analyses resulted in a binary indicator w0.5 and a BFw6. Therefore, our minimal critical cut-offs for statistical support were 8jBFj100 indicating support and BFw100 indicating strong support (Bahl et al., 2011;Lemey et al., 2009).
The migration routes that had the strongest support as indicated by the highest BF (Lemey et al., 2009) were determined using SPREAD (Bielejec et al., 2011). In addition, significant clustering of sequences from the four geographical regions was investigated by coding the regional location from which the virus was sampled onto the tips of 900 posterior sampled trees, generating 100 null distributions, and testing the significance of the observed data using BaTS package (Parker et al., 2008). Pv0.05 indicated significant clustering, whilst Pw0.05 indicated that traits were randomly distributed across the phylogeny.
Reassortment analyses. To visualize similarities and differences between the phylogenies, and investigate reassortment, tanglegrams were generated using the nucleotide substitution MCC trees generated by BEAST and TreeMap version 1.0 (http://taxonomy.zoology.gla.ac. uk/rod/treemap.html). These tanglegrams consisted of two rooted phylogenetic trees of which taxa that corresponded to each other in the two trees were connected. In the absence of reassortment, one would expect to see nearly horizontal linkage connecting one taxa to another.