Avian influenza at both ends of a migratory flyway: characterizing viral genomic diversity to optimize surveillance plans for North America

Although continental populations of avian influenza viruses are genetically distinct, transcontinental reassortment in low pathogenic avian influenza (LPAI) viruses has been detected in migratory birds. Thus, genomic analyses of LPAI viruses could serve as an approach to prioritize species and regions targeted by North American surveillance activities for foreign origin highly pathogenic avian influenza (HPAI). To assess the applicability of this approach, we conducted a phylogenetic and population genetic analysis of 68 viral genomes isolated from the northern pintail (Anas acuta) at opposite ends of the Pacific migratory flyway in North America. We found limited evidence for Asian LPAI lineages on wintering areas used by northern pintails in California in contrast to a higher frequency on breeding locales of Alaska. Our results indicate that the number of Asian LPAI lineages observed in Alaskan northern pintails, and the nucleotide composition of LPAI lineages, is not maintained through fall migration. Accordingly, our data indicate that surveillance of Pacific Flyway northern pintails to detect foreign avian influenza viruses would be most effective in Alaska. North American surveillance plans could be optimized through an analysis of LPAI genomics from species that demonstrate evolutionary linkages with European or Asian lineages and in regions that have overlapping migratory flyways with areas of HPAI outbreaks.


Introduction
Current surveillance programs for detection of highly pathogenic avian influenza (HPAI) viruses focus on monitoring avian populations for mortality events and the sampling of live wild and domestic birds (Munster et al. 2007;Komar and Olsen 2008). Viruses isolated from avian samples can then be characterized genetically to determine their subtype and pathogenicity. Previous molecular surveys have revealed substantial levels of sequence divergence among continental populations of low pathogenic avian influenza (LPAI) viruses in Europe, Asia, and North America (Ito et al. 1995;Widjaja et al. 2004) and, although less studied, South America (Pereda et al. 2008). This broad-scale genetic differentiation among continents yields a methodology based on phylogenetic assignment to identify intercontinental virus exchange Widjaja et al. 2004) and also indicates that intercontinental dispersal of LPAI viruses is relatively rare (Webster et al. 2007).
However, there is growing evidence of genetic exchange between North American and Eurasian strains of LPAI avian influenza via reassortment in northern pintails (Anas acuta) and dunlin (Calidris alpina) in Alaska (Koehler et al. 2008;Wahlgren et al. 2008), ruddy turnstones (Arenaria interpres) and herring gulls (Larus argentatus) along the Atlantic Coast of North America (Marakova et al. 1999), in mallards (Anas platyrhynchos) in Minnesota (Jackwood and Stallnecht 2007) and waterfowl in Alberta, Canada ). Furthermore, intercontinental genetic exchange appears to be bi-directional. RNA segments from North American avian influenza viruses have been observed in guillemots (Uria aalge) in Europe (Wallensten et al. 2005) and waterfowl in Asia Liu et al. 2004), North American RNA segments were found in a South American influenza virus isolated from cinnamon teal (Anas cyanoptera; Spackman et al. 2007), and some Asian lineages of the N8 RNA segment were more similar to North American virus isolates from northern pintails (Anas acuta) than to other Asian reference samples (Koehler et al. 2008).
On the basis of these studies, it appears that the relative frequency of occurrence of Asian origin RNA segments in North American LPAI isolates could be used to identify species and regions where the relative risk of introduction of a foreign HPAI virus is highest. For example, nearly half of avian influenza viruses isolated from northern pintails in Alaska contained at least one Asian RNA segment (Koehler et al. 2008). Conversely, Krauss et al. (2007) found little evidence of Eurasian origin RNA in Alberta, Canada, and Delaware Bay (New Jersey), United States, and concluded that foreign virus introduction was likely to be rare. The higher frequency of Asian lineage RNA segments in virus isolates from northern pintails probably results from the proximity of Alaska to Asian sources of LPAI viruses and the movement of this species between Asia and Alaska.
Genomic analyses of LPAI viruses could serve as a powerful approach to evaluate whether North American surveillance for HPAI has targeted bird species and regions most likely to exhibit transcontinental virus exchange. To assess the applicability of such an approach, we conducted a phylogenetic and population genetic analysis of LPAI viral genomes isolated from the northern pintail at opposite ends of a migratory flyway in North America (Fig. 1). The connectivity of the Pacific migratory flyway between Alaska and California is well documented for the northern pintail through banding and satellite telemetry data (Bellrose 1980;Miller et al. 2005;Nicolai et al. 2005). We used genomic sequencing and phylogenetic analysis of northern pintail LPAI viruses collected from wintering areas in California to determine the frequency of Asian lineage viruses present and to examine the similarity of RNA segments between Alaska and California isolates.

Materials and methods
Sampling, virus isolation and sequencing A total of 3045 samples, collected from live and hunterkilled wild northern pintail ducks, was subjected to virus isolation in embryonated eggs as previously described . Samples were collected during fall and winter months of 2006 and 2007 from across California in Siskiyou, Butte, Fresno, Kern, and Imperial counties (Fig. 1). A total of 30 LPAI viruses was isolated from California samples and compared to 38 viruses isolated from samples collected across Alaska (Fig. 1) by Koehler et al. (2008).
Viral RNA was extracted from allantoic fluid with the MagMAX AI/NDV RNA extraction kit (Ambion Inc., Austin, TX, USA). All eight RNA segments were amplified with the one-step RT PCR kit (Qiagen, Inc., Valencia, CA, USA) using a combination of previously published primers (Zou 1997;Hoffmann et al. 2001;Phipps et al. 2004;Bragstad et al. 2005;Chan et al. 2006;Obenauer et al. 2006;Li et al. 2007;Koehler et al. 2008) or primers specifically designed for this study which are available from the authors upon request. PCR products were gel purified and extracted using the QIAquick gel extraction kit (Qiagen, Inc.) or treated with ExoSap-IT (USB Inc., Cleveland, OH, USA) without additional purification before sequencing. Cycle sequencing was performed with identical primers used for PCR along with BigDye Terminator version 3.1 mix (Applied Biosystems, Foster City, CA, USA). Samples were analyzed on an Applied Biosystems 3730xl automated DNA sequencer (Applied Biosystems).

Phylogenetic relationships
To address our first objective of examining the placement of all RNA segments from each California isolate into either Asian or North American clades, we compared sequences obtained from California isolates to reference sample sequences (see Table S1) from the GenBank database at the National Center for Biotechnology Information (Bao et al. 2008). Five reference samples isolated from dabbling ducks from across the United States were selected for each RNA segment to represent North American lineages. A total of 4-35 reference samples isolated from water birds from Southeast Asia (China, Japan, and South Korea) was selected for each RNA segment to represent Asian lineages. All sequences were aligned using Sequencher version 4.7.
We used paup* version 4.0b (Swofford 2003) to generate neighbor-joining trees using the branch-and-bound method with 10 000 bootstrap replicates. We incorporated the best approximating model of nucleotide evolution as determined by Akaike's Information Criterion in modeltest version 3.06 (Posada and Crandall 1998) for each RNA segment analysis. Estimates of the proportion of invariable sites (I) and the gamma distribution parameter (G), calculated in modeltest, were also incorporated. For nearly all genes, the best approximating model was the General Time Reversible model (Lanave et al. 1984) plus I and G. For N6 and H7, better fit to the data was achieved with the Kimura (1981) 3-parameter model plus I. For Bayesian analysis, we used MrBayes version 3.1.2 (Ronquist and Huelsenbeck 2003) to construct posterior probabilities of clade support. Each analysis was run for 1 · 10 6 generations or until the average standard deviation of split frequencies was £1.00. We also verified that the potential scale reduction factor was 1.00 as another indicator of convergence (see Ronquist and Huelsenbeck 2003). Average posterior probabilities of the 50% majority rule consensus tree topologies were estimated by sampling likelihood parameters every 100 generations. Trees were visualized with TreeView (Page 1996). Following the construction of phylograms, we determined that a viral reassortment event had occurred between Asian and North American viruses when a lineage from a northern pintail in North America was most closely related to Asian reference sequences.

Genetic diversity
To characterize genetic diversity and differentiation of North American lineages of LPAI viruses between Alaska and California northern pintails, we first collapsed all sequences into individual alleles or haplotypes and computed their frequencies in each sampling area. We focused these analyses on the six non-surface glycoprotein RNA segments (M, NP, NS, PA, PB1, and PB2) as not all HA and NA subtypes were represented in both locations. We tabulated general measures of diversity, including percentage of unique sequences, sequence diversity (h), and the mean number of pairwise differences (p) within each sampling area using arlequin version 3.0.1 (Excoffier et al. 2005). We evaluated the level of population differentiation between Alaska and California by computing the effective number of distinct subpopulations, D ST , and the true level of differentiation, D (Jost 2008). Traditional measures of population differentiation, such as F-statistics, typically demonstrate incorrectly low levels when comparing groups composed of numerous rare alleles that occur when using highly polymorphic loci (Jost 2008) such as virus sequence data. However, because estimators such as D ST may fail to reveal other patterns in allelic data, we included calculations of F ST after incorporating a model of nucleotide evolution (F ST ) from arlequin for comparative purposes. In these assessments of population diversity and differentiation we restricted our analysis to North American lineages and excluded those lineages from Alaska identified by Koehler et al. (2008) which are related to Asian clades. To visually examine the diversity Figure 1 Breeding (green) and wintering (blue) distributions of the northern pintail. General migration routes (arrows) are shown for northern pintails that winter in Eastern Asia and Western North America based on banding and satellite telemetry studies (Bellrose 1980;Miyabayashi of North American lineages, we constructed phylograms for the six internal gene segments using the neighborjoining and Bayesian methods described above.

Phylogenetic relationships
We found no evidence of Asian lineage genes within California isolates for the 177 RNA segments of the M, NP, NS, PA, PB1, and PB2 genes. In all cases, sequences from California isolates for these genes clustered more closely with North American reference samples and were distinct and well supported, with neighbor-joining and Bayesian methods, from Asian reference samples (data not shown). Similarly, we observed a clear separation between Asian reference samples and California northern pintail virus sequences for all HA and NA subtypes in Fig. 2, with the exception of four (H6, N8, N1 and N2). In phylogenetic trees for the H6 (Fig. 3) and N8 ( Fig. 4) subtypes, there was evidence for multiple and differentiated clades. In the H6 tree, we observed four groups (Fig. 3), with group 1 composed of a mixture of North American and Asian lineages, including three California and four Alaska northern pintail isolates that were identified as derived from reassortment events (Koehler et al. 2008). Group 3 of the H6 tree contained only Asian reference samples and group 4 was composed of two North American reference samples. Similarly, group 1 lineages of the N8 tree (Fig. 4) contained sequences from North American and Asian isolates, including three isolates from California northern pintails and 15 from Alaska, whereas group 3 contained only Asian lineages. A similar pattern was observed in the N2 tree (not shown), in which all sequences clustered into one large group of North American and Asian lineages, including 10 California segments that were distinct from an outgroup of swine sequences. Thus, a total of 16 (6.6%) California segments were observed in mixed lineage clades, containing sequences from Asia and North America, for the H6, N8, and N2 genes. Lastly, we observed three Asian reference samples (AJ410565/A/ duck/Hong Kong, AB292405/A/duck/Hong Kong and AB298280/A/duck/Hokkaido) within the North American clade of lineages for the N1 gene (not shown) that included a California isolate from this study (FJ520158/A/ pintail/California).

Genetic diversity
In comparing genetic diversity of LPAI isolates between Alaska and California, we observed a substantial number and diversity of RNA segments for the M, NP, NS, PA, PB1, and PB2 genes (Table 1). These six RNA segments exhibited a great deal of sequence diversity and no homologous sequences were observed between Alaska and California (Table 1). Sequence diversity (h) was similar between areas, but pairwise differences (p) among sequences within each area were not consistent. However, the indices for these six RNA segments are underestimated for Alaska as Asian reassortment events identified by Koehler et al. (2008) were excluded from diversity summaries.
The lack of shared LPAI virus sequences for the six internal genes between Alaska and California yielded maximum values for both the number of effective subpopulations (D ST = 2.0) and true differentiation (D = 1.0) as defined in Jost (2008). However, these metrics are potentially misleading because they do not account for the small number of mutations that occur among some lineages of each RNA segment. Although the mean number of pairwise differences among sequences for each of the six RNA segments analyzed was large (Table 1), the range of those differences also included very low values (£4).  Phylograms of the six non-surface glycoprotein RNA segments demonstrate the similar levels of genetic diversity within Alaska and California virus isolates (Fig. 5). As a result, levels of population differentiation using traditional F-statistics (F ST ) were low to moderate with an estimated 1.5-15% of the total genetic variation occurring between Alaska and California ( Table 1). The greatest level of differentiation was observed for the PA segment (F ST = 0.156) and a phylogram of this segment revealed a greater degree of region-specific clustering of lineages than other RNA segments (Fig. 5). One clade of the PA segment was composed entirely of Alaskan isolates, as well as  (93) 1.0 (100) 1.0 (100) Figure 4 Phylogenetic relationship of sequences from the N8 subtype. North American isolates of northern pintails from Alaska (Koehler et al. 2008) and California (this study) are indicated with black circles. North American reference samples taken from GenBank are shown with a black triangle. Four equine reference samples serve as an outgroup. Bayesian posterior probabilities and levels of neighbor-joining bootstrap support >70% between major clades are shown on branches. Numbered vertical bars delineate major clades.
three Asian reference samples (Fig. 6), and similar lineages were not observed among California wintering samples. Moderate levels of F ST in the NS and PB2 RNA segments were also associated with small clusters of Alaska-and California-specific lineages (Fig. 5).

Host migration ecology and viral population genomics
To our knowledge, this is the first survey of LPAI variation in a single species at opposite ends of a migratory flyway. The northern pintail is known to migrate between North America and Eurasia (Miller et al. 2005;Nicolai et al. 2005) and LPAI viruses isolated from northern pintails in Alaska, an area where migratory birds from both North American and Asian flyways intersect (Fig. 1), contain a higher frequency of Asian lineage genes than observed elsewhere in North America (Koehler et al. 2008). Genetic evidence suggests that even though migratory connectivity between Alaska and California is well documented for northern pintails (Bellrose 1980;Miller et al. 2005;Nicolai et al. 2005), the number of Asian LPAI lineages and exact nucleotide composition of North American LPAI segments is not maintained through fall migration. Thus, LPAI genetic data indicate that Asian lineage viruses have been introduced to northern pintails in Alaska, but Asian-related lineages were difficult to detect at greater distances from Alaska along an established migratory flyway. We speculate that the most likely mechanisms that contribute to these findings are the rapid nucleotide mutation rate and reassortment among RNA segments that are common features of influenza A viruses (Macken et al. 2006;Dugan et al. 2008). As a result, the genomic characteristics of viruses isolated from birds in Alaska and California appear similarly diverse and differ by only a small number of mutations. In a study with wild mallards (Anas platyrhynchos), (Latorre-Margalef et al. 2008) reported that the mean value for the maximum duration of avian influenza virus infection was 8.3 days. Therefore, birds likely acquire entirely new LPAI infections during the fall migratory period and once they arrive on wintering grounds. However, Latorre-Margalef et al. (2008) also noted that the temporal range of virus shedding was large (2-34 days). We observed some region-specific lineages in Alaska (NS and PA segments) and California (PB2), suggesting that population structures of host and viruses may be similar on breeding and wintering areas. That is, the sample of LPAI virus isolates from Alaska may come from northern pintails that winter in both Asia and North America. Conversely, the sample of LPAI isolates from California may also come from northern pintails that breed in Alaska and other locales. However, we acknowledge that independent data from satellite telemetry and banding studies are needed to assess this possibility, especially since not all virus RNA segments showed similar evidence for population genetic differentiation, as did the NS, PA, and PB2 segments. The mechanisms of mutation and resulting differences in population structure are not, however, as applicable to our results for the HA and NA RNA subtypes. Some evidence for transcontinental exchange was noted among four segments of HA and NA, but such polymorphism is characteristic of these segments (Clark and Hall 2006) and likely results from selection to adapt to a wide variety of host species (Dugan et al. 2008).

Frequency of Asian lineages and LPAI subtype diversity
We observed no Asian lineages among the 177 sequences of the M, NP, NS, PA, PB1, and PB2 genes isolated from viruses of northern pintails wintering in California in contrast to 12 (5.4%) Asian lineages among the 222 segments for these same RNA segments in LPAI viruses isolated from northern pintails in Alaska (Koehler et al. 2008). Additionally, 11 of the 14 HA and NA subtype segments showed no evidence for Asian lineages among California samples as North American and Asian sequences were well differentiated with high support. Four surface glycoprotein subtypes in California isolates (H6, N8, N1, and N2), representing 16 (26%) segments, yielded phylogenetic trees that suggest reassortment between continental gene pools of avian influenza viruses, followed by persistence and evolution of these mixed lineage clades in Asia and North America. The tree for the HA H6 yields a complex pattern in which North American and Asian lineages do not cluster as distinct clades. Similarly, mixed lineage topologies have been observed previously for the H6 subtype (see Webby et al. 2003;Spackman et al. 2005;Dugan et al. 2008). For the H6 tree, we included a slightly different suite of North American and Asian reference samples than either Spackman et al. (2005) or Koehler et al. (2008). Thus, one possible conclusion is that the northern pintail isolates from Alaska (n = 4) and California (n = 3) can be considered outsider events, along with at least three North American reference samples. However, these results may also demonstrate the persistence of Asian-like lineages in North American water birds following initial introduction events.  Figure 5 Phylograms depicting genetic diversity among sequences from the six non-surface glycoprotein RNA segments isolated from northern pintails in Alaska (black circles) and California (white circles). Unlabelled roots of each tree are equine reference samples included as outgroups. Bayesian posterior probabilities and levels of neighbor-joining bootstrap support >70% between major clades are shown on branches for three segments (NS, PA, and PB2) that exhibit patterns of Alaska-or California-specific lineages (see text).
The tree for the N8 subtype is similar to that of Koehler et al. (2008), in which most Asian reference samples cluster as a single group separate from a clade that is a mixture of North American (including Alaska and California samples) and Asian lineages. The NA N2 segment tree is similarly a heterogeneous composition of North American and Asian lineages, with all California samples appearing as derived from Asian and North American reference samples (not shown). These results did not change following the inclusion of two completely Asian genotype LPAI viruses of the N2 subtype (A/Chicken/Nanchang/ 4-301/2001 and A/Wild Duck/Nanchang/2-0480/2000; Obenauer et al. 2006). Thus, this phylogram does not appear to result from the use of Asian reference samples that may have arisen from reassortment events. These results may also arise from the persistence of Asian-like lineages in North American water birds following initial introduction events or vice versa (e.g., our N1 results) and suggest selective pressures for maintaining genetic diversity in the NA surface glycoprotein to evade host immunity (Dugan et al. 2008).
In addition to sequence diversity, we also observed substantial variation in the subtype distribution of the transmembrane glycoproteins HA and NA between northern pintail isolates from Alaska and California. These frequency distributions also differ substantially from those observed among northern pintail LPAI viruses collected and isolated in Japan during the same time period as in our analysis (Jahangir et al. 2008). Categorizing virus isolates by subtype is imperative for surveillance programs to identify possible pathogenic strains, such as H2, H5, and H7, but may be of limited use for assessments of reassortment and virus exchange among continents due to the lack of complete geographic separation of these lineages (e.g. , Figs 3 and 4). Indeed, the HA and NA segments were excluded from a recent methodological review of how to best determine rates of reassortment and virus turnover (Macken et al. 2006).

North American LPAI genomic diversity
Low pathogenic avian influenza genetic diversity was high and the lack of shared sequences between Alaska and California suggests initially that they could be considered as differentiated subpopulations of LPAI viruses. Such results are expected for a sample drawn from a large, diverse population characterized with a highly polymorphic molecular marker (see Jost 2008). However, the Pacific migratory flyway between Alaska and California is well documented for the northern pintail. During 4 years of marking wintering northern pintails in the Central Valley of California, Miller et al. (2005) noted that approximately 50-70% of adult females were present in Alaska during the breeding season. Additionally, 64-85% of northern pintails banded in Alaska during the breeding season were recovered in California winering areas (Bellrose 1980;Nicolai et al. 2005). Thus, both reassortment and rapid nucleotide evolution, which are hallmarks of avian influenza A viruses (Clark and Hall 2006), probably contribute substantially to our findings of some differentiation between breeding and wintering areas. Summarizing LPAI genome data across several additional years of sampling is needed to determine if avian influenza viruses in northern pintails exhibit an evolutionary pattern similar to the migration model proposed by Nelson et al. (2007) for the H3N2 human influenza strain. We observed a small number of distinct phylogenetic clades in the PA, NS, and PB2 segments. Similar patterns for the PA segment were observed in wild ducks in Canada (Hatchette et al. 2004) and Japan (Liu et al. 2004). For the PA and NS segments, clades of mostly Alaskan isolates may suggest an admixture in our sample  Figure 6 Phylogenic relationship of sequences from the PA segment of avian influenza viruses isolated from northern pintails in Alaska (AK; Koehler et al. 2008) and California (CA; this study) in comparison to 36 reference samples from Asia (shaded ovals) and five from North America (NA). Circled numbers indicate the frequency of outsider events in each clade, including the five reassortment events identified by Koehler et al. (2008) for this RNA segment. The tree is rooted using two equine influenza isolates from North America. Bayesian posterior probabilities and levels of neighbor-joining bootstrap support >70% between major clades are shown on branches.  , but further optimization of this list, and timing of sampling, could be based on analysis of LPAI viral genetics as summarized in this study. A list of priority surveillance species will probably differ across regions of North America, but can be validated by a similar genetic analysis. Thus, a molecular prioritization is not only needed for Alaska, but also for species along the North Atlantic coast of North America, especially for those that exhibit initial evidence for transcontinental exchange (e.g., ruddy turnstones and herring gulls; Marakova et al. 1999). With few exceptions, genetic evidence for transcontinental virus exchange in studies of North American LPAI has come from samples collected in coastal regions that are closest to Europe or Asia (i.e., Alaska and North Atlantic). These probably represent the first or primary areas of contact for foreign viruses, yet only 28% of birds tested for HPAI H5N1 in the United States to date have been from Alaska and the North Atlantic coast (Highly Pathogenic Avian Influenza Early Detection Data System 2008). The lower frequency of occurrence of transconti-nental lineages from the areas of the continent that are more distant from these areas, such as California for northern pintails, is probably caused by functional dilution by this distance, virus mutation rates, reassortment and the short temporal nature of virus infections. Thus, based on genetic evidence, we conclude that one strategy would be to target surveillance efforts on species of coastal regions in the areas of geographic proximity to current sources of the highly pathogenic H5N1 virus.

Conclusions
We found limited evidence for the persistence of Asian lineage gene segments from breeding areas to a major wintering ground for the northern pintail, a species that is known to migrate between Asia and North America and that has shown a high frequency of Asian lineages in Alaska (Koehler et al. 2008). Thus, phylogenetic analysis of LPAI genomic variation can be used as an indicator of species and regions that should be targeted for HPAI surveillance. For example, if there is no evidence of transcontinental LPAI exchange for a certain species or regional pathway, then the introduction of HPAI via that route seems unlikely. However, species and or regions that do show genetic evidence of exchange with European or Asian LPAI lineages would be more likely pathways for the introduction of HPAI via wild migratory birds. Because of the pronounced diversity of HA and NA genes, determination of reassortment events with these segments is difficult. Therefore, we suggest that whole genome sequencing of LPAI be used as a methodology to determine levels of connectivity for continental virus populations in migratory birds, thereby informing future surveillance programs.