Evolution of New Genotype of West Nile Virus in North America

Previous studies of North American isolates of West Nile virus (WNV) during 1999–2005 suggested that the virus had reached genetic homeostasis in North America. However, genomic sequencing of WNV isolates from Harris County, Texas, during 2002–2009 suggests that this is not the case. Three new genetic groups have been identified in Texas since 2005. Spread of the southwestern US genotype (SW/WN03) from the Arizona/Colorado/northern Mexico region to California, Illinois, New Mexico, New York, North Dakota, and the Texas Gulf Coast demonstrates continued evolution of WNV. Thus, WNV continues to evolve in North America, as demonstrated by selection of this new genotype. Continued surveillance of the virus is essential as it continues to evolve in the New World.

W est Nile virus (WNV) is a mosquito-borne fl avivirus belonging to the Japanese encephalitis serogroup and maintained in an enzootic cycle between mosquitoes (primarily Culex spp.) and birds. Mammals such as horses and humans act as dead-end hosts. Most human infections are asymptomatic; West Nile fever develops in ≈20% of infected patients and neuroinvasive disease develops in <1% (1).
WNV was fi rst isolated in Uganda in 1937 and was generally associated with sporadic outbreaks of mild, febrile illness until the 1990s, when several epidemics of neuroinvasive disease were reported in northern Africa, eastern Europe, and Russia (2)(3)(4). In 1999, WNV was fi rst isolated in North America from human and bird samples during an outbreak of encephalitic disease in New York.
After this outbreak, WNV rapidly spread across the United States north to Canada and south to the Caribbean region, Mexico, and Central and South America.
By 2002, the original WNV genotype isolated in New York, known as NY99, was displaced by a new genotype, designated the North American (NA) or WN02 genotype (hereafter termed NA/WN02 genotype) (5,6). This genotype is characterized by 13 conserved nt changes, 1 of which results in an amino acid substitution, V159A, in the envelope (E) protein. The NA/WN02 genotype is believed to have become dominant in North America because of its ability to more effi ciently disseminate in mosquitoes than the original NY99 virus genotype (6)(7)(8).
Beasley et al. (9) fi rst identifi ed the NA/WN02 genotype in Texas in 2002, and further studies showed that this genotype had spread throughout the Upper Texas Gulf Coast and to other regions in the United States (5). Additional studies examined phenotypic changes in WNV isolates from the Upper Texas Gulf Coast region during 2003 and identifi ed co-circulation of small-plaque, temperature-sensitive, mouse-attenuated and large-plaque, non-temperature-sensitive, mouse-virulent strains (10)(11)(12). Subsequent studies of the E gene of viruses isolated through 2006 suggested that since the emergence of the NA/WN02 genotype, WNV in North America is either genetically homeostatic (13) or its growth rate is decreasing (14).
We examined genetic variation in selected WNV strains since 2005 from the Upper Texas Gulf Coast region, in particular, Harris County, Texas, USA (Houston metropolitan area). We report the isolation of genetic variants that demonstrate the continuing evolution of WNV in North America. We also show that the southwestern US genotype fi rst identifi ed in Arizona, Colorado, and northern Mexico in 2003 (termed SW/WN03 genotype) has now spread to the Upper Texas Gulf Coast region.

Virus Isolates
Virus isolates were obtained from the World Reference Center for Emerging Viruses and Arboviruses at the University of Texas Medical Branch (UTMB) in Galveston, Texas. All new isolates used in this study were originally made from mosquito pools or the brains of naturally infected birds cultured in Vero cells at UTMB. Each isolate was given a second passage in Vero cells to generate a working stock and stored at -80°C.

Reverse Transcription-PCR
Viral RNA was extracted from 140 μL of infected Vero cell supernatant by using the QIAamp Viral RNA Mini Kit (QIAGEN, Valencia, CA, USA) per the manufacturer's directions. Full-genome sequencing was performed by consensus overlapping sequencing of PCR products with primers based on the published sequence of WNV NY-99 fl amingo 382-99 (GenBank accession no. AF196835). Reverse transcription-PCR was performed by using the Titan One Tube RT-PCR Kit (Roche Applied Science, Indianapolis, IN, USA) (primers and PCR conditions are available by request). PCR products were subjected to electrophoresis on 1% agarose gels and purifi ed by using the QIAquick Gel Extraction Kit (QIAGEN).

Sequencing and Analysis
Purifi ed PCR products were sequenced in both directions by using the Protein Chemistry or Molecular Genomics Core Laboratories at UTMB. Sequences were edited and assembled by using ContigExpress in the VectorNTI program suite (Invitrogen, Carlsbad, CA, USA). Full-length coding sequences were aligned with all published full-length North American WNV isolate sequences available in GenBank (as of November 2010) and isolate WNV IS-98 STD by using MUSCLE in Seaview version 4 (15). The fi nal open reading frame (ORF) alignment contained 244 sequences of 10,299 nt (3,433 aa residues). A second alignment was made by using MUSCLE; this aligment contained 33 sequences from the Upper Texas Gulf Coast region. This alignment contained 11,030 nt and contained the entire ORF and portions of the 3′ and 5′ untranslated region (UTR).
Phylogenetic trees were inferred using the neighborjoining (NJ) method in the Phylip package (16) and the maximum-likelihood (ML) method by using PhyML (17). MODELTEST, in conjunction with PAUP, was used to identify generalized time reversible + I + Γ 4 as the best-fi t nucleotide substitution model to be used in phylogenetic analyses (18,19). To assess robustness of the phylogenetic methods used, we used the NJ method and 1,000 bootstrap replicates. The ML method used 100 bootstrap replicates for the entire North American alignment and 1,000 bootstrap replicates for the Upper Texas Gulf Coast region alignment. IS-98 STD was used as outgroup for the entire North American WNV alignment, and NY99 was used as outgroup for the Upper Texas Gulf Coast region alignment.

Recombination Detection and Selection Analysis
Screening for recombination was performed on the fi rst 9,999 nt of the North American WNV ORF alignment by using single-break point analysis on the Datamonkey server (20)(21)(22). This screening verifi ed absence of recombination in sequences before running the selection analyses. The fi rst 9,999 nt were selected because of constraints on sequence length by the programs used.

Viral Isolates
When compared with NY99, the prototype WNV strain for North America, the 17 WNV isolates we analyzed in this study had 38-60 nt (0.35%-0.54%) differences; most changes were synonymous. Nine of the 13 conserved nt changes characteristic of the NA/WN02 genotype were found in all newly sequenced isolates ( Table 2)

Upper Texas Gulf Coast Region Isolates, 2002-2009
WNV was fi rst detected in the Upper Texas Gulf Coast region in 2002 (9). During 2002-2004, isolates from the Upper Texas Gulf Coast region were divided genetically into 3 groups (groups 1-3) (12) and showed 0.30%-0.40% divergence compared with NY99. Isolates from 2005-2009 (groups 4-6; see below for their defi nitions) have signifi cantly greater divergence (0.40%-0.70%; p<9.4 × 10 -9 ) from NY99 (Table 3). When compared with isolates from 2002-2004 (groups 1-3), we found that recent Upper Texas Gulf Coast isolates from 2005-2009 (groups 4-6) have nucleotide divergence rates ranging from 0.50% to 0.80%. Because of the high number of synonymous nucleotide mutations, the deduced amino acid sequences of all isolates exhibited a higher level of conservation;     E  NS2A  NS3  NS4B  NS5  3 UTR  660  1442 2466  3774 4146  4803 6138 6238 6426  6996  7938 9352  10851  NY99 1999    A second phylogenetic analysis was undertaken that used isolates from this study and all published fulllength WNV sequences from North America available on GenBank (Figure 2). NJ and ML methods produced trees with similar topology. Within the larger tree, there were analogous groupings of previously and newly sequenced Harris County isolates, as shown in Figure 1

Selection Pressures
Recombination analysis using single-break point analysis was performed on the fi rst 3,333 codons of the ORF of the North American WNV alignment to rule out recombination before performing selection pressure analysis. As expected, no evidence of recombination was detected.

Discussion
To date, most genetic and phylogenetic studies of WNV have focused on partial genome sequencing, primarily of the E protein gene. Although studies of the E protein gene are helpful in understanding the evolution of WNV in North America, they provide few phylogenetically informative sites; analysis of genomic sequences is more informative (5,(26)(27)(28)(29)(30)(31)(32). Similarly, although many studies have examined the evolutionary dynamics of WNV soon after its introduction into North America, only 1 published study has examined isolates since 2006 (33). To our knowledge, none have been published that examined isolates from 2007 or more recently. For these reasons, we examined evolution of WNV by using genomic sequences from 1999-2009. We focused on the Upper Texas Gulf Coast region because of availability of multiple isolates from the same localities each year since the fi rst detection of the virus in Texas in 2002. These isolates were obtained as part of an ongoing surveillance program of WNV activity in Harris County.
The isolates sequenced in this study demonstrate that the NA/WN02 genotype has been maintained during 2002-2009 in Harris County. All 17 isolates sequenced contained 9 of 13 nt changes associated with the NA/WN02 genotype reported by Davis  had a U at nt position 6426, and two isolates had a C at nt position 9352.
Although isolates sequenced in our study display a high degree of similarity, they have major differences. It appears that >3 genetic groups of isolates were co-circulating in Harris County over the study period. Thus, there is continued genetic diversity of WNV over time, at least in the Upper Texas Gulf Coast region, rather than the genetic homeostasis in North America, which was proposed on the basis of using E gene sequences of viruses isolated through 2005 (13 Our data indicate that this genotype is spreading into new areas. It has been identifi ed in California, Illinois, New Mexico, New York, North Dakota, and Texas since 2003. Two of these clusters, the California cluster and the cluster we have called the SW/WN03 genotype, are further supported by using selection analysis. This analysis has shown that there is potential for positive selection at E-V431I in the California cluster and at both of the amino acid substitutions (NS4A-A85T, NS5-K314R) in the SW/ WN03 genotype. This fi nding further provides evidence of the potential role of this emerging genotype.
Potential roles of single amino acid substitutions within the WNV genome should also be noted. The single amino acid change, E-V159A, which occurred in the NA/WN02 genotype, was shown to decrease the extrinsic incubation period of the virus in mosquitoes, which enabled that genotype to displace the NY99 genotype (6). Brault et al. (34) reported that the NS3-T249P substitution increased virulence in American crows. The NS3-T249P substitution has undergone positive selection but the E-V159A change has not, yet both cause phenotypic changes. We speculate that positive selection of NS4A-A85T and NS5-K314R induces a phenotypic change in WNV.
Previous studies in our laboratory that focused on the E protein gene concluded that WNV is experiencing a genetic stasis or decrease in its growth rate after establishment of the NA/WN02 genotype (13). However, none of these studies have phylogenetically examined the entire genome of WNV. This study of genomic sequences demonstrates evolution of WNV, at least in the Upper Texas Gulf Coast region, and potential emergence of a new genotype in the southwestern United States (SW/WN03 genotype). Further experiments are needed to investigate potential phenotypic changes that occur in conjunction with the noted genotype changes and to determine if the