Genetic Variation among Temporally and Geographically Distinct West Nile Virus Isolates, United States, 2001, 2002

Analysis of partial nucleotide sequences of 22 West Nile virus (WNV) isolates collected during the summer and fall of 2001 and 2002 indicated genetic variation among strains circulating in geographically distinct regions of the United States and continued divergence from isolates collected in the northeastern United States during 1999 and 2000. Sequence analysis of a 2,004-nucleotide region showed that 14 isolates shared two nucleotide mutations and one amino acid substitution when they were compared with the prototype WN-NY99 strain, with 10 of these isolates sharing an additional nucleotide mutation. In comparison, isolates collected from coastal regions of southeast Texas shared the following differences from WN-NY99: five nucleotide mutations and one amino acid substitution. The maximum nucleotide divergence of the 22 isolates from WN-NY99 was 0.35% (mean = 0.18%). These results show the geographic clustering of genetically similar WNV isolates and the possible emergence of a dominant variant circulating across much of the United States during 2002.


RESEARCH
Analysis of partial nucleotide sequences of 22 West Nile virus (WNV) isolates collected during the summer and fall of 2001 and 2002 indicated genetic variation among strains circulating in geographically distinct regions of the United States and continued divergence from isolates collected in the northeastern United States during 1999 and 2000. Sequence analysis of a 2,004-nucleotide region showed that 14 isolates shared two nucleotide mutations and one amino acid substitution when they were compared with the prototype WN-NY99 strain, with 10 of these isolates sharing an additional nucleotide mutation. In comparison, isolates collected from coastal regions of southeast Texas shared the following differences from WN-NY99: five nucleotide mutations and one amino acid substitution. The maximum nucleotide divergence of the 22 isolates from WN-NY99 was 0.35% (mean = 0.18%). These results show the geographic clustering of genetically similar WNV isolates and the possible emergence of a dominant variant circulating across much of the United States during 2002.

W est Nile virus (WNV) is a member of the genus
Flavivirus (family Flaviviridae) and belongs to the Japanese encephalitis virus serocomplex. Until 1999, the geographic distribution of the virus was limited to Africa, the Middle East, India, and western and central Asia with occasional epidemics in Europe (1,2). By December 2002, however, the distribution of the virus had expanded to include 44 states of the continental United States and southern regions of 5 Canadian provinces from Saskatchewan to Nova Scotia (3). Over the course of 3 years, the virus has traversed North America, presumably from New York City, where it was first isolated during the summer of 1999 (4)(5)(6)(7). Partial nucleotide and complete genome sequence analysis of several WNV strains isolated in the northeastern United States during 1999 and 2000 showed that these isolates were most closely related to a WNV strain isolated from the brain of a dead goose in Israel in 1998 (6,8,9). The subsequent establishment of WNV across the eastern and midwestern regions of North America from 1999 through 2001 set the stage for the rapid and widespread movement of the virus across the remainder of the continent during the summer of 2002, resulting in the highest number of annual case reports and deaths attributed to WNV in humans, equines, and birds documented since the discovery of the virus in North America. Surveillance programs initiated by public health agencies, research institutions, and diagnostic laboratories have resulted in the collection of hundreds of WNV isolates across the United States and Canada from various sources, including mosquitoes, humans, equines, birds, and a number of other vertebrate species (3).
Phylogenetic comparisons of partial and complete nucleotide sequences from isolates collected in the northeastern United States during 1999 and 2000 demonstrated a high degree of genetic similarity to the prototype New York strain, WN-NY99 (GenBank accession no. AF196835), with nucleotide identities of >99.8% and amino acid identities of >99.9% (9)(10)(11)(12). Although these studies have confirmed that northeastern isolates collected in 1999 and 2000 showed limited genetic divergence from WN-NY99, to date little published information has described the continuing divergence of WNV as its temporal and spatial distribution have expanded (13). To assess the extent to which WNV has evolved since its introduction in North America, we analyzed the partial nucleotide and deduced amino acid sequences of WNV isolates collected during the summer and fall of 2001 and 2002 and compared them to a homologous sequence region of WN-NY99. Collaborations between the University of Texas Medical Branch (UTMB) and a number of U.S. public health agencies have allowed 22 isolates of WNV to be collected, representing several geographically distinct U.S. regions. Phylogenetic comparisons of a 2,004-nucleotide region encoding the entire premembrane and envelope proteins (prM-E) of each isolate have shown the most divergent variants of WNV in North America to date and provide evidence of the possible emergence of a dominant variant circulating in many regions of the United States. Furthermore, our results indicate geographic clustering of distinct variants within and between states and reinforce previous evidence supporting the likelihood of multiple introductions of virus into the state of Texas (13).

Collection and Virus Isolation
Isolates were collected from five states: Illinois, Alabama, Louisiana, Colorado, and Texas. Isolates from Texas were collected from nine counties representing regions across the entire state ( Figure 1). All isolates were collected from September 2001 to October 2002. After being confirmed WNV-positive by state public health laboratories, virus or tissues were sent to UTMB for submission into the World Arbovirus Reference Collection. Each sample was given one passage in Vero cells to derive viruses for use in these studies. Virus samples represented a variety of sources, including mosquito pools, bird brain, human cerebrospinal fluid (CSF), and a dog kidney. Of the 18 isolates sequenced in this study (Table 1), 11 were isolated from mosquito pools by the Texas Department of Health (TDH); 2 from a mosquito pool and dog kidney homogenate by the Illinois Natural History Survey (INHS); 2 from passerine brain homogenates from the University of Alabama at Birmingham; 1 from a red-tailed hawk brain homogenate by the Centers for Disease Control and Prevention, Division of Vector-Borne Infectious Diseases (CDC-DVBID), Fort Collins, Colorado; 1 from a mosquito pool in Louisiana, courtesy of CDC-DVBID; and 1 from the CSF of a patient who died of West Nile encephalitis at UTMB.

RNA Extraction, Reverse Transcription, and Polymerase Chain Reaction
Viral RNA was extracted directly from 140 µL of infected Vero or BHK cell culture supernatants by using the QiaAMP viral RNA extraction kit (Qiagen, Valencia, CA). Reverse transcription (RT) was performed in a 50-µL volume containing 5 µL of viral RNA, 1 µL of random hexamer primer, 10 µL of 5X RT buffer, 4 µL of 10 mM dNTPs, 0.4 µL of cloned RNAse inhibitor, 0.5 µL of Moloney murine leukemia virus (MMLV) reverse transcriptase, and 29.1 µL of high-performance liquid chromatography (HPLC) water. Polymerase chain reaction (PCR) was performed in a 25-µL volume containing 2.0 µL cDNA template from RT, 1.0 µL forward primer, 1.0 µL reverse primer, 2.5 µL 10X PCR buffer, 0.5 µL 10 mM dNTPs, 0.5 µL of 1 U/µL Taq PCR, and 17.5 µL of HPLC water. Three previously described primer pairs were used to amplify the entire prM-E genes of WNV (13). PCR products were gel-purified by using the QIAquick kit (Qiagen), according to the manufacturer's protocol, and the resulting template was directly sequenced by using the amplifying primers. The WN1751/WN2504A PCR product derived from WNV isolate Galveston County, TX-3 was cloned into pGEM-T Easy (Promega Corporation, Madison, WI), and 10 clones were sequenced to determine the degree of nucleotide sequence divergence within a single isolate collected from the southeast coast of Texas. Sequencing reactions were performed in the UTMB Biomolecular Resource Facility's DNA sequencing laboratory by previously described methods (13). Analysis and assembly of sequencing data were performed by using the Vector NTI Suite software package (Informax, Frederick, MD). Nucleotide and deduced amino acid sequences of the entire prM-E genes from each isolate were aligned by using the AlignX program in the Vector NTI Suite and compared with previously published sequences of isolates from southeast Texas collected from June to August of 2002 (13). All isolates were then compared with isolates collected in the northeastern United States during 1999, 2000, and 2001, and a phylogenetic tree was constructed by maximum parsimony algorithm by using PAUP (Version 4.0b10) (Sinauer Associates, Sunderland, MA) to show genetic relationships of these isolates with other North American WNV isolates found in GenBank, in which the homologous 2,004-nucleotide region had been sequenced.

Results
Nucleotide sequences representing a 2,004-nucleotide region of the complete prM-E genes of WNV (nucleotides 466-2,469) of the 18 isolates collected in 2001 and 2002 (GenBank AY4281514-AY428531), plus 4 southeast Texas strains (13), were compared with a homologous sequence region of the prototype WNV, WN-NY99 (Tables 1 and 2 (Tables 1 and 2). Nucleotide mutations occurred at 33 positions (9 in prM, 24 in E) with a total of 7 amino acid substitutions (2 in prM, 5 in E). The maximum nucleotide divergence of the 22 isolates from WN-NY99 was 0.35%, with an average nucleotide divergence of 0.18%.
Several of the nucleotide mutations identified in this study were shared by many isolates (Table 1 and 2; Figure 2). Two nucleotide mutations at residues 1,442 (conservative amino acid substitution of Val to Ala at position E159) and 2,466 were shared by 14 of the 22 isolates, with 10 of these 14 isolates sharing an additional noncoding nucleotide mutation at residue 660. Five different nucleotide mutations (at residues 969, 1,192 [amino acid substitution of Thr to Ala at position E76], 1,356, 2,154, and 2,400) were shared by seven isolates, all of which were collected from coastal regions of southeast Texas. The isolate from Louisiana differed from WN-NY99 at only one nucleotide (residue 807) over the region studied and did not share any nucleotide mutations with other isolates from this study. In comparison, all other nucleotide mutations identified in this study were not shared by nucleotide sequences reported previously from isolates collected in the northeastern United States during 1999, 2000, or 2001 (9)(10)(11)(12). Because these mutations were unique to isolates sequenced during this study, our results did not show a closer genetic relationship to isolates from 2001, 2000, or 1999. However, the two isolates in this study that were collected in 2001 (Alabama-1; Alabama-2) did share two nucleotide mutations (residues 1,442 and 2,466) with 12 of the other isolates collected in 2002. Construction of a phylogenetic tree by maximum parsimony analysis (Figure 3) illustrates the genetic proximity of isolates from this study To determine whether nucleotide mutations that define the southeast coastal Texas variant were uniform throughout the quasispecies population of a select isolate, the WN1751/WN2504A PCR product derived from WNV isolate Galveston Co., TX-3, was cloned into pGEM-T Easy. Ten clones were sequenced to obtain homologous regions of 700 nucleotides, which were then compared with the Galveston Co., TX-3, consensus sequence. This region contained the U to C mutation at nucleotide 2154 and the U to C mutation at nucleotide 2,400. Five of the 10 clones were identical to the consensus sequence, while the other five clones each had one or two nucleotide changes from the consensus sequence for a total of eight nucleotide changes ( Table 3)    Although our studies have compared a larger portion of the genome than earlier studies of partial nucleotide sequences, we have identified individual isolates with as many as seven nucleotide mutations and three amino acid substitutions, with a maximum divergence of 0.35% from the homologous region of the prototype North American WNV, WN-NY99. The nucleotide mutations identified in this study were not shared by previously sequenced isolates from 1999, 2000, or 2001 (9-12) and represent new nucleotide changes in the North American WNV population. Since these changes were not shared with other previously reported WNV sequences, the isolates analyzed in this study did not show a greater genetic similarity with northeastern isolates from 1999, 2000, or 2001. However, several of these nucleotide changes (660, 969, 1,356, 2,154, 2,400, and 2,466) are observed in other Old World WNV strains from both lineage I and lineage II ( Table 4). Each of these changes represents a noncoding mutation from either a C to U or U to C in the third codon of the open reading frame; nucleotides at these positions may revert back to nucleotides observed in the more ancestral Old World strains.
Our results also suggest the geographic clustering of genetically distinct variants. Seven of the 22 isolates, all of which were collected from coastal regions of southeast Texas, share five nucleotide mutations unique to only these isolates.  To date, little genetic evidence supports or refutes the hypothesis that WNV becomes established in an enzootic transmission cycle in a particular geographic area rather than being reintroduced into a particular area each year when the transmission season begins. Similarly, because of the limited published data detailing the year-to-year genetic changes observed in WNV, whether the virus is becoming endemic in particular regions of the United States remains to be established. This question will be answered in part by determining baseline phylogenetic results of specific variants in a geographic area and by analyzing isolates collected in sequential transmission seasons.
Although the isolates analyzed in this study do not represent the entire temporal and geographic distribution of WNV in North America, at least some nucleotide mutations have been conserved among WNV strains circulating across the continent. If indeed the conservation of these mutations is the result of selective pressure, such as the continued capacity to replicate in both arthropod and vertebrate hosts, rather than random mutations occurring as a consequence of genetic drift, one would expect these mutations to be conserved in virus isolates collected in other regions of North America. Further investigation concerning the genetic composition of viruses from additional regions of North America will define the extent to which dominant variants have emerged. If dominant variants do continue to emerge across the United States, phylogenetic analyses will help researchers monitor the spread of WNV in North America and may provide explanations for the rapid and widespread movement of this newly emerging virus in North America. Similarly, identifying the genetic composition of WNV isolates from other regions of the United States and Canada, as well as comparing these isolates with isolates collected in 2003, will continue to define evolutionary relationships of WNV circulating in North America and facilitate predictions concerning the primary mechanisms of transmission and spread of the virus.