Genetic Determinants of Virulence in Pathogenic Lineage 2 West Nile Virus Strains

The most likely determinants are mutations in the nonstructural proteins encoding viral replication and protein cleavage mechanisms.

We determined complete genome sequences of lineage 2 West Nile virus (WNV) strains isolated from patients in South Africa who had mild or severe WNV infections. These strains had previously been shown to produce either highly or less neuroinvasive infection and induced genes similar to corresponding highly or less neuroinvasive lineage 1 strains in mice. Phylogenetic and amino acid comparison of highly and less neuroinvasive lineage 2 strains demonstrated that the nonstructural genes, especially the nonstructural protein 5 gene, were most variable. All South African lineage 2 strains possessed the envelope-protein glycosylation site previously postulated to be associated with virulence. Major deletions existed in the 3′ noncoding region of 2 lineage 2 strains previously shown to be either less or not neuroinvasive relative to the highly neuroinvasive strains sequenced in this study. W est Nile virus (WNV) is endemic to Africa, Asia, Europe, and Australia and was introduced into the Western Hemisphere in 1999. In the Northern Hemisphere, an apparent increase in human case fatality rates, neurologic infections, and horse and bird deaths due to WNV has raised the question whether WNV strains with increased pathogenicity have emerged in the Northern Hemisphere, or whether the virulence of the virus and the severity of the disease are underestimated in South Africa.
Two major phylogenetic lineages of WNV have been demonstrated: lineage 1 includes viruses from North Africa, Europe, Asia, the Americas, and Australia (Kunjin virus); lineage 2 consists exclusively of viruses from southern Africa and Madagascar. The increase in illness and death from WNV lineage 1 strains relative to lineage 2 strains led to the supposition that lineage 1 strains are highly pathogenic while lineage 2 strains endemic to Africa are of low virulence (1,2). However, it was subsequently demonstrated in South Africa that lineage 2 strains may also cause severe disease (3). Furthermore, experiments using mice demonstrated marked differences in neuroinvasive phenotype that did not correlate with lineage, which suggests that highly and less neuroinvasive phenotypes exist in both lineages (2,4). Host gene expression studies indicated that similar genes are induced by highly neuroinvasive lineage 1 and 2 strains (4). Therefore, the perceived virulence of WNV in recent epidemics probably refl ects high medical alertness, active surveillance programs, and the emergence and reemergence of existing strains of WNV in locations with immunologically naive populations (3). Recently, a lineage 2 strain was isolated from a goshawk fl edgling that died of encephalitis in Hungary, which suggests that lineage 2 strains may also be spread by migratory birds outside of Africa (5).
Lineage 1 viruses that have phenotypes of reduced virulence in mice and ineffi cient growth in culture have been identifi ed in Mexico. Mutations leading to loss of envelope (E) protein glycosylation together with mutations in the nonstructural (NS) protein genes may be associated with attenuation of these viruses (6). Comparisons between the prototype Uganda strain (B956) and a variant of this strain, which was obtained by molecular mutation (B956D117B3), showed changes in the E and NS genes, which resulted in reduced virulence in mice (7). These attenuations could not, however, be correlated with clinical disease in humans because these strain were either isolated from birds or modifi ed in culture.
The NS4B protein may play an important role in virulence phenotype determination (6,(8)(9)(10), predicted to be involved in viral replication and evasion of host innate Genetic Determinants of Virulence in Pathogenic Lineage 2 West Nile Virus Strains immune defenses (8). Substitution of cysteine at position 102 with serine (Cys102Ser) led to the formation of a temperature-sensitive phenotype at 41°C as well as attenuation of the neuroinvasive and neurovirulent phenotypes in mice (8). An adaptive mutation (E249G) in the NS4B gene resulted in reduced RNA synthesis in host cells (9). An in vitro study compared infectious clones of the NY99 strain, which is highly virulent in American crows, with a Kenya strain (KEN-3829), which is less virulent for American crows. After 72 days at 44°C, reduction in viral RNA production by the KEN-3829 strain was 6,500-fold, compared with the NY99 strain reduction of 17-fold. This fi nding suggested that effi cient replication at high temperatures, as occurs in American crows, could be an important virulence factor that determines the pathogenic phenotype of the NY99 strain (10).
To further investigate the molecular determinants of virulence of lineage 2 WNV strains, we sequenced the genomes of highly and less neuroinvasive lineage 2 strains that were isolated from patients in South Africa and that had previously been characterized with respect to gene expression and pathogenicity (4). These complete genome sequences of highly neuroinvasive lineage 2 WNV strains enable comprehensive comparison with highly and less neuroinvasive lineage 1 strains.

Virus Strains
South African WNV isolates SPU116/89, SA93/01, SA381/00, and H442 were obtained from the Special Pathogens Unit, National Institute for Communicable Diseases, South Africa, as freeze-dried mouse brain passages 2-4. They were replicated by 1 passage in Vero cells for this study.

RNA Amplifi cation
Viral RNA was extracted from cell culture supernatant with the QIAamp Viral RNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. For cDNA synthesis, 10 μL of RNA and 0.4 μg of random hexanucleotides (Roche Diagnostics, Mannheim, Germany) were incubated at 65°C for 10 min before cooling on ice. Then 1× Expand Reverse Transcriptase buffer, 100 mmol/L dithiothreitol, 200 μmol of each deoxynucleotide triphosphate, 20 U RNase Inhibitor and 50 U Expand Reverse Transcriptase (Roche Diagnostics) were added and incubated at 30°C for 10 min, followed by 1 h at 43°C. For PCR amplifi cation, 10 μL of the cDNA reaction was added to the PCR master mix consisting of 3.75 U of Expand High Fidelity Polymerase and 30 pmol of each specifi c primer (primer sequences available on request) and cycled as follows: 94°C for 2 min (94°C for 15 s, followed by primer-specifi c annealing temperature for 30 s, 72°C for 2 min) × 35 and 72°C for 7 min. Expand Long Template PCR Polymerase (Roche Diagnostics) was used for products >2 kb with 300 μmol of each dNTP, 1× buffer, and 30 pmol of each specifi c primer and cycled at 94°C for 2 min (94°C for 10 s, 50°C for 30 s, 68°C for 3 min) × 10; followed by 30 cycles of 94°C for 15 s, 50°C for 30 s, 68°C for 5 min plus 5 s per cycle, and 72°C for 7 min.

DNA Sequencing
PCR products were purifi ed with Wizard SV gel and PCR clean-up system (Promega, Southampton, UK). DNA cycle sequencing was performed with the BigDye Terminator V3.1 kit and analyzed on an ABI PRISM 3100/3130 genetic analyzer (both from Applied Biosystems, Foster City, CA, USA).

Sequence Analysis
Genome editing and assembling were performed by using Vector NTI 9.1.0 (Invitrogen, Carlsbad, CA, USA); multiple sequence alignments, with ClustalW (11); and amino acid analysis, with GeneDoc for Windows (12). Amino acid changes considered to have a potential effect on the secondary structure of the proteins included substitution of hydrophilic for hydrophobic amino acids or vice versa and substitutions of cysteine, glycine, and proline residues (12).
Comparisons are relative to the top sequence (SA381/00); numbering refers to the sequence position of isolate SA381/00. Neighbor-joining trees were drawn with MEGA version 3.1 (13) by using the Kimura-2 distanceparameter and a bootstrap confi dence level of a 1,000 replicates. Nucleotide and amino acid p-distances (the number of pairwise nucleotide or amino acid differences divided by the total number of nucleotides or amino acids in the sequenced region) were calculated by using MEGA version 3.1. Signalase cleavage predicted scores were calculated with AnalyzeSignalase 2.03 (14).

Strain Characteristics
Four lineage 2 WNV strains isolated from patients in South Africa who had mild or severe WNV infections were selected for genome sequencing. Phenotypic pathogenicity data for these strains (H442, SPU116/89, SA93/01, SA381/00) in humans and mice are summarized in Table 1.
Detailed clinical data for all 4 strains have been described by Burt et al. (3), and mouse neuroinvasive experiments and gene expression data for H442, SPU 116/89, and SA381/00 have been described by Venter et al. (4). Strain SA93/01 has been shown to be highly neuroinvasive in a mouse model (M. Venter, unpub. data), similar to SPU116/89 and H442 strains, whereas SA381/00 has been classifi ed as being of low neuroinvasive phenotype in mice. H442 and SA381/00 caused fever, rash, myalgia, and arthralgia in human patients; SA93/01 caused nonfatal encephalitis in 2, and SA116/89 caused fatal hepatitis (3).
The 4 South African strains were compared with strains that were known to be highly or less neuroinvasive in mice or that had been reported to be highly pathogenic or attenuated. Lineage 2 strains for which both full genome sequences and neuronvirulence data in mice were available included isolate B956D117B3 (21) and Madagascar strain AnMg798. B956D117B3 is a passaged clone of reduced virulence (7) of the prototype strain (B956), which was originally associated with fever in a patient and was neurotropic in mice (22); AnMg798 is not neuroinvasive (2). Lineage 1 strains included the highly pathogenic and neuroinvasive NY385-99 strain (2), the attenuated non-neuroinvasive strain TM171-03 isolated in Mexico in 2003 (19), hamster-passaged attenuated clones of NY-385-99 (clone TYP-9376 and clone 9317B) (18), and a non-neuroinvasive Kunjin virus strain MRM61C (Table 1).

Phylogenetic Analysis
Phylogenetic analysis confi rmed that the South African strains described here belong within lineage 2 ( Figure  1). SA93/01 and SPU116/89 clustered together; H442 and SA381/00 were on separate branches within lineage 2 with respect to the full genome sequences or with respect to individual E, NS3, and NS5 genes (data not shown). Although the Indian strain clustered with lineage 1, p-distance analysis suggested that it was as distant to the lineage 1 strains (20% differences) as to the lineage 2 strains (21%-22%) relative to <5% differences within lineage 1C and 12% differences between 1A and 1B. It was therefore termed lineage 5, as suggested by Bondre et al. (23).

Genome Sequences and Distance Analysis
The complete genome sequences of strains H442, SPU116/89, SA381/00, and SA93/01 were deposited in GenBank (accession nos. EF429197-200). The termini were amplifi ed with primers designed from other lineage 2 full genome sequences. If one assumes that the 5′ and 3′ termini are identical in length to other published strains, these genomes were 11052 nt (SPU116/89, SA381/00, and SA93/01) and 11051 nt (H442) long. South African strains had overall nucleotide p-distances of 0.0278 (97.2% similarity) to each other (Table 2), with <1% amino acid differences over the complete genome despite having been isolated as many as 50 years apart. The highest percentage of amino acid differences in the individual proteins of the lineage 2 strains were in the NS proteins, especially the NS5 protein. Table 3 shows the differences between individual proteins of the South African lineage 2 strains. The 2 strains from North America, New York (NY-385-99) and Mexico (TM171-03), were similarly conserved. In contrast, the Madagascar strain, AnMg798, differed by >3% at the amino acid level from all lineage 2 strains from South Africa or Uganda B956D117B3, and the lineage 1 and 2 strains differed by >6% amino acids from each other.

Amino Acid Differences between Highly and Less Neuroinvasive Strains
Few amino acid differences were observed between the structural proteins of the South African strains ( Figure  2). SA381/00 had only 1 difference in the premembrane (prM) protein at position 105 relative to the highly neuroinvasive strains (Ala105Val) (Figure 2). Two differences, (Ala54Gly and Thr70Pro) could result in structural changes in the E protein of H442, which was isolated 50 years earlier than strains SPU116/89, SA93/01, and SA381/00. The attenuated lineage 2 strain B956D117B3 and the nonneuroinvasive Madagascar strain AnMg798 contained differences in the glycosylation site of the E protein relative to the South African strains (residues 154-157 deleted in B956D117B3, and Ser156Pro in AnMg798). Either of these changes would prevent glycosylation. Further substitutions of hydrophilic amino acids for proline and glycine residues with potentially structural implications were found in AnMg798 at positions 156, 199, and 230.
The NS3, NS4A/B, and NS5 proteins were the most variable viral proteins. In strain SA381/00, the least virulent of the 4 strains, a hydrophobic amino acid in contrast to a hydrophilic amino acid (Ser160Ala) and Arg298Gly could alter the structure of the SA381/00 NS3 protein. In the highly pathogenic strain SPU116/89, a hydrophobic-tohydrophilic mutation (Ala79Thr) in NS4B is found relative to the other strains. Other amino acid changes with potential structural implications were for strain B956D117B3 at positions 18 and 145 of the NS4A gene and 14 of the NS4B gene and for strain AnMg798 at positions 14 and 27 in the NS4B gene of ( Figure 2).
The NS5 protein was the most variable. Several positions were identifi ed where the South African strains associated with mild infections (SA381/00 and H442) and the 2 other lineage 2 strains (AnMg798,B956D117B3) associated with reduced virulence in mice had the same amino acid changes relative to strains that caused severe disease (SPU116/89 and SPU93/01). These included hydrophilic versus hydrophobic amino acids in position 614 and hydrophobic (mild) versus hydrophilic (pathogenic) in positions 625 and 626 of the NS5 protein. SPU116/89, isolated from a patient with necrotic hepatitis, was found to have amino acid changes that affect the hydrophobicity of the NS5 protein relative to all other strains in positions 197, 623, 635, 641, and 643.

Noncoding Regions
Approximately 98.6% identity existed between the 5′ noncoding regions of SA381/00 and the 3 remaining South African strains; the other strains were 100% conserved. For the 3′ noncoding regions, the overall identity was 98.5% (99% between SPU381/00 and H442 and 98% between SPU116/89 and SA381/00). Noteworthy nucleotide differences in the 3′ noncoding regions were a 2-nt deletion at nt 10439 and nt 10440 in strain H442 and a 76-bp deletion in the 3′ noncoding region from nt 10404 through nt 10479 in the attenuated strain B956D117B3, which was not present in the prototype strain (B956) or in any of the South African strains. Strain AnMg798 had deletions overlapping those of strain B956D117B3 at position 10411 to 10487 and from 10501 to 10512 and 10951 ( Figure 2, panel B). The sequence of the AnMg798 strain is incomplete in Gen-Bank and ended at position 10866 (16).

Envelope-Protein Glycosylation Motif
The E protein glycosylation motif previously identifi ed in lineage 1 at positions 154-156 (NYS) (6) was present in all 4 South African strains. However, as a result of a proline substitution at position 156, the site was not predicted to be glycosylated in strain AnMg798. The glycosylation motif is deleted completely in strains B956 and B956D117B3 (Table 1).

Cleavage Sites
Signalase prediction algorithms were used to analyze the signal peptidase cleavage sites (14); no differences were found in cleavage effi ciency between the highly and less pathogenic strains ( Table 4). The only meaningful difference was observed in the capsid (C)-PrM cleavage region, as indicated by the Student t test probability calculated in Table 4, where the lineage 2 strains were predicted to be cleaved more effi ciently than lineage 1 stains. Only slight differences were apparent in the PrM-E site; no differences were apparent in any other cleavage regions between lineage 1 and 2 strains.

Discussion
Phylogenetic and p-distance analyses suggested that relationships between WNV strains were infl uenced by geographic rather than temporal factors (Figure 1, Tables  2, 3). Four South African strains isolated over 50 years differed from each other by an average of only 3% of nucleotides but from the AnMg798 (Madagascar) strain by 21%.
The WNV genome consists of a 5′ noncoding region, a single open reading frame coding for 3 viral structural proteins (C, M, and E) and 7 NS proteins, and a 3′ noncoding region. The E and membrane (M) proteins are associated with host range, tissue tropism, replication, assembly, and the stimulation of the B-and T-cell immune responses; replication functions are associated with the NS proteins, which may also modulate responses to viral infection (6). The E protein is the viral hemagglutinin that mediates virus-host cell binding and elicits most of the virus neutralizing antibodies and serotype specifi city of the virus (1,24,25).
In this study, differences between highly and less neuroinvasive lineage 2 stains were identifi ed in the noncoding regions, which may potentially affect enzyme binding sites and replication effi ciency (Figure 2, panel B). It has been postulated that the 3′ stem loop structure may function as a translation suppressor (26) and that nucleotide sequence variation in the 3′ noncoding region of different dengue strains may have evolved as a function of transmission or replication ability in different mosquito and nonhuman primate/human host cycles (27). A 76-bp deletion in the 3′ noncoding region is present in strains B956D117B3 and AnMg798 relative to the South African strains. This deletion is not present in the original neurotropic mouse brain isolate of the B956 Uganda strain, which has recently been resequenced (7). Strain B956D117B3, a descendent of the original B952 isolate, has been shown to be less virulent 226 Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 14, No. 2, February 2008 than the original B956 strain. The absence of this deletion in all of the neuroinvasive lineage 2 strains isolated from clinical cases warrants further investigation of the role of the region in the pathogenicicty of WNV. The genetic stability observed in the surface E and M proteins of lineage 2 strains suggests an absence of immune-driven selection. Only the H442 strain, isolated 50 years before the other strains, had 2 substitutions in the E gene with potential structural implications (Figure 2). The absence of a putative E protein glycosylation site at positions 154-156 of the E protein (NYS) has previously been associated with reduced virulence in mice (19). This glycosylation motif was present in all the South African strains, including the less neuroinvasive strain SA381/00. However, the prototype lineage 2 strain B956D117B3 and the non-neuroinvasive lineage 1 and 2 strains MRM61C and AnMg798 were not glycosylated. This fi nding further emphasizes that glycosylation of the E protein is not the only determining factor for virulence.
Most substitutions were found in the NS proteins, in particular NS3, NS4A/B, and NS5. The NS3 protein is part of the protease complex, which is important for cleavage of the polyprotein and may affect virulence; it has been suggested that less effi cient cleavage results in delayed virus assembly and release, enabling the host immune system to clear infection (28). The NS3 protein of the less neuroinvasive strain, SA381/00, manifested hydrophobic and hydrophilic changes, which could lead to structural changes that affect function and, by implication, virulence. The highly neuroinvasive strain SPU116/89 had mutations that may al- ter the hydrophobicity of the NS4B protein (Ala79Thr) relative to the other strains and may have potential structural and functional implications for the viral replicase complex of which NS4B is a component (25).
Most amino acid differences occurred in the NS5 protein, which is associated with cytoplasmic RNA replication because it contains an RNA-dependent RNA polymerase, S-adenosylmethionine methyltransferase, and importin βbinding motifs (28). Deletions in the NS5 protein abolish replication (29), which suggests that amino acid substitutions may effect replication effi ciency and, hence, virulence. Temperature-sensitive strains with reduced virulence for mice, isolated in Texas, also contained mutations in the NS proteins (30). In addition, organ tropism of strains has been associated with mutations in the NS5, NS2, and E proteins (18). The 2 lineage 2 strains that caused mild disease in patients (H442 and SA381/00) had several substitutions of hydrophobic to hydrophilic amino acids relative to the other 2 strains in the NS5 protein. SPU 116/89, isolated from a patient with necrotic hepatitis, had several amino acid changes that may affect its hydrophobicity and result in structural and functional changes that have implications for altered replication effi ciency, tissue tropism, and pathogenicity.
Flavivirus polyproteins are cleaved either by a host signal peptidase or a viral-encoded serine protease consisting of the NS3 protease and the NS2B cofactor (NS2B-NS3) (29). Proteolytic processing of the C-prM and NS4A/ B proteins occurs effi ciently only after upstream cleavage of the signal sequence by cytoplasmic viral protease. Ef-fi ciency of signal peptidase cleavage at the NH 2 termini of prM and NS4 proteins is increased by coexpression of the viral NS2B-NS3 protease and the structural polyprotein region (31). Mutagenesis analysis of the signal sequence of yellow fever virus prM protein indicated that mutations that enhance cleavage by the signal peptidase almost totally suppress production of infectious virions (31). Signal peptidase cleavage of prM protein results in the production of membrane-anchored forms of the C protein, which may be deleterious for replication if it functions poorly as a substrate for viral protease. The signal peptidase-mediated cleavage at the NH 2 terminus of prM protein does not occur effi ciently, whereas cleavage at the NH 2 terminus of the E protein does. Inadequate prM protein production in turn affects production and lowers the secretion of prM-E heterodimers. When these constructs are used in vaccination studies, a lack of immunogenicity is noted (32).
In the present study, all highly and less pathogenic lineage 2 strains as well as lineage 1 strains were predicted to be cleaved with the same effi ciency. At the C-prM site, lineage 2 strains are cleaved slightly more effi ciently than lineage 1 strains; at prM-E, the reverse is true. How these differences in cleavage effi ciency affect pathogenicity is unclear and may warrant further investigation.
The high number of cases of neurologic infections in recent epidemics in the United States may be attributed to the rapid distribution of a single highly neuroinvasive strain in a highly susceptible population. The comparatively low number of WNV fever or neurologic cases reported in South Africa, despite the wide distribution of the virus  and the presence of neuroinvasive strains, may refl ect inadequate surveillance and a lack of medical awareness of the disease potential of arboviruses. Moreover, the importance of WNV in South Africa may be overshadowed by the presence and effect of other diseases such as HIV/AIDS. Nevertheless, the epidemic potential and effect that WNV may have on a large population of immunocompromised HIV-infected persons necessitates improved surveillance of arbovirus infections of persons in southern Africa.
In conclusion, these full genome sequences provide insight into the molecular factors that may differentiate pathogenic from mild lineage 2 WNV strains. Mutations in the NS proteins encoding viral replication and protein cleavage mechanisms are the most likely determinants of differences in pathogenicity.