Genetic characterization of Japanese encephalitis virus genotype II strains isolated from 1951 to 1978

Japanese encephalitis virus (JEV), the prototype member of the JEV serocomplex, genus Flavivirus, family Flaviviridae, is the most significant arthropod-borne encephalitis worldwide in terms of morbidity and mortality. At least four genotypes (GI–GIV) of the virus have been identified; however, to date, the genomic nucleotide sequence of only one GII virus has been determined (FU strain, Australia, 1995). This study sequenced three additional GII strains of JEV isolated between 1951 and 1978 in Korea, Malaysia and Indonesia, respectively, and compared them with the FU strain, as well as with virus strains representing the other three genotypes. Based on nucleotide and amino acid composition, the genotype II strains were the most similar to GI strains; however, these two genotypes are epidemiologically distinct. Selection analyses revealed that the strains utilized in this study are under predominantly purifying selection, and evidence of positive selection was detected at aa 24 of the NS4B protein, a protein that functions as an alpha/beta interferon signalling inhibitor.


INTRODUCTION
Japanese encephalitis virus (JEV) is a member of the JEV serocomplex within the genus Flavivirus, family Flaviviridae. Worldwide, JEV is the most important arthropod-borne viral encephalitis, with 50 000 cases and 10 000 deaths reported yearly, although this may be a significant underestimate due to a lack of surveillance and underreporting (Barrett, 2008). Whilst most human infections are asymptomatic, approximately 25 % of symptomatic infections are fatal, 50 % develop severe, often long-term, neurological and/or psychiatric sequelae, and 25 % resolve fully (Barrett, 2008). The virus is transmitted in an enzootic cycle involving waterbirds, domestic pigs and rice paddy-breeding mosquitoes, principally Culex tritaeniorhynchus. Humans and other non-avian vertebrates are only infected incidentally and are considered to be dead-end hosts because they fail to produce viraemias of sufficient titres to infect mosquitoes. The prototype strain of JEV was isolated in Japan in 1935 (Lewis et al., 1947) and the virus has since been found throughout East and South-east Asia with the geographical borders of virus activity extending north to maritime Siberia (Grascenkov, 1964), west to Pakistan (Igarashi et al., 1994) and south-east to Australia (Hanna et al., 1996).
JEV virions are approximately 50 nm in diameter and contain a host-derived lipid envelope that surrounds the nucleocapsid. Enclosed within the nucleocapsid is one molecule of positive-sense ssRNA of approximately 11 kb. This molecule contains 59 and 39 untranslated regions (UTRs) and a single ORF that encodes a 3432 aa polyprotein, which is co-and post-translationally cleaved by viral and host proteases into three structural proteins: the capsid protein (C), the precursor (prM) of the membrane protein, and the envelope protein (E), as well as seven non-structural proteins (NS): NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5 (Chambers et al., 1990). At least four genotypes of JEV have been identified based on nucleotide sequence information derived from a 240 nt region spanning a portion of the C and prM genes, as well as E gene nucleotide sequence data. Genotype I (GI) comprises strains isolated in northern Thailand, Cambodia, Korea, China, Japan, Vietnam, Taiwan and Australia between 1967 and the present; it has been the subject of numerous genetic studies due to its recent geographical expansion and displacement of GIII of the virus (Nga et al., 2004;Nitatpattana et al., 2008). JEV GII includes strains that have been isolated sporadically between 1951 and 1999 in Korea, southern Thailand, Malaysia, Indonesia, Papua New Guinea and northern Australia. Historically, the south-eastern limit of JEV activity was considered to lie west of the Wallace line (Williams et al., 2000). However, in 1995, an outbreak due to GII of JEV and consisting of three human cases, two of which were fatal, occurred on Badu Island in Australia's Torres Strait. GIII has been the source of numerous Japanese encephalitis epidemics throughout history, and includes strains isolated in mostly temperate regions of Asia between 1935 and the present. Curiously, GIV includes strains isolated only from mosquitoes on three islands encompassing the Indonesian archipelago between 1980 and 1981 (Chen et al., 1990(Chen et al., , 1992. The database of wild-type JEV strains for which there is ORF nucleotide sequence information consists of relatively few isolates, and surprisingly only one of these strains belongs to GII (FU strain, Australia, 1995). The absence of genomic information on other GII strains has prevented us from obtaining a firm understanding of the genetic variation and phylogenetic relationships among strains of JEV; it also has precluded an investigation to determine whether selection and codon usage bias exist among strains of the virus. Therefore, we determined the nucleotide sequence of the ORF of three additional GII strains of JEV that were isolated from specimens collected in Korea, Indonesia and Malaysia between 1951 and 1978 and compared them with sequence data derived from the GII strain that was isolated in Australia in 1995, as well as with 28 JEV strains representative of the other three genotypes, which were isolated throughout East and South-east Asia between 1935 and 2007. In addition, selection and codon usage analyses were performed on the ORF nucleotide sequence dataset.

Phylogenetic relationships among the four genotypes of JEV and among GII strains of the virus
The nucleotide sequence of the ORF of the three additional strains of GII of JEV were determined [Bennett, (Korea, c.1951);WTP-70-22 (Malaysia, 1970);and JKT 654 (Indonesia, 1978)]. These were compared with 29 wildtype homologous JEV sequences and one sequence of Murray Valley encephalitis virus (MVEV) that were retrieved from GenBank for a final dataset comprising 33 virus strains, each of which was 10 308 nt (3436 aa) in length (Table 1). All three methods of phylogenetic inference [Bayesian, neighbour-joining (NJ) and maximum-likelihood (ML)] confirmed previous studies and identified four major clades (representing GI-GIV), as well as revealing identical tree topologies and virus groupings (Fig. 1). Genotypes of JEV in this study were defined as strains that differed in nucleotide composition by more than 9.1 % over the entire ORF.
Nine of the 32 JEV strains belonged to GI, four strains belonged to GII (including the three strains sequenced in this study), 18 strains belonged to GIII and one strain belonged to GIV. GIV of JEV diverged from MVEV first, followed by GIII, GII and lastly GI.
Genetic divergence among the four genotypes of JEV Pairwise comparisons of both nucleotide and amino acid sequences were used to determine the genetic relatedness of GII strains to the other three genotypes of JEV ( A number of amino acid substitutions were specific for individual genotypes (genotype-specific amino acid substitutions), and some defined individual nodes within the phylogeny (node-defining amino acid substitutions) (see Supplementary Table S1, available in JGV Online). Genotype IV possessed 144 genotype-specific amino acid substitutions (20 in the C protein, 15 in the prM protein, 22 in the E protein, 17 in the NS1 protein, 11 in the NS2A protein, 15 in the NS3 protein, seven in the NS4A protein, eight in the NS4B protein and 29 in the NS5 protein), GIII had 17 genotype-specific substitutions (three in the C protein, two in the E protein, two in the NS1 protein, three in the NS2A protein, three in the NS3 protein, one in the NS4B protein and three in the NS5 protein), GI+GII had 27 node-defining substitutions (three in the C protein, one in the prM protein, three in the E protein, three in the NS1 protein, four in the NS2A protein, one in the NS2B protein, four in the NS3 protein, three in the NS4B protein and five in the NS5 protein), GII had 23 genotype-specific substitutions (one in the C protein, one in the prM protein, four in the NS1 protein, six in the NS2A protein, one in the NS2B protein, two in the NS3 protein, one in the NS4A protein and seven in the NS5 protein) and GI had 21 genotype-specific substitutions (one in the C protein, three Genetic characterization of GII strains of JEV in the prM protein, one in the E protein, four in the NS1 protein, one in the NS2A protein, two in the NS2B protein, three in the NS3 protein, one in the NS4A protein and five in the NS5 protein).

Genetic diversity within GII of JEV
The intra-genotypic nucleotide and amino acid divergence were calculated to determine the extent of genetic variation within each genotype (Table 2; only the intra-genotypic divergence within GII is shown). The intra-genotypic nucleotide sequence divergence ranged from 0.7 % (XJP613|SH17M-07) to 3.2 % (K94P05|KV1899) among GI strains, from 2.5 % (Bennett|WTP-70-22) to 4.3 % (JKT654|FU) among GII strains, and from 0.1 % (CH1392|T1P1) to 4.0 % (Beijing-1|GP78) among GIII strains. The amino acid sequence divergence ranged from 0.3 % (JEV/sw/Mie/40/2004|SH17M-07 and HEN0701|JEV/ sw/Mie/41/2002) to 2.8 % (K94P05|KV1899) among GI strains, 0.5 % (Bennett|WTP-70-22, Bennett|JKT654 and WTP-70-22|JKT654) to 0.9 % (JKT654|FU) among GII strains, and 0.3 % (04940-4|57434, JaGAr01|CH1392, JaGAr01|JaOArS982 and JaGAr01|JaTH160) to 2.1 % (TC|GP78) among GIII strains. Table 3 shows the amino acid substitutions within the ORF that were specific to the four strains of GII of JEV. Compared with the Bennett strain, the WTP-70-22 strain exhibited 255 nucleotide differences resulting in 17 aa substitutions, the JKT654 strain exhibited 309 nucleotide differences resulting in 15 aa substitutions and the FU Positive selection operating at a single site within the NS4B protein gene of JEV Given that recombination in a nucleotide sequence alignment can adversely affect the results of selection analyses, we performed recombination analyses on the JEV sequence alignment prior to selection analyses using RDP, GENECONV, Chimaera, MaxChi and Bootscan methods implemented within the RDP3 program. Preliminary analyses of the nucleotide sequences yielded ten statistically significant (P,0.05) potential recombination events that were detected by at least two of the recombination detection methods. However, after manual verification of the potential recombination events, using phylogenetic and recombination signal analysis features in the RDP3 program, only four of the ten potential recombination events were confirmed (see Supplementary Table S2, available in JGV Online). All confirmed recombination events identified K94P05 (Korea, 1994, mosquito, GI) as the recombinant and JaOArS982 (Japan, 1982, mosquito, GIII) as the minor parent, whilst the major parent was identified as a different GI strain in each of the four recombination events. As the K94P05 strain was not available in our collection to resequence, we were unable to determine whether the Bayesian phylogeny based on nucleotide sequence information derived from the ORF of the JEV strains. The tree was rooted with MVEV (strain MVE-1-51), which is a member of the Japanese encephalitis serogroup, but this has been omitted from the figure to allow better visualization of branch lengths. Horizontal branch lengths are proportional to the genetic distance between strains. Bar, number of nucleotide substitutions per site. GI-GIV are indicated to the right of the tree. All nodes within the phylogeny were supported by a posterior probability of 1.0 unless otherwise indicated to the right of the node. The strains sequenced in this study are indicated by asterisks.
Genetic characterization of GII strains of JEV strain represented a true recombination event that occurred in nature or whether the nucleotide sequence of this strain that was submitted to GenBank represents a laboratory or sequence assembly error. Therefore, the K94P05 strain was removed from all subsequent analyses, given the fact that recombinant sequences can confound the results of evolutionary studies.
Evidence of non-neutral selection was assessed using 44 datasets that were created according to protein and genotype (Table 4). The ratio of non-synonymous (d N ) to synonymous (d S ) nucleotide substitutions over the first 9999 nt of the ORF for all strains was estimated to be 0.035, which suggested predominantly purifying selection. Estimates for separate genes for all strains ranged from 0.014 for the NS2B protein to 0.147 for the C protein.
When the ORF was examined, estimates of the d N /d S ratio ranged from 0.020 for GII to 0.067 for GI. This pattern persisted when the ORF dataset was parsed into individual proteins: GII consistently had the lowest d N /d S ratio when compared with the other genotypes in seven of the ten individual genes examined, whilst GI and GIII consistently had higher d N /d S ratios when compared with GII in all of the individual genes examined. These data suggested that GI and GIII viruses are under stronger purifying selection compared with GII viruses.
The maximum number of codons that exhibited patterns of negative selection when the all ORF strains were examined was 1432, using the random effects likelihood (REL) method, and the maximum number of codons that exhibited patterns of positive selection was ten using the REL method. However, only one codon, at site 2296 of the ORF (aa 24 of the NS4B protein), was identified as being under strong positive selection using three of the four selection detection methods. When the selection analysis was performed according to gene for all of the strains, aa 24 of the NS4B protein was again identified as being under strong positive selection using two of the four selection detection methods: fixed effects likelihood (FEL) (P50.03) and internal FEL (IFEL) (P50.04).

Individual genes within the ORF of JEV strains exhibit differences in codon usage
To determine whether codon usage differed among JEV strains according to genotype and individual gene, the G+C content at the first and second codon positions (GC12), the G+C content at the synonymous third codon positions (GC3) and the codon usage index (N C ) values were determined for all 32 virus strains over the ORF for all genotypes, as well as according to genotype and gene ( Table 5). The mean GC12 value was 0.51 and ranged from 0.47 (GII|prM gene, GIII|prM gene and GIV|prM gene) to 0.54 (GIII|NS4B gene, GI|NS2B gene and GIV|NS2B gene).
The GC3 values ranged from 0.46 (GII|NS2B gene) to 0.59 (GII|NS2A gene and GIV|NS2A gene), with a mean value of 0.52. The mean N C value was 55.81 and ranged from 45.07 (GII|NS4A gene) to 61.00 (GI|NS2B gene, GIV|prM gene and GIV|NS4A). Therefore, the extent of codon usage bias was small, although some variation was present.
To determine whether the observed codon usage bias was controlled by mutational pressure or natural selection, the observed codon usage bias was compared with the bias that would be expected under the null hypothesis that mutation pressure is the sole determinant. First, GC3 was plotted against GC12 for all genotypes and genes (overall) (Fig.  2a), as well as according to genotype (Fig. 2b) and gene (Fig. 2c). A marginally statistically significant negative correlation between GC3 and GC12 was observed overall (r520.33, P50.04), suggesting that patterns of base composition may be the result of mutational pressure. However, no statistically significant correlations between GC3 and GC12 were observed according to genotype (r50.76, P50.24) or gene (r520.43, P50.22), indicating that patterns of base composition are most probably not the result of mutational pressure, as the effects are not present at all codon positions. Secondly, GC3 was plotted against N C for all genotypes and genes (Fig. 2d), as well as according to genotype (Fig. 2e) and gene (Fig. 2f). For each plot, a curve was drawn representing the expected codon usage if G+C compositional constraints alone were A. J. Schuh, R. B. Tesh and A. D. T. Barrett responsible for the codon usage bias. In the plot of GC3 versus N C over all genotypes and gene, the vast majority of points lay under the curve, whilst a few points lay on the curve and above the curve (Fig. 2d). When GC3 was plotted against N C according to genotype, all points were positioned slightly below the curve, suggesting that the actual codon usage indices were close to the values expected from their G+C content and, furthermore, that no difference exists among the four genotypes of JEV (Fig.  2e). However, in the plot of GC3 versus N C according to gene, some points overlapped the curve (NS2B and prM genes), whilst others lay further away (NS2A gene), suggesting that base composition as well as other factors may contribute to the observed codon usage bias (Fig. 2f).

DISCUSSION
The known geographical range of JEV activity was historically limited to East and South-east Asia and Indonesia. Yet, in recent years, the range of virus activity has expanded west into Pakistan (Igarashi et al., 1994) and   south-east into Australia (Hanna et al., 1996). In April of 1995, an outbreak of three human cases resulting in two fatalities occurred on Badu Island, which is located in the Torres Strait between mainland Queensland, Australia, and Papua New Guinea (Anon., 1995). Subsequent investigations revealed that the outbreak was caused by GII of the virus (Hanna et al., 1996) and was probably introduced from Papua New Guinea by wind-blown mosquitoes or migratory birds (Johansen et al., 2000). GII strains continued to be isolated from islands located in Australia's Torres Strait until 1998 (Pyke et al., 2001), and a single GII strain was isolated in Malaysia in 1999 (CNS138-11, Malaysia, 1999) (Solomon et al., 2003). However, in January 2000, a new genotype of JEV (GI) emerged in the Torres Strait (Pyke et al., 2001). As GII of JEV has not been isolated since 1999, it remains unknown whether this genotype is still circulating or has become extinct. It is important to note that, following the collection of the prototype strain of GI of JEV in 1967 (M859/Cambodia/1967/Mosquito), another GI strain was not isolated until 1979, and this genotype has recently replaced GIII as the most frequently isolated genotype throughout a number of Asian countries (Nga et al., 2004;Nitatpattana et al., 2008). To understand better the genetic variation, phylogenetic relationships and evolution among strains of JEV, the nucleotide sequences of the ORF of three GII strains collected between 1951 and 1978 were determined and compared with sequence information derived from a strain collected during the 1995 JEV outbreak in Australia (FU, Australia, 1995), as well as with strains representing the other three genotypes of the virus.
It is noteworthy that West Nile virus (WNV), a closely related mosquito-borne flavivirus within the Japanese encephalitis serocomplex, also exists in endemic (Africa, Australia and Papua New Guinea) and epidemic (Europe) transmission cycles, as does JEV. GI and GIII of JEV typically occur in epidemic regions (i.e. north of southern Thailand), whilst GII and GIV occur mostly in endemic regions (i.e. south of southern Thailand) (Solomon et al., 2003). The phylogeny presented here illustrates that, although GIII viruses have been introduced several times into the same locations (e.g. Taiwan), they are also maintained year to year, suggesting overwintering of the virus [e.g. Taiwan, 1958-1965 (HV1, TC andTL strains) and Taiwan, 1990-1997]. The phylogenetic tree also illustrated that all four of the Indian strains [GP78 (1978[GP78 ( ), 014178 (2001[GP78 ( ), 04940-4 (2002 and 57434 (2005)] grouped together within GIII, suggesting that JEV may have been introduced successfully into India only once and maintained there for at least 27 years. Furthermore, to the best of our knowledge, no other genotypes of the virus have ever been isolated from India. However, relatively few strains of JEV from India have been sequenced and this assertion may need to be revisited once additional nucleotide sequence data from this country become available. Interestingly, like JEV, WNV appears to have been introduced successfully into India only once  Genetic characterization of GII strains of JEV (May et al., 2011). It remains elusive why both JEV and WNV have been successfully introduced into India on only one occasion.
Nucleotide and amino acid pairwise comparisons over the ORF, as well as phylogenetic analyses, revealed that GII was the most similar to GI. Interestingly, GI has been isolated in East Asia, South-east Asia, Indonesia and northern Australia, whereas GII has been isolated on only two occasions in geographical areas where Japanese encephalitis is considered to be epidemic, i.e. Korea (Bennett strain, c.1951) and Australia (Badu island, FU strain, 1995). JEV utilizes a variety of vector species and hosts; geographical differences in vector or host availability could explain why the geographical range of GII of JEV is/was limited to tropical regions where JEV is endemic. Furthermore, it remains unclear whether GI has replaced GII in Australia and Papua New Guinea or whether both virus genotypes are circulating. To the best of our knowledge, no JEV isolates from these countries have been found or sequenced recently.
Correlation of the topology of the phylogeny with amino acid alignments of the JEV strains further illustrated the close evolutionary relationship between GI and GII of the virus; GI and GII shared 27 node-defining amino acid substitutions, whilst GII possessed 23 genotype-specific substitutions and GI had 21 genotype-specific substitutions. The low magnitude of genetic variation among the four geospatiotemporally distributed GII strains of JEV is consistent with the other three genotypes and indicates that the overall genomic tolerance for mutation is minimal. The NS2B protein was conserved among all GII strains. The central region of the NS2B protein of flaviviruses is a cofactor of the serine protease of the NS3 protein and is thus necessary for the activation of this protease (Mastrangelo et al., 2007).
Selection analyses revealed that JEV strains are under predominantly purifying selection, which results in virusencoded proteins being conserved over time due to selective pressure against deleterious variants. Evidence of strong positive selection was detected at aa 24 of the NS4B protein in the ORF nucleotide sequence file by three of the four detection methods and in the NS4B gene nucleotide sequence files by two of the four detection methods. This signifies that mutations at this amino acid over the evolutionary history of JEV have tended to be beneficial, rather than neutral or deleterious. The NS4B protein of flaviviruses has been found to inhibit the alpha/beta interferon (IFN-a/b) signalling cascade at the level of signal transducer and activator of transcription (STAT) phosphorylation, suggesting that amino acid mutations at aa 24 of this protein may be important in the antagonization of the host IFN response to the virus (Muñoz-Jordán et al., 2003. Aa 22 and 24 of the WNV NS4B protein, which are analogous to JEV residues 22 and 24, respectively, have been shown to control IFN-a/b resistance in HeLa cells expressing subgenomic replicons lacking the structural genes, although no effect was shown on the expression of full-length infectious genomes (Evans & Seeger, 2007). Although potentially interesting, it should be noted that the amino acids at residues 22 and 24 of the NS4B protein are different for WNV and JEV, so extrapolation must be made with care.
The results of the overall codon usage analysis revealed that patterns of base composition may be the result of mutational pressure; however, when the data were parsed according to genotype and gene, we found that patterns of base composition were probably not the result of mutational pressure as the effects were not present at all codon positions. Also, we found little variance in the extent of codon usage bias among the genotypes of JEV; however, individual genes of the virus varied to a small degree in their extent of codon usage bias.
In conclusion, this study elucidated the genetic variation, phylogenetic relationships, operation of selection and patterns of codon usage among strains of JEV, including three newly sequenced strains of GII that were temporally isolated in geographically distinct regions. Compared with other flaviviruses such as yellow fever virus, the nucleotide Table 5. GC12, GC3 and N C values according to genotype and gene  Genetic characterization of GII strains of JEV viruses at the University of Texas Medical Branch, Galveston, TX, USA (Table 1).
Cell culture, virus growth, RNA extraction, RT-PCR, DNA purification and sequencing. Viruses were inoculated onto confluent monolayers of Aedes albopictus C6/36 mosquito cells maintained at 28 uC in Dulbecco's modified essential medium containing 2 % FBS supplemented with 5 % tryptose phosphate buffer. Culture supernatant was harvested between 4 and 7 days postinoculation. Viral RNA was extracted from the culture supernatant using a QIAamp Viral RNA Mini kit (Qiagen). The entire ORF (10 296 nt) of the viruses was amplified using the Titan One Tube RT-PCR System (Roche) and overlapping primer pairs (see Supplementary Table S3, available in JGV Online). We did not amplify the UTRs, as there is extensive variation within the 39UTR of JEV strains, at least some of which seems to depend on the passage history of the viruses, whilst the 59UTR is highly conserved among strains (Thurner et al., 2004) and cannot be used for phylogenetic analyses. The PCR products were excised from an ethidium bromidestained agarose gel and purified using a QIAquick Gel Extraction kit (Qiagen). Purified DNA was sequenced directly by standard methods at the Recombinant DNA Laboratory at the University of Texas Medical Branch at Galveston, TX, USA, using the PCR primers and sequencing primers (Supplementary Table S3).
Phylogenetic and sequence analyses. The nucleotide sequences of the three viruses were assembled and edited using ContigExpress (Vector NTI; Invitrogen) and deposited in GenBank. Newly generated JEV ORF sequences, as well as all available ORF sequences derived from wild-type, time-stamped strains retrieved from GenBank in September 2009 (29 strains) along with that of MVEV (strain MVE-1-51, used as an outgroup in the phylogenetic analyses) were aligned using AlignX (Vector NTI; Invitrogen) (Table 1). BioEdit (Hall, 1999) was used to adjust the nucleotide sequence alignment manually to ensure that it was consistent with the ORF, to create individual nucleotide sequence alignments according to genotype and gene, to generate a deduced amino acid sequence alignment and to perform nucleotide and amino acid pairwise comparisons.
Phylogenetic relationships were inferred using NJ and ML methods in the PHYLIP package (Felsenstein, 1989) and the Bayesian method in MrBayes v3.1.2 (Ronquist & Huelsenbeck, 2003). MODELTEST (Posada & Crandall, 1998) in conjunction with PAUP (Swofford, 1998) was used to identify the best-fit nucleotide substitution model that would be utilized in the phylogenetic analyses; the GTR+I+C 4 model (general time-reversible+proportion of invariable sites+ gamma-distributed rates in four categories, respectively) was chosen from a total of 56 models using three model selection frameworks (hierarchical likelihood ratio tests, Akaike information criterion and Bayesian information criterion). The NJ analysis employed the nucleotide substitution model and F84 distance matrix algorithm, whilst the ML analysis utilized the GTR+I+C 4 nucleotide substitution model in conjunction with a heuristic search algorithm; the robustness of the NJ phylogeny was evaluated by bootstrap resampling with 1000 replicates, whilst the robustness of the ML phylogeny was evaluated by bootstrap resampling with 100 replicates. The Bayesian method used the GTR+I+C 4 nucleotide substitution model; the nucleotide data were partitioned by first, second and third codon positions separately and the substitution parameters were allowed to vary across partitions. The analysis used three hot chains and one cold chain, and ran twice for 1.6 million generations with 10 % burn-in; the 'sump' command and Tracer v1.4.1 (Rambaut & Drummond, 2005) were used to ensure model convergence. The robustness of the Bayesian phylogeny was evaluated by determining posterior probabilities for each of the nodes within the tree.
Recombination and selection analyses. The presence of recombination in a nucleotide sequence alignment can confound attempts to estimate selection pressures; therefore, recombination analyses were performed as a prerequisite to the selection analyses. Recombination among the sequences was analysed using the RDP (Martin & Rybicki, 2000), GENECONV (Padidam et al., 1999), Chimaera (Posada & Crandall, 2001), MaxChi (Smith, 1992) and Bootscan (Martin et al., 2005a) methods implemented in the RDP3 vb41 program (Martin et al., 2005b). Common program settings for all methods were to perceive sequences as linear, to require phylogenetic evidence, to refine breakpoints and to check alignment consistency. The highest acceptable P value was set at 0.05, after considering Bonferroni correction for multiple comparisons. All method-specific program settings remained at their default values. Only recombination events that were identified by at least two methods were considered as potential recombination events. The breakpoint positions and recombinant sequence inferred for every detected potential recombination event were verified manually using phylogenetic and recombination signal analysis features in RDP3.
Evidence of non-neutral selection was examined by calculating the d N /d S ratio using a NJ phylogeny and the general reversible nucleotide substitution model available through the Datamonkey web server . Four methods were used to detect non-neutral selection: single-likelihood ancestor counting (SLAC), FEL, REL and IFEL (Kosakovsky . A set of 44 nt sequence alignment files that were parsed according to protein and genotype were used in the analyses. Codon usage analysis. The analysis of codon usage patterns was performed on a set of 44 nt sequence alignment files, which were created by dividing the ORF dataset according to genotype and gene, using the CodonW software package (http://codonw.sourceforge.net/). GC12, GC3 and N C were calculated for each virus in all of the datasets and the means determined. The reported value of N C is always between 20 (when only one codon is used for each amino acid) and 61 (when all codons are used equally). The relationship between GC3 and GC12 was examined to determine the relative effects of mutation pressure versus natural selection on codon composition. Pearson's r correlation coefficient was used to measure the linear relationship between the two interval variables, GC12 and GC3, by employing the PASW Statistics 18 software program. To examine the influence of G+C content on codon usage, the relationship of N C and GC3 was plotted. This was compared with the N C value that would result if G+C content were solely responsible for the codon biases, calculated as N C 52+GC3+(29/[(GC3) 2 +(12GC3) 2 ]) (Wright & Bibb, 1992).