Molecular Evolution of Peste des Petits Ruminants Virus

Sequence data will increase understanding of virus evolution, adaptability, and pathogenicity.

with PPRV (9). The huge effect on small ruminant production has resulted in PPRV emerging as a global animal health concern.
The molecular epidemiology of PPRV, which is based on sequence comparison of a small region of the fusion (F) gene (322 nt) or the nucleoprotein (N) gene (255 nt), has identified 4 distinct lineages (I-IV) of PPRV (2). However, this analysis has not generated much information on the evolution and dispersal of PPRV lineages. Lineage I PPRV had gone undetected for 19 years being detected in Senegal in 1994. Lineage IV PPRV, which was believed initially to be confined to India and the Middle East, now has a wider geographic presence and appears to be evolving rapidly. Many aspects of PPRV evolution, such as ancestral virus location, divergence and time of origin, and historical and geographic patterns of spread, are poorly understood (10). A better understanding of the evolution of PPRV would enable prediction of how these viruses will lead to further outbreaks and epidemics and provide data for control strategies.
Advanced sequencing technologies have enabled molecular epidemiologic studies of viruses in which whole gene and complete genome data are used to enhance and clarify the evolutionary dynamics of viral infectious disease (11). We analyzed genome data for all 4 lineages of PPRV. This analysis will enable a more precise evolutionary and phylogenetic assessment of the relationships between lineages by reducing the associated estimation errors and increased higher confidence in estimates.

Complete Genome Sequencing of PPRV
Complete genome sequencing of 7 PPRV isolates (4 from lineage III and 3 from lineage IV) was performed according to the methods described by Muniraju et al. (12). Detailed information for each of the isolates is shown in Table 1.

Sequence Datasets
In addition to the 7 complete genomes sequences of PPRV generated in this study, another 7 complete genome sequences were obtained from GenBank (Table 1). However, of these 14 full genome sequences, Nigeria 1975/1 and Sungri 1996 represent vaccine strains generated after extensive serial passage of virus. Therefore, the evolutionary rate and time to most recent common ancestor (TMRCA) were compared with and without inclusion of vaccine strains. The complete genome sequences of 2 clinical isolates each from RPV (GenBank accession nos. AB547189 and X98291) and MV (accession nos. AF266288 and JF791787) and 12 PPRV isolates, excluding vaccine strains, (Table 1) were used for estimation of evolutionary rate and TMRCA. Furthermore, the coding and noncoding sequences of individual structural genes of PPRV (excluding vaccine strains) were used in this study.
Partial N gene sequences of PPRV (nucleotide positions 1253-1507) that have a detailed history of collection date and place were obtained from GenBank (available up to August 2013). These partial sequences were aligned by using the ClustalW algorithm in BioEdit software v7.2.0. (21) and edited to remove unreliable sequences/regions. Furthermore, the identical sequences originating from the same geographic location, host, and year were excluded to avoid redundancy in subsequent analysis. The final dataset (partial N gene) contained 159 sequences sampled over a period of 45 years .

Selection Analysis
The nucleotide and amino acid sequence differences between the PPRV lineages for 12 complete genome sequences were estimated by using BioEdit software v7.2.0. Analyses of selection pressure in individual PPRV genes was performed by obtaining mean ratios of nonsynonymous (dN) to synonymous (dS) substitutions per site. The dN/dS was calculated by using codon-based  (20) maximum likelihood approaches with the single-likelihood ancestor method implemented in hypothesis testing using the phylogenies package (22) (http://www. datamonkey.org).

Bayesian Time-Scaled Phylogenetic Analysis
Molecular evolutionary rate and divergence times were co-estimated. A Bayesian maximum clade credibility (MCC) phylogenetic tree was constructed by using Bayesian Markov chain Monte Carlo (MCMC) analysis and Bayesian evolutionary analysis sampling trees (BEAST) software package v1.8.0 (23), and BEAST runs were performed by using the CIPRES Science Gateway (24). For each sequence dataset, the best-fit nucleotide substitution model was determined on the basis of Akaike information criterion scores using JModel Test software v2.1.4 (25). An input file for BEAST analysis was obtained by using Bayesian evolutionary analysis utility software v1.8.0, in which sequences were tip dated according to the year of collection. Four molecular clock models (strict, uncorrelated lognormal distribution, uncorrelated exponential distribution [UCED], and random) were tested alongside different demographic models (nonparametric Bayesian skyline plot and the parametric constant and exponential growth), and the best models were selected by means of a Bayes factor (BF) test (26) using marginal likelihoods values (2lnBF>2) obtained from Tracer v1.5 software (http:// beast.bio.ed.ac.uk/tracer).
For each analysis, 2 independent MCMC chains were run to get a final output of 10,000 trees (ESS >200 for all the parameters estimated) and were assessed for their proper mixing, convergence, and consistency by Tracer v1.5 with 10% burn in. The 2 individual runs were combined by using LogCombiner v1.8.0 in the BEAST software package. The nucleotide substitution rate (substitutions/site/year) and the TMRCA (year) values were obtained from Tracer v1.5. The posterior tree distributions were summarized by using TreeAnnotator (http://beast.bio.ed.ac.uk/treeannotator) and exclusion of the first 10% of the trees as burn in. Phylogenetic MCC tree with median node heights were visualized in FigTree software v1.4.0 (http://www.molecularevolution.org/software/phylogenetics/figtree). Furthermore, the demographic history of PPRV was studied by using partial N gene dataset and less restrictive Bayesian skyline plot (BSP) models in which the changing profile of genetic diversity is plotted against time.

Phylogeographic Reconstruction
Bayesian phylogeographic analysis was performed by using complete PPRV genome sequence and partial N gene sequence datasets, and isolates were annotated according to their location (longitude and latitude). Partial N gene data were chosen instead of F gene data because of increased divergence reported for the N gene (2). For complete genome datasets, sequences from 14 viruses were considered, including 2 vaccine strains (Nigeria 1975/1 andSungri 1996) to represent all PPRV-endemic areas. Phylogeographic diffusion along the posterior sets of trees and relationships between these locations were identified by using the Bayesian stochastic search variable selection procedure in BEAST v1.8.0 (27). Discrete phylogeographic analysis was performed by using the continuous time Markov chain with the flexible Bayesian skyride tree prior.

Sequence Analysis
All 7 PPRV complete genomes are 15,948 nt and conform to the rule of 6 as described for all other morbillivirus genomes (28). The genome organization of the isolates was the same as that of other PPRV strains. Phylogenetic analysis of the complete genome sequences of PPRV clustered the sequences into 4 lineages. The complete genomes of PPRV isolates from Ethiopia 1994, Oman 1983, UAE 1986, and Uganda 2012 sequenced in this study belonged to lineage III and the isolates Sungri 1996, Morocco 2008, and Ethiopia 2010 belong to lineage IV. Comparison of the 12 aligned complete genome sequences showed that nucleotide differences ranged from 0.1% to 11.9%, and amino acid differences ranged from 0.1% to 7.2% ( Table 2).
The dN/dS for coding regions of the various genes of PPRV (n = 12) for all 4 lineages ranged from 0.06 to 0.45 (Table 3). The dN/dS per site across the coding region of different genes of PPRV genome are shown in Figure 1. The highest dN/dS ratio was observed in the phosphoprotein gene, followed by the hemagglutinin, N, F, large polymerase, and matrix (M) genes. The relative nucleotide substitution rates at all 3 codon positions of the structural genes of PPRV showed that substitutions were more frequent at the third codon position (Table 3) as expected.

Evolutionary Rate Estimates
Complete genome sequences of 12 PPRV and partial N gene dataset (n = 159) were analyzed by using the coalescent-based Bayesian MCMC approach. The general time-reversible nucleotide substitution model with a gamma distribution for rate variation was selected on the basis of Akaike information criterion scores. Bayes factor test with marginal likelihood comparisons showed that the relaxed UCED clock model best fitted the PPRV complete genome and partial N gene datasets ( Table 4). The 2lnBF value was >78 between UCED and strict clocks and 2-6 between UCED/uncorrelated lognormal distribution and UCED/ random clocks, which provided strong evidence for the UCED clock model. There was no difference between different demographic models compared within the UCED clock model (2lnBF <2). However, the exponential demographic model was chosen because it provided a narrow margin of 95% highest posterior density (HPD) estimates.
Accordingly, the UCED and exponential growth model have been directly used for the individual PPRV gene dataset and the PPRV/RPV/MV complete genome dataset to estimate the TMRCA and substitution rate per site per year. When we used the UCED and exponential growth models, we found that the mean evolutionary substitution rate of the PPRV complete genome was estimated to be 9.09 × 10 -4 (95% HPD 2.13 × 10 -4 -1.64 ×10 -3 ). When 2 complete genome sequences of vaccine strains were added into this analysis, the same models (general time-reversible nucleotide substitution model with a gamma distribution, UCED, and the exponential growth demographic models) were best fitted, and the mean substitution rate/site/year was reduced to 7.86 × 10 -4 (95% HPD 2.17 × 10 -4 -1.4 ×10 -3 ). Furthermore, the evolutionary nucleotide substitution rate for combined PPRV/RPV/MV complete genomes was 1.89 × 10 -3 (95% HPD 5.55 × 10 -4 -3.31 × 10 -3 ). Analysis of individual genes of the PPRV coding region dataset, coding and noncoding region datasets, and partial N gene dataset are shown in Table 4.

Temporal Dynamics
A Bayesian time-scaled MCC tree based on complete PPRV genomes was constructed ( Figure 2) by using the UCED model with exponential growth demography. The estimated median TMRCA of PPRV for all 4 lineages and divergence of lineage III PPRV were found to be ≈1904 (95% HPD 1730-1966). Lineage I diverged in ≈1939 (95% Results of TMRCA analysis using complete coding and coding and noncoding regions of individual PPRV genes are shown in Table 4. If one considers coding and noncoding sequences of individual genes in the analysis, a difference in TMRCA was found only for the M gene (i.e., 1944, 95% HPD 1879-1973). The TMRCA of PPRV/RPV/ MV was estimated to be ≈1616 (95% HPD 1072-1859), and the TMRCA for PPRV was estimated to be 1931 (95% HPD 1858-1956) (Figure 3).

Population Demography of PPRV
The demographic history of PPRV was investigated by using the partial N gene sequence dataset according to the BSP method implemented in BEAST. The BSP with an assumed piecewise-constant model has facilitated estimation of effective population size through time. The BSP  showed that the population did not show much genetic diversity (effective number of infections) until the mid-1990s when diversity started to increase. Toward the first decade of the 21st century, the population size appeared to reach a peak and then showed a small decrease until the most recent sampling in 2012 (Figure 4). The HPD interval size for the plot is narrow, which indicates strong support for this population trend.

Phylogeographic Analysis
To estimate the geographic origin of PPRV, we summarized the results of Bayesian phylogeographic analyses by visualizing the annotated MCC tree ( Figure 5). The complete genome sequence data used in this analysis incorporated all 14 isolates, including the vaccine strains, from 10 discrete locations so as not to leave out any reported virus-endemic area. The root state posterior probabilities for all the locations ranged between 9.02% and 12.69%; Nigeria and the Ivory Coast (now Côte d'Ivoire) receiving marginally higher support, 12.69% and 10.53%, respectively, than the rest of the locations ( Figure 5).
Because the geographic origin of PPRV could not be localized to a single country by using 14 complete genome sequences, further phylogeographic analysis was performed by using 159 partial N gene sequences collected from 30 locations during 1968-2012. The root state posterior probabilities of PPRV ranged from 0.11% to 17.20%, and Nigeria (17.20%), Ghana (14.28%), and Sierra Leone (11.68%) showed the highest marginal support ( Figure 6). The highest marginal support of root state posterior probabilities indicated that the geographic origin of lineage I PPRV was Senegal (27.44%), that of lineage II PPRV was Nigeria (27.00%), that of lineage III PPRV was Sudan (30.73%), and that of lineage IV PPRV was India (36.00%).

Discussion
We sequenced complete genomes of 4 lineage III and 3 lineage IV isolates of PPRV. We used these genomes and other available genomes to assess the evolutionary substitution rate, TMRCA, and divergence of PPRV lineages and the geographic origin of PPRV.
The measure of selective pressures acting across the PPRV genome showed only purifying (stabilizing) selection occurring across the genome and no evidence of positive selection. The conservation of amino acid residues was further confirmed by the fact that the relative substitution rates at the third codon position of all the genes were higher than those for the first and second codon positions. The observed upper limit of 11.9% nt divergence (7.2% aa divergence) among PPRVs is consistent with the low level of antigenic divergence observed because despite lineage differentiation, only a single serotype exists for PPRV. Homologous recombination events are generally rare or absent in negative-sense RNA viruses (30) and thus could not have been evaluated in this study.
From a genetic perspective, substitution rates are critical parameters for understanding virus evolution, given that restrictions in genetic variation within a population of viruses can lead to lower adaptability and pathogenicity (31). Our analyses estimated a range of PPRV nucleotide substitution rates throughout the complete genome of 1.64 × 10 -3 -2.13 × 10 -4 substitutions/site/year, which is similar to that predicted for other paramyxoviruses (10 -3 -10 -4 substitutions/site/year) (32)(33)(34)(35). Despite low levels of antigenic divergence, as shown by existence of a single serotype, genome plasticity of PPRV might explain its ability to emerge and adapt in new geographic regions and hosts, as reported extensively across vast areas in recent years. The TMRCA of PPRV obtained from complete genome sequence was estimated to be during 1904 (95% HPD . Similarly, the estimated TMRCA obtained from individual gene sequence, partial N gene sequence of PPRV, and combined PPRV/RPV/MV complete genome sequences was during 1910-1944. That the predicted TMRCA for PPRV was during the early 20th century is reasonable because the first recorded description of PPRV was made in 1942 (36). The delay of a few decades before identification of PPRV as a distinct viral entity after its initial detection can likely be attributed to confusion in differentiation between PPRV and RPV, a virus for which extensive cross-neutralization is observed after vaccination and natural infection, and lack of differentiating diagnostic tools. Substitution rates were consistent across each gene for PPRV. However, greater substitution rates were observed in the GC rich regions of the F and M genes. Similarly, the substitution rate was greater, as predicted because of the variability seen at the nucleotide level, in the highly variable region of the N gene sequence (255 nt) for PPRV. The TMRCA estimation was not possible for lineage I and II viruses (vaccine strain was omitted) because only 1 complete genome sequence was available for each lineage. Therefore, more complete genome sequences are required to study evolutionary and phylogenetic relationships for these lineages.
Biased estimates in substitution rate and TMRCA were observed by using datasets that included tissue culture-passaged, attenuated vaccine strain complete genome sequences, in which slower evolutionary substitution rates and earlier TMRCA were predicted. Similar observations were reported for PPRV/RPV/MV N gene sequence analyses, in which a slower and biased nucleotide substitution rate was observed when vaccine strain sequences (33) were included in the analysis and faster substitution rates and later TMRCA predictions were suggested when vaccine strain sequence data were excluded (34).
Spatial and temporal dynamics of RNA viruses are often reflected by their phylogenetic structure (37). Potential divergence events for different PPRV lineages were inferred by using rooted, time-measured phylogenetic trees with higher confidence from the PPRV complete genome sequence dataset. The inferred phylogeny supports the initial divergence of lineage III isolates, followed by lineage I isolates; lineage II and IV isolates were predicted to have diverged from each other at a later time. The inference of divergence events presented facilitated a better understanding of historical divergence of PPRV and offered further opportunities to study viral demographic history and dispersal events.
The demographic analysis of PPRV with the BSP indicated historically constant genetic variability of PPRV over time. This finding could be a reflection of the use of RPV vaccine in small ruminants to protect animals against PPRV through the 1990s, which might have affected the evolution and spread of PPRV. In the early 21st century, genetic diversity of PPRV has gradually increased, which reflects frequent outbreak reports. The increased genetic diversity may be a driver for selection pressures within individual lineages and might result in extinction events, as suggested by an absence of lineage I virus. In recent years, as efforts have increased to actively control and eradicate PPRV, a decrease in genetic diversity has been observed.
Phylogeographic reconstruction with spatial and temporal information of virus isolates has enabled an understanding of the historic emergence and dispersal patterns involved in virus evolution (38). Although PPRV existed earlier than its first description in Ivory Coast in 1942 (39), PPRV was later reported in Senegal, Chad, Togo, Benin, Ghana, Nigeria, Oman, Sudan, Saudi Arabia, India, Jordan, Israel, Ethiopia, Kenya, Uganda, and Pakistan (40). Our phylogeographic analysis indicated that Nigeria was the geographic origin of the most recent common ancestor of PPRV because of the highest root location state probability. Furthermore, geographic origins of the most recent common ancestor of PPRV lineages I, II, and III were predicted to be across Africa; lineage IV likely emerged in India. In conclusion, these findings suggest that the origin of PPRV was in western Africa, which then spread to eastern Africa, the Middle East, and Asia. However, although these predictions are suggestive of a potential origin for PPRV, caution must be exercised in their interpretation because estimates of geographic origin rely on available datasets, and these datasets need enhancing to provide greater confidence for phylogenetic assessment. As more sequence data become available for PPRV and the other morbilliviruses, ancestral origins of each virus and intraspecies differentiation might become more clear.