Analysis of molecular evolution of nucleocapsid protein in Newcastle disease virus

The present study investigated the molecular evolution of nucleocapsid protein (NP) in different Newcastle disease virus (NDV) genotypes. The evolutionary timescale and rate were estimated using the Bayesian Markov chain Monte Carlo (MCMC) method. The p-distance, Bayesian skyline plot (BSP), and positively selected sites were also analyzed. The MCMC tree indicated that NDV diverged about 250 years ago with a rapid evolution rate (1.059 × 10−2 substitutions/site/year) and that different NDV genotypes formed three lineages. The p-distance results reflected the great genetic diversity of NDV. BSP analysis suggested that the effective population size of NDV has been increasing since 2000 and that the basic reproductive number (R0) of NDV ranged from 1.003 to 1.006. The abundance of negatively selected sites in the NP and the mean dN/dS value of 0.07 indicated that the NP of NDV may have undergone purifying selection. However, the predicted positively selected site at position 370 was located in the known effective epitopic region of the NP. In conclusion, although NDV evolved at a high rate and showed great genetic diversity, the structure and function of the NP had been well conserved. However, R0>1 suggests that NDV might have been causing an epidemic since the time of radiation.


INTRODUCTION
Newcastle disease virus (NDV) is responsible for virulent diseases in birds, thereby causing great economic losses on poultry industry [1]. NDV belongs to the Avulavirus genus, Paramyxoviridae family [2]. The paramyxoviruses have been classified into 10 subtypes, APMV-1-APMV-10 [3], and NDV belongs to APMV-1 [4]. The main structure of NDV consists of the large protein (L), the hemagglutinin-neuraminidase (HN) protein, the fusion protein (F), the matrix protein (M), the phosphoprotein (P), and the nucleocapsid protein (NP) [5]. Given their low virulence, lentogenic NDV strains cause only mild respiratory or enteric infections. By contrast, velogenic NDV strains usually cause high mortality in birds. Mesogenic isolates are intermediate-virulence NDV strains that primarily cause respiratory disease [6].
The NP primarily regulates RNA transcription, replication, and assembly. Mebatsion et al. identified that an immuno-dominant epitope on NP can be replaced or deleted by a foreign epitope. A mutation-permissive region identification on NP gene provides a potential approach to www.impactjournals.com/oncotarget/ Oncotarget, 2017, Vol. 8, (No. 57), pp: 97127-97136

Research Paper
insert protective epitopes. This may be explored as target to design NDV vaccines [7]. Latest studies have reported that the NP of NDV is an important antigen in serologic assays because of its highly conserved sequences and high immunogenicity [8]. However, the molecular evolution of NP gene in NDV and whether minor antigenic changes occur in this gene remain to be elucidated. Therefore, we conducted a detailed evolutionary analysis of NP gene and predicted amino acid changes in NP three dimensional structure to provide a theoretical basis for the development of NP-based NDV vaccines.
Commonly, the genotypes and subtypes of NDV are divided into two classes, including class I (genotype I, lentogenic) and class II (contains at least genotypes II-XI, mesogenic or velogenic). Class I viruses are mainly isolated from shorebirds and waterfowl. While class II viruses are typically found spreading in poultry species and wild birds, and they have been divided into at least 10 genotypes (II-XI) [9]. Notably, NDV may evolve into virulent strain through the natural transmission between different hosts and induce major outbreaks [10]. The host shift and virulence change of NDV are closely relatedly to its evolution rate, selection pressure, and phylogenesis [11]. The pathogen's basic reproductive number (R 0 ) of a pathogen is an important epidemiological parameter that summarizes the transmission potential of a disease in a given population [12]. Despite these findings, the molecular evolution of NP gene based on genotype, evolution, and virulence has not been studied. In this study, we estimated the evolution rate, population dynamics, and selective pressure of the NP in different genotypes of NDV retrieved from public databases (characterized in our study) by the Bayesian Markov Chain Monte Carlo (MCMC) method. MCMC is a powerful algorithm for computing integrals efficiently in kinds of dimensions with one constant factor. The factor is used without parameter estimation. Therefore, MCMC offers a powerful means to compute the posterior probability density function for each model parameter in Bayesian parameter estimation [13].

Genetic diversity analysis of the NDV strains
The collected sequences were easily divided into different genetic groups: class I (lentogenic strains) and class II (velogenic or mesogenic strains). In order to estimate the genetic distance among these NDV strains, we analyzed their pairwise distance value (p-distance). Results showed that the average p-distances within classes I and II were 0.4 and 5.09, respectively. These values were significantly lower than that between the two groups, i.e., roughly 10.79 ( Figure 3). These results suggest that the NDV NP gene may be undergo great genetic divergence.

Estimation of positive and negative selection sites of the NDV NP gene
The positively selected sites in NP gene of NDV were comprehensively estimated using following methods: fixed effects likelihood (FEL), conservative single likelihood ancestor counting (SLAC), mixed effects model of evolution (MEME), and internal fixed effects likelihood (IFEL). The mean dN/dS (non-synonymous rate/ synonymous rate) value was estimated using the MEME method. Tables 1 and 2 show that only three positive selection sites were predicted using the FEL method and seven positive selection sites were found by the MEME method. Furthermore, all three methods indicated that most sites on the NP gene were under negative selection. A site was accepted as negative or positive selection sites if it was estimated by two methods simultaneously. Therefore, the sites at positions 43, 133, and 370 were considered positively selected. Interestingly, the site at position 370 was located in the identified epitope (aa335-aa489) region ( Figure 4) [7]. The estimated mean dN/dS value of the NP gene was 0.07 (95% confidence interval 0.068-0.072). These results support that the NP protein of NDV may have undergone purifying selection due to structural and functional constraints.

Population dynamic analysis of present NDV strains
The population dynamics of present NDV strains based on NP gene was assessed by Bayesian skyline plot (BSP) analysis ( Figure 5). Results showed that the effective population size of NDV infection number has kept relatively stable before the early 1980s but undergone a biphasic exponential growth from the 1980s to 2000s. There was one peak around the year 2000. Besides, the estimation of mean exponential growth rate was 0.144 (95% HPD 0.03-0.17) and the epidemic doubling time was about 4.8 years (credibility limits 4.0-23.1). The calculation of R 0 for different NDV durations with rational carrier infectiousness was 1.003 for only 0.3 months' duration to more than 1.006 for 0.5 months' duration.

DISCUSSION
Since the first isolation, NDV strains have been segregated into two classes that are characterized by polymorphism mainly restricted to the HN genes. However, further studies of other proteins are warranted to clarify NDV phylogenetics. The NP plays very important role in early stage of NDV life cycle. This gene is involved in the viral infection and replication so it undertakes   important biologic functions. Previous studies indicated that NP could effectively induce NDV-specific antibody production in chickens [7]. Therefore, we analyzed molecular evolution of this highly conserved gene of NDV in the known genotypes (I-XI).
Results of time-scaled evolutionary tree showed that NDV diverged approximately 250 years ago. The first division of present NDV genotypes into three lineages was dating back to 1926 (95% HPD 1910(95% HPD -1942 2). According to previous reports, there were three pandemics of NDV and led to enormous economic losses in the world. The first panzootic started in Southeast Asia in the mid-1920s and it took about 30 years to spread to other regions [14]. The second panzootic of NDV might also originate in Asia, however it took less years than the first panzootic to conquer the same territory in the 1960s [15]. It was mainly introduced from Indonesia to the United States and Europe. Straight after, the third NDV panzootic broke out in England in the 1970s and affected chickens and racing pigeons [16]. Furthermore, Toyoda et al. [17] screened over 200 NDV strains and compared physical maps of these strains. They found that these NDV isolates, circulating before the 1970s, could be divided into three major genetic groups. The present NDV strains were estimated to evolve rapidly with mean evolutionary rate of 1.059 × 10 −2 substitutions/site/year by Bayesian MCMC method (95% HPD:4.187×10 −3 -1.74×10 −2 ). The value is as high as that of highly variable RNA viruses. In a previous estimation of the evolutionary rates of NDV, Miller [18] reported that the rates for the F genomes of low virulence are 2.28 × 10 − 4 (strict) and 2.92 × 10 − 4 (relaxed) while the rates for virulent F genomes are 1.32 × 10 − 3 (strict) and 1.70 × 10 − 3 (relaxed). The high evolutionary rates may accelerate for viruses containing the virulent phenotypes. In addition, the average p-distances within class I (lentogenic strains) and class II (velogenic or mesogenic strains) were 0.4 and 5.09, respectively. The intergenogroup p-distance value was 10.79, which indicates that NDV had a wide genetic divergence ( Figure  3). In fact, class I and class II are mainly divisions of NDV strains in the world and these promote extensive genetic diversities among poultry. Latest reports showed that nine distributed worldwide genotypes were further divided in waterfowls in class I strains. By contrast, class II strains has comprised 18 genotypes when they are isolated over time [9,19]. Deserve to be mentioned, the typically circulating viruses in poultry species and wild birds belong to class II strains and genotypes IV-VIII are the predominant genotypes in worldwide poultry [20].
We used four methods in the NDV evolutionary selection pressure analysis. Although SLAC method is usually used for large alignments analysis compared with the other three methods, it detects selected sites at external branches of phylogenetic tree. In comparison, IFEL is more intensive for investigating selected sites Commonamino acid changes estimated by 4 methods are indicated in bold type. dN/dS, non-synonymous rate/synonymous rate; CI, confidence interval. Cut-off p-value < 0.05. ○, None positively selected sites.
along internal branches. Furthermore, the two methods take both nonsynonymous and synonymous rate variations into account in estimation process and could be efficiently parallelized [21]. Meanwhile, MEME can estimate episodic selective pressure [22]. The ratio of substitution rates at non-synonymous and synonymous sites is a common criterion to evaluate the evolutionary selection pressures on a protein. In general, if the normalized rate of nonsynonymous substitutions (dN) is greater than the normalized rate of synonymous substitutions (dS), that is dN/dS>1, the analyzed protein is considered under positive selection pressure. Conversely, if dN/dS<1, the protein is considered under negative selection and evolves slowly [23,24]. Negative selection often removes deleterious mutations in biological functional protein to maintain the long-term stability during evolution [25]. The mean dN/dS ratio of 0.07 and a large amount of negative selection sites demonstrated that the NP gene has undergone purifying selection, during which disadvantageous nonsynonymous mutations in NDV have been eliminated from the population (Tables 1 and 2). Miller also reported that value corresponding to the NP gene was 0.074. As a rule, the negative selection pressure on protein may prevent the functional deterioration of viral NP. The NP gene displayed strongly negative selection probably due to NP are involved in NDV replication and this biological function needs to be relatively stable. This phenomenon was also observed in conserved regions of NDV M protein [6]. Although NPs of paramyxoviruses are relatively conserved, the monoclonal antibody reactivity could be detected antigenic differences among isolated NDV strains [8,26]. NP of negative-strand RNA virus possesses strong immunogenic in nature. It has been used for diagnosing NDV, measles, rabies, and vesicular stomatitis virus as the antigen for several years [7]. Generally, a strong antigenicity protein of virus may undergo high levels of selection pressure, which can lead to an abundant of positive selection sites appear in antigenic viral protein [27]. In the present research, three positive selection sites were predicted in the deduced NP protein of NDV. Importantly, the amino Cut-off p-value < 0.05. acid at position 370 was located in the known effective epitope region of the NP (Figure 4). Thus the amino acid substitution at position 370 probably influence the host immunity efficacy against NDV. This phenomenon is in line with other viral systems that the special region of a protein may show higher rate of nonsynonymous substitutions than that of synonymous substitutions regardless of the general dN/dS ratio. Although few positively selected sites were identified for the NDV NP gene and insufficient evidence indicated the amino acid substitutions on effective epitope region substantially improve fitness of NDV in vivo, these analysis results may explain the reason of immunity failure of NDV vaccines. Recombination detection showed no evidence of recombination among present NDV NP strains. Consequently, these nucleotide sequences were used for generating phylogenetic trees to conduct population dynamic analysis. Our demographic analysis showed that NDV grew at a rate of 0.114 year −1 , and the R 0 ranged from 1.003 to more than 1.006 due to different durations of plausible poultry carrier infectiousness of NDV. This result indicates that NDV might have caused epidemics since the time of their radiation. Furthermore, BSP result suggested that the effective population size of NDV infection number kept relatively stable until the early 1980s but underwent biphasic exponential growth from the 1980s to 2000s ( Figure 5). This observation suggested that the number of genetically distinct but related NDV strains was expanding during these time. In fact, a relationship is found between epidemics of NDV and the effective population size. Many studies have reported NDV outbreaks in Brazil, Kazakhstan, and Pakistan since 1998 [28][29][30]. Moreover, the NDV that caused epizootics in many countries including Germany, the Netherlands, Belgium, Italy, and Spain in the late 1980s has been classified into a novel genotype, namely genotype VII. Thus, genotype VII NDV possibly originated from the Far East [31].
In conclusion, the present NDV including at least 11 genotypes formed three different linages due to rapid evolution and wide genetic divergence. The NP is considered relatively conserved among isolates of different virus genotypes. Although most positions were under purifying selection, a few positions located in specific regions were subject to positive selection. Furthermore, NDV has undergone an exponential population expansion, which could cause outbreaks in the host. These findings help us comprehensively understand the molecular evolution of NDV and prevent the spread of NDV by enhanced active surveillance.

Strains and alignments
In this study, sequences of more than 98 % homologies were removed from present dataset. A total of 170 strains were collected.169 full-length sequences coding for the NP of different genotype NDV strains were collected. Furthermore, APMV-2 strain (England 7702, GenBank accession No. HM159993) was added to the dataset as the outgroup. All collected sequences are available in GenBank. Clustal W is used to align all sequences [32]. The nucleotide sequences correspond to positions 1-1752 of the NP gene in NDV strain (Pakistan/2010/Uppsala, GenBank accession No. JN682209). Supplementary Table 1 showed the detailed information of present NDV strains.

Phylogenetic analysis by the bayesian MCMC method
The Bayesian MCMC method in BEAST software v1.7.5 was used to analyze the time-scaled phylogenetic tree and evolutionary rate of NDV [33,34]. The suitable evolutionary model that corresponds with our data was chosen by JModelTest software v2.1.7 [35]. As a result, HKY model was previously selected as the best nucleotide substitution model [36]. In our study, we tested four clock models (random, exponential, lognormal, and strict models) and two demographic models (exponential growth and constant size). Akaike's information criterion (AICM) were calculated to compare the clock and demographic models through MCMC [37] using Tracer software v1.6 (http://beast.community/tracer). After the comparison, we chose the model with the lowest AICM value (Supplementary Table 2). Consequently, the sequences were analyzed using an HKY model under an uncorrelated lognormal relaxed clock and an exponential growth model. R 0 was calculated through the equation R 0 = rD + 1, where r is the exponential growth rate and D is the average durations of rational carrier infectiousness [38]. The D used for present NDV R 0 calculation was 0.3-0.5 month. Then, the relation λ= ln(2)/r was used to calculate the epidemic doubling time [39]. The MCMC chains were run for 10,000,000 steps and sampled every 10,000 steps. The effective sample size (ESS) after a 10% burn-in was considered to assess the convergence by Tracer software v1.6. Only parameters with ESS values of >200 were accepted. LogCombiner (available in BEAST package) combined different BEAST runs from each file after a 10% burn-in. The estimation uncertainty was indicated by showing 95% highest posterior density (HPD) intervals. Tree Annotator software v 1.7.4 generated tree with the maximum clade credibility after a 10% burn-in and FigTree software v1.3.1 was used to view the phylogenetic tree. Moreover, a BSP analysis was proceeded to estimate changes of effective population size of NDV through the time using BEAST software.

P-distance values calculation
In order to analyze the phylogenetic relationships and genotypes definition of NDV, the frequency/p-distance values of the intergenotypes and outgroup were calculated. In total of 170 strains were analyzed using MEGA 6.0 software [40].

Positive and negative selection sites estimation
To comprehensively analyze the selection pressure on the NP of NDV, we calculated nonsynonymous (dN) and synonymous (dS) substitution rates at every codon by four methods: FEL, SLAC, MEME, and IFEL. We employed more than one method for accurate calculations through Datamonkey [41]. Positive selection sites (dN>dS) and negative selection sites (dN<dS) were ascertained by the p-value of < 0.05. The tertiary structure model of the NDV NP was built by SWISS MODEL [42], and the positively selected sites were showed on a tertiary structure. Notable, the Genetic Algorithm Recombination Detection method [43] was employed to detect recombination of used NDV strains through Datamonkey. In consequence, the dataset included nonrecombinant sequences.

Author contributions
J.Z.L. and Y.X.L. designed the study. W.T.F. and Y.L.X, X.N.Z. contributed analytic tools. W.T.F., Y.R.Z. and P.C. analyzed data. J.Z.L. and W.T.F., Y.L.X wrote the paper. Z.Q.C. and P.Z. revised the manuscript. All authors reviewed the results and approved the final version of the manuscript.

FUNDING
The project was supported by the National Key R&D Program (2016YFD0501208, 2016YFD0501007), Shandong Natural Science Foundation of China