Genetic diversity and molecular evolution of Ornithogalum mosaic virus based on the coat protein gene sequence

Ornithogalum mosaic virus (OrMV) has a wide host range and affects the production of a variety of ornamentals. In this study, the coat protein (CP) gene of OrMVwas used to investigate the molecular mechanisms underlying the evolution of this virus. The 36 OrMV isolates fell into two groups which have significant subpopulation differentiation with an FST value of 0.470. One isolate was identified as a recombinant and the other 35 recombination-free isolates could be divided into two major clades under different evolutionary constraints with dN/dS values of 0.055 and 0.028, respectively, indicating a role of purifying selection in the differentiation of OrMV. In addition, the results from analysis of molecular variance (AMOVA) indicated that the effect of host species on the genetic divergence of OrMV is greater than that of geography. Furthermore, OrMV isolates from the genera Ornithogalum, Lachenalia and Diuri tended to group together, indicating that OrMV diversification was maintained, in part, by host-driven adaptation.


INTRODUCTION
RNA viruses, many of which threaten human health or agricultural safety, form measurably evolving populations as a result of their high mutation rate and short generation times. Molecular evolution studies are useful in understanding the molecular bases of the adaptation, geographical expansion, and process of emergence of RNA viruses, which are key to the design of management measures (Lauring & Andino, 2010;Moya et al., 2000).
Ornithogalum mosaic virus (OrMV) is one of the most important viral pathogens of floricultural crops, causing severe leaf symptoms as well as flower deformation of the affected plants (Burger, Brand & Rybicki, 1990). Under natural conditions, OrMV has a wide host range, infecting plants of the genera Gladiolus, Iris, Ornithogalum and Diuris (Burger & Von Wechmar, 1989;Kaur et al., 2011;Wylie et al., 2013). In addition, OrMV can infect saffron corms (Crocus sativus) as described in our previous report (Liao et al., 2017).
OrMV is a member of the genus Potyvirus, which includes more than 100 viral species. The biology and molecular biology of OrMV have not been studied intensively. However, it is known that, similar to some well-characterized potyviruses, OrMV has a single-stranded, positive-sense RNA genome, encoding a single polyprotein which is cleaved into 10 mature proteins by three virus-specific proteases (King et al., 2011). Additionally, a short polypeptide (PIPO) is expressed by a +2 nucleotide frame shifting from the P3 crison, resulting in a P3-PIPO fusion product dedicated to movement of the virus in planta (Chung et al., 2008;Wei et al., 2010). Although only five complete genomes of OrMV have been determined, CP sequences of 36 OrMV isolates are publically available from GenBank. In this study, CP sequences were used to investigate the genetic diversity of OrMV and investigate the evolutionary forces responsible for the diversity. Our results will improve understanding of viral genetic variation and adaptive evolution, which may be helpful in developing sustainable management strategies for control of OrMV.

Virus isolates and sequence alignment
CP gene sequences with known geographic locations and host origins were obtained from GenBank database using its Batch Entrez facility (Table S1). Multiple sequence alignments were performed with MUSCLE codon algorithm (Edgar, 2004) implemented in MEGA5 (Tamura et al., 2011).

Phylogenetic network and recombination analyses
Two different approaches were used to investigate the occurrence of recombination events in CP sequences. First, the aligned CP gene sequences of 36 OrMV isolates were analysed using the Neighbor-Net method in SplitsTree 4.13.1 (Huson, 1998). In contrast to traditional bifurcating phylogenetic trees, SplitsTree constructs phylogenetic networks which allows for reticulation in the evolutionary relationships among taxa. Such reticulation could highlight the presence of recombination and if present, we followed up with a second analysis.
Second, sequences involved in the recombination and breakpoints were determined by using RDP4 suite (Martin et al., 2015), which incorporates the algorithms RDP, GENECONV, BOOTSCAN, MAXCHI, CHIMAERA, SISCAN, and 3SEQ. For each putative recombination breakpoint, a Bonferroni correction P-value (with a cutoff point at P < 0.01) was calculated. All isolates recognized were considered probable recombinants, supported by at least four different algorithms in RDP4 with an associated P-value of < 1.0 × 10 −4 . Simultaneously, the recombinants were further confirmed by GARD (Kosakovsky Pond et al., 2006) implemented in the Datamonkey web interface (Delport et al., 2010). The reliability of recombination breakpoints was evaluated using a KH test. To avoid false identification, only recombination breakpoints supported both by RDP4 and GARD were considered to be recombinants.

Genetic diversity and population subdivision
To investigate the genetic variation of the CP gene of OrMV, haplotype diversity (H d ) and nucleotide diversity (π ) were calculated using DnaSP 5.0 (Librado & Rozas, 2009). Hudson's estimates of K ST and S nn were used to determine the presence of subdivision in populations (Hudson, 2000;Hudson, Boos & Kaplan, 1992). Genetic differentiation among populations was also evaluated by F ST using Arlequin 3.5 (Excoffier & Lischer, 2010). The ranges of differentiation and corresponding F ST values were as follows: a moderate degree of differentiation, 0.05 to 0.15; a large degree, 0.15 to 0.25; and a substantial degree, >0.25 (Balloux & Lugon-Moulin, 2002). In addition, analysis of molecular variance (AMOVA) was conducted using Arlequin 3.5 (Excoffier & Lischer, 2010), with counties and host species as grouping factors to test for the effects of country and host on the genetic diversity of OrMV. The statistical significance of ϕ-statistics was tested based on 1023 permutations (default).

Phylogenetic analysis
After the potential recombinants were excluded, the phylogenetic relationships were reconstructed using the Maximum Likelihood (ML) approach implemented in MEGA5 (Tamura et al., 2011). For the ML analysis, substitution saturation was measured by Xia's test implemented in DAMBE 5.3.8 (Xia, 2013). The best-fitting of nucleotide substitution model was determined using MrModeltest (Nylander, 2008). ML analysis was performed under the GTR+Γ 4 model using the corrected Akaike Information Criterion and the robustness of the ML tree topology was assessed with 1,000 bootstrap replicates.

Bayesian tip-association significance testing for the geographic and host species
To determine the potential geographic and host-origin effects on OrMV CP diversification, Bayesian Tip-association significance (BaTS) testing was performed in BEAST 2.4.6 (Bouckaert et al., 2014). Three statistics of phylogeny-trait association were computed: association index (AI ), parsimony score (PS) and maximum monophyletic clade (MC) calculated from the posterior set of trees generated by BEAST 2.4.6 (Bouckaert et al., 2014). The statistical significance against the null distribution of trees was assessed by comparing it with the randomized trees generated from 10,000 reshufflings of tip characters. All P-values <0.05 from the three statistics, with low AI and PS scores and a high MC score, were considered significant, indicating a strong phylogeny-trait association.

Test for natural selection
Two different types of analyses were performed to test for natural selection using the CODEML algorithm (Yang, 2007) implemented in EasyCodeML (https://www.github. io/bioeasy/EasyCodeml). Firstly, the branch model was used to identify CP genes with a null model assuming that the entire tree has been evolving at the same rate (one-ratio model) and an alternative model allowing foreground branch to evolve under a different rate (two-ratio model). Multiple testing was corrected by applying the false discovery rate (FDR) method (Storey & Tibshirani, 2003) implemented in R. The CP gene of OrMV was considered as evolving with a significantly faster rate in foreground branch if the FDR-adjusted P-value less than 0.05 and a higher ω values (ω = dN/dS, synonymous to non-synonymous substitution rates) in the foreground branch than the background branches. Secondly, the site model was used to identify nucleotide sites in the CP-coding region that were likely to be involved in OrMV evolution. For the site model, six codon substitution models described as M0, M1a, M2a, M3, M7, and M8, were investigated. The M1a model assumes two categories of sites (ω 0 < 1, ω 1 = 1), whereas the M2a model adds a third set of sites (ω 2 > 1) to the M1a model. The M3 model, with three categories of sites, allows ω to vary among sites by defining a set number of discrete site categories, each with its own ω value. The M7 model partitions all the sites into ten different categories with ω < 1 and fits a beta distribution to ω. In the M8 model, an 11th category is added to the M7 model allowing ω values >1. For each nested model, the likelihood ratio test (LRT) was conducted by comparing twice the difference in log-likelihood values (2∆LnL) against a x 2 -distribution, with degrees of freedom equal to the difference in the number of parameters between models. Only a P-value of 0.05 or less in the all LRTs was considered to be significant. Additionally, pairwise dN/dS ratios were estimated using the yn00 program of PAML (Yang & Nielsen, 2000). Isolates that dS >2× the mean dS estimated from all isolates, as well as isolate pairs for which dS estimates approached 0, were removed as advised by Finseth, Bondra & Harrison (2014).

Recombination analyses
Recombination is an important source of genetic variability in viruses. To investigate the role of recombination in the evolution of OrMV, the split-decomposition network analysis with the CP gene sequences of 36 OrMV isolates was performed. A phylogenetic network showing reticulation was obtained (Fig. 1), indicating conflicting phylogenetic signals that are possibly attributed to recombination among viral genomes. The sequences were then checked for recombination using the RDP4 package (Martin et al., 2015). Four unique recombination events were detected by at least three independent methods implemented in the RDP suite (Table S2). However, only one isolate, Glad-8, was identified as a recombinant, with a breakpoint in the nucleotide 256, as confirmed by GARD analysis with a high level of confidence (both LHS and RHS p-values <0.01). The recombinant was excluded from the phylogenetic and selection analyses below.

Genetic diversity and population subdivision
OrMV isolates could be divided into two subgroups reflecting two different origins of OrMV or representing two divergent OrMV populations (Fig. 1). The haplotype diversity for both subgroup 1 and subgroup 2 was 1.000, whereas the nucleotide diversity for these two subgroups was 0.106 and 0.017, respectively. Haplotype diversity and nucleotide diversity for all OrMV isolates were 1.000 and 0.156, respectively, indicating a high genetic diversity in OrMV populations and among subpopulations (Table S3A)

Notes.
Significance thresholds: *** P < 0.001. tests of population differentiation were significant (Table S3B), indicating a great genetic differentiation between clade groups of OrMV.
To evaluate the role of geography and host specificity in shaping the population structure of OrMV, geographic regions and host genus were respectively used as a grouping factor to analyze the isolates of OrMV. When geographic regions were used as grouping factors, AMOVA tests revealed significant variation among geographic groups, making up 15.85% of the total variation, (Φ ST = 0.159, P-value <0.001) ( Table 1). Similar results were obtained when host species was used as a grouping factor. Significant subpopulation differentiation was observed among groups (Φ ST = 0.297, P-value <0.001), which accounted for nearly 30% of the total variation of OrMV. Taken together, it seems that the effect of host species on the genetic variance of OrMV is greater than that of geography although both host species and geographic effects contributed to the genetic variance of OrMV.

Phylogenetic analyses and BaTS testing
The ML phylogenetic trees based on the CP gene sequences showed that the 35 recombination-free OrMV isolates were grouped into two distinct clades with high bootstrap supports ( Fig. 2A), consistent with the results of phylogenetic network analysis. With the exception of an isolate from Australia, no significant signal for geographic structure in the diversity of the CP gene was observed when the OrMV isolates were grouped by their geographic origins (P MC > 0.05, Table 2). However, when the OrMV isolates were grouped by their host origins, a signal was found with more host-specific clustering than expected by chance, particularly for Ornithogalum, Lachenalia and Diuris (P MC < 0.05, Table 2). The BaTS results indicated that OrMV CP diversification could be maintained in part by host-driven adaptation.

Selection pressures
To investigate the differences in selective pressures behind the two clades (clade A and B) of OrMV, a two-ratio branch model test was performed using PAML, in which different ω values were assigned to the two clades. A LRT indicated that the one-ratio model should be rejected (p < 0.05, Table S4A); hence, selective pressures differed between the two clades. The mean ω values for clades A and B were 0.055 and 0.028 ( Fig. 2A, Table S4A), respectively, indicating that clade B was subjected to stronger purifying selection than clade A. Furthermore, the results from pair-wise analyses showed that there are differences between the distribution of dN/dS values between clade A and clade B (Fig. 2B). In the site model, there were no codons identified as being under positive selection and purifying selection was detected at the majority of polymorphic sites in the CP gene (Table S4B). Sliding-window analysis for sites under purifying selection was plotted in Fig. S1. Although the dN/dS values were below 1.00 for both clades, the dN/dS values of clade A were generally higher than those of clade B, indicating the CP gene in clade B had a stronger purifying selection pressure than those in clade A, in agreement with previous results from the branch model analysis. MC (Crocosmia) 1 n/a n/a n/a MC (Crocus) 1 n/a n/a n/a Notes. AI , association index; PS, parsimony score; MC, maximum monophyletic clade; HPD, highest probability density interval; n/a, no data available because of insufficient sample size (n < 2). Significance thresholds: *0.01 < p < .05. ** 0.001 < p < 0.01. *** p < 0.001.

DISCUSSION
Recombination plays an important role in the evolutionary history of plant viruses, including potyviruses (Moreno et al., 2004) (Gao et al., 2016aGao et al., 2017;Ohshima et al., 2007), luteoviruses (Pagán & Holmes, 2010) and cucumoviruses (Nouri et al., 2014). was maintained partially by host-driven adaptation. However, isolates included in this analysis were relatively limited both in geography and host species. Further studies with larger, multiple-location and multiple-host-species sampling are needed to confirm our results and generalize the findings.