Phylodynamics and Codon Usage Pattern Analysis of Broad Bean Wilt Virus 2

Broad bean wilt virus 2 (BBWV-2), which belongs to the genus Fabavirus of the family Secoviridae, is an important pathogen that causes damage to broad bean, pepper, yam, spinach and other economically important ornamental and horticultural crops worldwide. Previously, only limited reports have shown the genetic variation of BBWV2. Meanwhile, the detailed evolutionary changes, synonymous codon usage bias and host adaptation of this virus are largely unclear. Here, we performed comprehensive analyses of the phylodynamics, reassortment, composition bias and codon usage pattern of BBWV2 using forty-two complete genome sequences of BBWV-2 isolates together with two other full-length RNA1 sequences and six full-length RNA2 sequences. Both recombination and reassortment had a significant influence on the genomic evolution of BBWV2. Through phylogenetic analysis we detected three and four lineages based on the ORF1 and ORF2 nonrecombinant sequences, respectively. The evolutionary rates of the two BBWV2 ORF coding sequences were 8.895 × 10−4 and 4.560 × 10−4 subs/site/year, respectively. We found a relatively conserved and stable genomic composition with a lower codon usage choice in the two BBWV2 protein coding sequences. ENC-plot and neutrality plot analyses showed that natural selection is the key factor shaping the codon usage pattern of BBWV2. Strong correlations between BBWV2 and broad bean and pepper were observed from similarity index (SiD), codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analyses. Our study is the first to evaluate the phylodynamics, codon usage patterns and adaptive evolution of a fabavirus, and our results may be useful for the understanding of the origin of this virus.


Introduction
Broad bean wilt virus 2 (BBWV-2) belongs to the genus Fabavirus of the Comovirinae subfamily, Secoviridae family. BBWV-2 is an important pathogen causing extensive damage in broad bean, pepper, yam, spinach and other economically important horticultural and ornamental crops worldwide [1][2][3][4][5][6], and it is transmitted by aphids in a nonpersistent manner. The virions of BBWV-2 contain two proteins (large and small coat proteins) which form an icosahedral particle. BBWV-2 comprises bipartite positive-sense single-stranded RNA molecules with a genome size of approximately 6 and 4 kb. RNA1 encodes a single large polyprotein with functional proteins involved in genome replication and expression that are produced by proteolytic cleavage. Similarly, RNA2 also encodes a single large polyprotein, which is proteolytically processed into three functional proteins, including a movement protein and two coat proteins.
Sixty-one triplet codons encode all 20 amino acids, and thus several codons encode the same amino acid. This phenomenon is termed synonymous codons [7,8]. Generally, the unequal preference for specific codons over other synonymous codons by various organisms or even in different gene groups of the same genome creates a bias in codon

Virus Isolates
Forty-four full-length RNA1 and 47 full-length RNA2 sequences of BBWV-2 were retrieved from GenBank. The details of those sequences, such as host origins, collection time, locations and geographical, are shown in Supplementary Materials Table S1.

Recombination Analysis
Forty-four full-length RNA1 and 47 full-length RNA2 sequences of BBWV-2 were aligned using CLUSTAL X2 [32]. TRANSALIGN software (supplied kindly by Prof. Georg Weiller, Australian National University, Canberra, Australia) was used to support a degapped alignment of the encoded amino acids. Putative recombination events of the aligned BBWV-2 sequences were identified by BOOTSCAN, CHIMAERA, GENECONV, MAXCHI, RDP, SISCAN and 3SEQ programs [33][34][35][36][37][38][39] in the RDP4 software package [40]. The phylogenetic approach was used to verify the parent/donor assignments in the RDP4 package. And these analyses were calculated by different detection programs with default settings. The putative recombinants were supported by at least three different methods in the RDP4 package with an associated p-value of <1.0 × 10 −6 .

Phylogenetic and Evolution Dynamic Analysis
The phylogenetic relationships of the two ORF sequences of BBWV-2 were determined using the maximum-likelihood (ML) method in PhyML v3.0 [41] and the neighbour-joining (NJ) method implemented in MEGA vX [42]. The best-fitting models of the two datasets for the ML tree were selected by jModeltest v0.1.1 [43] according to the Akaike Information Criterion score. GTR with a proportion of invariable sites and a gamma distribution (GTR+I+г4) provided the best fit for both ORF coding sequences. For the ML analysis, branch support was calculated by a bootstrap analysis based on 1000 pseudoreplicates; meanwhile, for the NJ analysis, Kimura's two-parameter [44] option was used to evaluate 1000 bootstrap replications. The lineages were defined based on bootstrap values. When the node with high bootstrap values on both ML and NJ tree, we consider it as a defined lineage. The ML or NJ trees were displayed with TreeView [45]. Generally, two segments Viruses 2021, 13,198 3 of 21 of the same BBWV-2 isolate should be divided into same lineage based on ORF1 and ORF2 trees. We considered that reassortment occurred when the two segments of BBWV-2 isolate divided into different lineages based on ORF1 and ORF2 trees. The pairwise nucleotide sequence identity scores were represented as a distribution plot using SDT version 1.2 software (available from http://web.cbio.uct.ac.za/SDT) [46].
All nonrecombinant sequences obtained were used to estimate the evolutionary rate and timescale by BEAST v1.10.4 software [47]. Bayes factors were used to select the bestfitting molecular-clock models. The strict molecular clocks, uncorrelated lognormal, and uncorrelated exponential models [48] were also compared with the exponential growth, logistic growth, constant population size, Bayesian skyline plot, and expansion growth demographic models. A total of 6 × 10 8 -step MCMC chains were explored every 10 4 steps, and the first 10% of samples were removed as burn-in. Tracer v1.7 [49] was used to check the estimation of the relevant evolutionary parameters. To check the temporal signal, ten data-randomized replicates of the data were produced. The mean estimate from the original data out of 95% CIs of the date-randomized replicates is considered the criterion for clear temporal structure [50,51].

Nucleotide Composition Analysis
In total, five nonbiased codons, including three termination codons (UAA, UGA, and UAG), AUG (encoding only Met), and UGG (encoding only Trp), were removed, and the component parameters of both BBWV2 ORF sequences were calculated. The total content of AU and GC and the entire nucleotide composition (A, U, G and C %) of the two ORF data sets were calculated by BioEdit [52]. The nucleotide composition at the third codon position of the two BBWV-2 ORF sequences (A3, U3, G3 and C3%) were determined using the CodonW 1.4.2 package. EMBOSS explorer (http://www.bioinformatics.nl/embossexplorer/) was used to calculate the GC content at the 1st, 2nd and 3rd codon positions (GC1, GC2, GC3) and GC12 (the mean of GC1 and GC2).

Effective Number of Codon (ENC) Analysis
ENC values ranging from 20 (only one synonymous codon is used, an extreme codon usage bias) to 61 (the synonymous codons are used equally, no bias), indicating the degree of codon usage bias [53], were calculated by CodonW v1.4.2 software. ENC values were estimated as follows: where F k (k = 2, 3, 4, 6) is the average values for F k , while k is the k-fold degenerate amino acids. Here, F k is calculated as follows: where n is the total occurrence number of the codon for the corresponding amino acid; meanwhile, where n i represents the total number of the i-th codon for that amino acid. Here, the ENC was assessed to compute the absolute codon usage bias of both BBWV-2 ORF sequences regardless of the number of amino acids and the gene lengths. Generally, ENC values ≤ 35 indicate strong codon bias. It is accepted that the smaller the ENC value, the stronger the codon preference.

ENC-Plot Analysis
To investigate the role of mutation pressure in codon usage bias, a ENC-plot (ENC value in the ordinate against GC3s value in the abscissa) analysis was used. When the Viruses 2021, 13,198 4 of 21 codon usage bias is only determined by the mutation pressure factor, the points will lie on or around the standard curve. Otherwise, other factors also contribute, for example, natural selection. The expected ENC was calculated using the following formula: where s means the composition of GC3s.

Relative Synonymous Codon Usage (RSCU) Analysis
The ratio between the observed usage frequency and the expected usage frequency is termed as the RSCU value of a codon [54]. RSCU values were calculated as follows: where RSCU ij represents the value of the i-th codon for the j-th amino acid, and g ij means the observed number of i-th codons for the j-th amino acid which has "n i " kinds of synonymous codons. And the RSCU values of 1 indicate no bias for the codon, whereas codons with RSCU values more than 1.6 and smaller than 0.6 are considered to be "overrepresented" and "under-represented", respectively. The RSCU values of the two BBWV2 ORF sequences were calculated using MEGA X software.

Principal Component (PCA) Analysis
A multivariate statistical method (principal component analysis) was used to identify the correlations between samples and variables. After removing the codons UAA, UAG, UGA, UGG, and AUG, each strain of two ORF data sets was represented as a 59-dimensional vector, where each dimension corresponds to each sense codon's RSCU value [21,55]. A PCA analysis was performed using Origin 8.0 (OriginLab, Northampton, MA, USA).

Parity Rule 2 Analysis (PR2)
A PR2 plot analysis was performed to calculate the effects of mutation and natural selection on the codon usage of the two BBWV2 ORF sequences. The PR2 plot graphs A3/(A3 + U3) in the ordinate against G3/(G3 + C3) in the abscissa [21,55]. The centre of the plot (the slope is 0.5) indicates no bias between natural selection and mutation pressure.

Neutrality Analysis
The influence of mutation and natural selection bias on codon usage were analysed using a neutrality plot. Neutrality plot graphs GC3 in the abscissa and GC12 in the ordinate. The mutational force was indicated using the slope of the regression line which plotted between the GC12 and GC3 contents [21,55]. A slope of the regression lines on or around 1.0 indicates no or weak selection pressure. However, the codon usage bias was clearly influenced by natural selection when the regression curves deviated from the diagonal line.

Codon Adaptation Index (CAI) Analysis
The CAI value, which ranged from 0 to 1, is calculated by the CAIcal SERVER (http://genomes.urv.cat/CAIcal/RCDI/). It is used to predict the adaptation of the two ORFs of BBWV2 to their host. All above BBWV2 isolates were compared to each host. In general, the higher the CAIs, the stronger the adaptability to the host.

Similarity Index (SiD) Analysis
The SiD analysis was employed to evaluate the influence of the codon usage bias of hosts on the two BBWV2 ORFs. The SiD values was estimated as follows: where a i is the RSCU values of 59 synonymous codons of the BBWV2 coding sequences, and b i represents the identical codons' RSCU values of the host. SiD [D(A, B)] value, which ranged from 0 to 1.0, represents the potential effect of the entire codon usage of hosts on the BBWV-2 genes. Normally, higher SiD values indicate that the viruses' host plays a significant role in its codon usage.

Gravy and Aroma Statistics
A Gravy value ranging from −2 to 2 indicates the effect of protein hydrophobicity on codon usage bias. It is determined by CodonW (v1.4.2). Meanwhile, aroma value represents the influence of aromatic hydrocarbon proteins on codon usage bias.

Statistical Analysis
The relationships between the GC3s, GC ENC, Aroma, and Gravy and the first two principal component axes were measured using a Spearman's rank correlation analysis. A p value < 0.01 (**) shows an extremely significant relationship while 0.01 < p < 0.05 (*) represents a significant relationship. All of the above statistical analyses were estimated by Origin 8.0.

Recombination and Phylogenetic Analysis
Recombination can influence the topology of a phylogenetic tree and overall codon usage patterns at either the genome or gene level [56,57]. Thus, we first detected the presence of potential recombinants in the 44 full-length RNA1 and 47 full-length RNA2 sequences. Two and six clear recombinants from 44 full-length RNA1 and 47 full-length RNA2 sequences were observed (Table 1), respectively, and these recombinants were excluded from further analysis. The nonrecombinant BBWV-2 coding sequences mainly isolated from broad bean (RNA1 n = 6, RNA2 n = 5), pepper (RNA1 n = 18, RNA2 n = 18), spinach (RNA1 n = 3, RNA2 n = 3), and yam (RNA1 n = 4, RNA2 n = 4) were used in the following phylogenetic and codon usage analyses.
The phylogenetic analyses were conducted using ML methods based on the two ORFs' nonrecombinant sequences (Figure 1), respectively. Three and four lineages were formed based on the ORF1 and ORF2 coding sequences, respectively ( Figure 1). Four isolates in lineage I of the ORF1 ML tree were clustered into lineage IV in the ML tree of ORF2 ( Figure 1). These lineages did not reflect clear host and geographical origins. The ML trees of the ORF 1 and 2 coding sequences were compared using PATRISTIC software. The distance plots of the ORF1 distances against the ML trees of the ORF2 genes showed distinct lineages ( Figure 2). Similar, our time-scaled maximum clade credibility (MCC) tree also indicated three and four lineages based on the ORF1 and ORF2 coding sequences, respectively (Supplementary Materials Figure S1). The pairwise identity of ORF1 and ORF2 were approximately 77.59-100% and 78.17-100%, respectively. The phylogenetic analyses were conducted using ML methods based on the two ORFs' nonrecombinant sequences (Figure 1), respectively. Three and four lineages were formed based on the ORF1 and ORF2 coding sequences, respectively ( Figure 1). Four isolates in lineage I of the ORF1 ML tree were clustered into lineage IV in the ML tree of ORF2 ( Figure 1). These lineages did not reflect clear host and geographical origins. The ML trees of the ORF 1 and 2 coding sequences were compared using PATRISTIC software. The distance plots of the ORF1 distances against the ML trees of the ORF2 genes showed distinct lineages ( Figure 2). Similar, our time-scaled maximum clade credibility (MCC) tree also indicated three and four lineages based on the ORF1 and ORF2 coding sequences, respectively (Supplementary Materials Figure S1). The pairwise identity of ORF1 and ORF2 were approximately 77.59-100% and 78.17-100%, respectively.

Reassortment Analysis
Generally, reassortment can influence the rapid genomic and phenotypic changes for viruses with segmented genomes by coinfecting different viral strains exchanges entire segments. Our ML and time-scaled phylogenies distinguished three phylogenetic groups for Segment 1 (ORF1) and four groups for Segment 2 (ORF2) with high bootstrap or posterior support (Figure 1 and Supplementary Materials Figure S1). Ten isolates appear to

Reassortment Analysis
Generally, reassortment can influence the rapid genomic and phenotypic changes for viruses with segmented genomes by coinfecting different viral strains exchanges entire segments. Our ML and time-scaled phylogenies distinguished three phylogenetic groups for Segment 1 (ORF1) and four groups for Segment 2 (ORF2) with high bootstrap or posterior support (Figure 1 and Supplementary Materials Figure S1). Ten isolates appear to be reassorted (23.8%) ( Figure 1A) among the 42 full-length BBWV-2 isolates. For example, the AB1 isolate (MH447988) from South Korea was clustered into lineage I in the ORF1 ML tree, whereas it was divided into lineage III in the ORF2 ML tree ( Figure 1A). All four isolates in lineage IV in the ORF2 ML tree were clustered into Lineage I in the ORF1 ML tree ( Figure 1A). In addition, five isolates (the RP3, BB5, P3, and P2 isolates from South Korea and the Anhui isolate from China) in lineage III for the segment 1 tree were clustered into lineage I for the segment 2 tree ( Figure 1A). Furthermore, we performed reassortment analysis by RDP software using 38 BBWV2 artificially concatenated nonrecombinant sequences. This results also supported that ten isolates (IP, BB2, AB1, RP7, RP3, ME, BB5, P3, P2, and AH) appear to be reassorted (Supplementary Materials  Table S2).

Evolutionary Dynamic Analysis
A Bayesian phylogenetic method in BEAST v1.10.4 [47] was used here to estimate the evolutionary rates and node ages of BBWV-2 based on the two ORFs' nonrecombinant sequences. The expansion growth demographic model was supported as the best model for both ORF sequences based on a comparison of marginal likelihoods that were calculated using the harmonic-mean estimator in Tracer v 1.5.1. The relaxed-clock model provided a better fit than the strict-clock model, indicating the presence of rate variation among groups. Both ORF datasets of BBWV2 passed the date-randomization tests [50,51] Figure S1B) for the ORF1 and ORF2 coding sequences, with effective sample size (ESS) 957 and 561, respectively.

Nucleotide Bias Analysis
The nucleotide compositions of the two ORF sequences were calculated to assess the influence of compositional constraints on BBWV-2 s codon usage. Nucleotides U and A were most abundant with a mean composition of 28.42 ± 0.29% and 28.27 ± 0.20% (Supplementary Materials Table S3) for the ORF1 sequences, respectively, compared with G (26.21 ± 0.22%) and C (17.10 ± 0.25%). Similarly, nucleotides U (29.06 ± 0.34%) and A (27.94 ± 0.35%) were also most abundant in the ORF2 coding sequences, followed by G (24.28 ± 0.34%) and C (18.71 ± 0.29%) (Supplementary Materials Table S4). In terms of the third position's nucleotide composition of synonymous codons, U 3S, A 3S , G 3S and C 3S in both ORF sequences were consistent with the nucleotide composition despite the similar value of A 3S (33.13 ± 0.89%) and G 3S (33.64 ± 1.01%) in the ORF1 sequences (Supplementary Materials Tables S3 and S4). In addition, the composition of AU (56.69 ± 0.27% and 57.00 ± 0.42%) was also higher than that of GC (43.31 ± 0.27% and 43.00 ± 0.42%) in both ORF sequences of BBWV-2 (Supplementary Materials Tables S3 and S4). In all, these results suggest an AU-rich composition of BBWV-2 coding sequences.
An RSCU analysis was performed to estimate the codon usage pattern of the ORF1 and ORF2 coding sequences of BBWV-2 ( Table 2). For the ORF1 coding sequences, 14 of 18 preferred codons were A/U-ended (both A-and U-ended: 7) ( Table 2) (Table 2). These results suggest that A-and U-ended codons were preferred in the BBWV-2 coding sequences. Within these preferred codons, three had a RSCU value > 1.6, with the highest being UUG for both the ORF1 (2.82) and ORF2 (2.66) coding sequences of BBWV-2, indicating extreme over-presentation. The remaining preferred codons of RSCU values were all more than 0.6 and smaller than 1.6. Moreover, no optional synonymous codons were under-represented (RSCU < 0.6) from the BBWV-2 coding sequences. In addition, the RSCU values of the BBWV-2 coding sequences in terms of hosts also indicated that A/U-ended codons were more frequent than G/C-ended codons (  Tables S3 and S4). For ORF1, the highest mean ENC value of BBWV-2 was found in spinach while the lowest was in yam (Supplementary Materials Figure S3A). Meanwhile, the highest mean ENC value was found in pepper for ORF2 sequences (Supplementary Materials Figure S3B). The mean ENC values of the two ORF coding sequences were more than 35 (Supplementary Materials  Tables S3 and S4) (Supplementary Materials Figure S3), indicating a conserved and stable genomic composition in the BBWV2 coding sequences.

Trends in Codon Usage Variations
To investigate the synonymous codon usage variation in the two ORF sequences of BBWV-2, we performed a principal component analysis. The first four principal axes (axes 1-4) for the two ORF sequences of BBWV-2 accounted more than 50% ( Figure 3A,B). The results also showed that Axis 1 was the major factor affecting codon usage of the two ORF sequences for BBWV-2. We also explored the distribution of the two ORF sequences in different hosts according to the RSCU values on the first two axes. Overlapping among the different hosts for the two BBWV-2 coding sequences was observed from the PCA analysis, indicating distinct codon usage trends ( Figure 3C,D). However, only three yam and three spinach sequences were included in this analysis, so these results require further confirmation.  Tables S3 and S4). For ORF1, the highest mean ENC value of BBWV-2 was found in spinach while the lowest was in yam (Supplementary Materials Figure S3A). Meanwhile, the highest mean ENC value was found in pepper for ORF2 sequences (Supplementary Materials Figure S3B). The mean ENC values of the two ORF coding sequences were more than 35 (Supplementary Materials Tables S3 and S4) (Supplementary Materials Figure S3), indicating a conserved and stable genomic composition in the BBWV2 coding sequences.

Trends in Codon Usage Variations
To investigate the synonymous codon usage variation in the two ORF sequences of BBWV-2, we performed a principal component analysis. The first four principal axes (axes 1-4) for the two ORF sequences of BBWV-2 accounted more than 50% ( Figure 3A,B). The results also showed that Axis 1 was the major factor affecting codon usage of the two ORF sequences for BBWV-2. We also explored the distribution of the two ORF sequences in different hosts according to the RSCU values on the first two axes. Overlapping among the different hosts for the two BBWV-2 coding sequences was observed from the PCA analysis, indicating distinct codon usage trends ( Figure 3C,D). However, only three yam and three spinach sequences were included in this analysis, so these results require further confirmation.

ENC-Plot Analysis
A ENC-GC3s plot analysis was performed to assess the forces influencing the BBWV2 codon usage pattern. In general, points falling below the expected curve indicate that the codon usage is affected by natural selection rather than mutation. On the other hand, data points falling onto the expected curve which indicate mutational pressure. In the two plots of BBWV-2 sequences, all isolates regardless of host clustered together below the expected ENC curve (Figure 4), indicating that the influence of natural selection dominated that of mutation pressure.

ENC-Plot Analysis
A ENC-GC3s plot analysis was performed to assess the forces influencing the BBWV2 codon usage pattern. In general, points falling below the expected curve indicate that the codon usage is affected by natural selection rather than mutation. On the other hand, data points falling onto the expected curve which indicate mutational pressure. In the two plots of BBWV-2 sequences, all isolates regardless of host clustered together below the expected ENC curve (Figure 4), indicating that the influence of natural selection dominated that of mutation pressure.

Neutrality Plot
The influence of mutation and natural selection on BBWV-2 codon usage was assessed using a neutrality analysis ( Figure 5). Generally, nucleotide changes at the third position of the codon do not influence the changes in amino acids, so they are considered only a mutational force. Meanwhile, a nucleotide change causing a change in amino acid is considered a selection force. Among the two ORF sequences, a negative correlation was observed between the GC12 and GC3 values for BBWV-2 ( Figure 5). The slopes of the linear regression were −0.0415 and −0.0242 for the ORF1 and ORF2 coding sequences (Figure 5), respectively. These results indicate that mutation pressure accounted for 4.15% and 2.42% of the selection force for the ORF1 and ORF2 coding sequences, whereas natural selection accounted for 95.85% and 97.58%, respectively. Thus, the neutrality analysis indicated that natural selection dominated the forces shaping the codon usage pattern of BBWV-2.

Neutrality Plot
The influence of mutation and natural selection on BBWV-2 codon usage was assessed using a neutrality analysis ( Figure 5). Generally, nucleotide changes at the third position of the codon do not influence the changes in amino acids, so they are considered only a mutational force. Meanwhile, a nucleotide change causing a change in amino acid is considered a selection force. Among the two ORF sequences, a negative correlation was observed between the GC12 and GC3 values for BBWV-2 ( Figure 5). The slopes of the linear regression were −0.0415 and −0.0242 for the ORF1 and ORF2 coding sequences ( Figure 5), respectively. These results indicate that mutation pressure accounted for 4.15% and 2.42% of the selection force for the ORF1 and ORF2 coding sequences, whereas natural selection accounted for 95.85% and 97.58%, respectively. Thus, the neutrality analysis indicated that natural selection dominated the forces shaping the codon usage pattern of BBWV-2.

Parity Analysis
To assess whether highly biased genes exhibited biased codon selection i BBWV2 coding regions, we performed a PR2 bias plot analysis. Generally, the the plot (A = T and G = C) is the place where both coordinates are 0.5, and it place where no bias is present in the selection (substitution rates) or mutation Here, the nucleotides A and C are less commonly used than U and G in the two coding sequences (Figure 6). These give a novel perspective on the genetic dive BBWV-2 and explore factors shaping its codon usage patterns.

Parity Analysis
To assess whether highly biased genes exhibited biased codon selection in the two BBWV2 coding regions, we performed a PR2 bias plot analysis. Generally, the centre of the plot (A = T and G = C) is the place where both coordinates are 0.5, and it is also the place where no bias is present in the selection (substitution rates) or mutation force [12]. Here, the nucleotides A and C are less commonly used than U and G in the two BBWV-2 coding sequences ( Figure 6). These give a novel perspective on the genetic divergence of BBWV-2 and explore factors shaping its codon usage patterns.

Parity Analysis
To assess whether highly biased genes exhibited biased codon selection in the BBWV2 coding regions, we performed a PR2 bias plot analysis. Generally, the cent the plot (A = T and G = C) is the place where both coordinates are 0.5, and it is als place where no bias is present in the selection (substitution rates) or mutation force Here, the nucleotides A and C are less commonly used than U and G in the two BBW coding sequences ( Figure 6). These give a novel perspective on the genetic divergen BBWV-2 and explore factors shaping its codon usage patterns.  Furthermore, to calculate the influence of natural selection pressure on BBWV-2 codon usage bias, a linear regression analysis between the ARO and GRAVY values and GC3S, GC, ENC, and the first two principal axes values were also performed here. The correlation analysis based on ORF1 sequences showed that GRAVY is significantly negatively correlated with Axis 1. AROMO showed a significant positive correlation with ENC and a significant negative correlation with Axis 1 (Table 3). For the ORF2 sequences, our correlation analysis indicated that GRAVY is significantly negatively correlated with Axis 1 and Axis 2, and AROMO showed a significant negative correlation with ENC (Table 3). These results indicate that the general average aromaticity and hydropathicity are correlated to the codon usage variation in BBWV-2, indicating the influence of natural selection pressure on the BBWV-2 s codon usage pattern.

Codon Usage Adaptation in BBWV-2
CAI values were assessed to determine the adaptation and codon usage optimization of BBWV-2 to its hosts. Generally, sequences with higher CAI values are considered to be more adapted to hosts than sequences with low values. Here, the mean CAI values of the ORF1 sequences were 0.749, 0.771, 0.768 and 0.747 for broad bean, pepper, spinach and yam, respectively (Figure 7). The mean CAI values of the ORF2 coding sequences were 0.751, 0.770, 0.766 and 0.745 for the broad bean, pepper, spinach and yam, respectively ( Figure 7). These results indicate that the BBWV-2 genes have codon usage preferences that are closer to pepper than to other hosts. Then, to show the cumulative effects of codon biases on a single gene's expression, we also performed an RCDI analysis. The means of the RCDI values for both ORF sequences were highest for yam, followed by broad bean, spinach and pepper (Figure 7). These results also indicate that the BBWV-2 genes have codon usage preferences that are closer to pepper than to other hosts. Moreover, we performed an SiD analysis to understand how the codon usage patterns of broad bean, pepper, spinach and yam affect the BBWV-2 codon usage pattern ( Figure 8). Among the two ORF coding sequences, the highest SiD values were all observed in broad bean ( Figure 8) while the lowest values were found in pepper, although these SiD values for broad bean, pepper, spinach and yam were very low. These results indicate that during BBWV-2 evolution broad bean and pepper probably had a greater impact on the virus than spinach and yam.

Discussion
The evolutionary analysis and genetic variation of BBWV-2 from broad bean or pepper have been described based on analyses of partial or complete genome sequences [2,25]. In this study, our results provide significant insight into the evolutionary patterns of BBWV-2. For segmented viruses, rapid genomic changes were driven by both recombination and reassortment [58,59]. Previously, recombination was proven to be the factors shaping the evolution of BBWV-2 [2]. Here, our current findings suggest that genetic exchange by reassortment also had a significant influence on the genomic composition of BBWV-2.

Discussion
The evolutionary analysis and genetic variation of BBWV-2 from broad bean o per have been described based on analyses of partial or complete genome seq [2,25]. In this study, our results provide significant insight into the evolutionary p of BBWV-2. For segmented viruses, rapid genomic changes were driven by both re nation and reassortment [58,59]. Previously, recombination was proven to be the shaping the evolution of BBWV-2 [2]. Here, our current findings suggest that gen change by reassortment also had a significant influence on the genomic composi BBWV-2.

Discussion
The evolutionary analysis and genetic variation of BBWV-2 from broad bean or pepper have been described based on analyses of partial or complete genome sequences [2,25]. In this study, our results provide significant insight into the evolutionary patterns of BBWV-2. For segmented viruses, rapid genomic changes were driven by both recombination and reassortment [58,59]. Previously, recombination was proven to be the factors shaping the evolution of BBWV-2 [2]. Here, our current findings suggest that genetic exchange by reassortment also had a significant influence on the genomic composition of BBWV-2.
A phylogenetic analysis of BBWV-2 showed six divergent evolutionary lineages based on partial genomic sequences [2]. In the present study, our phylogenetic analysis based on ML and MCC found three and four lineages based on the ORF1 and ORF2 protein coding sequences, respectively. The evolutionary rates of the two BBWV-2 ORF coding sequences were 8.895 × 10 −4 and 4.560 × 10 −4 subs/site/year, respectively, similar to tobacco mosaic virus (TMV) [60] but slightly slower than the previously reported plant RNA viruses such as PVM [61], turnip mosaic virus [62], odontoglossum ringspot virus (ORSV) [63], and potato virus Y [64].
The codon usage pattern has a significant influence on virus evolution, such as adaption, evolution, evasion from the host's immune system, and survival [55,[65][66][67][68]. Presently, only several reports have described the influence of codon usage in the evolution of plant viruses. Here, we firstly assessed the codon usage pattern and composition of BBWV2 based on the complete genome. Normally, AU-rich genomes tend to contain codons ending with A and U, while genomes with a GC-rich composition tend to contain codons ending with G and C. In this study, our nucleotide composition results show that codons ended with A and U are more frequent in BBWV-2 coding sequences. Generally, the preferred codons have been mostly determined by compositional constraints (A and U in this case in the two BBWV2 ORF coding regions), which also supports the presence of mutation pressure. And, the RSCU analysis also indicated that A/U-ended codons were more frequent than G/C-ended codons.
Normally, RNA viruses have low codon usage bias to perform efficient replication in the host by lowering the competition with the host genes [55,[65][66][67][68]. Previous reports showed low codon usage bias from several plant viruses such as CTV, PRSV, PVM, RSV and SCMV [19][20][21][22][23][24]. In our study, a lower codon usage pattern of the BBWV2 genome (the ENC values higher than 35) was also found, indicating a low degree of preference. The ENC-plot, PR2, neutrality plot and regression analyses between the ARO and GRAVY values and ENC, GC, GC3S, and the first two principle axes values significantly showed that BBWV-2 is influenced by natural selection and mutation pressure to variable degrees. Consistent with PVM [23], both the ENC-plot and neutrality plot analyses showed that natural selection is the key factor shaping the codon usage pattern of BBWV-2.
The evolution and dynamics of infectious diseases are influenced by host-parasite interactions [69][70][71]. For viruses, several reports showed that codon usage patterns have a significant effect on the viruses' host-specific adaption [23,55,67,68]. In this study, our CAI analysis showed that both BBWV-2 genes have codon usage preferences that are closer to pepper than to other hosts. In addation, the RCDI analysis showed that the lowest codon usage deoptimization also occurred for the BBWV-2 isolates for pepper followed by spinach. Generally, low RCDI values indicate strong adaptation to a host [72]. Thus, both CAI and RCDI analyses were consistent supported that BBWV-2 were most strongly adapted to pepper than broad bean, spinach and yam. However, the SiD value for the BBWV-2 isolates from broad bean was higher than those observed for yam, spinach and pepper, indicating that the selection pressure of broad bean on BBWV-2 isolates was greater than that of yam, spinach and pepper, possibly due to BBWV-2 originated from broad bean.
In conclusion, the codon usage patterns and host adaptability of BBWV-2 were studied for the first time to investigate its evolutionary changes. Reassortment also had a significant influence on the genomic evolution of BBWV-2. The ENC-plot and neutrality plot analyses showed that natural selection is the key factor shaping the codon usage pattern of BBWV-2. A strong correlation between BBWV-2 and both broad bean and pepper was observed from the CAI, RCDI and SiD analyses. Our study furthers the understanding of the evolutionary changes of BBWV-2, which may provide a better understanding of the origin of BBWV-2.
Supplementary Materials: The following are available online at https://www.mdpi.com/1999-491 5/13/2/198/s1, Figure S1. Bayesian maximum-clade-credibility tree inferred from trees calculated from the ORF1 (A) and ORF2 (B) sequences of broad bean wilt virus 2. Horizontal blue bars represent the 95% credibility intervals of the estimates of node ages. The tree topology was chosen to maximize the product of node posterior probabilities. Only posterior probability values above 0.95 are shown.
Year before present; 2017. Figure S2. Estimates of nucleotide substitution rates. Mean estimates and the 95% highest posterior density interval (HPD) are shown. They were estimated from the polyprotein sequences of nonrecombinant sugarcane mosaic virus. The first value is based on the original data, whereas the remaining ten values are from date-randomized replicates in each set of estimates. The 95% HPD of the estimates from the date-randomized replicates did not overlap with the mean posterior estimate from the original data set. Moreover, the lower tails of the credibility intervals were long and tended towards zero. These features suggest sufficient temporal structure in the original data sets for rate estimation. Figure S3 ENC values for the ORF1 (A) and ORF2 (B) sequences of broad bean wilt virus 2. Table S1: The BBWV2 isolates using in this study, Table S2. Reassortment analysis by RDP software using 38 BBWV2 artificially concatenated sequences. Table  S3. Codon data of broad bean wilt virus 2 ORF1. Table S4. Codon data of broad bean wilt virus 2 ORF2. References [26][27][28][29][73][74][75][76]  Data Availability Statement: All data presented in this study are available on request from the corresponding authors.