Genetic analysis of tomato brown rugose fruit virus reveals evolutionary adaptation and codon usage bias patterns

Tomato brown rugose fruit virus (ToBRFV) poses a significant threat to tomato production worldwide, prompting extensive research into its genetic diversity, evolutionary dynamics, and adaptive strategies. In this study, we conducted a comprehensive analysis of ToBRFV at the codon level, focusing on codon usage bias, selection pressures, and evolutionary patterns across multiple genes. Our analysis revealed distinct patterns of codon usage bias and selection pressures within the ToBRFV genome, with varying levels of genetic diversity and evolutionary constraints among different genes. We observed a transition/transversion bias of 2.07 across the entire ToBRFV genome, with the movement protein (MP) gene exhibiting the highest transition/transversion bias and SNP density, suggesting potential evolutionary pressures or a higher mutation rate in this gene. Furthermore, our study identified episodic positive selection primarily in the MP gene, highlighting specific codons subject to adaptive changes in response to host immune pressures or environmental factors. Comparative analysis of codon usage bias in the coat protein (CP) and RNA-dependent RNA polymerase (RdRp) genes revealed gene-specific patterns reflecting functional constraints and adaptation to the host's translational machinery. Our findings provide valuable insights into the molecular mechanisms driving ToBRFV evolution and adaptation, with implications for understanding viral pathogenesis, host-virus interactions, and the development of control strategies. Future research directions include further elucidating the functional significance of codon usage biases, exploring the role of episodic positive selection in viral adaptation, and leveraging these insights to inform the development of effective antiviral strategies and crop protection measures.


Genetic variation analysis
To assess genetic variation between ToBRFV isolates, several analyses were conducted.We aligned all sequences using Clustal Omega version 1.2.2 11 .Then, the number of single nucleotide polymorphisms (SNPs) was determined for each gene and the entire genome, highlighting regions of genetic diversity using Geneious Prime 2023 with default parameters in the "Find Variation/SNPs" tool (https:// www.genei ous.com).Furthermore, the lengths of the CP, MP, and RdRp genes were compared to identify variations in gene size and potential functional implications.
To identify and visualize SNPs in amino acid sequences, we utilized the Biopython library (version 1.84) in Python 3 to analyze the multiple sequence alignment (MSA) data.The MSA was stored in a fasta file.The analysis was performed using a custom Python script, which first loaded the alignment data using the AlignIO module from Biopython.The script then iterated over the alignment columns to identify positions where at least one SNP was present, defined as columns containing more than one unique amino acid residue.For each SNP position identified, the script recorded the position and the number of unique amino acids observed at that position.This information was used to generate a bar plot using the Matplotlib library (version 3.7.2),with the x-axis representing the SNP position and the y-axis representing the number of unique amino acids (i.e., SNP frequency) at each position.The plot was designed to facilitate easy interpretation of the SNP distribution across the amino acid sequence.The resulting plot was saved as a PDF file with a resolution of 300 DPI for inclusion in this manuscript.The Python code used for this analysis is in Supplementary data 2.

Evolutionary divergence analysis
Average evolutionary divergence over sequence pairs within and between groups was calculated using MEGA 11.This analysis provided quantitative measures of genetic distance, allowing us to assess the degree of divergence among ToBRFV isolates.By comparing divergence levels within and between groups, we gained insights into the genetic relationships and population structure of ToBRFV.
The number of base substitutions per site from averaging over all sequence pairs within each group and between groups is shown.Analyses were conducted using the Maximum Composite Likelihood model.This

Genetic distance matrix
The genetic distances between ToBRFV sequences from various geographic locations were calculated and organized into a matrix using MEGA 11.This matrix served as the input data for the Principal Component Analysis (PCA).To explore the independence of genetic distances from geographic distances, PCA was performed on the genetic distance matrix using a code in Python 3 (Supplementary data 2).The genetic distance matrix was organized as a nested list, where each row and column corresponded to a country, and each value represented the genetic distance between the respective countries.The countries included in the analysis were Jordan, Peru, Israel and the State of Palestine, Germany, Mexico, Italy, United Kingdom, Canada, Greece, Netherlands, Egypt, USA, China, Turkey, and Belgium.The nested list was converted into a Panda DataFrame for ease of manipulation and analysis.DataFrame had countries as both row and column labels.PCA was performed using the PCA class from the sklearn.decompositionmodule in Python.The analysis was conducted to reduce the dimensionality of the data to two principal components, which explained the most variance in the genetic distances.The results of the PCA were visualized using a scatter plot, where each point represented a country.The countries were color-coded for clarity, and the first two principal components were plotted to illustrate the genetic relationships.

Selective pressure analysis
To investigate selective pressures acting on ToBRFV genes, the ratio of nonsynonymous to synonymous substitutions (dN/dS) was calculated for the CP, MP, and RdRp genes.This analysis, performed using the Datamonkey web server (https:// www.datam onkey.org/ meme/) 12 , assessed the relative contributions of positive selection, purifying selection, and neutral evolution to the genetic diversity of ToBRFV.
MEME (Mixed Effects Model of Evolution) estimates a site-wise synonymous (&alpha;) and a two-category mixture of non-synonymous (&beta; with proportion p-, and &beta; + with proportion [1-p-]) rates, and uses a likelihood ratio test to determine if &beta; + > &alpha; at a site.The estimates aggregate information over a proportion of branches at a site, so the signal is derived from episodic diversification, which is a combination of the strength of selection [effect size] and the proportion of the tree affected.A subset of branches can be selected for testing as well, in which case an additional (nuisance) parameter will be inferred from the non-synonymous rate on branches NOT selected for testing.

Recombination analysis of ToBRFV isolates
We investigated the potential for recombination events among ToBRFV isolates.We performed a whole genome recombination analysis using GARD, a genetic algorithm for recombination detection (https:// www.datam onkey.org/ gard) 13 .All available complete genome sequences of ToBRFV isolates from this study were included in the analysis.Default parameters were employed for the recombination detection algorithms.

Codon usage bias analysis
A total of 215 sequences encoding the MP, CP, and RdRp of ToBRFV were retrieved from NCBI in FASTA format.These sequences underwent preprocessing, including sequence trimming, quality assessment, and the removal of non-coding regions.Codon usage bias was analyzed using the R programming language (R-Studio version 9) with the coRdon package (Supplementary Data 2 14 ).Codon frequencies were calculated to determine the prevalence of each codon across the dataset.Relative Synonymous Codon Usage (RSCU) values were estimated to assess the non-uniform usage of synonymous codons, while the Codon Adaptation Index (CAI) was computed to measure bias towards codons associated with highly expressed genes.The Effective Number of Codons (ENC) was calculated to evaluate overall codon usage bias.Additionally, GC content, including GC3S (GC content at synonymous third positions), was determined.Descriptive statistics, including mean, median, standard deviation, and range, were calculated for CAI, ENC, GC content, and GC3S, and their distributions were visualized using histograms.The results were then exported as CSV files for further interpretation, with a focus on their implications for gene expression and evolutionary dynamics within the ToBRFV coding regions.

Comprehensive phylogenetic analysis of ToBRFV isolates
The circular phylogenetic tree representing 215 ToBRFV isolates based on their amino acid sequences revealed a complex network of evolutionary relationships.Set at a scale of 0.01, the tree allowed for a detailed analysis of genetic distances between sequences.Color-coded branches highlighted distinct phylogenetic clades, with the majority represented in blue, a specific subset (including Tomato mosaic virus amino acid sequences of CP, MP, and RdRp as outgroup) in yellow, Iranian isolates in dark blue, and red dots indicating bootstrap values for each node (Fig. 1).This visualization emphasized the remarkable genetic diversity within the ToBRFV population and provided a platform for discussing the observed evolutionary patterns and divergence in our study.
Interestingly, the phylogenetic analysis showed that isolates from different geographic locations were randomly distributed throughout the tree, without clear subgrouping based on geographical origin.This may suggest that the virus has spread globally in recent years, leading to increased genetic similarity among isolates from various regions.Our comprehensive tree construction, incorporating all coding regions of the virus (CP, MP, and RdRp), facilitated a thorough examination of the relationships between virus isolates.Notably, the Iranian isolates formed a distinct cluster, indicating the need for further investigation into their genetic characteristics and potential implications for virus spread and management.
Comparing our findings with previous studies that assessed genetic differentiation and migration patterns among ToBRFV populations from Europe, Asia, Africa and America, our results revealed differing trends.While Güller et al. reported high gene flow among geographic populations based on CP and MP gene domains, our study highlighted a lack of clear genetic differentiation between isolates from different regions.The absolute values of Fst among geographic populations being less than 0.33 and the high migration rates (> 1) observed in our study support the notion of extensive gene flow among ToBRFV populations from European, Asian, and American variants.Furthermore, the low values of Kst* and Z* metrics, and the non-significant p-values in pairwise comparisons for both gene regions, suggest a lack of significant genetic divergence between geographic populations.Particularly, the Snn metric results < 0 indicate minimal genetic differences between the populations, further supporting the concept of ongoing gene flow and genetic similarity among ToBRFV isolates globally 15 .

SNP distribution analysis reveals differential evolutionary pressures on ToBRFV genes
The SNP distribution plot provides a visual comparison of amino acid change across the three genes of the ToBRFV: MP, CP, and RdRp.Each column represents a location of amino acid change within the respective gene sequences.The density of amino acid changes is markedly higher in the RdRp gene but if we calculate amino acid changes compared to amino acid length for each gene, we found that MP shows more variable (0.218 Number of SNP/gene length) suggesting a greater variability and potential for evolutionary adaptation (Table 2).This

Ancestral relationships of ToBRFV isolates
Our study utilized whole genome nucleotide sequences of ToBRFV (215 isolates) to construct a circular phylogenetic tree, offering insights into the virus's evolutionary history.The tree, with a scale set at 0.01 to reflect genetic distances between sequences, indicates significant evolutionary events.The analysis revealed two distinct groups in green representing ancestors and Jordan isolates, while the yellow section denoted TMV as the outgroup.This phylogenetic analysis not only unveiled ancestral relationships but also suggested potential evolutionary pathways of the virus, illuminating its genetic makeup and transmission dynamics (Fig. 3).Confirming the findings of Salem et al. 2 , our results supported Jordan as the origin of the virus, emphasizing the importance of investigating transmission dynamics, possibly through seed dispersal.
Interestingly, the Iranian isolate displayed a distinct clad from other isolates, signaling the presence of a potentially new strain or sequencing artifact 16 .More variation in the Iranian isolate compared to other isolates was shown in the RDRP gene where more SNP were discovered.CP and MP were more like other isolates.This observation underscores the necessity for comprehensive genomic screening to identify any novel changes that could lead to the emergence of more destructive biotypes or strains.Further investigation and sequencing of this isolate are recommended to elucidate its unique genetic characteristics and implications for viral spread and management strategies.
Moreover, insights from previous studies shed light on the emergence and evolutionary history of ToBRFV.Analysis by Yan et al. revealed a distinct clustering of tobamoviruses 17 , with ToBRFV and TMV forming sister branches, while ToMV and ToMMV grouped.Recombination events involving other tobamoviruses, as suggested by Salem et al. 2 , may have contributed to the origin of ToBRFV.The detection of a recombination event involving strains of TMV and ToMMV further underscores the complex evolutionary dynamics shaping the genetic diversity of ToBRFV 6,18 .Our analysis using the GARD algorithm did not find any evidence of recombination events among the ToBRFV isolates included in this study.A total of 2647 models were examined at a rate of 0.73 models per second.The alignment contained 638 potential breakpoints, leading to a search space of 638 models with up to 1 breakpoint.The genetic algorithm explored 414.89% of this search space.
Additionally, findings by Esmaeilzadeh et al. propose Peru as a potential center for the emergence of ToBRFV 5 , emphasizing the importance of studying isolates from diverse geographical locations.The sequencing of a ToBRFV genome from tomato seeds in Peru (MW314111) provided valuable insights into the genetic diversity of the virus, suggesting a potential origin in South America rather than the Middle East 5 .However, we used more whole genome nucleotide sequences which we believe our result (Fig. 3) is more reliable than others also, our results are confirmed by the origin of the first report of the virus 2 .

Tomato brown rugose fruit virus diversity
The table presents the estimates of average evolutionary divergence over sequence pairs within various geographic groups.These values represent the genetic variation within each group, with a higher number indicating greater divergence (Table 1).Isolates in each country were classified.Jordan, the United Kingdom, Canada, and Mexico exhibit relatively low divergence values (0.0016 to 0.0017), suggesting a high degree of genetic similarity among the ToBRFV sequences within these populations.Peru, Israel and State of Palestine, the USA, and Belgium show moderate divergence (0.0022 to 0.0027), indicating a fair amount of genetic variation.Germany, Netherlands, and China have higher divergence values (0.0029 to 0.0034), which could reflect a more diverse set of sequences or a longer period of viral evolution within these groups.Italy and Greece stand out with the highest divergence values (0.0039 and 0.0041, respectively), pointing to significant genetic diversity within the ToBRFV sequences from these countries.Egypt shows an exceptionally low divergence value (0.0001), which might suggest a recent introduction of the virus or a very stable viral population with little genetic change.
Notably, Turkey has a divergence value of 0, indicating no detectable variation among the sampled sequences, which could be due to a very recent spread or a highly conserved virus population.The genetic data reflect a bottleneck caused by eradication efforts, indicating that the virus is still undergoing geographical expansion 4 .Despite the geographic diversity, ToBRFV isolates from different regions exhibit a high level of interrelatedness, with low genetic diversity and random mutations across genomes, attributed to the introduction of infected seeds 4 .These divergence estimates are crucial for understanding the genetic landscape of ToBRFV across different regions.They provide insights into the virus's spread, mutation rates, and potential adaptation to diverse environmental conditions or host varieties.This information can be instrumental in developing targeted strategies for monitoring and controlling the spread of ToBRFV.
The study aimed to analyze the transition/transversion bias and density of single nucleotide polymorphisms (SNPs) across different regions of the ToBRFV genome.The obtained data revealed interesting patterns in genetic variation among ToBRFV genes (Table 2).www.nature.com/scientificreports/Our analysis indicated a transition/transversion bias of 2.07 across the entire ToBRFV genome, with an SNP density of 0.198 per gene length.This suggests a higher frequency of transitions compared to transversions, in line with observations in RNA viruses 4 .Moreover, the CP gene exhibited a slightly higher bias of 2.43, indicating a similar SNP density to the overall genome but with a preference for transitions.In contrast, the MP gene displayed a significantly higher transition/transversion bias of 3.5 and the highest SNP density at 0.218, suggesting potential evolutionary pressures or a higher mutation rate in this gene 19 .
Interestingly, the RdRp gene showed a bias of 2.73 and an SNP density of 0.195, hinting at a higher rate of transitions compared to the CP gene.These varying biases across ToBRFV genes may reflect distinct evolutionary dynamics and constraints in each gene, underscoring the importance of considering gene-specific factors in mutational processes 20 .
Comparing our findings with previous studies, we observed a high level of genetic similarity among ToBRFV sequences, with up to 43 SNPs identified 7 .The CP gene emerged as the most conserved region, displaying low genetic variation and high conservation levels, possibly linked to elicitor recognition mechanisms in host plants 21 .
In contrast, the MP gene exhibited the highest nucleotide diversity, consistent with its role in overcoming plant resistance mechanisms such as the Tm-22 gene.Notably, specific amino acids in the MP gene have been identified as critical for evading host resistance, emphasizing the significance of genetic variation in viral adaptation 22 .
The analysis of the average evolutionary divergence between geographic groups of ToBRFV isolates revealed insightful observations regarding the genetic relationships and variability among different populations.Our findings indicate distinct patterns of divergence within various regions, shedding light on the evolutionary dynamics of ToBRFV.Notably, the PCA of the genetic distance data, as illustrated in the provided plot (Fig. 4), reveals significant differences among geographic groups.The PCA scatter plot shows the distribution of countries based on the first two principal components, with PC1 (33.67% variance) and PC2 (12.07%variance) together capturing 45.74% of the total variance in the genetic distances.Countries such as Jordan and Turkey are positioned close to each other, indicating low genetic distances and suggesting a close genetic relationship or recent common ancestry.In contrast, Peru, Germany, Italy, Greece, and Belgium are spread out further along the principal components, displaying higher genetic distances.This suggests a greater degree of genetic variation, possibly due to prolonged separate evolution or adaptation to diverse environments.Israel and the State of Palestine, Mexico, the United Kingdom, Canada, the Netherlands, Egypt, and the USA exhibit moderate divergence.These countries balance genetic similarity and diversity, reflecting intermediate positions on the PCA plot.China, positioned distinctly on the PCA plot, shows moderate to high divergence, suggesting a unique evolutionary path or a diverse set of isolates within the region.
Our results align with previous research indicating low gene flow and limited genetic variability in ToBRFV populations.observed that ToBRFV diverges from neutral evolutionary theory, indicating the virus is not undergoing natural selection and that accumulated mutations are low-frequency and random.The virus appears to not undergo natural selection, with accumulated mutations being low-frequency and random.The divergence from neutrality is most likely caused by a population expansion of ToBRFV, supported by the absence of any structuring in the phylogenetic tree.These insights are crucial for ensuring the continued efficacy of current diagnostic tools 4 .

Codon-level analysis reveals episodic positive selection in the ToBRFV MP gene
Our study conducted a selection analysis on specific codons within the ToBRFV genome to investigate episodic positive selection and evolutionary dynamics.Utilizing a likelihood ratio test (LRT), we identified episodic positive selection primarily in the MP gene, with significant findings at codons 123 and 192 (Fig. 5).These findings suggest that certain codons within the ToBRFV genome are subject to episodic positive selection, which may be indicative of adaptive changes in response to host immune pressures or other environmental factors.The detection of these sites is crucial for understanding the evolutionary dynamics of the virus and could have implications for antiviral strategies.
For codon 123 (GTt > ACt), belonging to the first set of codons analyzed, we observed a non-synonymous rate (beta) of 1568.25 with a weight of 1.00, indicating strong positive selection.The LRT value of 8.712 (p = 0.0057) confirmed episodic selection at this site.Similarly, codon 192 exhibited a significant non-synonymous rate of 228.83, with an LRT value of 4.490 (p = 0.0491), signifying episodic positive selection (Table 3).
These findings suggest that certain codons in the ToBRFV genome undergo adaptive changes in response to environmental factors or host pressures, which are crucial for understanding the virus's evolutionary dynamics and have implications for vaccine design and antiviral strategies (Table 3).
Comparing our results to previous studies, Güller et al. and Esmaeilzadeh et al. reported strong purifying (negative) selection on the MP and CP gene domains of ToBRFV.Our study corroborates these findings, indicating that negative selection is the predominant force shaping the evolution of the ToBRFV genome.Additionally, Çelik et al. highlighted strong negative evolutionary constraints on the ORFs of ToBRFV 5,15,20 However, we found two coding regions of MP under positive selective pressures, suggesting potential adaptive changes in these regions.
Furthermore, our study aligns with the findings of Hak and Spiegelman and Yan et al. regarding specific residues in the ToBRFV MP gene involved in evading host resistance mechanisms.Hak and Spiegelman identified residues in the central region of MP critical for escaping recognition, while Yan et al. demonstrated the importance of residues H67, N125, K129, A134, I147, and I168 in evading Tm-22-mediated resistance 19,22 .Our identification of coding positions 123 and 192 in the central region of MP confirms the previous studies and suggests that these residues are also important in resistance breaking.This indicates that the evolutionary pressures on the MP gene may be driving adaptations that enhance the virus's ability to overcome host defenses.Our findings support the notion that specific residues in the MP gene are under positive selection and play a crucial role in resistance breaking.The identification of these adaptive changes highlights the importance of further investigating these coding regions to understand their role in the virus's ability to evade host resistance mechanisms.Future studies should focus on the functional implications of these residues to develop strategies for managing ToBRFV infections.

Codon usage bias analysis of tomato brown rugose fruit virus genome analysis
The comparative analysis of the codon usage bias in the MP, CP, and RdRp genes of ToBRFV revealed distinct patterns of codon usage bias and adaptation to the host's translational machinery (Fig. 6, Supplementary Table 1-3).The codon usage patterns observed in our study align with previous research findings in plant viruses 23,24 .Our results indicate that there are significant differences in codon preferences among the genes, reflecting the functional importance and evolutionary pressures experienced by each gene.
In our study, the GC content in the CP gene was centered around 55%, suggesting a moderate GC bias that could influence codon usage and protein structure (Fig. 6).The Effective Number of Codons (ENC) values for CP indicated a moderate level of codon bias, reflecting the balance between mutational pressure and natural selection, as similarly observed in various plant RNA viruses 25 .The Codon Adaptation Index (CAI) peaks in CP at around 0.7, indicating a moderate level of adaptation to the host's translational machinery 26 .In contrast, Gómez et al. reported that PVY genes had lower CAI values compared to their hosts, suggesting that PVY is less adapted to its hosts than ToBRFV, which highlights the variability in codon adaptation among different plant viruses 27 .For the MP gene, the GC content peaked at about 50%, slightly lower than CP, potentially impacting the amino acid composition of the protein.The ENC distribution for MP suggested a similar level of codon bias as CP, while the CAI peaks at around 0.8 indicated a higher adaptation to the host's translational efficiency compared to CP 26 .
In the RdRp gene, the GC content peaked slightly above MP, hinting at a potential for a more stable RNA structure.The broader distribution of ENC values for RdRp suggested less codon usage bias, indicating a more diverse set of codons used for encoding amino acids.The CAI for RdRp, although similar to MP, was slightly lower, suggesting a lesser degree of adaptation to the host's translational machinery compared to MP 26 .
Our findings support the idea that codon usage patterns are influenced by a combination of factors, including mutational bias and translational selection, rather than solely translational selection 23,24,28 .He et al. highlighted that these factors, along with gene length, secondary protein structure, and selective transcription, play significant roles in shaping codon usage bias in plant RNA viruses 25 .This is corroborated by the ENC-GC3S plot analysis in PVY, which showed clustering below the expected curve, indicating the influence of GC content on codon usage 27 .The observed differences in codon preferences among genes within the same genome may be attributed to varying evolutionary pressures and functional constraints, similar to the findings in Potato Virus M 28 .While Cheeran et al. demonstrated that natural selection predominantly shapes codon usage bias in TMV genes, our analysis of ToBRFV suggests a nuanced interplay between mutational bias and translational selection, reflecting unique evolutionary trajectories in different viral genomes 29 .
Understanding these patterns of codon usage bias and adaptation to the host's translational machinery in ToBRFV genes can provide valuable insights into the virus's evolution and host-virus interactions.He et al. demonstrated that host selection pressure significantly influences the codon usage patterns of plant RNA viruses, suggesting that ToBRFV may similarly evolve to optimize its replication and survival in its host 25 .This is further supported by the work of Gómez et al., who demonstrated that PVY strains also show similar codon usage preferences, underscoring shared evolutionary mechanisms among plant-infecting viruses 27 .
The RSCU values for the CP gene of ToBRFV were rigorously analyzed, uncovering significant codon bias patterns.These patterns closely resemble those found in potato virus Y (PVY) strains, which also exhibit a preference for codons ending in A or U 27 .RSCU serves as a metric for comparing the observed frequency of codons to the expected frequency under equal usage of synonymous codons.Notably, codons associated with Phenylalanine www.nature.com/scientificreports/(Phe), Leucine (Leu), Serine (Ser), and Proline (Pro) exhibited varying degrees of bias towards specific codons, as evidenced by their RSCU values.For instance, the Proline codons CCC and CCG demonstrated substantial bias with RSCU values of 1.00464 and 2.450116, respectively, indicating a distinct preference for these codons.Similarly, Histidine (His) and Glutamine (Gln) codons displayed a pronounced bias towards CAC and CAA, respectively, with RSCU values approximating 2, suggestive of their preferential usage within the CP gene.Moreover, the Arginine (Arg) codons, particularly CGC and AGA, manifested noticeable bias, underscored by their elevated RSCU values, indicative of a predilection for these codons within the CP gene of ToBRFV (Table 4).Moreover, the analysis of the w_cai column, reflecting the relative adaptiveness of each codon, unveiled crucial insights into the codon usage patterns of ToBRFV.These findings are in line with Gómez et al., who reported that PVY strains also showed a high preference for A/U-ending codons, suggesting a common evolutionary strategy among plant viruses to optimize codon usage for host adaptation 27 .Noteworthy, values closer to 1 in the w_cai column signify higher adaptiveness, thus delineating the efficiency of gene expression and protein synthesis within ToBRFV.This comprehensive analysis not only enhances our understanding of codon usage dynamics in ToBRFV but also bears significant implications for gene expression regulation and the formulation of targeted control strategies against the virus.
Comparing our findings with those of He et al., who investigated the codon usage patterns of Narcissus late season yellows virus (NLSYV) and Narcissus yellow stripe virus (NYSV) and Narcissus degeneration virus (NDV) CP genes, a recurring preference for A/U in the third codon position across narcissus viruses was observed 30 .This comparative analysis sheds light on the shared characteristics and distinctive features of codon usage among different viruses, providing valuable insights into the evolutionary mechanisms and selective pressures governing codon bias in viral genomes.The codon usage patterns in ToBRFV show a mix of U-and G-ending codons, similar to what has been observed in Potato Virus M (He et al., 2019).This preference, despite the GC-rich or AU-rich composition, suggests the influence of mutation pressure on codon selection.
Therefore, the meticulous analysis of RSCU values and codon adaptation index in the CP gene of ToBRFV offers the essential groundwork for deciphering the intricate interplay between codon bias, gene expression, and viral evolution.These findings not only enrich our understanding of viral genome dynamics but also hold promise for informing future antiviral strategies and therapeutic interventions.
In this study, we conducted a comprehensive RSCU analysis for the MP and RdRp genes of ToBRFV (Tables 5  and 6).RSCU provides valuable insights into the preferences and biases in codon usage, shedding light on the genetic characteristics of the virus.The RSCU values, presented in Tables 5 and 6, reveal distinct patterns of codon usage in both genes, which could have significant implications for gene expression efficiency, protein synthesis, and the development of control strategies against ToBRFV.
For the RdRp gene of ToBRFV, our analysis reveals distinct codon usage preferences.Phenylalanine (Phe) codons favor TTT over TTC, while Leucine (Leu) codons exhibit no bias between TTA and TTG.Serine (Ser) codons display a moderate bias, with TCT showing the highest RSCU value.Proline (Pro) codons show a strong bias towards CCA and CCG, and Histidine (His) and Glutamine (Gln) codons exhibit a strong bias towards CAC and CAA, respectively.Moreover, Arginine (Arg) codons display notable biases, particularly for CGA and CGC.Comparing our findings with previous studies on cucurbit-infecting tobamoviruses 26 , we observe similarities in codon usage preferences, particularly in the preference of U over C in most synonymous third codon positions.This consistency across tobamoviruses underscores the potential for targeted interventions in viral gene control and attenuation.Strategies such as deoptimizing synonymous codons less used by both the virus and host may offer avenues for reducing viral gene expression and virulence.However, considerations of codon-specific biases, such as increasing codons ending with CpA to align with host preferences, must be carefully evaluated to avoid adverse effects on viral survival.
Comparing our findings with those of Cheeran et al. on Tobacco Mosaic Virus (TMV), we observe commonalities in codon usage preferences, particularly in the prevalence of high-frequency codons ending with nucleotide T, indicative of shared evolutionary pressures shaping viral codon bias 29 .The codon usage patterns in ToBRFV show a mix of U-and G-ending codons, similar to what has been observed in Potato Virus M 28 .This preference, despite the GC-rich or AU-rich composition, suggests the influence of mutation pressure on codon selection.
Moreover, our study contributes to understanding the mutation pressures shaping codon usage in ToBRFV genes.We found a preference for A/G in TuMV protein-coding regions, indicative of mutation pressures 31 .This insight underscores the dynamic nature of codon usage and its implications for viral evolution and adaptation.The comprehensive analysis of codon usage bias in ToBRFV genes, alongside comparative studies like that of Potato Virus M 28 , provides deeper insights into the evolutionary dynamics of plant viruses.These insights are essential for developing effective antiviral strategies and improving our understanding of virus-host interactions.
In conclusion, our analysis provides valuable insights into the codon usage patterns of ToBRFV, which could inform strategies for controlling viral gene expression and virulence.Further research into the functional implications of codon biases and their interplay with host factors is warranted to deepen our understanding of virus-host interactions and aid in the development of effective control measures against ToBRFV.

Fig. 1 .
Fig. 1.Phylogenetic relationships of ToBRFV amino acid sequences of coat protein (CP), movement protein (MP) and RNA-dependent RNA polymerase (RdRp).The tree was constructed using the maximum likelihood method and bootstrap with 1000 replications and a threshold of 60.Yellow highlight: Tomato mosaic virus amino acid sequences of CP, MP and RdRp as outgroup.Read circular show bootstrap for each node.Dark blue: Iranian isolate.

Fig. 2 .
Fig. 2. Distribution of amino acid change across ToBRFV genes: (a) Movement Protein (MP), (b) Coat Protein (CP), and c: RNA-dependent RNA polymerase (RdRp).RdRp.Amino acid change is shown in the x-axis and frequency in the y-axis color spots for each isolate.Isolates are in the left part of the plot.

Fig. 3 .
Fig. 3. Phylogenetic tree with ancestral analysis of ToBRFV whole genome nucleotide sequences using maximum likelihood method and bootstrap with 1000 replication and threshold 60.Ancestor and sister groups were highlighted in green.Yellow highlight: Tomato mosaic virus whole genome sequences as outgroup.Read circular show bootstrap for each node.Dark blue: Iranian isolate.

Fig. 4 .
Fig. 4. Principal component analysis (PCA) of Tomato brown rugose fruit virus (ToBRFV) genetic distances.PCA of ToBRFV genetic distances between various geographic regions.The plot visualizes the first two principal components, which capture 45.74% of the total variance in the genetic distance data.Each point represents a country, and the distance between points reflects the genetic divergence between the corresponding ToBRFV isolates.

Fig. 5 .
Fig. 5. Detection of positive selection in movement protein (MP) of Tomato brown rugose fruit virus (ToBRFV) using dn/ds ratios.The figure illustrates the results from the Mixed Effects Model of Evolution (MEME) analysis, highlighting sites under positive selection in the ToBRFV genome.The x-axis enumerates the genomic sites, while the y-axis displays the Likelihood Ratio Test (LRT) values, which are indicative of positive selection when elevated.Peaks in the LRT values, represented by vertical bars, pinpoint the locations where the non-synonymous to synonymous substitution ratio (dN/dS) exceeds one, suggesting adaptive evolutionary changes.

Fig. 6 .
Fig. 6.Codon usage bias analysis for Tomato Brown Rugose Fruit Virus (ToBRFV) genes: Movement Protein (MP), Coat Protein (CP), and RNA-dependent RNA Polymerase (RdRp).The figure displays histograms representing the codon usage bias for the MP, CP, and RdRp genes of the ToBRFV.The histograms compare the GC content, Effective Number of Codons (ENC), and Codon Adaptation Index (CAI) across these genes.

Table 1 .
Estimates of average evolutionary divergence over sequence pairs within groups.

Table 2 .
Maximum likelihood estimates of transition/transversion bias, and number of single nucleotide polymorphisms (SNPs) in genes.

Table 3 .
Evidence of episodic selection in movement protein (MP) of Tomato brown rugose fruit virus (ToBRFV) genes at specific codon sites.All three genes were analyzed and the positive selection was just in the MP gene.

Table 4 .
In conclusion, our multi-faceted analysis of ToBRFV provides valuable insights into its genetic diversity, evolutionary dynamics, and adaptive strategies.Through comprehensive phylogenetic analysis, we revealed a complex network of evolutionary relationships among ToBRFV isolates, indicating extensive global spread and significant genetic diversity.Our findings suggest ongoing gene flow among ToBRFV populations from different geographic regions, with limited genetic differentiation and notable genetic similarity observed across diverse populations.Furthermore, SNP distribution analysis unveiled differential evolutionary pressures on ToBRFV genes, emphasizing the importance of considering gene-specific factors in mutational processes.At the codon level, our study identified distinct patterns of codon usage bias and selection pressures within the ToBRFV genome.We observed varying levels of genetic diversity and evolutionary constraints among different genes, with Episodic positive selection primarily observed in the MP gene.This indicates adaptive changes in response to host immune pressures or environmental factors.Comparative analysis of codon usage bias in the CP and RdRp genes provided further insights into functional constraints and adaptation to the host's translational machinery.These findings underscore the importance of understanding codon usage dynamics in the context of viral evolution and hostvirus interactions.Overall, our study enhances the understanding of ToBRFV evolution, host-virus interactions, and the molecular mechanisms driving viral adaptation.These insights can inform the development of targeted strategies for monitoring and controlling the spread of ToBRFV, as well as guide future research into viral pathogenesis and the development of antiviral interventions.Continued Collaboration and surveillance efforts are essential to stay ahead of emerging viral threats and safeguard global tomato production.Relative synonymous codon usage (RSCU) analysis for the coat protein (CP) gene of Tomato brown rugose fruit virus (ToBRFV).The table provides an in-depth look at the RSCU values for the CP gene of the ToBRFV.RSCU is a measure of codon bias that compares the observed frequency of codons to the expected frequency if all synonymous codons for the same amino acid were used equally.An RSCU value of 1 indicates no bias, values greater than 1 indicate a positive bias and values less than 1 indicate a negative bias.

Table 5 .
Relative synonymous codon usage (RSCU) analysis for the movement protein (MP) gene of Tomato brown rugose fruit virus (ToBRFV).The table provides a comprehensive RSCU analysis for the MP gene of the ToBRFV.RSCU is a measure that indicates the relative frequency of synonymous codons used for encoding each amino acid.An RSCU value of 1 suggests no bias, above 1 indicates a preference for that codon, and below 1 indicates avoidance.

Table 6 .
Relative synonymous codon usage (RSCU) analysis for the RNA-dependent RNA polymerase (RdRp) gene of tomato brown rugose fruit virus (ToBRFV).The table presents the RSCU values for the RdRp gene of ToBRFV, offering insights into codon usage preferences.RSCU values greater than 1 indicate a bias towards a particular codon, while values less than 1 suggest avoidance.