Systematic analysis of the codon usage patterns of African swine fever virus genome coding sequences reveals its host adaptation phenotype

African swine fever (ASF) is a severe haemorrhagic disease caused by the African swine fever virus (ASFV), transmitted by ticks, resulting in high mortality among domestic pigs and wild boars. The global spread of ASFV poses significant economic threats to the swine industry. This study employs diverse analytical methods to explore ASFV’s evolution and host adaptation, focusing on codon usage patterns and associated factors. Utilizing phylogenetic analysis methods including neighbour-joining and maximum-likelihood, 64 ASFV strains were categorized into four clades. Codon usage bias (CUB) is modest in ASFV coding sequences. This research identifies multiple factors – such as nucleotide composition, mutational pressures, natural selection and geographical diversity – contributing to the formation of CUB in ASFV. Analysis of relative synonymous codon usage reveals CUB variations within clades and among ASFVs and their hosts. Both Codon Adaptation Index and Similarity Index analyses confirm that ASFV strains are highly adapted to soft ticks (Ornithodoros moubata) but less so to domestic pigs, which could be a result of the long-term co-evolution of ASFV with ticks. This study sheds light on the factors influencing ASFV’s codon usage and fitness dynamics, enriching our understanding of its evolution, adaptation and host interactions.


INTRODUCTION
The African swine fever virus (ASFV) is the only member of the family Asfarviridae, representing the sole example of dsDNA arboviruses.Its genome spans approximately 170 to ~190 kb and encompasses 160-234 ORFs [1,2].Among them, the amino acid sequence of p72 (B646L) exhibits high conservation across different strains, serving as the basis for ASFV genotyping [3].Historical records trace the initial transmission of ASFV from Africa to Lisbon, Portugal, in 1957, with the first ASFV strain isolated from soft ticks (Ornithodoros erraticus) in Spain in 1960 [4].ASFV infection in natural hosts rarely leads to clinical symptoms and manifests as persistent carriage of the virus [5].Conversely, infection of domestic swine (Sus domesticus) precipitates a high-fatality affliction characterized by mortality rates of 90-100 % among young pigs.African swine fever (ASF), the resultant disease, constitutes a grave, swiftly progressing haemorrhagic ailment typified by elevated fever, pervasive haemorrhagic manifestations and necrosis of lymphoid tissues [6].This disease is defined by the World Organization for Animal Health (WOAH, previously known as OIE) as one of the most devastating animal diseases [7].An exhaustive understanding of the replication and evolution of this pivotal virus is therefore indispensable.Synonymous codons, encoding the identical amino acid, exhibit non-random and non-uniform usage across genomes.This pervasive phenomenon, known as codon usage bias (CUB), entails dissimilar frequencies of synonymous codon utilization [8].Synonymous codon variation prevails at multiple tiers: within amino acid codons, among genes in genomes and across different species [9].Studies elucidating codon usage have identified influential factors, including secondary protein structure, natural selection processes [10], mutational pressures, replication efficiency and selective transcription, among others [11].CUB is commonly used to assess translation efficiency in proteins.The relationship between codon optimization and ribosome translation speed remains contentious [12].Nonetheless, a substantial body of evidence underscores the impact of codon bias on translation fidelity [13], elongation dynamics [14], protein folding [15] and mRNA stability [16].
Genomic CUB holds crucial implications for virus genetic evolution and host adaptation [17,18].Virus replication and protein synthesis rely on host organelles.Thus, the interaction between viral and host codon usage influences viral survival, adaptation and evolution.Previous research has highlighted a dynamic balance between viral and host CUB for optimal evolution [19].The differential usage of the synonymous codons might be crucial for our understanding of viral biology, in particular the interplay between viruses and the immune response of their hosts [20].Understanding viral codon patterns provides details of molecular evolution and viral gene expression regulation.Efficient viral protein expression is vital for inducing immunity during vaccine

Impact Statement
We analysed complete genomic information from 64 African swine fever virus (ASFV) strains representing nine genotypes, encompassing all genotypes currently available in the NCBI database.Our study revealed that natural selection and mutation pressure primarily drive the formation of codon usage biases in ASFV.There are differences in codon usage preferences among different genotypes of ASFV, leading to clustering results based on codon usage patterns being similar to that with p72 genotyping.It is noteworthy that ASFV strains isolated from Africa exhibit distinct codon usage preferences.Moreover, there is a significant disparity in codon usage biases between the ASFV genome and those of its hosts.In comparison to domestic pigs and wild boars, ASFV's codon bias is closer to ticks, possibly as a result of the long-term co-evolution between ASFV and soft ticks.
design.Studying viral codon bias can enhance protein expression optimization, aiding vaccine development.Thus, understanding ASFV codon evolution is essential for these reasons.
In this study, we conducted a comprehensive analysis of codon usage and composition in the complete genome sequences of 64 ASFV strains with full coding sequence (CDS) annotation information (at least 120, averaging 174) from the NCBI database.We aimed to assess the codon bias of ASFV by comparing the genomic base composition and codon usage patterns.Additionally, we wanted to explore potential factors influencing the formation of this bias, such as mutation pressure and natural selection.By comparing Codon Adaptation Index (CAI) and Similarity Index (SiD) values, we aimed to investigate the differences between viral and host biases, looking into the adaptability of ASFV to its various hosts.

Sequence data acquisition and processing
For this investigation, 64 complete ASFV genomes sourced from the National Center for Biotechnology Information (NCBI) GenBank database (https://www.ncbi.nlm.nih.gov/) were utilized.The ORFs of each genome were sequentially concatenated and spliced into full-lengthCDSs for subsequent data analysis experiments.A total of 11071 CDSs, corresponding to 3 488 280 codons, were subjected to analysis.Comprehensive details regarding the 64 chosen ASFV strains, including strain name, serial number, isolation time and location, as well as source literature, are presented in Table S1,available in the online version of this article.

Phylogenetic analysis
To investigate the genetic evolutionary relationships among the chosen strains, phylogenetic analysis was conducted based on the p72 protein's amino acid sequences of the 64 selected ASFV strains.Initial multiple sequence alignments were executed via Clustal X (v2.1), followed by the establishment of phylogenetic trees employing the neighbour-joining (NJ) and maximum-likelihood (ML) algorithms with MEGA X (v7.0.26) and IQ-tree (v2.2.0) [21], respectively.The models employed were No. of difference and Q.bird+F+G4, both determined as optimal nucleotide substitution models through software calculations for these sequences.The reliability of the phylogenetic tree was evaluated by bootstrap methods with 5000 replicates.Visualization of the phylogenetic trees was accomplished using iTOL (https://itol.embl.de/).

Analysis of nucleotide composition
Subsequent compositional attributes were computed using CodonW (v1.4.2) and Emboss for the CDSs within ASFV genomes: (i) individual nucleotide content within CDSs and the occurrence frequency of each nucleotide at the third position of synonymous codons (A 3s , C 3s , U 3s and G 3s ); (ii) collective GC content and the frequency of G+C nucleotides at each position of synonymous codons (GC 1s , GC 2s and GC 3s ); and (iii) count of synonymous codons (L_syn) and total amino acid count (L_aa); (iv) Effective Codon Number (ENC) values, along with the grand average of hydropathy (GRAVY) and protein aromaticity (AROMA).The GRAVY value signifies the summation of hydropathy values for all amino acids in a sequence, divided by the residue count [22].The AROMA value represents the prevalence of aromatic amino acids -namely, Phe, Tyr and Trp -within a given amino acid sequence.The analysis excluded five codons, including the three stop codons (UAA, UAG and UGA), the initiator codon AUG (Met) and the tryptophan-encoding codon UGG.CodonW (v1.4.2) was used to compute the dinucleotide occurrence within ASFV CDSs.The following equation was utilized: Dinucleotide frequencies refer to the occurrence of the 16 distinct dinucleotides, calculated individually across each of the three potential codon positions.Dinucleotides displaying a relative abundance >1.25 were identified as overrepresented, while those with a relative abundance <0.78 were characterized as underrepresented [23].

Effective codon number analysis
ENC values span from 20 to 61 and serve as a metric for quantifying the extent of CUB within ASFV encoding sequences.An ENC value of 61 signifies uniform usage of all codons, making it evident that a lower ENC value implies a more pronounced codon bias.Conventionally, an ENC value of ≤35 suggests substantial codon bias in the gene [24].
The ENC-GC 3s plot effectively captures the level of codon bias and the primary influences contributing to its development [25].Specifically, in cases where sequence codon usage is entirely random, data points align with the anticipated curve.However, when codon usage is impacted by factors such as natural selection, the ENC value deviates markedly from the expected curve.Computation of the ENC expectation curve is conducted as follows: where s represents the value of GC 3s (%).

Neutrality plot analysis and Parity Rule 2 analysis
Neutrality plots were constructed using the GC3 and corresponding GC12 values (averaging GC1 and GC2) of the viral CDS.These plots depict the influence of genome composition and natural selection on the viral genome.A diagonal fit of the regression curve (slope=1) suggests that CUB is exclusively shaped by mutation pressure.Conversely, a regression curve slope approaching zero signifies that natural selection predominantly drives codon bias determination [26,27].
Parity Rule 2 (PR2) plot analysis was conducted to examine the influence of mutation and selection pressure on gene codon usage [28,29].The PR2 plot is a straightforward scatter plot wherein the AU bias [A3/ (A3+U3)] on the third position of quadruple degenerate codons serves as the ordinate, and the GC bias [G3/ (G3+C3)] acts as the abscissa.

Principal component analysis
Principal component analysis (PCA) is a multivariate statistical method used to reduce the dimensionality of multiple variables to facilitate comparison and reflect the relationship between samples.In this study, the codon utilization patterns within CDSs of diverse ASFV strains were scrutinized.Each synonymous codon value is represented as a one-dimensional vector, defined as a correlation variable.Subsequently, the 59-dimensional vector, resulting from synonymous codon values, is dimensionally reduced into an uncorrelated variable known as a principal component.

Relative synonymous codon usage analysis
To juxtapose the codon usage preferences of ASFV with those of its host organisms, the relative synonymous codon usage (RSCU) values of codons within ASFV CDSs were computed and compared against those of the wild boar (Sus scrofa), domestic pig (Sus scrofa domestica) and two soft ticks (Ornithodoros moubata and Ornithodoros savignyi).The reference datasets for host RSCU were sourced from the Codon Usage Database (http://www.kazusa.or.jp/codon/) [30].The RSCU index was calculated as follows: where X ij is defined as the count of occurrences of the ith codon of the jth amino acid, and n i represents the number of synonymous codons.The RSCU value represents he observed frequency of a specific codon in a genetic sample and its anticipated frequency of usage.An RSCU value of 1.0 indicates no CUB for the corresponding amino acid.Synonymous codons exhibiting RSCU values >1.6 and <0.6 are classified as over-and underrepresented codons, respectively [31].

Codon adaptation index analysis
CAI analysis reveals virus adaptation in codon usage to that of the corresponding host(s), and a higher CAI signifies a more robust adaptation to the host cell environment.It is a quantitative technique predicting gene expression levels based on its CDS [32].CAI values range from 0 to 1, with 1 indicating that ASFV's codon usage frequency matches that of the reference set.CAI analysis of ASFV genes was executed using COUSIN [33].The synonymous codon usage patterns of viral hosts (Sus scrofa, Sus scrofa domestica, Ornithodoros moubata and Ornithodoros savignyi) and control group (Homo sapiens) were used as references [30].

Similarity index
The SiD indicates the effect of host codon usage on virus codon usage patterns [34].Similar to the aforementioned CAI analysis, these datasets encompass RSCU values for ASFV, its host organisms and the control group.The cumulative effect of a distinct host's overall codon usage patterns on the constitution of virus-wide codon usage is quantified through the definition of the similarity index D (A, B): where a i is the RSCU value of the 59 synonymous codons in the ASFV-encoded sequence, and b i denotes the RSCU value of the corresponding codon in the host.Consequently, R (A, B) is formulated as the cosine of the angle between the vectors in spaces A and B, indicating the extent of resemblance between ASFV and the overall codon usage pattern of the host.D (A, B) serves to gauge the potential influence of the host's overarching codon usage on ASFV, with values ranging between 0 and 1.

Codon pair bias
Codon pair bias (CPB) refers to the non-random usage of synonymous codon pairs within protein-coding regions of genomes [35].CPB is quantified by calculating a codon pair score (CPS) for each pair, comparing its frequency to the expected chance frequency based on the individual codon frequencies in a set of sequences: ) where F(AB) is the frequency of codon pair AB, F(A) the frequency of codon A, F(B) the frequency of codon B, F(XY) is the frequency of amino acid pair XY, F(X) the frequency of amino acid X and F(Y) the frequency of amino acid Y.The CPB score is calculated using the following formula: where k represents the total number of codon pairs in the given sequence.This section refers to Python script statements that were adapted from the CPB calculation portion in Codonshuffle [36].

Statistical analysis
Pearson correlation and linear regression analyses were conducted using R Language (v4.2.2) and SPSS 27.A P-value of <0.01 (**) indicates a highly significant correlation, while 0.01<P<0.05(*) signifies a significant correlation.Given that SiD and CAI values deviate from a normal distribution, a comparative assessment was executed through one-way ANOVA, alongside the nonparametric Wilcoxon and Mann-Whitney test, to ascertain potential notable distinctions between mean values across different groups.A P-value of <0.001 (***) indicates an extremely significant difference.

ASFV strain information and phylogenetic analysis
Comprehensive information about the 64 designated ASFV strains, encompassing strain names, accession numbers, isolation dates and locations, along with references, is presented in Table S1.To elucidate the interrelationships among the chosen ASFV strains, we initiated our investigation with a phylogenetic analysis of the p72 protein sequences from these 64 strains.This analysis was executed using the NJ and ML methods.
By sequencing the B646L gene encoding the main structural capsid protein p72, ASFV has been categorized into 24 distinct genotypes (GTs) [37].The strains selected for the study encompass nine of these GTs, primarily originating from GT I and GT II.All selected strains possess complete genome sequencing information and relatively comprehensive CDS annotation data.The evolutionary analysis results show that the topological structure of the evolutionary trees reconstructed using the NJ and ML methods are highly similar.Thus, only the ML phylogenetic tree is displayed here (Fig. 1).On the basis of lineage, we segregated the 64 ASFV strains into four distinct clades, forming the basis for subsequent analysis.Clade 1 and Clade 2 represent GT I and GT II respectively.Clade 3 includes GTs IX, X and XIV, while Clade 4 includes GTs VIII, IV, XX and XXII.Notably, the Zambia 1983 strain (MN318203) was previously attributed to GT I, yet our findings indicate a closer association with other South African GTs (XXII, VIII, etc.), in agreement with related investigations conducted by Forth et al. [38].

Nucleotide composition
To determine the potential impact of nucleotide constraints on codon usage, nucleotide composition analysis of 64 ASFV was performed (Table S2).A predominance of A (31.6%) and T (28.4%) was observed in ASFV coding sequences, whereas C (20.4%) and G (19.6%) were less abundant.Similarly, the nucleotide composition at the third position of synonymous codons indicated higher proportions of A 3s ( The relative abundance of 16 dinucleotides in the 64 ASFV CDSs was calculated to assess their characteristics, which can help differentiate between DNA genomes from different strains.This measure also partly reflects the species-specific aspects of DNA replication, modification and repair mechanisms [39].Our calculations revealed that the distribution of relative dinucleotide abundance in ASFV coding sequences was non-random (Fig. S1 and Table S3).Dinucleotides AA, AT and TT were overrepresented (ρxy≥1.25,with values of ρxy=1.855±0.178,ρxy=1.453±0.017and ρxy=1.512±0.096,respectively), while dinucleotides CG, GG and GT were underrepresented (ρxy≤0.78,with values of ρxy=0.574±0.018,ρxy=0.737±0.009and ρxy=0.694±0.066,respectively).Here, ρxy represents the frequency values of the corresponding dinucleotides.These findings collectively suggest that the ASFV genome exhibits a distinct dinucleotide usage pattern.
TpA and UpA dinucleotides are typically underrepresented in the genomes of both DNA and RNA viruses [40].The reduction in UpA dinucleotides can enhance the stability of nucleic acid sequences [41].However, in ASFV, we did not observe a low frequency of TpA usage; instead, we observed a low frequency of CpG usage.This observation may be directly related to the fact that the viral genome itself is rich in A and T but lacks G and C. Previous studies have shown that CpG deficiency in a virus is associated with the immune evasion characteristic [42].Based on the computationally derived genomic nucleotide composition-related findings, further analysis of the factors influencing CUB and its formation for ASFV codons was conducted.

Effective codon number
To assess codon usage bias in different ASFV isolates, we calculated the ENC value.The calculated ENC values for ASFV coding sequences ranged from 56.774 to 57.894, with an average of 57.111±0.223(Table S2).An ENC value >35 indicates a low CUB in the ASFV complete genome [25].This finding aligns with previous studies on other porcine viruses, such as porcine circoviruses (54.71±0.87)[43], porcine parvoviruses (57.433±0.192)[44] and classical swine fever virus (52.69±0.47)[45].The limited codon preference observed in these viruses may promote effective replication in diverse host cells [46].
Additionally, when considering ENC values across different clades, we observed that ASFV strains in clade 4 exhibited significantly higher values than others.To delve deeper into the factor leading to the low CUB of the ASFV genes, we analysed the relationship between the ENC value and the percentage of G or C in the third codon position (GC 3s ).The ENC plot (Fig. 2) showed that all spots clustered slightly below on the left side of the expected curve.This indicated that both mutational bias and natural selection play substantial roles in codon selection in ASFV genes.

Neutrality plot analysis and Parity Rule 2 analysis
Neutrality plot analysis, plotting GC 12s against GC 3s values, is used to investigate the influence of natural selection and mutation pressure on codon usage [26].Neutrality plot analysis revealed distinct k-values for different ASFV clades: Clade 1-ASFV had k=0.479 (52.1 % natural selection), while Clade 2-ASFV had k=0.452 (54.8 % natural selection)(Fig.3a).It indicated that natural selection and mutation pressure both participate in ASFV CUB.Furthermore, to ascertain whether codon bias is exclusive to highly biased genes, a PR2 plot was constructed to illustrate the association between A and U content and G and C content within four-fold degenerate codon families (alanine, arginine, glycine, leucine, proline, serine, threonine and valine) (Fig. 3b).
The plot revealed that ASFV tended to use A and U more frequently than G and C at the third site of four-fold degenerate  codons.These results also indicate that codon preference in ASFV is influenced by a combination of mutation pressure and other factors, including natural selection.Kunec and Osterrieder found that codon bias is essentially a result of dinucleotide bias [23].
To investigate whether this theory holds true for ASFV, an analysis of the RSCU values for the CDS of the 64 ASFV strains was subsequently conducted.

Principal component analysis
To assess the influence of geographical distribution on ASFV's codon usage patterns, we conducted a PCA based on RSCU values calculated below, focusing on ASFV strains isolated from different geographical regions.The first axis of this analysis explained 95.1 % of the total variation, with the second axis accounting for 2.1 %.With the exception of one point in Clade 2 that slightly overlapped with Clade 3, all data points were distinctly clustered into four groups, representing the four ASFV clades.This spatial distribution of ASFV isolates reflected their geographical origins (Fig. 4).Specifically, Clade 1-ASFV was associated with the European region, Clade 2-ASFV with the Asian region, Clade 3-ASFV with the East African region and Clade 4-ASFV with the South African region.This outcome is congruent with the preceding evolutionary analysis.Consequently, it appears that geographical diversity, coupled with factors such as climatic conditions, the presence of natural hosts in the infection region and host susceptibility, have collectively played a role in moulding the codon usage observed in ASFV CDSs.

Relative synonymous codon usag
The patterns of synonymous codon usage in ASFV coding sequences were assessed through RSCU analysis (Fig. 5).A/U-ending codons are preferred in ASFV coding sequences.Among the 18 most favoured codons in the ASFV coding sequence, 15 have A/U endings (five with A-endings and 10 with U-endings).The clustering analysis results of RSCU values closely paralleled the phylogenetic analysis outcomes for p72 sequences (Fig. 1).Clade 4-AFSV exhibited different preferences in codon usage compared to other strains.Within the figure, we have identified the 10 codons displaying the most significant disparities and their respective amino acids, highlighting the prominent distinctions in Pro, Leu, Ser, etc. (Fig. 5).
The results indicated that ASFV has a preference for A/U-terminating codons, which corresponds to the high AU content in these viruses.In contrast, host organisms prefer codons ending with G/C.ASFV does not exhibit strong CUB, with only one codon, UUC (encoding Phe), having an RSCU value <0.6, indicating it is underrepresented.In Clade 4-ASFV, two codons, CUA (Leu) and UCA (Ser), were underrepresented, while one codon, UCC (Ser), was overrepresented (Table 1).The RSCU values 59 synonymous codons are presented.Codons that are preferred, overrepresented (RSCU >1.6) and underrepresented (RSCU <0.6) are indicated in bold, with a red background for overrepresented codons and a green background for underrepresented codons.Table 1.Continued CpG content is usually the lowest in DNA virus and negative sense single-stranded viruses, and low CpG content affects the CUB of viruses containing CpG codons.For instance, in the Nipah virus, the RSCU values of the eight codons containing CpG were underrepresented [47].A similar underrepresentation of CpG-containing nucleotides had been observed in the genome of the equine influenza virus and Flaviviridae viruses [48,49].However, it is worth noting that despite the low content of CpG in ASFV (ρxy=0.574±0.018),we did not observe similar underrepression of CpG-containing codons in ASFV.
To visualize the differences between ASFV strains and their hosts, as well as the representation of each codon, we visualized the data in Table 1 graphically (Fig S2).Compared to the hosts, ASFV did not exhibit strong bias in codon usage, and the majority of codon frequencies were within the range of weak bias (0.78-1.25).With a few exceptions, ASFV tended to use codons that differ from the four host preference codons.For example, when encoding double degenerate codon amino acids such as Cys, Asp, Lys, Asn, etc., ASFV exhibits completely opposite codon usage frequencies compared to all other hosts.The results showed that the codon usage patterns and selection of preferred codons in ASFV CDSs is antagonist to its hosts for the majority of the codons.Overall, there were significant differences in the usage patterns of ASFV genomic codons compared to the four hosts, but compared to wild boar and domestic pig, ASFV's codon usage patterns were closer to those of soft ticks.To validate this conclusion, we conducted adaptive analysis on ASFV and its different hosts using various algorithms, as detailed below.

Codon adaptation index
CAI analysis primarily reflects viral adaptability in codon usage patterns relative to their corresponding hosts, with higher CAI values indicating strong adaptation to the host's cellular machinery [50].Our findings suggest that ASFV exhibits greater adaptability to the codon usage patterns of soft ticks compared to pigs.This is evident from the significantly higher average CAI values of ASFV strains against ticks (0.615623±0.007678for O. moubata and 0.601701±0.007794for O. savignyi) compared to pigs (0.559958±0.004621for Sus scrofa and 0.500247±0.003697for Sus domesticus, P<0.001).Additionally, within soft ticks and wild boars, Clade 4-ASFV showed higher adaptability compared to other strains.For instance, in O. savignyi, the mean CAI value of Clade 4-ASFV (0.623908±0.001045)was higher than that of Clade 1-ASFV (0.601046±0.001959),Clade 2-ASFV (0.596776±0.002515)and Clade 3-ASFV (0.0603103±0.000685).The variation in CUB of ASFV among different clades followed a decreasing trend, from natural hosts such as ticks to wild boars and then to domestic pigs (Fig. 6 and Table S4).This suggests that the adaptability of the main prevalent ASFV strains (GT I and GT II) to the codon usage patterns of domestic pigs has been increasing throughout the evolutionary process.

index
The SiD analysis represents an alternative methodology used to scrutinize the resemblance in codon utilization patterns between ASFV and its respective hosts.Additionally, it serves to expound upon the distinct impacts of various hosts on the evolutionary trajectory of ASFV codon usage patterns, discerning those hosts that have the utmost influence.The results revealed that the domestic pig genome had the maximum effect on ASFV codon usage, while the soft tick genome had the minimum effect (Fig. 7).Specifically, the average SiD value of ASFV in domestic pigs (0.088562±0.000910)was significantly higher than that in the other three hosts (P<0.001),whereas the average SiD value was lowest in O. moubata (0.053315±0.004191),indicating that ASFV's codon usage pattern is highly adaptable to O. moubata (Table S5).Similar to the CAI analysis (Fig. 6), we observed a similar phenomenon where Clade 4 ASFV exhibits adaptation to the domestic pig host similar to other strains.However, in other hosts, Clade 4 ASFV still shows significant differences compared to other strains.

Correlation analysis
To further investigate the impact of mutation pressure on the codon usage pattern of the ASFV CDS, we evaluated the correlation between the nucleotide compositions (A, T, G, C), the third site of the synonymous codons (A 3s , U 3s , G 3s , C 3s , and GC 3s ), ENC values, GRAVY values and AROMA values (Table 2).There were significant positive correlations observed between A and A3 (r=0.984,95 % BCa CI [0.94, 0.99], P<0.01), C and C3 (r=0.950,95 % BCa CI [0.86, 0.98], P<0.01), G and G 3s (r=0.664,95 % BCa CI [0.46, 0.79], P<0.01), and GC and GC 3s (r=0.928,95 % BCa CI [0.87, 0.96], P<0.01) in ASFVs.Here, BCa represents the bias-corrected and accelerated method used for calculating confidence intervals in statistics.These results demonstrated that the CUB of the ASFV was influenced by nucleotide compositions, further confirming the mutational pressure on the formation of codon usage patterns in ASFV CDSs.GRAVY and AROMA values were considered as indicators of natural selection.In ASFV, both the GRAVY and AROMA values were significantly correlated with A 3s , C 3s , U 3s , G 3s , ENC and GC 3s (P<0.01).
The correlation between CAI and ENC reflects the balance between selection and mutation [51].A high correlation (r → 1) indicates that selection pressure dominates, while a low correlation suggests that bias is mainly due to mutations (r close to 0) [52].Similarly, we examined CAI values for four different hosts: wild boar (CAI1), domestic pig (CAI2), soft tick (CAI3) and African soft tick (CAI4), which are representtive hosts of ASFV (Table 3).In the case of wild boar and both types of soft ticks, the CAI values of ASFV displayed a highly significant positive correlation with GRAVY and AROMA values.However, for domestic pigs, there was only a significant correlation observed with GRAVY values (r=0.353,95 % BCa CI [-0.49, -0.26], P<0.05).These results indicate that natural selection, as reflected in GRAVY and AROMA values, significantly shapes ASFV gene expression in wild boars and soft ticks.In contrast, its impact is somewhat reduced in domestic pigs.
The strong correlation between ENC and CAI values highlights that the ASFV genome experiences substantial natural selection pressure in the wild boar (r=0.836,95 % BCa CI [0.71, 0.90], P<0.01), soft tick (r=0.920,95 % BCa CI [0.80, 0.97], P<0.01) and African soft tick (r=0.936,95 % BCa CI [0.83, 0.98], P<0.01), but this influence is less pronounced in domestic pigs.Furthermore, the CAI values of the wild boar group are notably correlated with those of the other three hosts.This is consistent with earlier research suggesting that wild boar, serving as common hosts in both the forest and domestic pig cycles, have a pivotal role in shaping ASFV evolution, adaptation and transmission pathways.

DISCUSSION
In this comprehensive study, we have investigated synonymous codon usage within the CDSs of 64 ASFV strains.Our investigation aimed to unravel the molecular evolution of ASFV under the influence of various host and environmental factors.
The ASFV coding genome exhibited a lower frequency of G and C nucleotides compared to A and T. This correlated significantly with the composition of the third base of synonymous codons (A3, C3, U3, G3).Such correlations suggested that nucleotide  composition has constrained ASFV's codon preferences.The ENC value, indicative of codon bias, was strongly linked to nucleotide frequencies, further emphasizing the impact of nucleotide composition on ASFV's codon bias.The PR2 diagram illustrates an imbalance in the frequencies of A3, U3, G3 and C3 codons in ASFV, indicating that natural selection plays a significant role in moulding the virus's codon preferences.Furthermore, analysis using ENC and neutral plots supports the notion that factors beyond mutation pressure contribute to ASFV's codon usage patterns, aligning with previous research findings [53].The neutrality plot analysis also indicated that natural selection and mutation pressure both participate in ASFV CUB.As expected, PCA based on RSCU values effectively differentiated ASFV across various clades.Clade 4-ASFV exhibits significant differences compared to other strains in terms of both RSCU values and CPB values (Fig. S3).This distinction to some extent mirrors the virus's diverse transmission routes and host spectrum, potentially contributing to the evolution of ASF.Altogether, these findings provide robust evidence that ASFV's codon usage patterns are shaped by a combination of factors, encompassing mutation pressure and natural selection.
The ENC values for all 64 ASFV CDSs fluctuated around 57.122±0.235,indicating a low bias in these viruses, which potentially improves the replication of ASFV in the different hosts by reducing competition during translation.Further investigation is needed to determine whether the same applies to ASFV.According to the values observed in this study, ASFV exhibited a phenomenon known as CUB in its CDSs.Among the 18 preferred codons, five ended with A, ten with U and three with C, signifying a preference for A/U-ended codons.Studies have shown that the unmethylated CpGs of viral pathogens can be recognized by Toll like receptor 9 (TLR9) in the host cells, triggering a cascade of immune responses [54].In investigations of codon bias in other viruses, a consistent link has been observed between low CpG content and the inadequate representation of CpG-containing codons [55,56], contributing to CUB and viral immune evasion mechanisms [49].The CpG content of the Eurasian epidemic ASFV strain showed a decreasing trend compared to the Clade 4-ASFV strain originating in South Africa (Table S3), resembling observations in the transmission of influenza viruses [57].This suggests that ASFV may employ low CpG content as an immune evasion strategy.However, despite the low CpG content observed in ASFV coding sequences, codons containing CpG or GpC did not exhibit reduced expression.Obviously, CUB of ASFV was not very significant, with no other codons exceeding the range of 0.6-1.6,except for the UUC codon encoding Phe where the RSCU value was defined as low expression.These results suggest that dinucleotides have a limited impact on ASFV's codon usage.This could be attributed to the analysis using whole genetic data rather than individual CDSs, as previous studies have demonstrated that using the same synonymous codons as hosts can enhance the efficient translation of corresponding amino acids [58].It appears that ASFV adjusts the translation environment by employing codons uncommonly used by the host, ultimately benefiting its efficient replication and propagation.
ASFV-infected ticks maintain a high, persistent infection level, similar to uninfected ticks, with the exception that infected females die after laying a second egg mass [59].ASFV replicates efficiently in tick midgut cells [60], without inducing apoptosis, a common response seen in primary alveolar macrophages cells infected with ASFV or in insect gut cells infected with other high-titre viruses [61,62].Adult warthogs infected with ASFV present as a latent infection.These phenomena may be related to the adaptability of virus codons, but because there is no codon usage of warthog in the codon database, we are unable to conduct further research.Therefore, we selected four hosts [wild boar (Sus scrofa), domestic pig (Sus domesticus) and two soft ticks (O.moubata and O. savignyi)] in the database for comparison.The results (Fig. 2) showed that all four hosts have significantly different codon usage preferences from ASFV.In the CAI and sSiD analyses, we included human (Homo sapiens) as a control group.The control group composed of human hosts aligns with the expected results, exhibiting the lowest CAI values and relatively higher SiD values.We found that ASFV had statistically significantly higher average SiD values for wild boars and domestic pigs compared to ticks, while CAI analysis presented completely opposite results.This result indicates that, compared to other hosts, domestic pigs exert stronger selective pressure on the codon usage pattern of ASFV.Within domestic pigs, the reduced divergence of Clade 4-ASFV from other ASFVs may indirectly confirm this evolutionary process.
From a genetic, physiological and immunological perspective, the significant differences between ticks and pigs suggest that ASFV's ability to avoid using the optimal host codons might contribute to its adaptation to four different hosts simultaneously.This may include, but is not limited to, reducing recognition by the host's innate immune system and facilitating host-jumping behaviour.Furthermore, the presence and distribution of preferred and non-preferred codons are considered factors guiding proper protein folding [63].The utilization of non-preferred codons may potentially enhance the replication accuracy of ASFV within host cells.For multi-host viruses such as ASFV, the usage pattern of codons may vary among different hosts.For instance, SARS-CoV-2 are highly adapted to the human codon usage pattern [64], and Nipah virus (NiV) shows high adaptation to both bats and humans [47], whereas Marburg virus (MARV) displays antagonism with both human hosts and Rousettus aegyptiacus (Egyptian fruit bat) [52].The Canine parvovirus 2 (CPV-2), originating from feline panleukopenia virus (FPV), exhibits higher CAI values in cats.However, newly emerging strains show an increasing trend in CAI values for dogs [65].Similar observations have been noted in influenza viruses such as H1N1/09 and canine influenza H3N2 [66].The preference of ASFV for tick-like codon usage may result from prolonged co-evolution with soft ticks, and it is speculated that over time, an adaptation to pig codon usage, similar to the evolution observed in influenza viruses, may occur.
ASFV has spread globally and has improved its adaptability to pigs through complex adaptive evolutionary processes, further posing a risk for global transmission and subsequent outbreaks.Consequently, more stringent measures are imperative to combat ASFV effectively.The disparities in codon usage patterns between ASFV and its hosts exert comprehensive effects, influencing the virus's adaptability, replication efficiency and transmission, and its capacity to evade immune responses.In summary, the data from the current study contribute to understanding the evolution of whole ASFVs and their host adaptation.

Fig. 1 .
Fig.1.Phylogenetic trees based on ASFV p72 protein sequences were reconstructed by using the maximum-likelihood method.A bootstrap test value of 5000 was employed, with bootstrap support values annotated only when they were ≥60.The branch lengths solely depict the topological arrangement rather than the evolutionary distances, with each terminus aligned.GenBank accession numbers of strains, along with isolation years, locations and genotypes, are displayed.The findings delineate the division of the 64 ASFV strains into four clades, marked with different colours on the outer ring of the phylogenetic tree.

Fig. 2 .
Fig.2.ENC plot analysis (GC 3s against ENC) of the 64 ASFV complete CDSs.The curve represents the codon usage bias (CUB) solely due to mutation bias (absence of selective pressure).The distinctive distribution of the strains is shown, with clades and isolation sites distinguished by shape and colour, respectively.

Fig. 4 .
Fig. 4. PCA was conducted on all selected ASFV strains using their CDS RSCU values.The ASFV strains are visualized in a scatter plot based on the values of the first two principal components (PC).Each point on the plot represents one ASFV strain, with Clade 1-ASFV, Clade 2-ASFV, Clade 3-ASFV and Clade 4-ASFV represented in red, green, blue and purple, respectively.

Fig. 5 .
Fig. 5. Cluster analysis (heat map) of RSCU values among 64 ASFV coding regions.The results of RSCU value cluster analysis closely mirrored those of the p72 sequence evolution analysis, thus reinforcing the outcomes of the phylogenetic analysis.Clade 4-ASFV exhibited distinct codon preferences compared to other strains, with 10 codons displaying significant differences, highlighted with red stars, and annotated with amino acid names.Cluster formation was based on the Euclidean distance and WPGMA methods.

Fig. 6 .
Fig. 6.CAI analysis was conducted on ASFV genome CDSs in relation to the four hosts, with humans (Homo sapiens) employed as the control group.To compare mean CAI values among the four hosts, we utilized one-way ANOVA and the Wilcoxon test.Asterisks highlight highly significant differences in CAI values between different hosts (***P<0.001).

Fig. 7 .
Fig. 7. SiD analysis of ASFV genome CDSs in relation to the four hosts, with humans (Homo sapiens) employed as the control group.One-way ANOVA and Wilcoxon tests were used to compare the mean SiD values for the hosts.Asterisks indicate highly significant differences in SiD values between different hosts (***P<0.001).

Table 1 .
Synonymous codon usage of the ASFV complete CDSs and four hosts

Table 3 .
The correlation between ENC, AROMA and GRAVY values of ASFV and CAI values for adaptability to four host species *P<0.05; **P<0.01.Unless otherwise noted, bootstrap results are based on 1000 bootstrap samples.