Global Population Structure of Apple Mosaic Virus (ApMV, Genus Ilarvirus)

The gene sequence data for apple mosaic virus (ApMV) in NCBI GenBank were analyzed to determine the phylogeny and population structure of the virus at a global level. The phylogenies of the movement protein (MP) and coat protein (CP) genes, encoded by RNA3, were shown to be identical and consisted of three lineages but did not closely correlate with those of P1 and P2, suggesting the presence of recombinant isolates. Recombination Detection Program (RDP v.4.56) detected significant recombination signal in the P1 region of K75R1 (KY883318) and Apple (HE574162) and the P2 region of Apple (HE574163) and CITH GD (MN822138). Observation on several diversity parameters suggested that the isolates in group 3 had higher divergence among them, compared to isolates in groups 1 and 2. The neutrality tests assigned positive values to P1, indicating that only this region experiencing balanced or contracting selection. Comparisons of the three phylogroups demonstrated high Fixation index (FST) values and confirmed genetic separation and the lack of gene flow among them. Additionally, ±500 bp of partial MP + ‘intergenic region’ + partial CP coding regions of two Turkish isolates from apple and seven from hazelnut were sequenced and determined that their phylogenetic positions fell within group 1 and 3, respectively.

The genetic variation of ApMV has mostly been investigated by the analysis of the CP gene of isolates from various hosts and geographic locations. Initial analysis of the CP gene of ApMV isolates from Korea revealed three evolutionary groupings distinguished by plant hosts [16]. However, later studies on ApMV isolate sequences from diverse geographical origins throughout the world suggested the existence of two to five evolutionary groups. Seven Indian isolates formed a distinct group separated from the other four groups [2]. Apple and rose isolates from Poland and Belarus were positioned together in one of three constructed phylogroups [5]. These older data maintained some correlation of phylogroups and isolate origins, but perhaps due to frequent global trading, recent reports found no clear association of phylogroups with hosts or locations [6,[17][18][19]. Despite this knowledge, detailed diversity and evolutionary analyses involving all regions of the ApMV genome has never been performed, making it difficult to have a general agreement on the exact nature of the virus phylogeny and evolution.
Clear understanding on the population structure of a plant virus could be the key for its accurate detection and effective control [20,21]. Therefore, genome sequences of all global ApMV isolates in the National Center for Biotechnology Information (NCBI) GenBank were analyzed in this study to advance our understanding of the virus at the molecular level. Additionally, Turkey's 684,000 tons of hazelnut production [22] accounts for 69% of the total world production. The country is also the leader of hazelnut export with a world market share of 61%, followed by Italy, the USA, and Azerbaijan. The presence of ApMV in different host species planted in different regions of the country needs to be molecularly identified as a first step for disease control since considerable yield reduction, up to 42%, had been observed in hazelnut [23,24]. Thus, this study could also clarify the position of known Turkish isolates among global populations.
Possible recombination signals on the P1, P2, MP, and CP sequences were searched using the RDP, GENECONV, BootScan, MaxChi, ChiMaera, SiScan, and 3Seq algorithms with a Bonferroni-corrected p value of <0.05 implemented in Recombination Detection Program (RDP v.4.56) [26]. Only those supported by at least five of the algorithms were included as credible signals.

Phylogenetic and Comparative Nucleotide and Amino Acid Analyses
Four phylogenetic trees based on the P1, P2, MP, and CP genes nucleotide (nt) sequences were built using Neighbor-joining (NJ) algorithms in MEGA X software v.10.2.4. The Kimura 2-parameter models [27] with uniform rate among sites and complete deletion of missing data treatment was determined by the lowest Bayesian information criterion score to be the best for nt substitution of all four alignments. The statistical significance of isolate clusters was determined using 1000 bootstrap replicates. Estimation of the percentage pairwise identities of the P1, P2, MP, and CP coding regions at the nucleotide (nt) and amino acid (aa) levels was calculated using Sequence Demarcation Tool software (SDT v1.2) [28].

Genetic Diversity, Polymorphism, and Neutral Selection Analyses
The diversities on the nt sequences of the P1, P2, MP, and CP genes were evaluated using DnaSP software v.6.12.03 [29] based on the confidence intervals of population geneticsrelated parameters: the number of haplotypes = h, haplotype diversity = Hd, average nt differences between sequences = k, nt diversity (per site) = π, total number of mutations = η, and the number of variable sites = S. Transcriptional selection was determined using a ratio of nonsynonymous to synonymous sites (ω = dN/dS). Neutral selection tests implemented in DnaSP v.6.12.03: Tajima's D [30], and Fu and Li's D* and F* [31] were performed with a window length of 100 sites and step size of 25 sites to measure genetic divergence on individual coding sequences of the ApMV genome.

Gene Flow and Genetic Differentiation among Populations
The K S *, K ST *, Z*, S nn , and F ST values [32,33] were estimated using DnaSP software v.6.12.03. These values determined the genetic differentiation of MP and CP genes nt sequences among compared phylogroups. When genetic divergence among the compared populations is non-existent, the K ST * value nears zero [34]. A low genetic isolation is translated into a small Z* value [32]. The S nn value ranges from a minimum of 0.5 (to show that the compared population was identical) to a maximum of 1 (distinctly separated populations) [33]. The F ST value ranges between 0, which is describing strictly identical populations, to 1, which is describing totally distinct populations [32,34]. A high gene flow and large genetic isolation between tested populations are usually indicated by an F ST value of at least 0.33 [35,36].

Acquisition of Partial Sequences of Turkish Isolates and Their Phylogeny
Fifteen hazelnut leaf samples from Ordu province and five apple leaf samples from Ankara province, Turkey were collected during the summer of 2022 due to recent reports of viral symptomatic plants in several plantations. Total RNA was extracted from samples using NucleoZOL, one phase RNA purification, following the manufacturer's procedures (Macherey-Nagel GmbH & Co. KG, Düren, Germany).
The recovered nt sequences of Turkish isolates were aligned with those of 14 isolates in NCBI GenBank that have full sequence of RNA3. The alignment was trimmed according to Turkish isolates length. A phylogenetic tree based on the alignment was constructed to investigate their positions in global phylogrouping, using the Neighbor-joining (NJ) algorithms in MEGA X software v.10.2.4 with Kimura 2-parameter as the best supported nt substitution model [27], uniform rate among sites, and complete deletion of missing data treatment. The statistical significance of isolate branching was determined using 1000 bootstrap replicates. Differences in the partial aa sequence of the aligned isolates were examined using BioEdit v.7.2.5 [39].

Phylogenetic and Recombination Analyses
The constructed MP and CP phylogenetic trees showed exactly the same topology, in which global isolates were separated into three main lineages. Fourteen isolates with complete RNA3 sequences were consistently clustered in the same groups in the respective trees ( Figure 1; Supplementary Table S1). However, the phylogenies of the P1 and P2 genes could not be resolved in the same way, probably due to recombination events that were found in the P1 region of K75R1 (KY883318) and Apple (HE574162) and the P2 region of Apple (HE574163) and CITH GD (MN822138) ( Table 1). Recombinant isolates were removed from subsequent analyses to improve the accuracy of the results.

Comparative Nucleotide and Amino Acid Analyses
According to SDT analysis, the MP and CP genes of ApMV isolates maintained very high nt sequence identities of 88-100% and 87-100%, respectively (Table 2). However, the relatively few nt changes in some isolates (Negret 2 for MP and CP, and TrapezK5, TrapezK, and almond for CP) resulted in aa changes, causing low aa identities of them with those of other tested isolates.

Genetic Diversity and Polymorphism Analyses
Results of the analyses showed that there is a large genomic variation in the sequences of the P1 and P2 genes of few currently known isolates, as shown by the very high Hd, S, η, and k values obtained by both regions. The movement protein gene recorded lower Hd, S, η, and π values than CP, suggesting that the region is highly conserved when compared to the CP (Table 3). However, there were much less MP sequences available to be compared than the CP sequences. For both MP and CP, isolates in group 3 exhibited greater diversity among themselves than isolates in group 1 and 2, respectively (Table 3). Therefore, the obtained data could also be an indication that group 3 consists of divergent isolates that might be evolutionary adapted to infect certain hosts, especially hazelnut, since no hazelnut isolate was positioned in the other two groups.

Neutral Selection Analysis
The three parameters of the neutrality test invariably assigned negative numbers, some with statistically significant p values, to three phylogroups of the MP and CP genes, which are in RNA3, and gave positive numbers to the P1 gene, which is in RNA1 (Table 4).

Gene Flow and Genetic Differentiation among Populations
Analysis was performed only on the MP and CP genes since there was no clear phylogrouping of the P1 and P2 genes due to a lack of available recombinant free sequences currently listed in NCBI GenBank. The comparisons among groups 1, 2, and 3 of both MP and CP raised high and statistically significant K S *, K ST *, Z*, and S nn metrics and the Fixation index (F ST ) > 0.33, which thus supported grouping of the constructed phylogenetic trees into three lineages. However, the values presented by F ST analysis on the sequences of the MP and CP genes suggested that there were more divergences between group 1 vs. 2 than group 1 vs. 3 and 2 vs. 3 (Table 5). Table 5. Genetic differentiation estimates for lineages of ApMV, based on sequences of MP and CP genomic regions comparisons.

Phylogeny of Novel Turkish Isolates
All of 20 samples were positive for ApMV infection according to results of a molecular test using RT-PCR ( Figure 2). Infected hazelnut showed distinctive 'oak-leaf patterns' symptoms, whereas infected apple exhibited white discoloration symptoms (Figure 3). Sequencing of nine positive RT-PCR products gave around 500 nts of the ApMV genome, covering partial MP + 'intergenic region' + partial CP gene (spans nts 637-1156 in RNA3; reference isolate: accession no. NC_003480). Some Turkish isolates showed different nt and aa lengths due to indel mutations (Figure 4). Newly obtained sequences were registered in NCBI GenBank with accession no. OP374166-OP374174. lar test using RT-PCR (Figure 2). Infected hazelnut showed distinctive 'oak-leaf patterns' symptoms, whereas infected apple exhibited white discoloration symptoms (Figure 3). Sequencing of nine positive RT-PCR products gave around 500 nts of the ApMV genome, covering partial MP + 'intergenic region' + partial CP gene (spans nts 637-1156 in RNA3; reference isolate: accession no. NC_003480). Some Turkish isolates showed different nt and aa lengths due to indel mutations (Figure 4). Newly obtained sequences were registered in NCBI GenBank with accession no. OP374166-OP374174. OP374173 from apple; 2 Negative control (uninfected sample); 3 OP374166 from hazelnut; 4 OP374167 from hazelnut; 5 OP374168 from hazelnut; 6 OP374169 from hazelnut; 7 OP374170 from hazelnut; 8 OP374171 from hazelnut.
The constructed phylogenetic trees positioned two Turkish isolates from apple and six other apple isolates in group 1, whereas seven Turkish isolates from hazelnut were clustered with other four hazelnut isolates identified in Poland in group 3 ( Figure 5). Therefore, there was some degree of correlation between ApMV phylogeny with host species. The results were in line with those of polymorphism analysis, which indicated that group 3 belongs to evolutionary successful isolates capable to acquire hazelnut as an additional host. Results of the molecular study also verified the new primer pair capability to amplify partial genome of genetically diverse isolates belong to different hosts and phylogroups. OP374173 from apple; 2 Negative control (uninfected sample); 3 OP374166 from hazelnut; 4 OP374167 from hazelnut; 5 OP374168 from hazelnut; 6 OP374169 from hazelnut; 7 OP374170 from hazelnut; 8 OP374171 from hazelnut.
The constructed phylogenetic trees positioned two Turkish isolates from apple and six other apple isolates in group 1, whereas seven Turkish isolates from hazelnut were clustered with other four hazelnut isolates identified in Poland in group 3 ( Figure 5). Therefore, there was some degree of correlation between ApMV phylogeny with host species. The results were in line with those of polymorphism analysis, which indicated that group 3 belongs to evolutionary successful isolates capable to acquire hazelnut as an additional host. Results of the molecular study also verified the new primer pair capability to amplify partial genome of genetically diverse isolates belong to different hosts and phylogroups.

Discussion
ApMV is a major concern in stone fruits and other high value crops cultivated around the world, but its complete phylogeny is still unclear. In this current work, all complete sequences of the P1, P2, MP, and CP regions of the ApMV genome currently listed in NCBI GenBank were analyzed to better understand the population structure and evolution course of the virus. ApMV has also been studied quite well in Turkey, and some isolates have been sequenced. However, sequences of hazelnut isolate, which was reported to be genetically divergent [6], are not yet available. Therefore, hazelnut and apple isolates from Ordu and Ankara provinces obtained in this study were partially sequenced and then studied to know their molecular characters.
RDP analysis of this study did not only confirm the recombination in the P2 region of MN822138 recently reported by [11], but also found strong signals in the P2 region of HE574163 and in the P1 region of HE574162 and KY883318. However, a recombination event was not found in the sequences of the MP and CP genes. Phylogenetic trees further showed that while MP and CP shared an exact phylogeny, P1 and P2 did not have the same phylogeny. Given the frequently occurred signals on these few known sequences of P1 and P2, it is likely that recombination plays more important roles in shaping ApMV evolution, particularly in the two regions than reassortment. However, more sequences of P1 and P2 are clearly needed to draw a strong conclusion on this issue.
There was no strong agreement on ApMV phylogeny as most of the phylogenetic trees of the virus were constructed using the Neighbor-joining (NJ) method [2,5,6,18], while at least one applied Maximum-likelihood [17]. Thus, these proposed trees sometimes produced rather distinctive branches. To resolve this, phylogenetic trees constructed in the current study employed the most commonly used NJ method. Furthermore, unrooting the trees from any out-group seemed to increase the ease and accuracy of phylogrouping. As a result, the trees based on the complete sequences of MP and CP regions (both are in RNA3) showed exactly the same topology in which both trees were divided into three main phylogroups (groups 1, 2, and 3) (Figure 1). Prominently, the trees showed no correlation between the groups and hosts and origin, indicating a wide host range and distribution of any ApMV strain. Phylogrouping proposed in this study could be a strong base for future molecular study of ApMV since all GenBank isolates from different hosts and countries were involved in the analysis.
Both MP and CP regions actually maintained relatively high sequence identity among the isolates according to the SDT study. However, changes in the nt sequences of several isolates that were mostly recovered from hazelnut and belong to a subcluster in group 3 (Negret 2 for MP and CP) and TrapezK5, TrapezK, and almond for CP were translated into many non-synonymous substitutions; thus, their aa sequences shared very low identity with other isolates. Out of curiosity, separate polymorphism and neutral selection analyses were performed on the CP sequences of the 10 isolates in this subgroup within group 3, named 'Hazelnut' (all the available hazelnut isolates already belong to group 3 in MP comparison). In the CP comparison, the only 10 isolates in the Hazelnut subcluster were shown to have higher variance among them, according to all diversity parameters, and experienced weaker selection pressure (ω = 0.971) than those of the whole members of groups 1 and 2 ( Table 3). The Hazelnut subgroup also received negative values from three neutrality tests (Table 4). Therefore, the results further distinguished the divergence of these isolates that adapted to infect hazelnut.
In this study, seven hazelnut isolates from Turkey were partially sequenced to complement our knowledge on the only known hazelnut isolates obtained in Poland [6]. Results of this phylogenetic study showed that even though new hazelnut and apple isolates were collected from nearby provinces within the same country, the Turkish hazelnut isolates were related much closer to other Polish isolates from hazelnut than to Turkish apple isolates. As a matter of fact, the Turkish apple isolates were clustered with other apple isolates from different countries ( Figure 5). Similar to the Polish isolates, there were nt deletions in the RNA3 sequence of Turkish hazelnut isolates at positions nts 1003-1005 and 1089-1091 (reference isolate: accession no. NC_003480), which were reflected into deletion of one aa residue in the respective positions at translation of this segment. However, all seven Turkish isolates also experienced unique additional ten nt deletions at positions nts 1098-1100 and 1106-112 that resulted in the further removal of a total of four aa residues ( Figure 4). Most of these aa changes at RNA3 between the apple and hazelnut isolates happened at the non-coding intergenic region (Figure 4), thus unlikely to affect isolates fitness.
Overall, all ApMV phylogroups were under weak evolutionary constraint, which could be one of the reasons for high genetic diversity among isolates in all three groups. In line with this finding, it can also be suggested that different ApMV populations, which all were constantly assigned negative neutrality values, are experiencing further expansion (population growth due to lack of subdivision) or bottleneck selections. Interestingly, these results were rather contrary to those experienced by prunus necrotic ringspot virus (PNRSV), another member of Ilarvirus that also infects stone fruits worldwide, which was found under very strong purifying pressures and received positive neutrality values for some of its populations [40].
High and statistically significant K S *, K ST *, Z*, and S nn metrics obtained by all group comparisons confirmed that the clustering of isolates into three phylogroups in both MP and CP trees has been done correctly. Furthermore, estimation of genetic separateness based on F ST values > 0.33 showed that members of these groups were genetically distinct, and there was a lack of gene flow among them.

Conclusions
The P1, P2, MP, and CP regions in the genome of all the isolates registered in NCBI Genbank were analyzed to study ApMV phylogeny and also to understand the global population structure of the virus. Both MP and CP trees shared the same topology in which isolates were clustered into three major groups (1, 2, and 3) without correlation with host species and countries. Two new Turkish isolates from apple and seven from hazelnut were positioned within group 1 and 3, respectively. However, the clustering cannot confirm P1 and P2 phylogenies yet due to a lack of available data. For MP and CP, calculation on different parameters estimated relatively higher genetic diversity among isolates in group 3 than those in groups 1 and 2. P1 was the only genome region sustaining a balanced or contracting selection. It is clear that additional molecular data are still needed to complete the analysis on the P1 and P2 regions.

Data Availability Statement:
The new data presented in this study are openly available in the NCBI GenBank database, with accession numbers OP374166-OP374174.