The genetic structure and recombination analyses of Sweetpotato leaf curl virus (SPLCV) population in China

Sweetpotato leaf curl virus (SPLCV) is a species of virus that causes substantial yield losses in sweetpotato. In this study, twenty-four complete genomic sequences of SPLCV which identified from the sweetpotato samples collected from China, were calculated into four strains. A total of 11.9% isolates were recombinants, and ten recombination patterns were detected. In addition to C3, V1, IR and C1 genes, the both ends of the V2 gene were also identified as recombination hot spots for SPLCV. Negative (purifying) selection was identified among all six SPLCV genes, except for C4 gene was under the variety of choices (positive selection) for adaptive protein evolution. The mismatch distribution of SPLCV in all subgroups and groups was multiple peaked or ragged, indicating SPLCV population has existed for a long time. The gene flows between Shandong, Hebei and Jiangsu Provinces in China were frequent, which was also observed between the groups of China, Brazil, South Korea. Recombination, selection pressure and gene flow promote the evolution of the SPLCV in China.


Introduction
Sweetpotato (Ipomoea batatas) is a creepy herbaceous plant and widely cultivated staple crop among the world (Clark et al. 2012). China is considered as the largest producer of sweetpotato in the world, and it produces about 75% of world production (Gao et al. 2000). However, production of sweetpotato is affected by several serious diseases that still not effectively controlled in China (Gao et al. 2000;Clark et al. 2002;Xie et al. 2013). More than thirty different viruses have been found in sweetpotato, including RNA and DNA viruses.
Studies of the molecular evolutionary history of viruses including mutation, recombination, selection and adaptation help to provide understanding of important features of their biology as changes in virulence and geographical range, and whether there are new mutants, or they emerge as new epidemics. This information is essential for designing strategies for controlling viruses (Rubio et al. 2001;Ohshima et al. 2002;Tomitaka and Ohshima 2006;Tomitaka et al. 2007;Li et al. 2017;Huang et al. 2019). Recombination and mutations are the main factors in virus evolution. Different virus Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s4134 8-020-00348 -4) contains supplementary material, which is available to authorized users.
isolates or strains mixed infect the Ipomoea, which could induce new viruses or strains (Albuquerque et al. 2012;Lozano et al. 2009). The recombinant breakpoints are present in the intergenic region (IR), V1 and C1 gene in SPLCV (Albuquerque et al. 2012;Zhang and Ling 2011).
In China, SPLCV has been initially isolated from Taiwan (Luan et al. 2006) and poses a serious threat to sweetpotato cultivation . In this study, we cloned twentyfour new complete sequences and the genetic population of SPLCV in China. It is useful to understand the evolution of SPLCV and propose the effective control strategies.

Plant samples and DNA extraction
Sweetpotato leaf samples were collected from sweetpotato germplasm bank in China (Xuzhou, Jiangsu) which were growing at a field in Xuzhou during 2013-2015. Samples showed clear symptoms including leaf curling, vein thickening, mottling, and chlorosis were collected and immediately frozen in liquid nitrogen and stored at − 80 °C until DNA extraction. Total genomic DNA of the sweetpotato leaf tissues was extracted using the TaKaRa Universal Genomic DNA Extraction Kit (TaKaRa Biotechnology (Dalian, China) Co., Ltd.).

Cloning and sequencing of complete genome
The presence of SPLCV in the samples was determined using specific primers of Li et al. (2004): the virus sense primer, 1795-1828 (PW285-1b), 5′ -TAA TTC GAA CTG CAG TTC CGT ATT TAC AGT T-3′ and complementarysense primer, 2308-2278 (PW285-2b), 5′ -GCT AGA GGA GGC CTG CAG ACT GCT AAC GAC G-3′. Consequently, primer pairs of nearly full length were used: LCVF: 5′-CAT GAC ATT TTC AGC GGC CCA GTC -3′; LCVR: 5′-GAC CCA TAT CCA TTG TTA TTG -3′. All reactions were carried out in volumes of 50 µl with TaKaRa PrimeSTAR GXL DNA Polymerase (TaKaRa Biotechnology (Dalian, China) Co., Ltd.). The reaction mixtures were incubated under the following conditions: 94 °C for 3 min; 94 °C for 15 s, 55 °C for 15 s, 68 °C for 40 s, 30 cycles; and 68 °C for 10 min. Finally, 0.5 µl Takara taq (5U/µl) at 72 °C was added for 10 min and immediately homogenized. PCR products were purified from agarose gels and ligated into pMD18-T clone vector (TaKaRa Biotechnology (Dalian, China) Co., Ltd.) at 16 °C overnight with Epicurian coli DH5α super competent cells. Circular genomic DNA was amplified by RCA (Rolling-Circle Amplification), and Illustra TempliPhi100 Amplification kit (GE Healthcare) was used as described by Haible and Liu (Haible et al. 2006;Liu et al. 2017). The amplified product was digested by restriction enzyme BamHI, and the digested products were separated in 1% agarose gel with an expected size of 2.8 kb. The positive clones were identified by PCR, and more than three replicates of positive clones were sequenced by Sangon Biotech (Shanghai, China) Co., Ltd.
The nucleotide and deduced amino acid sequences were edited and aligned using EditSeq and MegAlign of DNASTAR followed by similarity assessment of pairwise sequences. The accession numbers and assigned abbreviations of these begomoviruses are listed in Table 1.

Sequence comparison and molecular variability
Pairwise nucleotide and amino acid identity comparisons were made using SDT (species demarcation tool, available from http://web.cbio.uct.ac.za/SDT) Clustal W alignment program and DNAMAN software (Muhire et al. 2013). Multiple sequence alignments were carried out using version 1.81 of Clustal X (Thompson et al. 1997).
The ratio between nonsynonymous (d N ) and synonymous (d S ) substitutions per site (ω = d N /d S ) was calculated to assess the selection pressure acting on the SPLCV six genes. When ratio of d N /d S was > 1, the gene was under positive (or diversifying) selection, when = 1, it was in neutral selection, and when < 1, the gene was under negative (or purifying) selection. Next, a Z-test was computed to validate the type of selection pressure. A p-value less than 0.05 was considered statistically significant. In addition, all natural selection tests were computed using the Nei-Gojobori method as implemented in MEGA-6 software (Jawad et al. 2018).

Recombination and phylogenetic analysis
The neighbor-joining (NJ), minimum evolution (ME) and unweighted pair group method with arithmetic mean (UPGMA) assembled in MEGA 6 were used for phylogenetic analyses (Jawad et al. 2018).
We searched for evidence of recombination and recombination sites in an alignment of all 160 SPLCV complete or nearly complete sequences using the Recombination Detection Program version 4.95 (RDP4.95), which was assembled by programs RDP, GENECONV, BootScan, the Maximum Chi-square (MaxChi), Chimaera, 3Seq and the Sister Scanning (SIScan) . A highest acceptable probability value of P = 0.01 was applied. Only those sequences supported by the seven programs described above and involving fragments sharing ≥ 97% sequence identity with their parental sequences were regarded as 'clear' recombinants (Albuquerque et al. 2012).  long-term demographic stability, while a smooth distribution indicated that the population had a star-like phylogeny due to the accumulation of low-frequency mutation caused by a recent expansion. DnaSP 5.10 was also used to examine the genetic differentiation between populations via three permutation-based statistical tests, Ks*, Z and Snn, or the level of gene flow, Fst, between populations (the interpopulational component of genetic variation or the standardized variance in allele frequencies across populations) (Librado and Rozas 2009;Li et al. 2017). The absolute value of Fst ranged from 0 to 1 for undifferentiated to fully differentiated populations. Normally, an absolute value of Fst > 0.33 suggests infrequent gene flow, while absolute value of Fst < 0.33 suggests frequent gene flow.

Genome characterization and sequence comparison
In 2012-2015, the investigation of sweetpotato seedlings in Jiangsu, Anhui, Shandong, Hebei and Zhejiang Provinces revealed that the occurrence of SPLCV had an upward trend. Furthermore, the occurrence rate of some cultivars with SPLCV in the seeding bed reached more than 90% in 2014. Twenty-four full genome sequences of SPLCV (GenBank accessions number MH602247 to MH602270, Table 1) collected in Jiangsu Province of China in 2013-2015 ranged from 2810 to 2844 nucleotides. All genomes had the typical organization of monopartite begomoviruses with two ORFs in the virion-sense strand (V1 and V2) and four ORFs in the complementary-sense strand (C1, C2, C3 and C4); all sequences contained the conserved nucleotide sequence 5′-TAA TAT T↓AC-3′ in IR.
Compared with the identity of nucleotides and amino acids of the six ORFs and complete sequences of SPLCV, the result showed that the identities of V2 gene were the highest at nucleotide level, while identities of V1 gene were the highest at amino acids level (Table 2). According to nucleotide identity, four strains of Chinese SPLCV were discovered in this study, including nine isolates of SPLCV-Sh, four of SPLCV-US, six of SPLCV-Fu and five of SPLCV-Hn (a new strain in China reported in 2017) (Table 1).

Phylogenetic analysis
Based on the complete nucleotide of SPLCV with the 76 SPLCV sequences from China (including 24 isolates in this research), a phylogenetic tree was constructed using NJ method. In the phylogenetic tree, the SPLCV isolates were clustered to two lineages, i.e., SPLCV-FUH and SPLCV-Sh. The SPLCV-Sh lineage contained all strain-Sh isolates. The SPLCV-FUH lineage included all strain-Fu, strain-US and strain-Hu isolates which can be further divided into three sublineages and subgroup SPLCV in accordance with the proposed strains classification, i.e., subgroup SPLCV-Fu (all strain-Fu isolates), subgroup SPLCV-US (all strain-US isolates) and subgroup SPLCV-Hu (all strain-Hu isolates) (Fig. 2). Genetic differentiation and gene flow between and within strains were examined by five permutation-based statistical  tests, Ks*, Z and Snn or Fst and Nm. The results revealed no genetic differentiation within the isolates from Hebei, Jiangsu and Shandong Province, while the isolates between Jiangsu and Hebei had genetic difference in the complete genome genes (Table 4). The absolute values of F ST between or within SPLCV populations from Hebei, Jiangsu and Shandong were all below 0.33, indicating that the gene flow between or within SPLCV populations from Hebei, Jiangsu and Shandong was frequent. The absolute values of Nm > 1 also supported the conclusion of gene flow. In order to understand genetic differentiation and gene flow between China and the other countries, we further examined isolates from China, Brazil and South Korea. The obtained results showed that the isolates between different countries had a significant genetic difference, while the gene flow between or within SPLCV populations from the three countries was frequent.

Population dynamics
The Tajima's D, Fu & Li's D*, Fu & Li's F* values for SPLCV-US, SPLCV-Fu subpopulation and SPLCV-Sh population from China were negative, indicating that these subpopulations or populations were increasing. However, these results were not significant (Table 5). Haplotype diversity, ranging from 0.997 to 1.000, had little difference between groups or subgroups, just like the nucleotide diversity, ranging from 0.03327 to 0.04938. The mismatch distribution of SPLCV genome among all subgroups and groups was multiple-peaked or ragged (Fig. 3), thus suggesting that these subpopulations were long-existing ones.

Discussion
In this paper, we studied the genetic structure of SPLCV population by analyzing the complete sequence of 76 isolates from China and 84 isolates from other countries. Our results showed that (1) about 11.9% of the SPLCV isolates characterized were recombinants; (2) the SPLCV-C4 gene of China was under positive selection; (3) frequent gene flow was detected between SPLCV populations from China, Brazil and South Korea; (4) the SPLCV subpopulations or populations were long-existing ones.
All the symptoms in plants with SPLCV infection were detected by PCR, and 4 strains were divided according to previous reports and ICTV standard Adams et al. 2016). The nucleotides of V2 and amino acids of V1 were relatively conservative with higher identity, while C4 was at lower identity, which might be explained by their   respective functions. The gene C4 is related to symptoms formation, and it is also responsible for the induction of hyperplasia (Latham et al. 1997;Park et al. 2010a, b). The C4 protein from either Beet curly top virus or Beet severe curly top virus expression in transgenic Arabidopsis plants could increase the expression levels of cell cycle-related genes CYCs, CDKs and PCNA, and suppressed ICK1 and the retinoblastoma-related gene RBR1, resulting in activation of host cell division (Park et al., 2010). The gene C4 is also related to the movement of viruses in plants and participates in post-transcriptional genes as silencing suppressors (Park et al. 2010a, b). The result showed that the identities of nucleotides and amino acids were lower that might be explained by its multi-functions. Further studies are necessary to verify this premise. The nucleotide identity and phylogenetic analysis demonstrated that four strains were found from sweetpotato germplasm bank which is the same from field, so the SPLCV diversity observed in the samples collected from the sweetpotato germplasm bank of China also represents the samples from commercial fields. Different strains can also be distinguished in the phylogenetic tree, and the isolates were consistently grouped in accordance with the proposed strain classification by nucleotide identity: all the strain-Sh isolates were clustered in lineage SPLCV-Sh, while subgroup SPLCV-Fu (all strain-Fu isolates), subgroup SPLCV-US (all strain-US isolates) and subgroup SPLCV-Hu (all strain-Hu isolates) were clustered into SPLCV-FUH lineage. Recombination and mutations are the main promotors for the virus evolution. Recombinants may increase the pathogenicity of virus, extend host ranges or have the ability to overcome resistance in crop varieties (García-Arenal and McDonald 2003). In the earlier studies reported, a number of recombinants and recombination spots in begomoviruses were found. For example, C1, IR, V1 and the interface between ORFs V1 and C3 have been identified as hot recombination spots of SPLCV and other begomoviruses, while V2 ORF and the third quarter of the V1 ORF have been identified as recombination cold spot in begomoviruses (Albuquerque et al. 2012;Prasanna and Rai 2007;Lefeuvre et al. 2007). In this study, we found C3 gene as an additional hot spot, as well as the occurrence of recombination breakpoints in both ends of the V2, and analyzed the recombination breakpoints detected within the SPLCV dataset. Field disease investigations showed that leaf curling is becoming more and more serious in seeding bed that might be contributed by the new recombinants. The selection pressure suffered under the viral gene was estimated by d N /d S ratios, which make the virus more or less to adapt a new host plant or environment (Lefeuvre et al. 2009;Tian et al. 2011;Cuellar et al. 2015;Fialloolivé et al. 2014). Positive selection (d N /d S > 1) may endow the virus more fitness to adapt a new host or environment. However, rapid divergence driven by positive selection has been rarely demonstrated (Nielsen et al. 1999). Our results showed that among examined SPLCV genes, negative (purifying) selection dominated the evolution of SPLCV genes. In these genes, the value of the d N /d S ratio of C4 was 1.46, > 1, indicating that the gene was in a variety of choices (positive selection) and that SPLCV may be more fitness to adapt a new host or environment. So, strengthen quarantine to prevent cross-infection is necessary. Five of 24 isolates (20%) were found to be Hn strain reported in China in 2017 , while the mismatch distribution of SPLCV-Hn might be multiplepeaked, or ragged, like the other subgroups and groups of SPLCV, thus suggesting that SPLCV-Hn was long-existing one since it was discovered recently.
The result of gene flow showed that the gene flow between or within isolates from Shandong, Hebei and Jiangsu appeared to be frequent. These same results were obtained within isolates from China, Brazil (Albuquerque et al. 2011;Paprotka et al. 2010) and South Korea. SPLCV is transmitted by vine and tuber, propagation materials of sweetpotato and also seed, in a persistent manner and that explains the result well.
In a word, recombination, gene flow and selection pressure promote the evolution of the SPLCV in China.