Introduction

The sweet potato [Ipomoea batatas (L.) Lam] is a dicotyledonous perennial plant and a member of the family Convolvulaceae. It is currently widely grown in tropical and temperate regions in the world1. The sweet potato is regarded as one of the most healthful foods in the world due to its high amounts of beta-carotene, carbohydrates, and dietary fiber and diverse micronutrients2,3. It is a very fast-growing vine plant and is usually vegetatively propagated by cutting a piece of a runner of sweet potato approximately 30 cm in length4. According to the FAO, in 2017, China was the largest sweet potato producer (72,031,782 tons), followed by several African countries, such as Malawi (5,472,013 tons), the United Republic of Tanzania (4,244,370 tons), and Nigeria (4,013,786 tons). In Korea, the sweet potato has been cultivated since the end of the 18th century, and it is the third most important crop plant in South Korea based on production (331,514 tons)5. In general, the tuberous roots and stems of the sweet potato are used as different food materials in Korea.

Due to their vegetative propagation, most sweet potato cultivars are severely infected by various pathogens. Of the known pathogens, more than 30 different viruses infecting sweet potato have been identified so far6. The most frequently identified viruses are DNA viruses such as Sweet potato leaf curl virus (SPLCV) in the genus Begomovirus in the family Geminiviridae, followed by RNA viruses in the family Potyviridae7. Examples of well-known RNA viruses infecting sweet potato include Sweet potato feathery mottle virus (SPFMV), Sweet potato virus C (SPVC), Sweet potato virus G (SPVG), Sweet potato virus 2 (SPV2), Sweet potato latent virus (SPLV), and Sweet potato mild mottle virus (SPMMV) in the family Potyviridae; Sweet potato chlorotic fleck virus (SPCFV) in the family Flexiviridae; and Sweet potato chlorotic stunt virus (SPCSV) in the genus Crinivirus in the family Closteroviridae6. SPLCV is transmitted by vegetative propagation and the insect vector whitefly (Bemisia tabaci)8, whereas potyviruses infecting sweet potato are also transmitted by various insect vectors including Myzus persicae (Sulzer)9. Many previous studies have demonstrated that multiple viruses such as begomoviruses and potyviruses frequently coinfect most sweet potato cultivars grown in fields, resulting in the reduction of sweet potato yield10,11,12. Moreover, the coinfection of potyviruses in sweet potato has been found to be common10. Vegetative propagation and insect vectors might be two main reasons for the coinfection of viruses in sweet potato.

In Korea, several studies have examined viruses infecting sweet potato13,14,15,16. According to the previous studies, many sweet potato germplasm collections and sweet potato cultivars grown in fields in Korea are often coinfected by different viruses14. At least eight different virus species infecting sweet potato have been reported based on RT-PCR assays16. The most frequently identified viruses infecting sweet potato in Korea are two potyviruses, SPFMV and SPVC14,16.

With the rapid development of next-generation sequencing (NGS) technologies, NGS-based approaches enable us to facilitate the identification of viruses, virus diagnostics, and the assembly of virus genomes17,18,19,20. For instance, the NGS technique is widely applied in the study of viruses infecting sweet potato by RNA-Sequencing (RNA-Seq)17,21 and small RNA-Seq22,23. Using the Illumina MiSeq platform, two DNA viruses including Sweet potato mosaic virus (SPMaV) and Sweet potato leaf curl Sao Paulo virus (SPLCSPV) in the genus Begomovirus and two RNA viruses including SPFMV and SPCSV were identified in South Africa17. In China, 11 virus species including SPVG and SPFMV have been identified from 219 samples in 10 regions in southern China by small RNA-Seq23.

Although studies associated with viruses infecting sweet potato have been intensively conducted, there has not yet been a comprehensive study using NGS techniques for sweet potato viromes. As shown previously in our studies, RNA-Seq is a very powerful tool to study viromes in diverse plants24,25,26,27. Here, we conducted a study of sweet potato viromes by RNA-Seq. For that, 10 different libraries were prepared from leaf samples collected from eight regions in Korea and two different sweet potato cultivars. Comprehensive bioinformatics analyses revealed the complexity of sweet potato viromes in different geographical regions and different cultivars. Moreover, we identified two novel viruses infecting sweet potato and two virus variants that might be ancestors of SPFMV and SPVG.

Results

Collection of leaf samples for sweet potato virome study

In order to identify viruses infecting sweet potato in Korea, we collected leaf samples from eight different geographical regions in 2016 and 2017 (Table 1 and Fig. 1a). All sweet potato plants were grown in fields, as shown in Fig. 1b. We collected leaf samples showing disease symptoms (Fig. 1c,d). Samples were pooled based on the collected regions. In addition, we collected two representative cultivars referred to as “Beni Haruka” and “Hogammi,” which were widely cultivated in Yeoju, Korea. A total of 357 samples were subjected to RNA-Seq. Ten different libraries for RNA-Seq were prepared and paired-end sequenced by the HiSeq2000 system. For simplicity, we named the libraries based on geographical regions and cultivar names (Table 2). For instance, a library from “Hogammi” cultivated in Yeoju was referred to as YJ-H.

Table 1 Detailed information of sweet potato samples for sweet potato viromes in Korea.
Figure 1
figure 1

Geographical regions of collected sweet potato samples in Korea and viral disease symptoms in sweet potato plants. (a) A map displaying eight different geographical regions in Korea in which sweet potato samples were collected. Each region is indicated by a different color with the abbreviated region name. Full names of geographical regions can be found in Table 1. (b) Sweet potato plants grown in field covered by poly mulch sheets to protect soil. (c) Sweet potato plants displaying leaf malformation and purpling. (d) Leaf spots on sweet potato leaf.

Table 2 Summary of raw sequence data for RNA-Seq and SRA accession number.

Transcriptome assembly and virus identification

Raw sequence reads from 10 libraries were individually de novo assembled using the Trinity program. The number of assembled contigs (transcripts) ranged from 65,866 (HN) to 187,838 (NS) (Table 3). The contigs obtained from each library were used for a BLASTN search against the plant virus reference genome sequences derived from the viral genome database (https://www.ncbi.nlm.nih.gov/genome/viruses/). Based on the BLASTN search using assembled contigs, we obtained a total of 646 virus-associated contigs representing 10 different sweet potato viruses from 10 sweet potato transcriptomes (Table 4). Based on the number of virus-associated contigs, SPFMV (249 contigs) was the dominant virus infecting sweet potato followed by SPVC (129 contigs), SPLCV (78 contigs) (Fig. 2a), while based on virus-associated reads, SPVC (202,975 reads) was the major virus infecting sweet potato followed by SPFMV (198,658 reads) and SPVE (Sweet potato virus E) (178,532 reads) (Fig. 2b). We examined the proportion of identified viruses in each library based on virus-associated contigs (Fig. 2c) and reads (Fig. 2d). According to virus-associated contigs, SPFMV was the major virus in seven libraries including HN, IC, IS, NS, YJ, YJ-B, and YJ-H (Fig. 2c). Based on virus-associated reads, SPVC was the major virus in four libraries including GJ, GS, HN, and IS, while SPFMV was the dominant virus in NS, YJ-B, and YJ-H (Fig. 2d). SPVE and SPVG were the dominant viruses in IC and YJ, respectively. From the YA library, only SPSMV was identified.

Table 3 Summary of de novo transcriptome assembly by Trinity.
Table 4 Information of virus-associated contigs for viruses identified in sweet potato in Korea.
Figure 2
figure 2

Identification of viruses infecting sweet potato and proportion of identified viruses in each library. Pie chart illustrating proportion of identified viruses from all 10 libraries based on virus-associated contigs (a) and virus-associated reads (b). Bar graphs illustrating proportion of identified viruses in each library based on virus-associated contigs (c) and virus-associated reads (d). (e) Number of identified libraries for individual virus. (f) Number of identified viruses in each library. (g) Proportion of virus-associated reads as compared to total sequence reads in each library.

We examined the number of libraries for the individual identified virus (Fig. 2e). SPSMV was the most frequently identified virus (nine libraries), whereas SPCFV was identified from a single library. SPFMV (eight libraries), SPLCV (seven libraries), and SPVC (seven libraries) were also frequently identified. The number of identified viruses in each library ranged from one (YA) to eight (IC) (Fig. 2f). In the case of libraries from the YJ region, five (YJ-H) to seven (YJ-B) viruses were identified.

Next, we examined the proportion of virus-associated reads in each library (Fig. 2g). The proportion of virus-associated reads ranged from 0.001% (YA) to 0.577% (GS). Libraries from the GS, IC, and YJ regions contained many virus-associated reads compared to other regions.

Identification of two novel viruses infecting sweet potato

SPVE isolate GS with a single-stranded RNA virus of 10,890 nt was assembled from the GS library. SPVE isolate GS contained two ORFs encoding a polyprotein and pretty interesting potyviridae ORF (PIPO) (Fig. 3a). A BLASTN search against the NT database revealed that SPVE isolate GS shared 86% coverage and 77% nucleotide identity with SPVC isolate ECK17 (KT069223.1) identified from South Africa (Fig. 3b). The phylogenetic tree based on polyprotein amino acid sequences showed that SPVE isolate GS belongs to the same clade as five SPVC isolates, which are members of the genus Potyvirus in the family Potyviridae (Fig. 3c). We identified SPVE from GS and YJ-B; however, the genome of SPVE isolate YJ-B was incomplete. The phylogenetic tree based on complete genome sequences for SPVE, SPVC, and SPFMV showed that SPVE is distantly related with SPVC and SPFMV. Species demarcation criteria for the genus Potyvirus for the complete ORF are <76% nucleotide identity and <82% amino acid identity. Complete ORF of SPVE showed 73.88% nucleotide identity and 79.60% amino acid identity, demonstrating that SPVE is a new species in the genus Potyvirus (Table S1 and Fig. 3d).

Figure 3
figure 3

Genome organization and phylogenetic analyses of two novel viruses, SPVE and SPVF. (a) Genome organization of SPVE isolate GS. Polyprotein cleavage sites were also indicated with respective protein names. (b) BLASTN results using complete genome of SPVE against NCBI’s NT database. (c) Maximum likelihood phylogenetic tree of polyprotein amino acid sequences for SPVC, SPVE, SPFMV, SPVG, SPV2, and SPLV. (d) Maximum likelihood phylogenetic tree of genome sequences for SPVC, SPFMV, SPVE, and SPFMV. Two genome sequences for SPVE isolates GS and YJ-B were included in the phylogenetic construction. (e) Genome organization of SPVF isolate GS. (f) BLASTN results using complete genome of SPVF against NCBI’s NT database. (g) Maximum likelihood phylogenetic tree of RdRp amino acid sequences for SPCFV, SPYMV, SPVF, Melon yellowing-associated virus (MYaV), Elderberry carlavirus B (EBCVB), and Poplar mosaic virus (PopMV). (h) Maximum likelihood phylogenetic tree of CP amino acid sequences for SPCFV, SPVF, MYaV, and Garlic latent virus (GLV). (i) Maximum likelihood phylogenetic tree of genome sequences for SPCFV, SPYMV, and SPVF. EBCVA was used as an outgroup. Two genome sequences for SPVF isolates GS and IC were included in the phylogenetic construction. For the phylogenetic tree construction, all available protein or genome sequences homologous to SPVE or SPVF were retrieved from GenBank based on BLASTP and BLASTN searches, respectively. Accession number, isolate name, and virus name were described. Orange color indicates SPVE or SPVF. We used bootstrap replication values of 1,000, and bootstrap values over 70% are shown.

We identified two novel viruses tentatively named SPVE and Sweet potato virus F (SPVF) from the sweet potato transcriptomes (Fig. 3). The SPVF isolate GS with a single-stranded RNA genome of 9,122 nucleotides (nt) was assembled from the GS library. SPVF isolate GS contained six open reading frames (ORFs) encoding RNA-dependent RNA polymerase (RdRp), three triple gene block (TGB) proteins, one coat protein (CP), and one nucleic acid-binding protein (NABP) (Fig. 3e). A BLASTN search against the nucleotide (NT) database of the National Center for Biotechnology Information (NCBI) revealed that SPVF isolate GS shared 67% coverage and 76% nucleotide identity with the SPCFV isolate UN210 (KP115607.1) identified from Korea (Fig. 3f). The phylogenetic trees based on RdRp (Fig. 3g) and CP (Fig. 3h) amino acid sequences showed that SPVF belongs to the same clade as SPCFV, which is a member of the genus Carlavirus in the family Betaflexiviridae. We identified SPVF from the GS and IC libraries; however, the genome of SPVF isolate IC was incomplete. The phylogenetic tree using genome sequences for SPVF isolates GS and IC and three SPCFV isolates showed that two SPVF were grouped together. Species demarcation criteria for the genus Carlavirus are 72% nt identity or 80% aa identity between their respective CP or polymerase genes. RNA-dependent RNA polymerase (RdRp) and coat protein (CP) for SPVF showed 78.41% and 81.89% amino acid identity, respectively, demonstrating that SPVF is a new species in the genus Carlavirus (Table S1 and Fig. 3i).

Complete genomes of viruses infecting sweet potato and phylogenetic analyses

From sweet potato transcriptomes, we assembled genomes of several viruses infecting sweet potato. We assembled 12 nearly complete genomes covering ORFs for seven viruses including SPFMV, SPLCV, SPLV, SPVC, SPVE, SPVF, and SPVG (Table 5). Except for SPLCV and SPVF, all viruses belonged to the genus Potyvirus. In addition, we obtained several partial sequences for the identified sweet potato viruses.

Table 5 Detailed information for 12 assembled virus genomes from sweet potato transcriptomes.

In order to reveal the phylogenetic relationships of the identified viruses, we generated phylogenetic trees based on genome sequences (Table S2 and Fig. 4). The phylogenetic tree of SPFMV showed two well-defined groups, Group A and Group B (Fig. 4a). Group A contained three SPFMV isolates, IS, YJ, and IC, while Group B included a single SPFMV isolate GS. Interestingly, two SPFMV isolates, YJ-H and YJ-B, were genetically distant from other known SPFMV isolates. According to the phylogenetic tree, SPFMV isolate YJ-B is a common ancestor of known SPFMV isolates. Genome sequences of seven SPVC isolates in this study were used for phylogenetic tree construction (Fig. 4b). The phylogenetic tree of SPVC showed three defined groups. All SPVC isolates in this study belonged to Group A containing isolates from Korea, China, and Australia. The phylogenetic tree of SPVG revealed two groups of SPVG isolates (Fig. 4c). According to the phylogenetic tree, two SPVG isolates, IS and WT325, in Group B were distantly related to those in Group A. The phylogenetic tree using genome sequences of SPLV isolates revealed that three SPLV isolates were very closely related (Fig. 4d). In the case of SPV2, all four SPV2 isolates in this study were closely related (Fig. 4e). The phylogenetic tree of SPLCV displayed three defined groups (Fig. 4f). Group A included two SPLCV isolates, YJ and YJ-B, whereas Group C contained only SPLCV isolate GS. In particular, SPLCV isolate IC was genetically distant from other SPLCV isolates.

Figure 4
figure 4

Phylogenetic analyses of SPFMV, SPVC, SPVG, SPLV, SPV2, and SPLCV isolates. (a) Maximum likelihood phylogenetic tree of genome sequences for 33 SPFMV isolates including six isolates from this study indicated by orange color. SPVC was used as an outgroup. (b) Maximum likelihood phylogenetic tree of genome sequences for 28 SPVC isolates including seven isolates in this study indicated by orange color. SPFMV was used as an outgroup. (c) Maximum likelihood phylogenetic tree of genome sequences for 12 SPVG isolates including two isolates in this study indicated by orange color. SPV2 was used as an outgroup. (d) Maximum likelihood phylogenetic tree of genome sequences for eight SPLV isolates including three isolates in this study indicated by orange color. Plum pox virus (PPV) was used as an outgroup. (e) Maximum likelihood phylogenetic tree of genome sequences for 14 SPV2 isolates including four isolates in this study indicated by orange color. SPVG was used as an outgroup. (f) Maximum likelihood phylogenetic tree of genome sequences for 14 SPLCV isolates including four isolates in this study indicated by orange color. SPGVaV was used as an outgroup. For phylogenetic analyses, we retrieved only complete genome sequences for each virus from GenBank based on a BLASTN search. We used not only complete viral genome sequences with respective accession numbers but also nearly complete viral genome sequences (Table S2) without accession numbers in this study. Accession number, isolate name, and virus name were described. Orange color indicates virus genomes obtained from this study. We used bootstrap replication values of 1,000, and bootstrap values over 70% are shown.

Mapping of raw sequence reads and mutation rates of identified viruses

Viruses evolve faster than other organisms and exhibit strong genetic variation within the infected host. To determine virus mutations, we mapped raw sequence reads on the 12 virus genomes in which nearly complete genomes were assembled from this study (Figs. 5 and 6). Using SAMtools, we conducted variant calling for individual virus genomes resulting in the identification of single nucleotide polymorphisms (SNPs).

Figure 5
figure 5

Viral genome assembly and SNP analyses for six assembled viruses including SPVF, SPVE, SPVG, SPLCV, and SPLV. Genome organization of assembled virus genome, mapping results of sequence reads on the assembled virus genome, and positions of identified SNPs for SPVF isolate GS (a), SPVE isolate GS (b) SPVG isolate IS (c), SPLCV DNA A isolate GS (d), SPVG isolate YJ (e), and SPLV isolate YJ-B (f) were visualized. Based on assembled virus-associated contigs and mapping of sequence reads on the reference virus genome, viral genomes were obtained. Positions of individual ORFs were indicated. In addition, mapping results on the assembled individual virus genome were used for SNP identification. The positions of identified SNPs in each virus genome were visualized by the Tablet program.

Figure 6
figure 6

Viral genome assembly and SNP analyses for three SPFMV isolates and three SPVC isolates and mutation analyses of 12 virus genomes. Genome organization of assembled virus genome, mapping results of sequence reads on the assembled virus genome, and positions of identified SNPs for SPFMV isolate IC (a), SPVC isolate GS (b) SPFMV isolate YJ (c), SPVC isolate IC (d), SPFMV isolate YJ-B (e), and SPVC isolate IS (f) were visualized. Number of identified SNPs (g) and frequency of SNPs for the 12 virus genomes. Based on assembled virus-associated contigs and mapping of sequence reads on the reference virus genome, viral genomes were obtained. Positions of individual ORFs were indicated. In addition, mapping results on the assembled individual virus genome were used for SNP identification. The positions of identified SNPs in each virus genome were visualized by the Tablet program.

Firstly, we examined the mapping patterns of sequence reads on the individual virus. Many reads were mapped on the whole region of the virus without gaps, enabling us to obtain the nearly complete genome (Figs. 5 and 6). Most viruses such as SPVF, SPVG, SPLV, SPFMV, and SPVC showed that the number of mapped reads increased from the 5′ region to the 3′ region of the virus genome (Figs. 5 and 6). In particular, SPLCV DNA A having a circular DNA genome exhibited different mapping patterns from other viruses. For example, a relatively small number of reads were mapped on the 5′ and 3′ regions and the region between AV1 and AC3 (Fig. 5d). The number of mapped reads was reduced from the AC3 region to the AC4 region.

We next examined the number of SNPs in each virus genome (Fig. 6g). Three SPFMV isolates showed a large number of SNPs ranging from 512 (Isolate YJ) to 1,036 (Isolate YJ-B) followed by three SPVC isolates ranging from 345 (Isolate IS) to 399 (Isolate GS). SPLV isolate YJ-B had the smallest number of SNPs among the 12 virus genomes. We calculated the frequency of SNPs according to each virus genome (Fig. 6h). Again, three SPFMV isolates showed a high frequency of SNPs ranging from 4.71% (Isolate YJ) to 9.67% (Isolate YJ-B). SPVE, SPVF, and SPVG showed low frequencies of SNPs.

We examined the positions of SNPs in each virus genome (Figs. 5 and 6). In general, the identified SNPs were randomly distributed on the virus genome. In particular, two SPVG isolates had many SNPs on the 3′ end region of the virus.

Development of molecular methods to diagnose viruses infecting sweet potato

In this study, we identified a total of 10 different viruses infecting sweet potato. Based on the virus-associated reads in each library, SPCFV (16 reads) was identified from only one library (IS), while SPFMV (198,658 reads) was identified from eight libraries (Table 6). It is important to confirm RNA-Seq results and to develop molecular diagnostic methods for viruses infecting sweet potato. For that, we designed primer pairs for 10 viruses. RT-PCR primers for nine viruses were designed for RNA viruses with a single RNA genome, while PCR primers were designed for SPLCV with a single circular DNA genome based on the obtained virus sequences (Table S2 and Fig. 7a). A primer pair amplifying a partial actin gene was used as a positive control. We extracted total RNA and DNA from the same samples used for RNA-Seq. Amplified PCR products from each sample were visualized by gel electrophoresis (Figs. 7b and S1). In general, PCR results were correlated with those of RNA-Seq (Fig. 7b). For example, SPFMV was identified from nine libraries (all except YA) by RT-PCR, whereas SPFMV was identified from eight libraries (all except GS and YA) by RNA-Seq (Fig. 7b). In general, the PCR and RNA-Seq results were identical for SPV2, SPLCV, SPVF, SPSMV, and SPLV. In the case of SPVC, SPVE, and SPCFV, RT-PCR identified an additional virus infection compared to RNA-Seq. Based on genome sequence and phylogenetic analyses, SPVG isolate IS was distantly related with the other SPVG isolates. Therefore, we designed an additional primer pair for SPVG isolate IS. The primer pair for SPVG IS with a size of 702 bp could amplify SPVG isolate IS with high specificity.

Table 6 Information of virus-associated reads for viruses identified in sweet potato in Korea.
Figure 7
figure 7

Confirmation of 11 identified viruses infecting sweet potato by RT-PCR. (a) Position of amplicon on individual virus genome was indicated by gray bar with respective size of amplicon. The detailed information of primer pairs can be found in Table S3. (b) Agarose gel electrophoresis results by RT-PCR with newly designed primer pairs. Full-length gels of RT-PCR results can be found in Fig. S1 in the Supplementary Information. Actin gene of sweet potato was used as positive control. We used the same total RNA for both NGS and RT-PCR. Green color indicates RT-PCR primer pairs for two novel viruses, SPVE and SPVF, as well as a variant of SPVG.

Discussion

The word “virome” is derived from two words, “virus” and “genome,” which suggests it is similar to a viral genome28. Strictly speaking, a virome should be defined as all virus-associated nucleic acids existing in a specific organism, tissue, or environment29. A virome study should include viral genome information for not only a single virus but also multiple viruses. Moreover, a virome study reveals not only the presence of viruses but also the complexity of viral genomes, viral replication, viral mutation, and the change of viral genomes in a certain condition, as shown previously24,26,27.

Research associated with viruses infecting sweet potato has been intensively conducted in many countries14,17,23,30. However, there have been few comprehensive studies associated with sweet potato viromes. For instance, a previous study examined sweet potato viromes in three different sweet potato cultivars in China by NGS, revealing a total of 15 different viruses infecting sweet potato21. In addition, a large-scale sweet potato virome study is currently being conducted by the small RNA-Seq of 1,750 sweet potato samples collected from 12 countries in Africa, where the sweet potato is one of the main crops (http://bioinfo.bti.cornell.edu/virome/index).

In Korea, several studies have been carried out to understand viruses infecting sweet potato. For example, a nationwide survey was carried out to examine viruses infecting sweet potato from 2011 to 201414. Based on multiplex RT-PCR assays, there were at least eight different viruses infecting sweet potato in Korea16. Furthermore, viral genomes of 18 isolates representing five potyviruses in Korea were determined15. As compared to previous studies associated with viruses infecting sweet potato in Korea, our sweet potato virome study provides valuable additional information associated with viruses infecting sweet potato in Korea. We revealed a total of 10 different sweet potato viromes representing eight different geographical regions and two major sweet potato cultivars in Korea. Our study demonstrated that each geographical region and cultivar displayed a unique sweet potato virome composed of different viruses, as shown in other plant viromes27,31. For instance, three viruses were identified from the GJ, HN, and NS regions; however, SPVC was dominantly present in GJ and HN, while SPFMV was the dominant virus in NS. The high proportion of SPLCV, which is transmitted by the whitefly, in the NS region compared to other regions suggests that insect vectors could be environmental factors determining the individual sweet potato virome. Samples for YJ-B and YJ-H were collected from the same region, Yeoju; however, the lists and proportions of infected viruses were very different, showing cultivar-specific sweet potato viromes.

Here, we identified a total of 10 different virus species infecting sweet potato. Our RNA-Seq-based approaches identified novel viruses and new virus variants more successfully than a PCR-based viral genome study, which could amplify highly conserved viral sequences15. In particular, the rare similarity of the 5′ regions in SPVE and SPVF with other homologous viruses highlights the superiority of NGS techniques followed by bioinformatics analyses. Although SPVF in the genus Carlavirus was closely related to SPCFV, the BLAST results and phylogenetic analyses revealed that SPVF is a new species. Two SPVF isolates, GS and IC, in the same group were different from other SPCFV isolates. Moreover, our phylogenetic analyses demonstrated that Sweet potato yellow mottle virus (SPYMV) isolate Yeongdeok (KR072674.1) should be a member of SPCFV. Moreover, two SPVE isolates, GS and YJ-B, were closely related to SPVC. However, the phylogenetic tree using polyprotein sequences of potyviruses infecting sweet potato demonstrated that SPVE was a new species in the genus Potyvirus.

Our sweet potato virome study indicated that SPFMV was the dominant virus followed by SPVC and SPVE in the examined regions in Korea. In addition, these three viruses are regarded as the main viruses infecting sweet potato in the world. We identified all previously reported viruses infecting sweet potato in Korea except Sweet potato golden vein-associated virus (SPGVaV) in the genus Begomovirus, which is rare in Korea16,32. Moreover, we did not identify SPCSV in the genus Crinivirus, which has been reported in many countries including China and Uganda33,34. Therefore, it is important to prevent the introduction of SPCSV to Korea.

Plant tissues and developmental stages are important factors for plant virome studies. Our previous studies demonstrated that viral RNA was enriched in grape fruits and lily flowers24,35. Of course, the optimal tissues for plant virome studies depend on the plant species. In our study, the proportion of virus-associated reads in the whole transcriptome was very low, indicating that leaf tissues might not be appropriate samples for the detection of viruses infecting sweet potato. As shown in a previous study36, we carefully suggest using samples from the fibrous and tuberous roots of sweet potato for virus detection. Based on our experience studying viruses infecting plants with a large genome size such as the hexaploid sweet potato (4.4 Gb)36, selection of the proper tissue enriched with viruses is necessary for a successful virome study.

In this study, we used messenger RNA from total RNA for the library preparation using oligo-d(T). Eight RNA viruses in this study possessed polyadenylate (poly(A)) tails (all except SPLCV and SPSMV). As shown in other previous studies21,35, virus genomes with poly(A) tails can be easily assembled. It is also not surprising that genomes of DNA viruses such as SPLCV can also be assembled from transcriptome data, as shown in our previous studies26. In many cases, the number of mapped sequence reads on the viral genomes with poly(A) tails was increased from the 5′ region to the 3′ region. This result is somehow correlated with the result of poly(A) tails preferentially attaching to the transcripts close to the poly(A) tails.

We examined the mutation rates for the 12 assembled virus isolates. Of them, SPFMV showed a high mutation rate of up to 9.67%. Furthermore, a recent study has found possible recombination within the Nla-Pro, CP, and P1 genes using available SPFMV genomes30. In addition, SPLCV also exhibited a high frequency of mutations of up to 4.52%. Similarly, several Korean SPLCV recombinants have been identified13. Thus, our result suggests that mutation and recombination contribute to the genetic diversity of SPFMV and SPLCV isolates. Although several potyviruses were coinfected, the mutation rate was the highest in SPFMV followed by SPVC, SPVG, and SPLV. SPLV showed the lowest mutation rate among the examined potyviruses. This result suggests SPFMV might play an important role in viral disease symptoms in coinfected sweet potato plants.

Here, we examined sweet potato viromes for two popular sweet potato cultivars in Korea, Beni Haruka and Hogammi. Beni Haruka originates from Japan, while Hogammi was recently developed by the Rural Development Administration (RDA) in Korea. As they are popular for roasting, many growers cultivate both cultivars. In the case of Beni Haruka, at least seven different viruses were identified, whereas five different viruses were identified from Hogammi. Interestingly, both cultivars were cultivated in the same field in Yeoju, Korea; however, Beni Haruka was more severely infected by viruses than Hogammi. The difference in the number of infected viruses between the two cultivars might be correlated with the cultivation period. That is, Beni Haruka has been cultivated for a long time, while Hogammi was introduced to Korean growers only three years ago by the RDA.

It is now possible to de novo assemble many viral genomes from transcriptome data37. Similarly, we obtained 12 complete virus genomes and 18 nearly complete genomes for eight viruses. A total of 30 assembled viral genomes were further used for phylogenetic analyses. Our phylogenetic analyses showed that many isolates of SPVC, SPLV, SPV2, and SPLCV in this study were grouped together with other known isolates from Korea. In general, it is likely that virus genomes are highly correlated with geographical regions. However, two isolates of SPFMV and a single isolate of SPVG were genetically distant from other known isolates. For example, two SPFMV isolates, YJ-H and YJ-B, were revealed as common ancestors of other known SPFMV isolates. Furthermore, SPVG isolate IS in this study and SPVG isolate WT325 from Taiwan were genetically distant from other SPVG isolates, suggesting they are new variants of SPVG.

As shown in other previous studies, the coinfection of multiple viruses in sweet potato resulted in the dramatic reduction of sweet potato production by up to 50%38. Unfortunately, most sweet potato cultivars in Korea were severely coinfected by many viruses14. There are several possible reasons explaining how viruses infect sweet potato plants. The first is vegetative propagation. Most growers purchase sweet potato sprouts from seedling markets, and they are often already infected by diverse viruses. Surprisingly, a recent study showed that most sweet potato germplasm (83.8%) in the Bioenergy Research Center of the RDA in Korea, which is used to develop new cultivars or provide sweet potato cuttings, was already infected by multiple viruses14. Furthermore, a recent study demonstrated that SPLCV can be transmitted by seeds, suggesting the newly developed sweet potato cultivars might also be highly infected by viruses8. Thus, it seems that the sweet potato virus control in Korea should be conducted from the early stages of breeding.

Based on the above evidence, it is necessary to develop virus-free sweet potato cultivars in order to prevent the damages caused by viral diseases, as suggested previously38. For that, knowledge of the viruses infecting sweet potato and diagnostic methods are needed. In this study, we provide a complete list of viruses infecting sweet potato in Korea and diagnostic methods, which could be valuable information for the development of a virus-free sweet potato in Korea in the near future. Moreover, we suggest that the development of a virus-free sweet potato in Korea should be carried out by not only national institutes such as the RDA but also universities and companies. We have to provide various reasonable choices to growers to promote competition among different developers of virus-free sweet potatoes. Dependence on a single national institute does not guarantee the production of a high-quality virus-free sweet potato, as we have seen.

Methods

Collection of sweet potato samples

We collected sweet potato leaf samples from eight different geographical regions in Korea in May 2016 and May 2017 (Table 1 and Fig. 1a). The eight regions are the main sweet potato producing areas in Korea. Most collected samples showed viral disease symptoms; however, we also collected leaf samples without any visible disease symptoms. In order to examine viruses infecting sweet potato in different geographical regions, leaf samples were pooled according to geographical regions and used for total RNA extraction. In addition, leaf samples from two major sweet potato cultivars referred to as “Beni Haruka” and “Hogammi” were also collected to compare sweet potato viromes in different sweet potato cultivars. As a result, a total of 357 samples were used for 10 different libraries.

Total RNA extraction and library preparation for RNA-Seq

Leaf samples were pooled and then frozen in the presence of liquid nitrogen. The frozen leaf samples were ground with a pestle and mortar. We used the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) to extract total RNA for RNA-Seq based on the manufacturer’s instructions. The quality and quantity of extracted total RNA were measured by gel electrophoresis followed by using an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, U.S.A.). Using the extracted total RNA, we generated 10 different RNA-Seq libraries using the NEBNext Ultra RNA Library Prep Kit for Illumina in accordance with the manufacturer’s instructions (NEB, Ipswich, Massachusetts, U.S.A.). Briefly, poly(A)-tailed mRNA was extracted by using poly-T oligo-attached magnetic beads. We synthesized the first strand of cDNA from the extracted mRNA followed by a second strand of cDNA. The 3′ ends of the DNA fragments were adenylated. After adapter ligation, we performed PCR amplification to selectively enrich DNA fragments with adapters and amplify the large amount of DNA in the library. We again measured the quality and quantity of each library using the 2100 Bioanalyzer. The six prepared libraries from 2016 were paired-end sequenced by Theragen (Suwon, South Korea) using the HiSeq2300 platform, and the four libraries from 2017 were paired-end sequenced by Macrogen Co. (Seoul, South Korea) using the HiSeq2000 platform.

De novo transcriptome assembly and virus identification

In the bioinformatics analyses including the de novo transcriptome assembly and BLAST search, a workstation with two 20-core CPUs and 256 GB of RAM installed with Ubuntu 16.04.4 LTS was used. Based on the results of our previous studies31, we only used the Trinity program (version 2.0.2, released January 22, 2015) with default parameters for de novo transcriptome assembly39. Raw sequence reads from each library were de novo assembled using Trinity. We generated our own plant viral genome database by selecting only viruses infecting plant species from the viral reference database of the NCBI (https://www.ncbi.nlm.nih.gov/genome/viruses/). To identify virus-associated contigs, the assembled contigs (transcriptome) from each library were blasted against the plant viral genome database using MEGABLAST40 with a cutoff E-value of 1e–6. The obtained virus-associated contigs were again blasted against NCBI’s NT database to remove sweet potato host sequences and other contaminated sequences. Ultimately, only virus-associated contigs were used for the study of sweet potato viromes. To calculate the number of virus-associated reads for identified viruses, the BBMap program was used (https://sourceforge.net/projects/bbmap/).

Virus genome assembly and annotation

Based on the BLAST results, the virus-associated contigs in each library were aligned on the identified virus reference genomes using the ClustalW program implemented in the MEGA7 program41. Several nearly complete viral genome sequences were obtained without any further alignment. Poly(A) tail sequences at the 3′ terminal were deleted. We again aligned raw sequence reads on the identified virus genome sequences to fill the missing gaps of the virus genome using a Burrows–Wheeler Aligner (BWA) program with default parameters42. To predict the ORFs in each virus genome, we used the ORF Finder program (https://www.ncbi.nlm.nih.gov/orffinder/). In addition, we manually checked the identified ORFs and the 5′ and 3′ untranslated regions (UTRs) by comparing the corresponding reference virus genome. Twelve complete viral genome sequences covering whole ORFs were deposited in NCBI’s GenBank database with respective accession numbers (Table 5).

Phylogenetic analyses of identified viruses

For phylogenetic tree analyses, we used the virus genome sequences assembled in this study (Table S2) as well as the available viral genome sequences from GenBank for eight viruses. We used RdRp amino sequences, CP amino sequences, and a complete genome sequence for SPVF and its homologous sequences, whereas we used a polyprotein amino acid sequence and a complete genome sequence for SPVE and its homologous sequences. In the case of six viruses (i.e., SPFMV, SPVC, SPVG, SPLV, SPV2, and SPLCV), complete genome sequences obtained in this study as well as available complete viral genome sequences from GenBank were used. Six SPFMV genomes, seven SPVC genomes, two SPVG genomes, three SPLV genomes, four SPV2 genomes, and four SPLCV genomes were used for the construction of phylogenetic trees in this study. For the individual virus, viral sequences were aligned using the ClustalW program. The aligned nucleotide sequences or amino acid sequences were used for phylogenetic tree construction using the MEGA7 program with the maximum likelihood method and 1,000 bootstrap replicates41.

Identification of SNPs for 12 assembled virus genomes

For virus SNP identification, it is important to use the assembled virus genome sequences in each library as reference virus genome sequences to increase SNP specificity. We analyzed single SNPs for the 12 assembled virus genomes as described previously27. The raw sequence reads in individual libraries were aligned on the assembled viral genome using the BWA program with default parameters, resulting in the Sequence Alignment Map (SAM) files. The SAM files were converted into Binary Alignment Map (BAM) files using SAMtools43. After that, the sorted BAM files were used to generate the Variant Call Format (VCF) file format using the mpileup function of SAMtools for SNP calling. Finally, we called SNPs using BCFtools implemented in SAMtools. The positions of identified SNPs and mapped reads on each viral genome were visualized by the Tablet program44.

RT-PCR assay

In order to confirm the results of RNA-Seq, we carried out RT-PCR. For that, we designed 12 RT-PCR primer pairs. An actin gene of sweet potato was used as a positive control. In the case of SPVG, two different primer pairs, SPVG and SPVG_IS, were designed. The SPVG_IS-specific primer pair can amplify the variant of SPVG identified from the IS region. Detailed information of designed primer pairs can be found in Table S3. The regions of each virus amplified by RT-PCR can be found in Fig. 7a. We used the same total RNA from the pooled samples as a template RNA for the RT-PCR assay. RT-PCR was performed using the DiaStar OneStep RT-PCR Kit (SolGent, Daejeon, Korea). As described previously31, the RT-PCR conditions were 50 °C for 30 min, 95 °C for 15 min, followed by 30 cycles at 95 °C for 20 sec, 50 °C to 56 °C for 40 sec (the annealing temperature can be varied depending on the Tm values of primers), and 72 °C for 1 min, with a final extension at 72 °C for 5 min. We checked the amplified RT-PCR products by gel electrophoresis followed by EtBr staining. Furthermore, we cloned the amplified RT-PCR product in the pGEM-T-Easy Vector (Promega, Wisconsin, US) followed by Sanger sequencing to confirm the sequences of amplified PCR products.

To confirm complete genome sequences of SPVE and SPVF, we carried out RT-PCR using newly designed primers (Table S4 and Fig. S2). The amplified RT-PCR products were visualized by gel electrophoresis (Fig. S2). We confirmed amplicon sequences by cloning into the pGEM-T-Easy Vector and Sanger sequencing.