Viral metagenomics reveals sweet potato virus diversity in the Eastern and Western Cape provinces of South Africa

a Biotechnology Platform, Agricultural Research Council, Private Bag X5, Onderstepoort 0110, South Africa b Department of Life and Consumer Sciences, University of South Africa, Private Bag X6, Florida 1710, South Africa c ICRISAT (ICRAF Campus), PO Box 39063-00623, Nairobi, Kenya d Vegetable and Ornamental Plant Institute, Agricultural Research Council, Private Bag X293, Pretoria 0001, South Africa e School of Molecular and Cell Biology, University of the Witwatersrand, Private Bag 3, Johannesburg PO Wits 2050, South Africa


Introduction
Sweet potato (Ipomoea batatas (L.) Lam.) is a dicotyledonous perennial plant belonging to the morning glory family Convolvulaceae. It produces edible, highly nutritious tubers and is ranked as the 3rd most important root crop and the 7th most important staple crop in the world (Valverde et al., 2007;Clark et al., 2012). Sweet potato is an attractive crop to resource-poor farmers as it is easy to grow, high yielding, is drought and heat tolerant, and crowds out weeds (Kays, 2004). In countries such as Zambia, South Africa (SA), Uganda, Kenya, Nigeria, and Tanzania women are the primary growers of sweet potato and they use it to generate income. Sweet potatoes have a high content of carbohydrates and dietary fibre; they are rich in vitamin A, vitamin C, vitamin B6 and because of these nutritional benefits they are used for poverty alleviation (Van Jaarsveld et al., 2005). A number of DNA and RNA viruses accumulate in the crop as a result of vegetative propagation (Valverde et al., 2007), and pose a serious threat to sweet potato production (Valverde et al., 2007;Kreuze et al., 2009;Tesfaye et al., 2011). It has been recorded that viral diseases can decrease yield and quality of sweet potato storage roots by 30-100% in countries such as the United States (Valverde et al., 2007), Peru (Cuellar et al., 2008), SA (Domola et al., 2008), Kenya (Ateka et al., 2004) and Ethiopia (Tesfaye et al., 2011).
In the last two decades, virus detection methods have shifted from traditional techniques to metagenomic approaches coupled with high throughput sequencing (Boonham et al., 2014). Viral metagenomics has been used to identify novel viruses in plants (Kreuze and Fuentes, 2008;Idris et al., 2014). This approach is considered an unbiased one for viral detection since no prior knowledge of the virus is necessary, and neither virus-specific primers, nor antibodies are required. Consequently novel viruses, if present, can be detected, identified and quantified in a single experiment (Studholme et al., 2011). However, in the absence of reference sequences, the use of high throughput sequencing for virus detection requires de novo genome assembly of new virus sequences, which can be a challenge. A metagenomic approach also means that the entire microbial community within a sample can be described, even in mixed viral interactions, thus simplifying diagnostics (Idris et al., 2014). In most cases, the virus sequences generated in a metagenomic study would form a small proportion of the total nucleic acids making the removal of host sequences critical prior to, or after sequencing (Stobbe and Roossinck, 2014). For this reason, enrichment methods such as isolation of dsRNA (Clark et al., 2012) and small interfering RNA (siRNAs) (Kreuze et al., 2009) have been employed to detect DNA and RNA viruses from different hosts (Kashif et al., 2012). The availability of NGS platforms such as those supplied by Illumina (Illumina Inc., San Diego, CA, USA) has further revolutionized viral metagenomics studies. NGS technologies generate large amounts of data rapidly at reduced costs and many bioinformatic tools have been developed to handle data analysis (Massart et al., 2014). This study was carried out with the objective of establishing the current status of sweet potato viruses in two South African provinces (Western Cape and Eastern Cape). NGS of a symptomatic sweet potato field sample revealed a mixed infection of six viruses [SPMaV, SPLCSPV, SPFMV, SPCSV, SPVG and Sweet potato virus C (SPVC)], for the first time in SA.

Description of symptoms
Sweet potato plants collected from the field were maintained in the glasshouse and observed for symptom development over a period of 6 months. Plants exhibiting symptoms typical of viral infection, such as upward curling of the leaves, chlorotic spots, vein clearing, and purple ring spots, were selected for analysis by sequencing. Symptom severity was scored using a 1-5 scoring scale (Mwanga et al., 2001;Domola et al., 2008) where, 1 = no virus symptoms, 2 = mild symptoms (chlorotic spots), 3 = moderate symptoms (chlorosis, chlorotic spots and vein clearing), 4 = severe symptoms (chlorotic spots, leaf curl, and leaf puckering/necrosis) and 5 = very severe symptoms (chlorotic spots, leaf curl, mottling, and stunting).

RNA library preparation and sequencing
After a period of 6 months, 10 symptomatic and 7 asymptomatic plants were randomly selected for further analysis (

Sequence assembly for RNA libraries
Adaptor removal and quality trimming was performed for each of the 17 RNA libraries using Fastq-mcf (Aronesty, 2013) and the quality of the sequence reads was analysed using FastQC. A quality threshold of 30 and a Phred score of 33 were selected for trimming options. Sequence reads below the length of 50 base pairs (bp) and greater than 180 bp were discarded. The trimmed reads were aligned to the sweet potato chloroplast sequence and sweet potato expressed sequence tags (ESTs) to subtract host sequences. The unmapped sequence reads were assembled into contigs using the CLC Genomics Workbench de novo assembly tool, with default parameters. The contigs were subjected to BLASTn and BLASTx searches against viral sequences downloaded from the NCBI database. To generate consensus sequences for phylogenetic analysis, sequence reads and contigs matching sweet potato viruses were mapped to the full-length reference genomes, using alignment settings: length fraction = 0.5 and similarity = 0.9. The newly assembled sequences for SPVG, SPFMV, SPVC, SPCSV RNA1 and SPCSV RNA2 were deposited in GenBank under accession numbers KT069224, KT069222, KT069223, KX932096 and KT069221 respectively.

Data analysis for 17 RCA libraries
The libraries underwent quality trimming using Fastq-mcf (Aronesty, 2013). A quality threshold of 30 and a Phred score of 33 were selected for trimming options. Sequence reads below the length of 50 bp and greater than 150 bp were treated as low quality and discarded. The trimmed sequence reads generated from the RCA libraries were aligned to the partial sweet potato chloroplast genome (accession number KF242475) and the sweet potato mitochondrial DNA (accession number FN421476) using the CLC Bio Genomics Workbench (version 7.5.1) (CLC bio, Aarhus, Denmark) in order to filter out host sequences. The unmapped sequence reads were collected and then assembled into contigs using the CLC Bio Genomics Workbench de novo assembly tool. The contigs generated from the de novo assembly were subjected to BLASTn and BLASTx searches against the viral databases downloaded from the NCBI database. The full-length reference sequences of the viruses detected in the BLAST searches were retrieved and used in subsequent reference-guided assemblies. The sequence reads and contigs matching sweet potato viruses were mapped to the full genomes of the closest hits, using mapping settings: length fraction = 0.7 and similarity = 0.9 in the CLC Bio Genomics Workbench. The newly assembled sequences for SPLCSPV and SPMaV were deposited in the GenBank under accession numbers KX859238 and KX859239 respectively.

Phylogenetic analysis
The complete genome sequences of SPFMV, SPVG, SPVC, SPCSV, SPLCSPV and SPMaV were retrieved from the NCBI database and used for multiple sequence alignments (MSA) and phylogenetic analysis in MEGA 6.06 (Tamura et al., 2013). The neighbour-joining method was used to generate phylogenetic trees. The bootstrap tests were conducted using 1000 replicates and the evolutionary distances were computed using the Jukes-Cantor method.

RT-PCR and PCR confirmation of viruses identified by NGS
Full-length genome sequences of closely related isolates of SPVG, SPFMV, SPVC, SPCSV, SPLCSPV and SPMaV were downloaded and used in multiple sequence alignments (MSA). Alignments were submitted to the IDT PrimerQuest Tool (http://eu.idtdna.com/primerquest/home/ index) for primer design. Conserved regions such as the coat protein gene were targeted for primer design. The primer sequences used in the study are listed in Table 1. Since DNA and RNA viruses were identified in sample KT10, genomic DNA (gDNA) was isolated from sample KT10 using the QIAGEN DNeasy Plant Mini Kit (QIAGEN Inc., Valencia, CA, USA) following manufacturer's instructions. Total RNA was also isolated from sample KT10 using the RNeasy Plant Mini Kit (QIAGEN, Valencia, CA, USA). The total RNA was converted to cDNA using the TaKaRa PrimeScipt 1st strand cDNA synthesis kit (TaKaRa, Japan). The polymerase chain reactions (PCRs) were performed using the TaKaRa EmeraldAmp® GT PCR Master Mix (TaKaRa, Japan) following the manufacturers instructions. PCR's consisted of 12.5 μl of the TaKaRa Emerald GT PCR Master Mix (TaKaRa, Japan), 0.5 μM of each primer (10 mM), 2 μl of the template DNA and 9 μl of nuclease-free water. The PCRs were performed using the recommended thermal cycling conditions: Initial denaturation at 94°C for 5 min, 35 cycles of 94°C for 30 s, annealing temperature (57-63°C) for 45 s, extension at 72°C for 2 min, and the final extension for 10 min at 72°C. The amplicons were visualised on a 2% agarose gel by electrophoresis. The PCR amplicons were sent to Inqaba Biotechnical Industries (Pty) Ltd. service provider for direct  Sanger sequencing. Sequences were analysed using the Sequence Scanner Software v2.0 (Applied Biosystems). The edited sequences were then subjected to BLASTn and BLASTx searches to determine their identities.

Field symptoms associated with viral infection
In the field a variety of symptoms were observed on sweet potato plants. Symptoms ranged from upward curling of leaves, purpleedged vein feathering and purple ring spots in samples KT10, F11, KF1 and L18; to chlorotic spots in KT6, P2, Z24, and KZ17; and vein clearing in Z24, while other samples were asymptomatic (Table 2 and Fig. 1). Symptom severity scoring of field samples can be viewed in Table 2. The most severe symptoms were observed on sample KF1, which was collected from the WC province. After one week of harvesting the leaves from sample KF1, the whole plant died. Samples F11 and KT10 displayed moderate symptoms characterised by purple ring spots and leaf curling respectively. These plants were also collected from the WC province. The phenotypic data suggests that sweet potato viruses were more prevalent in the WC, since most of the samples collected from this province were symptomatic. Interestingly, during sample collection in the WC province, unidentified whitefly species were observed on the sweet potato leaves. These could be possible vectors of several viruses that are associated with the symptoms recorded (e.g. leaf curling is associated with begomoviruses). Only two of the six samples from the EC province (KZ17 and Z24) displayed noticeable virus symptoms (Table 2). Furthermore during field collection sweet potato leaves were either symptomless or displayed mild symptoms in each of the four sampling locations of the EC province.

Sequence data and de novo assembly
The 17 individually labelled RNA libraries were sequenced to generate approximately 7 gigabases (Gb) of data. The primary data consisted of over 19 million sequences. After quality control, 56% of the data was retained for further analysis. Sixty eight percent of the retained data aligned to sweet potato chloroplast and EST sequences. For each sample, Fig. 2. Genome coverage achieved by de novo assembly. Total contigs assembled for SPVC aligned along the full genome (a); contigs aligned to SPFMV covering the partial polyprotein sequence (b), the complete sequences of the HC-Pro, P3, 6 K1, CI, 6 K2, Nia-VPg, Nia-Pro, and NIb and partial CP genes were obtained; four contigs aligned to SPVG from sample F11 (c); the RNA2 segment of SPCSV had contigs aligned to the p6, hsp70h, p60, p8, CP, mCP, and partial p28 proteins (d); 4 contigs spanning over the p227 and RdRp proteins for RNA1 segment of SPCSV (e). The total number of contigs generated for DNA viruses SPMaV (f) and SPLCSV (g) achieved 80-100% genome coverage.  sequence reads that did not map to host sequences were assembled to generate contigs varying in number and size (Table 3). The de novo assemblies generated large numbers of contigs, with the largest contigs being in the range of 5-10 kilobases (kb), while the N50 values were in the range 300-486 bp (Table 3).

Detection of RNA viruses
The analysis of the assembled contigs was undertaken by matching all contigs to the viral sequences in the NCBI database using BLASTn and BLASTx. Overall, the majority of the assembled contigs showed no similarity to viral sequences, but significant matches to known sweet potato viruses were found in a large number of samples isolated from symptomatic plants (e.g. KT10, KF1, KT6, F11, L18, L9, P2, F4, Z24 and KZ17). Notably, five samples (M19, FH14, M11, K10 and F22) from asymptomatic plants had no detectable viral sequences in this analysis. Low counts of virus sequences were detected in two asymptomatic samples (L11 and P14) collected from the WC province. The viral sequences identified in each sample are shown in Table 4a. Viral sequences accounted for 1-2% of the total sequence data that was generated from the RNA libraries. In many cases the RNA virus genomes were assembled into a single contig ( Fig. 2a and Table 4a) with a large number of reads supporting the contig assembly. In other instances there were lower levels of infection and only fragmentary assemblies of the genomes were achieved, however the total contig assemblies represented almost complete genome sequences when aligned with the reference genomes ( Fig. 2 and Table 4a).
The BLAST search results revealed the presence of SPCSV RNA1 segment and SPCSV RNA2 segment in 3 samples. The total assembled contigs for SPCSV RNA1 represented a maximum of 72% of the East African strain reference sequence (accession number AJ428554), and for RNA2 the contigs represented a maximum of 92% of the m2-74 RNA2 sequence (accession number HQ291260). The two largest contigs generated for RNA1 were from samples KT10 and L18, at lengths of 7965 bp and 7984 bp respectively. Alignment and pairwise comparisons of these two contigs showed that were identical, except for the 5′ and 3′ ends, due to incomplete sequences. The RNA1 sequences shared 76% nucleotide (nt) similarity to the Ugandan reference sequence, while the RNA2 segment shared 97% nt identity to the Peruvian m2-74 isolate. This suggests that the two genome segments are of distinct origin, based on the widely divergent sequences observed. The two contigs matching to RNA1 of SPCSV were merged to generate a reference contig for extension with the PRICE genome assembler software using default parameters (Ruby et al., 2013). The final contig generated using PRICE was 8572 bp.
SPFMV was detected in 8 samples. The largest contig for SPFMV, 10540 bp, was assembled from sample L18. This contig covered 97% of the reference genome and shared 94% sequence identity with the SPFMV 10-O strain (accession number AB439206). Over 80% genome coverage was achieved for the de novo assembly of SPFMV in 7 samples (Table 4a) and over 50,000 sequence reads were assembled into 45 contigs. According to the de novo assembly, SPVC was the second most prevalent virus in the samples analysed. It was detected in 6 samples and the longest contig of 10442bp (Table 4a), which was assembled from sample KZ17, had a sequence depth of 477-fold. The contig shared 95% nt identity with the SPVC isolate from Argentina (accession number KF386015) and maximum genome coverage of 96% was achieved. BLAST search results also revealed the presence of SPVG in samples KT10 and F11. The largest contig of 6291 bp was assembled from sample F11. A total of 4 contigs were generated for SPVG, and when these were aligned to the reference genome a consensus of 10,577 bp, representing 97% of the genome, was generated (Fig. 2c). The consensus sequence shared 98% nt identity with an isolate from Argentina (accession number JQ824374). PCR amplification of the coat protein genes for the 4 RNA viruses was successful (Fig. 5). The PCR amplicons were Sanger sequenced to confirm their identity. The optimised PCR assay has the potential to be used in future studies for easy detection of SA virus isolates, especially SPCSV, which could not be detected previously using primers available in the literature.

Detection of DNA viruses
Interestingly gene transcripts of ssDNA begomoviruses (SPLCSPV and SPMaV) and sweet potato caulimo-like virus (SPCV) were detected in the RNA dataset (Table 4a). This could have occurred possibly as a result of purification of DNA viral transcripts. High genome coverage was achieved for SPLCSPV detected in sample KT10. While fragmentary assemblies were generated for SPMaV which was detected in samples KT10 and KF1. We conducted RCA and deep sequencing to generate full-length genomes for the DNA viruses. The individually labelled RCA libraries generated over 1,1 Gb of sequence data. Fifty nine percent of the paired-end data (approximately 2,3 million reads) was retained after adaptor and quality trimming and used for down stream analysis. Over 800 thousand reads mapped to the host sequences and the remaining 65% of the - clean reads were used for de novo assembly. The two DNA viruses (SPLCSPV and SPMaV), detected in the RNA datasets were also detected in the RCA libraries (Table 5). The de novo assembly generated a total of 11 contigs that were identified as SPMaV and 21 contigs were matching SPLCSPV (Table 5 and Fig. 2f & g). The consensus sequence for SPMaV shared 98% nt similarity to the SA SPMaV-isolate PstI-01 (accession number JQ621843), while the SPLCSPV consensus sequence shared 99% nt sequence similarity to the SA SPLCSPVisolate PstI-012 (accession number JQ621844). DNA viruses were detected in only 4 of the 17 plants that underwent RCA and deep sequencing (samples KT10, KF1, KT6 and L18) ( Table 5). Begomoviruses were not detected in the plants collected from the EC. Both SPMaV and SPLCSPV were detected previously in SA from the Limpopo province in 2012, and now detected in this study for the first time in the WC province. A PCR assay for the two begomoviruses was performed and the expected band sizes of approximately 322 bp and 314 bp for SPLCSPV and SPMaV respectively were obtained (Fig. 5). Sanger sequencing confirmed the identity of the begomoviruses.

De novo assembly efficiency
When contigs were aligned to full-length viral genomes, gaps were observed in the consensus sequences. The only virus that was assembled with no gaps was SPVC (Fig. 2a). From this study we noticed that near full-length virus genome could be de novo assembled using datasets of 75 to 300 Mb (e.g. samples L18, F11, KT10, KZ17, Z24 and KF1) (Table 4a). This resulted in overall high genome coverage and sequence depth. In cases where large amounts of data were generated and there was low viral sequence count (e.g. P14) or no virus detection (M19, FH14, M11, K10 and F22), it was concluded that viruses were either absent or present at very low concentrations. The sequence data is supported by the phenotypic data. Samples M19, FH14, M11, K10, F22 and P14 showed no visible symptoms prior to sequencing (Table 2 and Fig. 1). The de novo assembly approach is effective for virus discovery and for the assembly of near complete viral sequences. This strategy is also efficient in the assembly of distinct viral sequences, where reference-guided assembly could pose a limitation.

Reference sequence-guided assembly
The reference-guided assembly showed that a total of 43,224 sequence reads originated from SPFMV (Table 4b). SPVC specific sequence reads were 41,265, while the Crinivirus (SPCSV) had a total of 12,879 sequence reads, and only 3862 sequence reads mapped to SPVG. Four new South African RNA virus genomes were generated from the reference-guided assembly. The SPCSV RNA2 segment sequence was 8210 bp long with a sequence depth of 104-fold (accession number KT069221); the SPFMV genome sequence was 10803 bp long and had a sequence depth186-fold and the new SPVG isolate was 10739 bp long, with a sequence depth of 39-fold (accession number KT069224). The new South African SPVC genome was 1 nucleotide (nt) longer than the reference sequence (10794 bp), and was assembled with a sequence depth of 457-fold (accession number KT069223). Few sequence reads aligned to the Peruvian and Ugandan SPCSV RNA1, possibly as a result of high variability within the South African genome sequence. The longest assembled contig for SPCSV RNA1 (accession number KX932096) was generated from the de novo assembly. From our RCA data we show that more reads corresponded to SPLCSPV (114211) than SPMaV (69950) ( Table 5). Reference-guided assembly was able to generate full-length genome sequences for both begomoviruses. The new sequence for SPLCSPV was 2769 bp long (accession number KX859238) and SPMaV was 2781 bp long (accession number KX859239). Sequence depth for SPMaV ranged from 75 to 826-fold while it was 140-1932-fold for SPLCSPV. Sweet potato caulimo-like virus (SPCV) was also detected in the RCA data set, however a short sequence with gaps was assembled (Table 5).

Co-infections and mixed virus infections in sweet potato
The sequence data revealed co-infections of potyviruses (SPVC and SPFMV) in samples P2 and Z24 collected from the WC and EC provinces, respectively. Sequence reads matching SPFMV, SPVC and SPVG were detected from F11, which was collected from the WC. Three plants samples from the WC showed evidence of a virus complex including SPLCSPV, SPMaV, SPFMV and SPCSV (KF1, KT6 and L18) (Tables 4a &  4b; 5). A mixed infection of 6 viruses was detected in sample KT10 - from the WC (Table 4a). The sequence data shows that sweet potato viruses were found mostly as co-infections and mixed infections in plant samples from both provinces.

Phylogenetic analysis of RNA viruses
Near complete sequences of SPVC, SPVG, SPFMV and SPCSV RNA2 generated from the reference-guided assembly were used for phylogenetic analysis. Phylogenetic trees assigned RNA2 to the East African (EA) strain group (Fig. 3a). The SPFMV sequence grouped with the ordinary strains (Fig. 3b). The SPVC from the EC clustered with the SPVC group, which shares 95% nucleotide identity with the isolate from Argentina. The SPVG sequence from the WC clustered with isolates from Taiwan, USA and South Korea (Fig. 3b). A phylogenetic tree of begomoviruses (Fig. 4) placed the two DNA viruses from the coastal WC province of SA with previous SPLCSPV and SPMaV isolates detected from Waterpoort, Limpopo province, South Africa in 2012 (Esterhuizen et al., 2012).

Discussion
In this study we detected 6 different sweet potato viruses in various combinations in the EC and WC provinces of SA. A variety of known symptoms were observed in the infected field samples, which depended on the sweet potato cultivar and virus combination. This has been shown in several other studies , where for example a combination of SPCSV and SPFMV caused severe symptom development on susceptible cultivars (Gibson et al., 1998). This study reports for the first time, the detection of two begomoviruses and four RNA viruses in a single plant in SA. Multiple infections of SPFMV, SPCSV, SPLCSPV and SPMaV (found in samples KT10, KF1, KT6 and L18 from the WC province), resulted in severe symptoms including upward curling of leaves, chlorotic spots, mottling and necrosis. The occurrence of multiple viruses in single plants, and correlation between mixed infections and symptom severity, has been reported in sweet potato (Mukasa et al., 2006;Tugume et al., 2016). However the combination of six viruses has not been reported. It is therefore necessary to further investigate how sweet potato cultivars will respond to infection by six viruses because other viruses (such as SPMaV, SPLCSPV, SPVC and SPVG) could be playing a role in causing severe disease symptoms.
Consistent with previous reports, it is evident from the NGS data that SPFMV remains the most common sweet potato virus in SA, occurring wherever sweet potato is grown. Other studies have also reported SPFMV to be the most widely occurring virus in sweet potato to date (Valverde et al., 2007;Rännäli et al., 2009;Clark et al., 2012). Infection with SPFMV often causes no symptoms (Kreuze and Fuentes, 2008), and SPCSV causes mild symptoms in single infection but when the two viruses are in co-infection, they often cause severe symptoms (Kreuze et al., 2009). The co-infection results in a synergistic interaction ultimately causing SPVD, the most devastating disease affecting sweet potato (Ateka et al., 2004;Kreuze et al., 2009;Cuellar et al., 2011). There are distinct strain groups within SPFMV; these are the O, EA, RC and C strains (Untiveros et al., 2010). In this study, phylogenetic analysis (Fig. 3) clustered SPFMV isolates into two distinct groups; the O strain and the C strain, now classified as a new Potyvirus species (SPVC) (Untiveros et al., 2010). As reported previously (Rännäli et al., 2009;Sivparsad and Gubba, 2012), mixed infections of O and C strains have been detected in sweet potato from other regions of South Africa. The SPFMV-O strain detected in the WC province shared 94% nt identity with the isolate from Japan and SPVC shared 95% nt sequence identity with the isolate from Argentina. The full genome sequence of the South African SPVG isolate shared 98% nt identity with the isolate from Argentina. Limited genetic variability observed between SPVG isolates worldwide suggests geographic distribution by infected vegetative material, and that previous diagnostic tests were not sensitive or available at time of material importation/exportation to detect this virus. Co-infection of potyviruses SPVG and SPFMV (both common (C) and ordinary (O) strains) has been previously reported in French Polynesia, New Zealand, Zimbabwe and South Africa (Rännäli et al., 2009). RNA viruses are prone to variation therefore studies investigating the evolution and adaptability of these viruses are necessary in order to develop effective diagnostic assays and disease control strategies (Rubio et al., 2013). Since this is the first study to generate near  complete reference sequences of South African isolates, we were also able to successfully design oligonucleotide primers to develop diagnostic assays for all the viruses detected.
The SPCSV RNA2 segment assembled in this study is highly conserved and shares 97% nt identity with the Peruvian isolate. The RNA1 segment shares 76% nt identity with the Ugandan isolate. This result suggested a possible reassortment (Hou and Gilbertson, 1996;Savory et al., 2014) between RNA1 from an "unknown" variant of an East African isolate or RNA2 from a West African isolate. It is also possible that reassortment could have occurred between SPCSV RNA1 and another sweet potato virus species. Since fewer full-length sequences of RNA1 and RNA2 segments of the SPCSV are not available in the GenBank, screening for reassortment becomes a challenge. Reassortment events between RNA segments of two different or closely related viruses have been documented in viruses infecting other crops including tomato (Chen et al., 2009) and banana (Hu et al., 2007). Reassortment between virus species, especially of closteroviruses in the family Closteroviridae, increases genetic variability and accelerates evolution (Rubio et al., 2013). Genetic diversity observed in SPCSV which belongs to the genus Crinivirus within the family Closteroviridae, may have arisen from the interaction of mixed viral infections and migration (exchange of sweet potato cuttings) along distant geographical areas (Rubio et al., 2013). Alternatively, reassortment may have occurred in RNA1 with viruses from natural wild hosts. SPFMV, which co-exists often with SPCSV, was detected in 22 Ipomoea spp., Hewittia sublobata, and Lepistemon owariensis in Uganda (Tugume et al., 2008). SPCSV has been found in complexes with viruses such as SPVG, SPV2, SPLCV, SPMMV and cucumber mosaic virus (CMV), where it enhances replication and ultimately increases virus titers by approximately 1000-fold (Valverde et al., 2007;Kreuze and Fuentes, 2008). The interaction of SPCSV with other viruses also exacerbates viral symptoms (Mukasa et al., 2006;Kreuze and Fuentes, 2008), and it has been documented in many cases that SPCSV plays a major role in the enhancement of disease severity (Mukasa et al., 2006;Valverde et al., 2007). The Hsp70 gene sequence on RNA2 of SPCSV from KwaZulu-Natal (KZN) province in SA is from the West African (WA) strain group (Sivparsad and Gubba, 2012), while the phylogenetic analysis of the full sequenced segments from this study assigned the SPCSV RNA2 from the WC province to the East African (EA) group. This finding suggests that high genetic diversity of SA SPCSV isolates may exist in different sweet potato growing regions. Our study also demonstrates the need to sequence full-length segments of SPCSV in southern and northern Africa in order to further examine the genetic diversity of SPCSV and to identify potential geographic regions where reassortment could occur, as this could lead to the emergence of new strains and increased disease severity.
SPLCSPV and SPMaV have been reported to co-infect sweet potato plants in SA (Esterhuizen et al., 2012). Based on our phylogenetic analysis, begomoviruses may be more widespread in the country, necessitating screening for these viruses in all 9 provinces of SA. Plants infected with Begomovirus are often symptomless, however symptoms such as yellow vein, leaf distortion, chlorosis (Kreuze and Fuentes, 2008) and upward curling of young leaves can be associated with infection (Valverde et al., 2007). The recorded field symptoms in our study showed that sweet potato plants infected with begomoviruses displayed upward curling of old and young leaves. The sequence data generated in this study confirmed that plants displaying leaf curl symptoms were indeed infected with begomoviruses. Furthermore, sequence data revealed begomoviruses co-existing with a Crinivirus (SPCSV) and a Potyvirus (SPFMV). Synergistic interactions of begomoviruses and SPCSV were reported recently (Cuellar et al., 2015). Studies have also shown that single infection with SPCSV or SPFMV causes mild symptoms or no symptoms at all, while con-infection of both viruses causes severe symptoms resulting in sweet potato virus disease (SPVD) (Cuellar et al., 2008). Results from our study infer that SPLCSPV and SPMaV isolates could be contributing significantly to sweet potato disease severity; further experimentation is required to confirm our results. Sequence reads identified as sweet potato caulimo-like virus (SPCV) were detected in the RNA and DNA datasets. Future studies must explore screening more material from the Western Cape and other provinces in SA for the presence of this virus. Disease control strategies, including the use of virus-free cuttings and vector control, should be implemented especially in the WC province commercial farms, to prevent further spread of the viruses and crop decline. It is necessary to conduct a nationwide survey employing NGS in order to a) screen for viruses, b) assemble full-length genomes and c) gain better understanding on virus diversity and virus complexes.
In conclusion, this study describes a metagenomic approach employing the use of high throughput deep sequencing for the detection of RNA and DNA viruses in sweet potato without a priori knowledge. This approach clearly reveals the comprehensive profile of the entire viral community in a sample. We established that a survey of two provinces detected 6 viruses in South Africa, including a distinct SPCSV RNA1 sequence. We can also infer that SPCSV, together with SPFMV and begomoviruses, is still a major role player in SPVD.

Conflict of interest
The authors declare that they have no competing interests.