Phylogenomic relationship and evolutionary insights of sweet potato viruses from the western highlands of Kenya

Sweet potato is a major food security crop within sub-Saharan Africa where 90% of Africa production occurs. One of the major limitations of sweet potato production are viral infections. In this study, we used a combination of whole genome sequences from a field isolate obtained from Kenya and those available in GenBank. Sequences of four sweet potato viruses: Sweet potato feathery mottle virus (SPFMV), Sweet potato virus C (SPVC), Sweet potato chlorotic stunt virus (SPCSV), Sweet potato chlorotic fleck virus (SPCFV) were obtained from the Kenyan sample. SPFMV sequences both from this study and from GenBank were found to be recombinant. Recombination breakpoints were found within the Nla-Pro, coat protein and P1 genes. The SPCSV, SPVC, and SPCFV viruses from this study were non-recombinant. Bayesian phylogenomic relationships across whole genome trees showed variation in the number of well-supported clades; within SPCSV (RNA1 and RNA2) and SPFMV two well-supported clades (I and II) were resolved. The SPCFV tree resolved three well-supported clades (I–III) while four well-supported clades were resolved in SPVC (I–IV). Similar clades were resolved within the coalescent species trees. However, there were disagreements between the clades resolved in the gene trees compared to those from the whole genome tree and coalescent species trees. However the coat protein gene tree of SPCSV and SPCFV resolved similar clades to the genome and coalescent species tree while this was not the case in SPFMV and SPVC. In addition, we report variation in selective pressure within sites of individual genes across all four viruses; overall all viruses were under purifying selection. We report the first complete genomes of SPFMV, SPVC, SPCFV, and a partial SPCSV from Kenya as a mixed infection in one sample. Our findings provide a snap shot on the evolutionary relationship of sweet potato viruses (SPFMV, SPVC, SPCFV, and SPCSV) from Kenya as well as assessing whether selection pressure has an effect on their evolution.


INTRODUCTION
Sweet potato is grown in over nine million hectares (Food and Agriculture Organization of the United Nations (FAO), 2016) with 97% of global production confined to China and Africa (FAOSTAT, 2006). In Africa, 90% of the production occurs around the Lake Victoria region and in the western highlands of Kenya (Ewell, 1960;Loebenstein, 2010). Sweet potato is considered to be a food security crop and is grown within smallholder agro-ecosystems. It is intercropped with legumes such as beans (Phaseolus vulgaris), cowpea (Vigna unguiculata) and groundnut (Arachis hypogaea L.) particularly within smallholder farms in Africa. However, there is a two-fold difference in production levels between smallholder farms in Africa compared to Asia, and America (Loebenstein, 2010). One major reason for these differences is the spread of viral diseases within the cropping system. There are two primary modes of viral transmission within sweet potato. Sweet potato is vegetatively propagated, and through this there is the possibility of spreading viruses from the parent stock. The second mode of transmission is through viruliferous aphids in particular: Aphis gossypii, Myzus persicae, Aphis craccivora and Lipaphis erysimi and some whiteflies (Bemisia tabaci, Trialeurodes vaporariorum) (Tugume, Mukasa & Valkonen, 2008;Navas-Castillo, Lopez-Moya & Aranda, 2014).
The agro-ecosystems in the western highlands of Kenya are characterised by a heterogeneous cropping system (Tittonell et al., 2007;Wainaina et al., 2018), which allow for virus movement between crops during the growing season. To date, there have been limited efforts to identify the diversity and phylogenomic relationships of plant viruses in this system. In addition, it is not known what the roles of recombination and selective pressure are in the evolution of these viruses. In this study, we used a high throughput sequencing approach to identify plant viruses within sweet potato, and sought to answer the question 'What is the phylogenomic relationship of sweet potato viruses present in the western highlands of Kenya, and what evolutionary states are they under?' Here, we report the first complete genomes of SPFMV, SPVC and SPCSV, and a partial SPCSV, from the western highlands of Kenya. In addition, we investigate the role of recombination and selective pressure across the complete genome in driving the evolution of these viruses.
These four viruses have previously been reported within east Africa, including Kenya (Ateka et al., 2004). However, detection was dependent on either immunoassay enzyme linked immunosorbent assays (ELISA) or polymerase chain reaction (PCR) amplification of the partial coat protein (CP) gene (Ateka et al., 2004;Miano, LaBonte & Clark, 2008;Opiyo et al., 2010). So far, there have been no complete genomes of these viruses reported from Kenya. Findings from this study will provide the basis for improving molecular diagnosis through better informed primer design and testing for a broader range of various virus strains within eastern Africa. In addition, the new genomes from this region will further contribute to the evolutionary analysis of this and other related sweet potato viruses.

Field collection
Ethical approval to conduct this study was obtained from the University of Western Australia (RA/4/1/7475). In addition, permission to access all privately owned farms was obtained through signed consent forms by the head of each household. Sampling was carried out in the western highlands of Kenya over two cropping seasons (2015 and 2016) during the long season from April-August. Fieldwork activities were coordinated through the Cassava Diagnostics Project Kenyan node. We sampled 120 farms within this period as part of a larger field survey . A total of six viral symptomatic sweet potato samples were collected. The main viral symptoms observed on the leaves sampled in the fields were purple ringspots with leaf crinkling. For each symptomatic sample, two leaves were collected. One leaf of each sample was stored in silica gel, while the second leaf sample was stored using the paper press method (Almakarem et al., 2012). All samples were then transported to the BecA-ILRI hub laboratories in Nairobi, Kenya for virus testing.

Nucleic acid extraction and PCR screening of viruses
From each individual leaf, RNA was extracted using the Zymo RNA miniprep kit (Zymo, Irvine, CA, USA) according to the manufacturers' specifications. Extractions were then lyophilised and shipped to the University of Western Australia for further processing.
Lyophilised RNA was subsequently reconstituted with nuclease free water. From an aliquot of the RNA, cDNA was prepared using Promega master mix (Promega, Madison, WI, USA) as described by the manufacturer. Subsequently, PCR was carried out using the Bioneer master mix (Bioneer, Daedeok, Republic of Korea) using two sets of primers; universal Potyvirus primers LegPotyF 5-GCWKCHATGATYGARGCHTGGG-3 and LegPotyR 5-AYYTGYTYMTCHCCATCCATC-3 (Webster, 2008) and for Carlavirus primers 5-GTTTTCCCAGTCACGAC-3 and 5-ATGCCXCTXAXXCCXCC-3 (Chen, Chen & Adams, 2002). Bean common mosaic virus was used as the positive control Potyvirus and a non-template control nuclease free water was used as the negative control.
cDNA library preparation and RNA-Seq sequencing A cDNA library was prepared from the a sweet potato sample that was positive after the initial PCR screening using Illumina Truseq stranded total RNA sample preparation kit with plant ribozero as described by the manufacturer (Illumina, San Diego, CA, USA). All libraries containing the correct insert size fragments and quantity were sent to Macrogen Korea for sequencing. Libraries were normalised based on concentration and then pooled before sequencing. Pair-end sequencing 2 Â 150 bp was done on the rapid run mode using a single flow cell on the Illumina Hiseq 2500 Macrogen, Korea. However, four of the samples that were sent for sequencing failed at the quality control step of preparation and therefore did not proceed to sequencing. One of the remaining samples produced very low coverage, so we were unable to confidently undertake any analysis on that data. This left one single sample with good quality sequence for analysis.

Assembly and mapping of RNA-Seq reads
Raw reads were trimmed and assembled using CLC Genomics Workbench (CLCGW ver 7.0.5) (Qiagen, Hilden, Germany). Trimmed reads were assembled using the following parameters: quality scores limit set to 0.01, the maximum number of ambiguities was set to two and read lengths less than 100 nt were discarded. Contigs were assembled using the de novo assembly function on CLCGW essentially as described in (Kehoe et al., 2014a;Wainaina et al., 2018). Reference-based mapping was then carried out using complete reference genomes retrieved from GenBank. Mapping parameters were set as follows: minimum overlap 10%, minimum overlap identity 80%, allow gaps 10% and fine-tuning iteration up to 10 times. The consensus contig from the mapping was aligned using MAFFT (Katoh & Standley, 2016) to the de novo contig of interest. The resulting alignments were manually inspected for ambiguities, which were corrected with reference to the original assembly or mapping. The ORF and annotation of the final sequences were done in Geneious 8.1.8 (Biomatters, Auckland, New Zealand). Sequences were referred to as nearly complete if the entire coding region was present, and complete if the entire genome including untranslated regions were present.
Bayesian phylogenetic analysis, coalescent species tree estimation using a coalescent framework and pairwise identity analyses Bayesian inference was used to estimate the phylogenetic relationships for SPVC, SPFMV, SPCSV, and SPCFV. These analyses were carried out on the complete genomes and separately on individual genes. The most suitable evolutionary models were determined by jModelTest (Darriba et al., 2012). Bayesian analysis of the nearly complete genomes was carried out using Exabayes 1.4.1 (Aberer, Kobert & Stamatakis, 2014) while individual genes were analysed using MrBayes 3.2.2 (Ronquist et al., 2012). MrBayes was run for 50 million generations on four chains, with trees sampled every 1,000 generations using GTR +I+G as the evolutionary model. In each of the runs, the first 25% (2,500) of the sampled trees were discarded as burn-in. In the ExaBayes run, each gene segment was assigned an independent evolutionary model. ExaBayes was run was for 50 million generations on four chains. In each run, the first 25% of the sampled trees were discarded as burn-in. Convergence and mixing of the chains were evaluated using Tracer v1.6 (Rambaut et al., 2014) and trees visualised using Figtree (http://tree.bio.ed.ac.uk/software/figtree/). Species tree estimation using the complete genome was carried out using Singular Value Decomposition (SVD) Quartets (Chifman & Kubatko, 2014) with a coalescent framework to estimate the species tree for SPFMV, SPCSV, SPVC, and SPCFV. The SVDQ analysis used all quartets with support of the species tree branches based on a bootstrap support of >50%. The species tree was visually compared to the gene trees from MrBayes and the complete genome tree from ExaBayes. Pairwise identities on the complete and partial sequences from Kenya, and from GenBank sequence were determined using Geneious 8.1.9 (Biomatters, Auckland, New Zealand).

RESULTS
RNA-Seq on total plant RNA resulted in 12,667,976 reads which after trimming for quality came to 10,995,262 reads. De novo assembly produced 9,269 contigs from one sample (Table 1). Plant virus contigs were identified after BLASTn searches with lengths of between 8,427 and 16,157 nt, and had an average coverage of 1,339-11,890 times. Genome sequences with complete ORFs and complete untranslated regions (UTRs) were considered to be full genomes. However, genome sequences that lacked parts of the 5 and 3 UTR regions were considered to be near complete genomes. The final sequence was obtained from the consensus of de novo assembly and the mapped consensuses reads of 9,414-16,157 nt in length. The four sweet potato viruses obtained from this study are summarised in Table 1, and whole genome sequences retrieved from GenBank for analysis are summarised in Table S1. All viral sequences generated from this study were deposited in GenBank with the following accession numbers: SPVC (MH264531), SPCSV (RNA1 MH264532), SPCSV (RNA2, MH264533), SPCFV (MH264534), and SPFMV (MH264535).

Analysis of recombination
Among the viral sequences from this study and those from GenBank, SPFMV was found to be recombinant at position 9, 9,964-10,482 nt within the CP region (Table 2). Moreover, the SPFMV sequences retrieved from GenBank were also found to be recombinant within the P1, Nla-Pro and CP gene regions ( Table 2). The P1, Nla-Pro and CP genes were the hot spots of recombination.
Bayesian Phylogenetic relationship, coalescent species tree estimation and percentage pairwise identity Bayesian phylogenomic analysis among the sweet potato viruses was carried out across the whole genome in the case of SPVC, SPFMV, and SPCFV and within RNA1 and partial RNA2 in the case of SPCSV. Within SPCSV (RNA1 and RNA2) two well-supported clades were resolved, identified as clade I-II (Figs. 1 and 2). The Kenyan sequences clustered within clade II and were closely associated with two Uganda sequences and one sequence from China in both trees. Four well-supported clades identified as clades I-IV were resolved within the SPVC phylogenomic trees (Fig. 3). The Kenyan sequences clustered within clade II with sequences from Peru, Spain, and East Timor (Fig. 3). Three wellsupported clades were resolved within the SPCFV phylogenomic tree, identified as clades I-III (Fig. 4). The Kenyan sequence clustered within clade III with two Ugandan sequences. Within the SPFMV phylogenomic tree comprising of both recombinant and non-recombinant sequences, two clades were resolved and identified as clades I-II (Fig. 5A). The Kenyan sequences were clustered in clade I. While phylogenomic analysis using SPFMV non-recombinant sequences resolved two well-supported clades that were associated with the main SPFMV strains, the russet crack (RC) clade I and the ordinary (O) clade II (Fig. 5B). The Kenyan sequence was excluded from this phylogenomic tree since it was recombinant. Moreover, phylogenetic analysis on the two genes where the recombination breakpoint was identified resolved two clades for the CP gene tree (Fig. 5C) and three clades for Nla-Pro gene tree (Fig. 5D). Within the CP gene tree,  recombinant sequence formed a distinct sub-clade identified as 1a within the larger clade I. While in Nla-Pro the recombinant sequence clustered in clade II (Fig. 5D). The CP gene is used as the primary target region for many virus diagnostic molecular markers, and this region tree resolved similar clades to both the concatenated genome tree and the coalescent species tree (Figs. S1-S4; Tables S2A-S2B) in SPCSV and SPCFV but not in SPVC and SPFMV (Tables S2A-S2B). Percentage pairwise identities between the Kenya sequences and the GenBank sequences varied across the viruses within SPCSV RNA1 (83-99%), RNA2 (70-98%). The closest match to the Kenyan sequence was two Uganda sequences (AJ428554.1 and NC_004123.1) and a sequence from China (KC146843.1) with nucleotide identities of between 98.7% and 98.8%. Within SPVC nucleotide, identity match ranged between (91% and 98%). The closest match to the Kenyan sequence was a sequence from Spain (KU511269) with 93.3% percentage identity. Percentage nucleotide identity within the SPCFV ranged between 72% and 96%. The closest nucleotide identity matches to the Kenyan sequence were sequences from Uganda (NC_006550 and AY461421) with percentage identity of 96.5%. Percentage nucleotide identity within the SPFMV ranged between 87% and 98%. The closest nucleotide identity match to the Kenyan sequence was a sequence from China (KY296450). However the rates of purifying selection (d N /d S < 1) were not homogeneous across genes. Genes that were under relative lower purifying selection were the P1 gene in both SPVC and SPFMV (Figs. 6A and 6D). On the other hand, triple block 3 and Nucleic acid binding virus genes in SPCFV (Fig. 6B) and the CP genes in all four viruses were under strong purifying selection with d N /d S ratios of ∼0.1 (Figs. 6A-6D). Purifying selection results in minimal changes to amino acids within the respective genes, which results in slow rates of evolution within these genes.

DISCUSSION
One of the major limitations for sweet potato production, especially within smallholder agro-ecosystems in Kenya, is viral disease. Among these viral diseases is the SPVD attributed to the co-infection of SPFMV and SPCSV that act in synergy to exacerbate symptoms. In this study, we identified a mixed infection involving four viruses; SPFMV, SPCSV, SPVC, and SPCFV. We report the first complete genome of SPFMV, SPVC, SPCFV, and partial SPCSV from Kenya. The SPFMV and SPVC genomes are the first from sub-Saharan Africa. Moreover, we conducted phylogenomic relationship analysis of these genomes. In addition we identified recombination events and selective pressure as acting on the virus genomes and potential drivers for their evolution in Kenya and globally.
High throughput RNA sequencing RNA-Seq on sweet potato High throughput RNA sequencing (RNA-Seq) was used to identify the complete genome and partial genome of sweet potato viruses from a viral symptomatic sweet potato. We report the first complete genomes of SPVC (10,392 nt), SPFMV (10,482 nt), SPCFV (9,414 nt), and partial SPCSV (16,157 nt) (  Tugume, Mukasa & Valkonen, 2016). Prevalence of SPFMV were reported to be at 89% while SPCSV was 55% in Kenya using ELISA. In Uganda, the prevalence levels were between 1.3% for SPFMV and 5.4% in SPCSV based on next-generation sequencing.
In this study, we build on these findings using a whole genome sequencing approach rather than single gene loci. Sweet potato feathery mottle virus and SPVC belong to the family Potyviridae, and are spread by viruliferous aphids and through infected cuttings within sweet potato (Ateka et al., 2004). In addition, a Carlavirus SPCFV and partial Crinivirus, SPCSV were also identified (Table 1)  vectors. Previous studies have reported aphid and whitefly-transmitted viruses in crops within the western region (Legg et al., 2006(Legg et al., , 2014Mangeni et al., 2014;Wainaina et al., 2018) and the Lake Victoria region (Tugume et al., 2010a;Adikini et al., 2015;Adikini et al., 2016). Moreover, farming practices within smallholder farms, which include partial harvesting of mature sweet potato, are thought to help maintain the virus within the agroecosystem. The advantage of this practice is it allows for the crop to remain underground, where it stores well (Loebenstein, 2010), providing a sustainable food source for the  farmers. However, a major drawback of these practices is that these sweet potato crops may act as potential viral reservoirs that then become a viral source that aids dissemination to non-infected host plants by insect vectors during the cropping season. This phenomenon results in the continuous circulation of viruses within the agroecosystems.

Recombination in sweet potato viruses
Survival of plant viruses is dependent on their ability to be successfully transmitted to suitable host plants. Survival within the host plant is dependent on the ability of the virus to evade the host plant resistance system, while at the same time maintaining their genetic vigour to allow for replication. One approach that they utilise for their survival is recombination, which is a key driver of virus evolution and in addition to this, beneficial traits are acquired while deleterious ones are removed. Within the Potyviridae, recombination is highly prevalent (Varsani et al., 2008;Elena, Fraile & García-Arenal, 2014;Ndunguru et al., 2015;Tugume, Mukasa & Valkonen, 2016;Wainaina et al., 2018). Moreover, co-infection of multiple viruses, in particular within sweet potato, can result in well-adapted viruses and has been adversely reported in other countries (Tugume et al., 2010a;Maina et al., 2018aMaina et al., , 2018b. Analysis of recombination on both the new sequence and those retrieved from GenBank, identified 11 recombinant sequences in SPFMV (Table 2), which included the Kenyan sequence. The three other viruses identified (SPCV, SPVC, and SPCFV) from Kenya were not recombinant. The SPVC sequences from GenBank sequences were recombinant but are well described and discussed elsewhere (Maina et al., 2018b). Within SPFMV, recombination was mainly found within P1, Nla-Pro and the CP region of the genome. These findings are consistent with previous SPFMV reports (Maina et al., 2018a(Maina et al., , 2018b. The CP region is a hot spot of recombination mainly due to the selective pressure from the host immune system. As a strategy to evade the host immune system, the viral CP is constantly changing. On the other hand, the P1 gene is postulated to be the driver for diversity of the Potyviruses. This resulted in evolutionary branching of other members of the Potyviruses such as the ipomovirus and tritimoviruses (Valli, López-Moya & García, 2007). The main driver of recombination within the P1 region is postulated to be the interaction between the N-terminal region of P1 gene and the host plant (Valli, López-Moya & García, 2007). It is therefore common to have both intragenus and intergenus recombination within P1 thus facilitating better host adaption. Similarily, we postulate this could also be the primary reason for the recombination events within Nla-Pro. Nla-Pro is associated with the proteolytic activities within members of the family Potyviridae. In addition, it regulates the potyviral proteins at different stages of infection thus ensuring successful viral colonisation (Ivanov et al., 2014).

Phylogenomic relationship between sweet potato viruses
Phylogenetic analyses were carried out between the complete genomes from Kenya and reference GenBank sequences . In both, SPCSV RNA1 and RNA2 (Figs. 1 and 2) and SPCFV (Fig. 4) Kenya and Uganda sequences clustered together in well-supported clades. The percentage nucleotide similarity was over 96% compared to Uganda sequences. We suggest the clustering of Uganda and Kenya sequences could be due to movement of infected plant cuttings across the borders of Kenya and Uganda. Communities living in this region have a shared kinship that transcends the geopolitical borders and often there is exchange of vegetative planting material. Moreover, there is inadequate phytosanitary screening across the borders for plant cuttings. Previous studies have reported both virus and vector movement through plant cuttings along these border regions (Legg et al., 2011). In addition, this mode of virus spread has also been reported in other vegetatively propagated crops such as cassava (Legg et al., 2014;Alicai et al., 2016). Sweet potato virus C sequences from this study clustered with the South-American Peru, Spanish, and one East Timor sequence in a single well-supported clade (Clade II) (Fig. 3) with the closest similarity a sequence from Spain (KU511269) with 93% nucleotide identity. SPVC is likely to have been introduced into the eastern Africa regions through trade, and the British colonialists and missionaries, with the introduction of sweet potato into eastern Africa. The Portuguese traders transported sweet potato from South America to Africa through Mozambique and Angola around the 15th century (Loebenstein, 2010). The British colonialists subsequently followed them in 1662. We hypothesize SPVC may then may have jumped into the native vegetation, and has been maintained within the agro-ecosystem since that time. More recently, international trade between Kenya, Europe and parts of South America, is a possible route for the continued introduction of SPVC into the western highlands of Kenya. More SPVC genomes sequenced across more geographical regions will in future provide an opportunity to better understand the evolutionary dynamics of this virus.
The phylogenomic relationship of SPFMV sequences is possibly distorted due to the presence of recombinant SPFMV sequences (Table 2). Recombination has been implicated in misrepresenting the true phylogenetic relationship of viruses (Schierup & Hein, 2000;Posada & Crandall, 2002;Varsani et al., 2008). In this study, SPFMV sequences both from this study and GenBank were found to be recombinant (Table 2). Recombinant sequences formed a distinct clade on both the CP and Nla-Pro gene trees (Figs. 5C-5D) and whole genome tree (Figs. 5A-5B). A significant feature of recombination on the phylogenetic tree is the splitting of sequences into recombinant versus non-recombinant clades, which was observed (Figs. 5A, 5C, and 5D). Thus any inference in the clustering of SPFMV sequences, in particular, with recombinant sequences present is likely to be inaccurate. The SPFMV phylogenomic tree with non-recombinant sequences resolved two clades associated with two of the three main phylogroups present in SPFMV associated with the SPFMV strains RC and O (Kreuze et al., 2000;Maina et al., 2018a) (Fig. 5B).
Single gene loci are used in routine molecular diagnostics and subsequent analysis of the phylogenetic relationship of viruses. A majority of the gene trees across all four viruses were discordant to the concatenated genome tree except within the CP gene which is the primary diagnostic marker (Colinet et al., 1995). However there was concordance between the number of clades resolved from the concatenated whole genome tree, the coalescent species tree, and the CP gene trees in SPCSV (RNA1 and RNA2) and SPCFV (Table S2B) however, this was not the case in SPFMV and SPVC (Table S2B). The discordance between the gene trees and the species trees could be attributed to; incomplete lineage sorting (ILS), gene gain and loss, horizontal gene transfer (HGT) and gene duplication (Maddison, 1997). It is probable that some of these factors could be the difference between the gene and species trees. These findings support the use of the CP as an ideal diagnostic marker for molecular diagnostics within SPCSV and SPCFV. Our findings are comparable to previous virus whole-genome studies . However, they also differ with other viruses within the Potyviridae, for example within ipomoviruses such as the cassava brown streak virus and Uganda cassava brown streak virus (Alicai et al., 2016). A probable cause of these differences could be the divergence of the Ipomoviruses from other members of the family Potyviridae. Therefore, it is necessary to evaluate all gene trees against the coalescent species tree and concatenated genome tree of individual viruses. This will aid in determining which of the genes reflects the true phylogenetic relationship of the virus based on the sequences. This approach is more stringent, and provides a robust analysis to choose a suitable gene region from which to create new diagnostic tools. This is imperative for the control and management of plant viral infections.

Selection pressure analysis between genes of the sweet potato viruses
Selective pressure across genes of RNA viruses varies across viral families and genes (Duffy, Shackelton & Holmes, 2008). Though RNA viruses undergo rapid evolutionary rates, this is dictated by several factors such as viral populations, inter versus intra-host variation, and population sizes (Duffy, Shackelton & Holmes, 2008). Across all the viral sequences (Figs. 6A-6D) the CP genes were under strong purifying selection d N /d S ∼ 0.1. This strong purifying selection is evident in a majority of vector-transmitted viruses, due to the fitness trade-off phenomena (Chare & Holmes, 2004). The fitness trade-off states that due to the limited number of insect vectors and specificity between the insect vectors and viruses that transmit RNA viruses, the evolution of the RNA viruses is constrained by their insect vectors (Power, 2000;Chare & Holmes, 2004). While deleterious mutations occurring within the RNA viruses could potentially affect their transmission, they are removed through purifying selection (Chare & Holmes, 2004). Purifying selection is more pronounced within the CP as previously reported (Chare & Holmes, 2004;Alicai et al., 2016;Wainaina et al., 2018). This further supports the hypothesis of the fitness trade-off phenomena in particular within plant RNA viruses with insect vectors.
On the other hand, within SPFMV and SPVC from the family Potyviridae we identified the P1 gene region to be under the least selection pressure (Figs. 6A and 6B). This indicates that though purifying selection was evident within the P1 gene, it was to a lesser extent compared to the CP gene. P1 is associated with viral adaptation of the host plant (Shi et al., 2007, Salvador et al., 2008Tugume et al., 2010b), and it interferes with the host plant RNA induced silencing complex (Tugume et al., 2010b). This helps to ensure that viruses can evade the host immune response. This increases the chances for the virus to establish itself and survive within the host plant. Mutations that may facilitate survival of the virus are therefore tolerated within the P1 region. Overall, all genes within the SPCFV were under strong purifying selection.

CONCLUSION
We used high throughput sequencing on viral symptomatic sweet potato plants collected within the western highlands of Kenya. We identified co-infection of SPCSV, SFMV, SPVC, and SPCFV and obtained the first complete genome of these viruses from Kenya. Moreover, percentage nucleotide identity in SPCSV and SCFV sequences from Kenya were closely matched to sequences from Uganda with nucleotide similarity of above 96%. Inadequate phytosanitary measures and a porous border between Kenya and Uganda are likely factors that contribute to and further exacerbate the problem. The SPVC whole genome from this study clustered with sequences from South America. We postulate that SPVC may have been introduced into eastern Africa from the initial sweet potato cultivars from South America. SPVC was subsequently maintained within native vegetation and by vegetative propagation after the initial viral jump. Evolutionary insights based on recombination events and selective pressure analysis revealed the following; within all four viruses, only SPFMV sequences were found to be recombinant. This was especially within the P1, Nla-Pro and CP genes. Recombinant SPFMV sequences formed a distinct clade on both the whole genome tree and the gene trees, particularly within the Nla-Pro and CP genes. Conversely, selection pressure analysis across the genes varied across all four viruses. The CP gene was under strong purifying selection in all viruses, while the P1 gene in SPFMV and SPVC showed weak positive selection. Our findings provide a snap shot of viruses present within sweet potato and a more extensive study within the western highlands of Kenya would most likely reveal more extensive viral infections within this region.
Future studies should be conducted within the Lake Victoria region and the western highlands of Kenya, to identify all possible sweet potato viruses and potential viral reservoirs within this region. A combination of both sequencing using the Oxford nanopore sequencing technology , ELISA, and Loop mediated isothermal amplification, may provide faster and more cost effective approaches for the detection of multiple viruses within symptomatic sweet potato. This is especially important within east Africa where multiple viral infections are prevalent in most vegetatively propagated crops. Moreover, the availability of more viral sequences within this region will allow for further viral evolution studies to be conducted. This information will be crucial in determining when the viruses undergo changes and what the drivers of these changes are within the agro-ecosystems.

ADDITIONAL INFORMATION AND DECLARATIONS
Funding James M. Wainaina is supported by an Australian Award Scholarships from the Department of Foreign Affairs and Trade (DFAT), and this work is part of his PhD research. Pawsey Supercomputing Centre provided supercomputer resources for data analysis with funding from the Australian Government and the Government of Western Australia. Laboratory and sequencing cost were paid for through a Rising star grant from the Faculty of Science University of Western Australia to Laura M. Boykin. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: Australian Award Scholarships from the Department of Foreign Affairs and Trade (DFAT). Australian Government and the Government of Western Australia. Faculty of Science University of Western Australia.
Timothy Makori performed the experiments, contributed reagents/materials/analysis tools, authored or reviewed drafts of the paper, approved the final draft. Monica A. Kehoe conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft. Laura M. Boykin conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.

Field Study Permissions
The following information was supplied relating to field study approvals (i.e., approving body and any reference numbers): Ethical approval to conduct this study was obtained from the University of Western Australia (RA/4/1/7475). In addition, permission to access all privately owned farms was obtained through signed consent forms by the head of each household. Sweet potato samples were collected as part of a larger field survey in the western highlands of Kenya over two cropping seasons (2015 and 2016) during the long season .