The first transcriptomes from field-collected individual whiteflies ( , Hemiptera : Aleyrodidae )

species ( ), or whiteflies, are the world’s Background: Bemisia tabaci B. tabaci most devastating insect pests. They cause billions of dollars (US) of damage each year, and are leaving farmers in the developing world food insecure. Currently, all publically available transcriptome data for are generated B. tabaci from pooled samples, which can lead to high heterozygosity and skewed representation of the genetic diversity. The ability to extract enough RNA from a single whitefly has remained elusive due to their small size and technological limitations. In this study, we optimised the single whitefly RNA extraction Methods: procedure, and sequenced the transcriptome of four individual adult Sub-Saharan Africa (SSA1) Transcriptome sequencing resulted in B. tabaci. 39-42 million raw reads.   assembly of trimmed reads yielded between De novo 65,000-162,000 transcripts across   transcriptomes. B. tabaci Bayesian phylogenetic analysis of mitochondrion cytochrome I Results: oxidase (mtCOI) grouped the four whiteflies within the SSA1 clade. BLASTn searches on the four transcriptomes identified five endosymbionts; the primary endosymbiont and four secondary endosymbionts: Portiera aleyrodidarum and that were Arsenophonus, Wolbachia, Rickettsia, Cardinium spp. predominant across all four SSA1 B.  samples with prevalence levels tabaci between 54.1-75%. Amino acid alignments of the  G gene of Nus P. for the SSA1   transcriptomes of samples WF2 and aleyrodidarum B. tabaci WF2b revealed an eleven amino acid residue deletion that was absent in samples WF1 and WF2a. Comparison of the protein structure of the  G Nus 1* 2* 1 3


Introduction
Members of the whitefly Bemisia tabaci (Hemiptera: Aleyrodidae) species complex are classified as the world's most devastating insect pests. There are 34 species globally 1 and the various species in the complex are morphologically identical. They transmit over 100 plant viruses 2,3 , become insecticide resistant 4 , and ultimately cause billions of dollars in damage annually for farmers. The adult whiteflies are promiscuous feeders, and will move between viral infected crops and native weeds that act as viral inoculum 'sources', and deposit viruses to alternative crops that act as viral 'sinks' while feeding.
The crop of importance for this study was cassava (Manihot esculenta). Cassava supports approximately 800 million people in over 105 countries as a source of food and nutritional security, especially within rural smallholder farming communities 5 . Cassava production in Sub Saharan Africa (SSA), especially the East Africa region, is hampered by both DNA and RNA transmitted viruses.
Whitefly-transmitted viruses cause cassava mosaic disease (CMD) leading to 28-40% crop losses with estimated economic losses of up to $2.7 billion dollars per year in SSA 6 . The CMD pandemics in East Africa, and across other cassava producing areas in SSA, were correlated with B. tabaci outbreaks 7 . Relevant to this study are two RNA Potyviruses: Cassava Brown Streak Virus (CBSV) and the Uganda Cassava Brown Streak Virus (UCBSV), both devastating cassava in East Africa. Bemisia tabaci species have been hypothesized to transmit these RNA viruses with limited transmission efficiency [8][9][10] . Recent studies have shown that there are multiple species of these viruses 11 , which further strengthens the need to obtain data from individual whiteflies as pooled samples could contain different species with different virus composition and transmission efficiency. In addition, CBSV has been shown to have a higher rate of evolution than UCBSV 12 increasing the urgency of understanding the role played by the different whitefly species in the system.
Endosymbionts and their role in B. tabaci Viral-vector interactions within B. tabaci are further influenced by bacterial endosymbionts forming a tripartite interaction. B. tabaci has one of the highest numbers of endosymbiont bacterial infections with eight different vertically transmitted bacteria reported [13][14][15][16] . They are classified into two categories; primary (P) and secondary (S) endosymbionts, many of which are in specialised cells called bacteriocytes, while a few are also found scattered throughout the whitefly body. A single obligate P-symbiont P. aleyrodidarum is systematically found in all B. tabaci individuals. Portiera has a long co-evolutionary history with all members of the Aleyrodinae subfamily 15 . In this study, we further explore genes within the P. aleyrodidarum retrieved from individual whitefly transcriptomes, including the transcription termination/antitermination protein NusG. NusG is a highly conserved protein regulator that suppresses RNA polymerase, pausing and increasing the elongation rate 17,18 . However, its importance within gene regulation is species specific; in Staphylococcus aureus it is dispensable 19,20 .
The S-endosymbionts are not systematically associated with hosts, and their contribution is not essential to the survival and reproduction. Seven facultative S-endosymbionts, Wolbachia, Cardinium, Rickettsia, Arsenophonus, Hamiltonella defensa and Fritschea bemisae have been detected in various B. tabaci populations 13,[21][22][23][24] . The presence of S-endosymbionts can influence key biological parameters of the host. Hamiltonella and Rickettsia facilitate plant virus transmission with increased acquisition and retention by whiteflies 22 . This is done by protection and safe transit of virions in the haemolymph of insects through chaperonins (GroEL) and protein complexes that aid in protein folding and repair mechanisms 19 .
Application of next generation sequencing in pest management of B. tabaci The advent of next generation sequencing (NGS) and specifically transcriptome sequencing has allowed the unmasking of this tripartite relationship of vector-viral-microbiota within insects 24-28 . Furthermore, NGS provides an opportunity to better understand the co-evolution of B. tabaci and its bacterial endosymbionts 26 . The endosymbionts have been implicated in influencing species complex formation in B. tabaci through conducting sweeps on the mitochondrial genome 27 . Applying transcriptome sequencing is essential to reveal the endosymbionts and their effects on the mitogenome of B. tabaci, and predict potential hot spots for changes that are endosymbiont induced.
Several studies have explored the interaction between whitefly and endosymbionts 29,30 and have resulted in the identification of candidate genes that maintain the relationship 31,32 . This has been explored as a source of potential RNAi pesticide control targets 29,32,33 . RNAi-based pest control measures also provide opportunities to identify species-specific genes for target gene sequences for knock-down. However, to date all transcriptome sequencing has involved pooled samples, obtained through rearing several generations of isolines of a single species to ensure high quantities of RNA for subsequent sequencing. This remains a major bottle neck in particular within arthropoda, where collected samples are limited due to small morphological sizes 34,35 . In addition, the development of isolines is time consuming and often has colonies dying off mainly due to inbreeding depression 33 .
It is against this background that we sought to develop a method for single whitefly transcriptomes to understand the virus diversity within different whitefly species. We did not detect viral reads, probably an indication that the sampled whitefly was not carrying any viruses, but as proof of concept of the method, we validated the utility of the data generated by retrieving the microbiota P-endosymbionts and S-endosymbionts that have previously been characterised within B. tabaci. In this study we report the endosymbionts present within field-collected individual African whiteflies, as well as characterisation and evolution of the NusG genes present within the P-endosymbionts.

Whitefly sample collection and study design
In this study, we sampled whiteflies in Uganda and Tanzania from cassava (Manihot esculenta) fields. In Uganda, fresh adult whiteflies were collected from cassava fields at the National Crops Resources Research Institute (NaCRRI), Namulonge, Wakiso district, which is located in central Uganda at 32°36'E and 0°31'N, and 1134 meters above sea level. The whiteflies obtained from Tanzania were collected on cassava in a countrywide survey conducted in 2013. The samples: WF2 (Uganda) and WF1, WF2a, and WF2b (Tanzania) used in this study were collected on CBSD-symptomatic cassava plants. In all the cases, the whitefly samples were kept in 70% ethanol in Eppendorf tubes until laboratory analysis. The whitefly samples were used for a two-fold function; firstly, to optimise a single whitefly RNA extraction protocol and secondly, to unmask RNA viruses and endosymbionts within B. tabaci as a proof of concept. In addition, data obtained from Nextera -DNA library prep from a Brazilian sample (156_NW2) was also used in this study. The whitefly was collected from a New World 2 colony in Brazil on Euphorbia heterophylla and kept in 70% ethanol in Eppendorf tubes until laboratory analysis.
Extraction of total RNA from single whitefly RNA extraction was carried out using the ARCTURUS® PicoPure® kit (Arcturus, CA, USA), which is designed for fixed paraffin-embedded (FFPE) tissue samples. Briefly, 30 µl of extraction buffer was added to an RNase-free micro centrifuge tube containing a single whitefly and ground using a sterile plastic pestle. To the cell extract an equal volume of 70% ethanol was added. To bind the RNA onto the column, the RNA purification columns were spun for two minutes at 100 x g and immediately followed by centrifugation at 16,000 x g for 30 seconds. The purification columns were then subjected to two washing steps using wash buffer 1 and 2 (ethyl alcohol). The purification column was transferred to a fresh RNase-free 0.5 ml micro centrifuge tube, with 30 µl of RNAse-free water added to elute the RNA. The column was incubated at room temperature for five minutes, and subsequently spun for one minute at 1,000 x g, followed by 16,000 x g for one minute. The eluted RNA was returned into the column and re-extracted to increase the concentration. Extracted RNA was treated with DNase using the TURBO DNA free kit, as described by the manufacturer (Ambion, Life Technologies, CA, USA). Concentration of RNA was done in a vacuum centrifuge (Eppendorf, Germany) at room temperature for 1 hour, the pellet was suspended in 15 µl of RNase-free water and stored at -80°C awaiting analysis. RNA was quantified, and the quality and integrity assessed using the 2100 Bioanalyzer (Agilent Technologies, CA, USA). Dilutions of up to x10 were made for each sample prior to analysis in the bioanalyzer.
cDNA and Illumina library preparation Total RNA from each individual whitefly sample was used for cDNA library preparation using the Illumina TruSeq Stranded Total RNA Preparation kit as described by the manufacturer (Illumina, CA, USA). Subsequently, sequencing was carried out using the HiSeq2000 (Illumina) on the rapid run mode generating 2 x 50 bp paired-end reads. Base calling, quality assessment and image analysis were conducted using the HiSeq control software v1.4.8 and Real Time Analysis v1.18.61 at the Australian Genome Research Facility (Perth, Australia).
BLAST analysis of transcripts and annotation: BLAST searches of the transcripts under study were carried out on the NCBI non-redundant nucleotide database using the default cut-off on the Magnus Supercomputer at the Pawsey Supercomputer Centre Western Australia. Transcripts identical to known bacterial endosymbionts were identified and the number of genes from each identified endosymbiont bacteria determined.

Phylogenetic analysis of whitefly mitochondrial cytochrome oxidase I (COI):
The phylogenetic relationship of mitochondrial cytochrome oxidase I (mtCOI) of the whitefly samples in this study were inferred using a Bayesian phylogenetic method implemented in MrBayes \ (version 3.2.2) 35 . The optimal substitution model was selected using Akaike Information Criteria (AIC) implemented in the Jmodel test 2 36 .
Sequence alignment and phylogenetic analysis of NusG gene in P. aleyrodidarum across B. tabaci species: Sequence alignment of the NusG gene from the P-endosymbiont P. aleyrodidarum from the SSA1 B. tabaci in this study was compared with another B. tabaci species, Trialeurodes vaporariorum and Alerodicus dispersus using MAFFT (version 7.017) 37 . The Jmodel version 2 36 was used to search for phylogenetic models with the Akaike information criterion selecting the optimal that was to be implemented in MrBayes 3.2.2. MrBayes run was carried out using the command: "lset nst=6 rates=gamma" for 50 million generations, with trees sampled every 1000 generations. In each of the runs, the first 25% (2,500) trees were discarded as burn in.
Analysis and modelling the structure of the NusG gene The structures for Portiera aleyrodidarum BT and B. tabaci SSA1 whitefly were predicted using Phyre2 38 with 100% confidence and compared to known structures of NusG from other bacterial species. All models were prepared using Pymol (The PyMOL Molecular Graphics System, Version 1.5.0.4).

RNA extraction and NGS optimised for individual B. tabaci samples
In this study, we sampled four individual adult B. tabaci from cassava fields in Uganda (WF2) and Tanzania (WF1, WF2a, WF2b). Total RNA from single whitefly yielded high quality RNA with concentrations ranging from 69 ng to 244 ng that were used for library preparation and subsequent sequencing with Illumina Hiseq 2000 on a rapid run mode. The number of raw reads generated from each single whitefly ranged between 39,343,141 and 42,928,131 (Table 1). After trimming, the reads were assembled using Trinity resulting in 65,550 to 162,487 transcripts across the four SSA1 B. tabaci individuals ( Table 1).
Comparison of endosymbionts within the SSA1 B. tabaci samples Comparison of the diversity of bacterial endosymbionts across individual whitefly transcripts was conducted with BLASTn searches on the non-redundant nucleotide database and by identifying the number of genes from each bacterial endosymbiont (Supplementary Table 1). We identified five main endosymbionts including: P. aleyrodidarum the primary endosymbionts and four secondary endosymbionts: Arsenophonus, Wolbachia, Rickettsia sp, and Cardinium spp ( Table 2). P. aleyrodidarum predominated all four SSA1 B. tabaci study samples with incidences of 74.8%, 71.2%, 54.1% and 58.5% for WF1, WF2, WF2a and WF2b, respectively. This was followed by Arsenophonus, Wolbachia, Rickettsia sp, and Cardinium spp, which occurred at an average of 18.0%, 5.9%, 1.6% and <1%, respectively across all four study samples.
Phylogenetic analysis of single whitefly mitochondrial cytochrome oxidase I (COI) B. tabaci is recognized as a species complex of 34 species based on the mitochondrion cytochrome oxidase I 1,39,40 . We therefore used cytochrome oxidase I (COI) transcripts of the four individual whitefly to ascertain B. tabaci species status and their phylogenetic relation using reference B. tabaci COI GenBank sequences found at www.whiteflybase.org. All four COI sequences clustered within Sub Saharan Africa clade 1 (SSA1) species (data not shown).
Sequence alignment and Bayesian phylogenetic analysis of NusG gene Nucleotide and amino acid sequence alignments of the NusG in P. aleyrodidarum were conducted for several whitefly species including: B. tabaci (SSA1, Mediterranean (MED) and Middle East Asia Minor 1 (MEAM1) New World 2 (NW2), T. vaporariorum (Greenhouse whitefly) and Alerodicus dispersus. The alignment identified 11 missing amino acids in the NusG sequences for the SSA1 B. tabaci samples: WF2 and WF2b, T. vaporariorum (Greenhouse whitefly) and Alerodicus disperses. However, all 11 amino acids were present in samples WF1 and WF2a, MED, MEAM1 and NW2 ( Figure 1). Bayesian phylogenetic relationships of the NusG sequences of P. aleyrodidarum for the different whitefly species clustered all four SSA1 B. tabaci (WF1, WF2, WF2a and WF2b) within a single clade together with ancestral B. tabaci from GenBank ( Figure 2). The SSA1 clade was supported by posterior probabilities of 1 with T. vaporariorum and Alerodicus, which formed clades at the base of the phylogenetic tree ( Figure 2).

Structure analysis of Portiera NusG genes
Structures of the NusG protein sequence of the primary endosymbiont P. aleyrodidarum in the four SSA1 B. tabaci samples were predicated using Phyre2 with 100% confidence, and compared to known structures of NusG from other bacterial species  including (Shigella flexneri, Thermus thermophiles, and Aquifex aeolicus; (PDB entries 2KO6, 1NZ8 and 1M1H, respectively) and Spt4/5 from yeast (Saccharomyces cerevisiae; PDB entry 2EXU) 19,41,42 . The 11-residue deletion was found in a loop region that is variable in length and structure across bacterial species, but is absent from archaeal and eukaryotic species (Figure 3 and Figure 4A). The effect of the deletion appears to shorten the loop in NusG from the African whiteflies (WF2 and WF2b). A model of bacterial RNA polymerase (orange surface representation; PDB entry 2O5I) bound to the N-terminal domain of the T. thermophiles NusG shows that the loop region is not involved in the interaction between NusG and RNA polymerase ( Figure 4B).

Discussion
In this study, we optimised a single whitefly RNA extraction method for field-collected samples. We subsequently successfully conducted transcriptome sequencing on individual Sub-Saharan Africa 1 (SSA1) B. tabaci, revealing unique genetic diversity in the bacterial endosymbionts as proof of concept. This is the first time a single whitefly transcriptome has been produced.
NusG deletion and implications within P. aleyrodidarum in SSA B. tabaci We report the presence of the primary endosymbionts P. aleyrodidarum and several secondary endosymbionts within SSA1 transcriptome. Furthermore, P. aleyrodidarum in SSA1 B. tabaci was observed to have a deletion of 11 amino acids on the NusG gene that is associated with cellular transcriptional processes within another bacteria species. On the other hand, P. aleyrodidarum from NW2, MED and SSA1 (WF2a, WF1) B. tabaci species did not have this deletion ( Figure 1). The deleted 11 amino acids were identified in a loop region of the N-terminal domain of NusG protein, resulting in a shortened loop in the SSA1 WF2b sample. This loop region has high variability in both structure and length across bacterial species, and is absent from archaea and eukaryotic species.
NusG is highly conserved and a major regulator of transcription elongation. It has been shown to directly interact with RNA polymerase to regulate transcriptional pausing and rho-dependent termination 19,20,43,44 . Structural modelling of NusG bound to RNA polymerase indicated that the shortened loop region seen in the WF2b sample is unlikely to affect this interaction. Rhodependant termination has been attributed to the C-terminal (KOW) domain region of NusG, therefore a shortening of the loop region in the N-terminal domain is also unlikely to affect transcription termination. Yet, there has been no function attributed to this loop region of NusG, and thus the effect of variability in this region across species is unknown. However, the deletion could represent the results of evolutionary species divergence. Further sequencing of the gene is required across the B. tabaci species complex to gain further understanding of the diversity.
Why the single whitefly transcriptome approach?
The sequencing of the whitefly transcriptome is crucial in understanding whitefly-microbiota-viral dynamics and thus circumventing the bottlenecks posed in sequencing the whitefly genome. The genome of whitefly is highly heterozygous 43 . Assembling of heterozygous genomes is complex due to the de Bruijn graph structures predominantly used 44 . To deal with the heterozygosity, previous studies have employed inbred lines, obtained from rearing a high number of whitefly isolines 34,45 . However, rearing whitefly isolines is time consuming and often colonies may suffer contaminations, leading to collapse and failure to raise the high numbers required for transcriptome sequencing.
We optimised the ARCTURUS ® PicoPure ® kit (Arcturus, CA, USA) protocol for individual whitefly RNA extraction with the dual aim of determining if we could obtain sufficient quantities of RNA from a single whitefly for transcriptome analysis and secondly, determine whether the optimised method would reveal whitefly microbiota as proof of concept. Using our method, the quantities of RNA obtained from field-collected single whitefly samples were sufficient for library preparation and subsequent transcriptome sequencing. Across all transcriptomes over 30M reads were obtained. The amount of transcripts were comparable to those reported in other arthropoda studies from field collections 32 . However, we did not observe any difference in assembly qualities 32 ; probably due to the fact that our fieldcollected samples had degraded RNA based on RIN, and thus direct comparison with 32 was inappropriate.
Degraded insect specimen have been used successfully in previous studies 46 . This is significant, considering that the majority of insect specimens are usually collected under field conditions and stored in ethanol with different concentrations ranging from 70 to 100% 47-49 rendering the samples liable to degradation. However, to ensure good keeping of insect specimen to be used for mRNA and total RNA isolation in molecular studies, and other downstream applications such as histology and immunocytochemistry, it is advisable to collect the samples in an RNA stabilizing solution such as RNAlater. The solution stabilizes and protects cellular RNA in intact, unfrozen tissue, and cell samples without jeopardizing the quality, or quantity of RNA obtained after subsequent RNA isolation. The success of the method provided an opportunity to unmask vector-microbiota-viral dynamics in individual whiteflies in our study, and will be useful for similar studies on other small organisms.

Endosymbionts diversity across individual SSA1 B. tabaci transcriptomes
In this study, we identified bacterial endosymbionts (

Conclusions
Our study provides a proof of concept that single whitefly RNA extraction and transcriptome sequencing is possible and the method is optimised and applicable to a range of small insect transcriptome studies. It is particularly useful in studies that wish to explore vector-microbiota-viral dynamics at individual insect level rather than pooling of insects. It is useful where genetic material is both limited, as well as of low quality, which is applicable to most agriculture field collections. In addition, the single whitefly transcriptome technique described in this study offers new opportunities to understand the biology, and relative economic importance, of the several whitefly species occurring in ecosystems within which food is produced in Sub-Saharan Africa, and will enable the efficient development and deployment of sustainable pest and disease management strategies to ensure food security in the developing countries.

Data availability
The datasets used and/or analyzed during the current study are available from GenBank: SRR5110306, SRR5110307, SRR5109958, KY548924, MG680297.

Competing interests
No competing interests were disclosed.

Supplementary material
Supplementary Using this method, the authors have sequenced the transcriptome of four individuals (one from Uganda and three from Tanzania) collected on cassava leaves with symptoms said to be produced by Cassava (CBSV), a (+)ssRNA virus belonging to the family, probably transmitted by brown streak virus Potyviridae (still questioned in the literature).

B. tabaci
Indeed, this is an important technical feat. No doubt, it may help follow the movement of whiteflies and the diseases they transmit. In this context, it is interesting to note that the sequence and the functional analysis of the transcriptome of single cells has been published in several instances (recently reviewed by Liu and Trapnell, 2016). This paper is a bit disappointing, to me. The title claims that the authors have produced the first transcriptome of a single whitefly. This is true on its face value; and the results have been posted in GenBank (although, raw data). It should be relatively easy to identify transcripts since the sequence of B.
MEAM1 and MED are known. Nonetheless, this reviewer is expecting some valuable information tabaci on gene expression in the whole animal. Is the lack of data in the paper due to a low number of reads of transcripts of cellular genes? Could the author identify, say, transcripts of housekeeping genes or genes involved in sugar metabolism or else? This would underline the power and the limits of the one-insect-one-transcriptome analysis.
Instead, the authors have chosen to focus on the whitefly primary (P) and secondary (S) endosymbionts, especially on the NusG gene of the primary endosymbiont (4 figures). This gene Portiera aleyrodidarum might be interesting but it is more a structural than a functional study, which to my point of view lessens the importance of the study.
The supplementary Table 1 is interesting but does not tell us the endosymbiont composition of the four individuals scrutinized. Is it Portiera (P), and Arsephonus, Wolbachia and Rickettsia? Also, the title of individuals scrutinized. Is it Portiera (P), and Arsephonus, Wolbachia and Rickettsia? Also, the title of Table 1 is not clear; what is the meaning of "number of genes in endosymbionts bacteria"? Is it the number of genes with homologies to others?
It is interesting that Hamiltonella sequences have not been found, knowing that this is the symbiont that produces GroEL, which binds to the CP of begomoviruses, and facilitates the transit of the virus in the hemolymph. It is also interesting that CBSV sequences were not found, although the whiteflies have been collected on symptomatic plants. Is it that, after all, is not the vector of this virus?

B. tabaci
Altogether, I expected much more from the title. I suggest to lower expectations of the reader by amending the title to something like "Analysis of the endosymbiont transcriptome from individual whiteflies". I recommend publication after relating to the points mentioned above.

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? No
No competing interests were disclosed. Competing Interests:

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 02 Feb 2018 , University of Western Australia, Perth, Australia Laura Boykin Your comments have greatly improved our manuscript and an updated version is now available for review. Thank you.

Reviewer 2 Comments
In this paper, Sseruwagi . present a method of RNA preparation, which is suitable for the et al

Comments
In this paper, Sseruwagi . present a method of RNA preparation, which is suitable for the et al Illumina-based RNASeq analysis of the transcriptome of a single whitefly.
Using this method, the authors have sequenced the transcriptome of four individuals (one from Uganda and three from Tanzania) collected on cassava leaves with symptoms said to be produced (CBSV), a (+)ssRNA virus belonging to the by Cassava brown streak virus Potyviridae family, probably transmitted by (still questioned in the literature).

B. tabaci
Indeed, this is an important technical feat. No doubt, it may help follow the movement of whiteflies and the diseases they transmit. In this context, it is interesting to note that the sequence and the functional analysis of the transcriptome of single cells has been published in several instances (recently reviewed by Liu and Trapnell, 2016). This paper is a bit disappointing, to me. The title claims that the authors have produced the first transcriptome of a single whitefly. This is true on its face value; and the results have been posted in GenBank (although, raw data). It should be relatively easy to identify transcripts since the sequence of MEAM1 and MED are known. Nonetheless, this reviewer is expecting some B. tabaci valuable information on gene expression in the whole animal. Is the lack of data in the paper due to a low number of reads of transcripts of cellular genes? Could the author identify, say, transcripts of housekeeping genes or genes involved in sugar metabolism or else? This would underline the power and the limits of the one-insect-one-transcriptome analysis.

Response:
The initial experiment was meant to determine if we could obtain sufficient RNA and conduct RNAseq analysis on individual field collected B. tabaci. Our primary aim was to unravel the microbiota within individual transcriptome. Conducting gene expression analysis is still a challenge with the current method, mainly due to variation in starting RNA concentration of the whiteflies. Secondly, we did not achieve adequate ribosomal depletion, which may have hindered successful gene expression analysis. However, based on this method (ongoing) we have indeed identified nuclear genes and single copy orthologs.

Comments
Instead, the authors have chosen to focus on the whitefly primary (P) and secondary (S) endosymbionts, especially on the NusG gene of the primary endosymbiont Portiera aleyrodidarum (4 figures). This gene might be interesting but it is more a structural than a functional study, which to my point of view lessens the importance of the study.

Response:
We focused on the NusG mainly due to the unique deletion observed on what should be highly conserved proteins that are reported to be crucial in bacterial replication. It highlights the unique features of the endosymbionts from SSA species of B. tabaci compared to other putative species of B. tabaci and further highlights the difference within this species within the species complex. These findings were possible because we studied individual whitefly transcriptomes, and may probably not have been discovered by transcriptomes generated from pooled isolines.

Comments
The supplementary Table 1 is interesting but does not tell us the endosymbiont composition of the four individuals scrutinized. Is it Portiera (P), and Arsephonus, Wolbachia and Rickettsia? Also, the title of Table 1 is not clear; what is the meaning of "number of genes in endosymbionts bacteria"? Is it the number of genes with homologies to others? Response: We have revised and clarified the legends and content of both Tables 1 and 2 and supplementary  We have revised and clarified the legends and content of both Tables 1 and 2 and supplementary  Table1.

Comments
It is interesting that Hamiltonella sequences have not been found, knowing that this is the symbiont that produces GroEL, which binds to the CP of begomoviruses, and facilitates the transit of the virus in the hemolymph.

Response:
Candidatus Hamiltonella defensa has been reported to be absent in whiteflies in Africa. Our study found very negligible numbers of contigs in only one of the whiteflies (WF2a) studied. However, the literature (lines 117 to 120 in this paper) indicates that Rickettsia spp. is also involved in virus transmission, and is among the predominant endosymbionts detected in our study. We have added the results of Hamiltonella and Rickettsia to clarify the reviewer's concerns.

Comment
It is also interesting that CBSV sequences were not found, although the whiteflies have been collected on symptomatic plants. Is it that, after all, is not the vector of this virus? B. tabaci Response: RNA viruses such as CBSVs are picked up and kept for short periods in the whitefly stylet, unlike the DNA viruses that build-up and keep long in the midgut, and are more likely to be detected if present in the whitefly under study. It is also possible that the whitefly were not viruliferous considering that less than 10% of field collected whiteflies are viruliferous despite them feeding on infected.
Secondly, a recent publication (Ateka E, Alicai T, Ndunguru J, Tairo F, Sseruwagi P, Kiarie, S., et al. (2017) Unusual occurrence of a DAG motif in the Ipomovirus Cassava brown streak virus and implications for its vector transmission. PLoS ONE 12(11): e0187883 reported the presence of a DAG motif within CBSVs indicating they could be aphid-transmitted viruses rather than by whiteflies.

Comment
Altogether, I expected much more from the title. I suggest to lower expectations of the reader by amending the title to something like "Analysis of the endosymbiont transcriptome from individual whiteflies".

Response:
We appreciate the reviewers comment regarding the title but we prefer the current title as our study it is the first transcriptome generated from field collected whiteflies-the analyses pipelines can be investigated and expanded upon with future studies. Kai-Shu Ling US Vegetable Laboratory, USDA-ARS (United States Department of Agriculture -Agricultural Research Service), Charleston, SC, USA Sseruwagi and colleagues in this manuscript described a method to effectively generate a high throughput RNA-seq dataset using purified total RNA extracted from each individual field-collected adult whitefly, , which generated 39-42 million raw reads per library using Illumina sequencing.

Bemisia tabaci
Because the genome sequence of cassava whitefly SSA-1 is yet available, through B. tabaci de novo assembly of cleaned reads, high number of contigs (65,000-162,000) from each library were generated. Functional prediction to profile the generated transcripts of SSA1 were not performed. However, B. tabaci sequences to the mitochondrion cytochrome I oxidase (mtCOI) gene were identified from each of the four RNA-seq libraries. Phylogenetic analysis of mtCOI confirmed its close relationship to the cassava whitefly SSA1 clade. In addition, these RNA-seq datasets also contained sequences relating to five B. tabaci endosymbiont bacteria. Although authors claimed to have transcriptomes for these endosymbionts, extensive analysis to functionally profile the identified RNA sequences of these endosymbionts was not conducted in the current study. Individual analysis through amino acid alignment of the identified Nus G gene sequences in the primary from four RNA-seq datasets revealed an eleven Portiera aleydidarum amino acid residue deletion in two of the four individual whitefly libraries. Although this finding is interesting, a validation test would be necessary to confirm the missing sequences in those individuals through Sanger sequencing of amplicons generated using Nus G specific primers on the original RNA preparations. It is also surprising that not a single sequence relating to cassava-infecting viruses although these whiteflies were supposedly collected from cassava plants infected with cassava brown streak virus which has a poly-A tail in its RNA genome. It would be an interest to test the original RNA preparations to determine which viruses may be in these individual whiteflies.

Specific comments and suggestions:
Title: As mentioned in the general comments, this is more like a method paper in doing RNA sequencing on little RNA extracted from individual whitefly, not an extensive transcriptome analysis. I would suggest to change the title to something like this: Effective RNA sequencing using little RNA extracted from field-collected individual whiteflies ( ) useful for transcriptome analysis.

IN ABSTRACT:
Page 1: change 65,000-162,000 transcripts to contigs. Page 1: the compound sentence starting with "BLASTn searches …" This compound sentence is too long and the meaning is not clear, need to rewrite.

IN INTRODUCTION:
Page 3: need to modify the sentence ending with "…is hampered by both DNA and RNA (transmitted) virus", either deleting "transmitted" or change the sentence to "is hampered by whitefly-transmitted DNA and RNA viruses". Page 3: The sentence starting as "Relevant to this study are two RNA Potyviruses: …" Potyviruses should be replaced with "ipomoviruses, in the family Potyviridae": … Page 3: in the same paragraph as above, you may want to elaborate a little bit more on virus Page 9: When you mentioned it is useful to study vector-microbiota-viral dynamics, but why not a single viral sequence read was detected in these datasets? Although there are some advantage in using RNA sequencing of individual whitefly, you may also want to point out there are still some room for improvement.

Is the work clearly and accurately presented and does it cite the current literature? Partly
Is the study design appropriate and is the work technically sound? Partly

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Virology, vector (whitefly) biology, epidemiology and genomics I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 02 Feb 2018 , University of Western Australia, Perth, Australia Laura Boykin Thank you for your comments, they have greatly improved our manuscript. A new version is ready for review and our responses to each comment are listed below.

Reviewer 1 Specific comments and suggestions:
Title: As mentioned in the general comments, this is more like a method paper in doing RNA sequencing on little RNA extracted from individual whitefly, not an extensive transcriptome analysis. I would suggest changing the title to something like this: Effective RNA sequencing using little RNA extracted from field-collected individual whiteflies ( ) useful for Bemisia tabaci transcriptome analysis.

Response:
We appreciate the reviewers comment regarding the title but we prefer the current title as our study it is the first transcriptome generated from field-collected whiteflies-the analyses pipelines can be investigated and expanded upon with future studies.

IN ABSTRACT:
detected in these RNA-seq datasets in field-collected whiteflies. Rather than speculating these individual whitefly did not carry the target viruses, why not doing some tests by RT-PCR to confirm the lack of target viruses in these RNA preparations? Response: Due to very little RNA (~17 uL) all was used in the cDNA library preparation; thus no further experiments could be done after library preparation. Additionally, the fact that we were analysing a single whitefly, the very low viral titre may not have been detectable. It is also known that RNA viruses such as CBSVs are picked up and kept for short periods in the whitefly stylet, unlike the DNA viruses that build-up and keep long in the midgut, and are more likely to be detected if present in the whitefly under study. Studies have shown that less than 10% of field whiteflies are viruliferous in any sample, and therefore it is possible that we missed the infected individuals . during sampling The suggestion would be to use freshly-collected samples in the future.

IN METHODS:
Page 4: Since whiteflies were collected from CBSD-symptomatic cassava plants in Uganda and Tanzania, it might still be possible to conduct RT-PCR tests to determine the presence of viruses in the purified whitefly RNA preparations.

Response:
It may not be possible to detect CBSVs even with RT-PCR because of the reasons provided above.
Page 4: The rationale in using the Brazilian whitefly sample? Also it appears to me that there were no analysis of sequences from this Brazilian dataset in the result section.

Response:
We have added further information regarding the Brazilian whitefly to the end of the first paragraph in the methods section.

IN RESULTS:
Page 5: There were 65,550 to 162,487 contigs generated from individual RNA-seq datasets. How all these sequences could be assigned to? Any ideas on what proportion of the sequences belonging to whitefly genome, what proportion of the sequences to endosymbionts? B. tabaci Response: We have added an additional line to Table 1 to show the number of endosymbiont contigs-thank you for this suggestion.
Page 5: The citation to Table 2 seems to point to the listing of five endosymbionts, however the content in the Table 2 showing the origin of whiteflies collected.

Response:
Corrected Table 2 with the prevalence of the endosymbionts included.
Page 5: What are these incidences (74.8%, 71.2%, 54.1% and 58.5%) mean? The meaning is not clear, a proportion of total endosymbiont sequences that assigned to ? P. aleyrodidarum Response: For clarity, the absolute counts of the contigs associated with each endosymbiont have been provided rather than percentages and are linked directly to Table 2.
Page 5: The (data not shown) is not a good idea, it would be good to present it as a supplementary file.