Transcriptional analysis of phloem-associated cells of potato

Numerous signal molecules, including proteins and mRNAs, are transported through the architecture of plants via the vascular system. As the connection between leaves and other organs, the petiole and stem are especially important in their transport function, which is carried out by the phloem and xylem, especially by the sieve elements in the phloem system. The phloem is an important conduit for transporting photosynthate and signal molecules like metabolites, proteins, small RNAs, and full-length mRNAs. Phloem sap has been used as an unadulterated source to profile phloem proteins and RNAs, but unfortunately, pure phloem sap cannot be obtained in most plant species. Here we make use of laser capture microdissection (LCM) and RNA-seq for an in-depth transcriptional profile of phloem-associated cells of both petioles and stems of potato. To expedite our analysis, we have taken advantage of the potato genome that has recently been fully sequenced and annotated. Out of the 27 k transcripts assembled that we identified, approximately 15 k were present in phloem-associated cells of petiole and stem with greater than ten reads. Among these genes, roughly 10 k are affected by photoperiod. Several RNAs from this day length-regulated group are also abundant in phloem cells of petioles and encode for proteins involved in signaling or transcriptional control. Approximately 22 % of the transcripts in phloem cells contained at least one binding motif for Pumilio, Nova, or polypyrimidine tract-binding proteins in their downstream sequences. Highlighting the predominance of binding processes identified in the gene ontology analysis of active genes from phloem cells, 78 % of the 464 RNA-binding proteins present in the potato genome were detected in our phloem transcriptome. As a reasonable alternative when phloem sap collection is not possible, LCM can be used to isolate RNA from specific cell types, and along with RNA-seq, provides practical access to expression profiles of phloem tissue. The combination of these techniques provides a useful approach to the study of phloem and a comprehensive picture of the mechanisms associated with long-distance signaling. The data presented here provide valuable insights into potentially novel phloem-mobile mRNAs and phloem-associated RNA-binding proteins.


Background
Plants are sessile organisms and unlike animals have no neural network or circulatory system. Phloem and xylem are the main tissues that facilitate nutrient and signal transport in the whole-plant body. With the evolution in size and complexity, the need for an efficient longdistance transport system has steadily increased over time for land plants [1]. The result of these changes has led to the development of more specialized and complicated cell types in both the phloem and xylem. Xylem is composed of parenchyma cells, fibers and long tracheary elements that transport water and soluble mineral ions from the root to other organs. Tracheary elements and fibers are enucleate, non-living cells that maintain only a cell wall. In comparison, phloem is composed of living cell types, including sieve elements, parenchyma cells, and supportive cells, such as fibers and sclereids. Parenchyma cells include both specialized companion cells and unspecialized phloem parenchyma cells. Sieve elements lose most of their organelles and are enucleate.
All their metabolic functions are carried out by the companion cells but profiles of phloem proteins suggest that translation may occur within the sieve element system [2]. RNAs are transcribed and translated in companion cells and small RNAs, mRNAs and proteins are then actively transported into sieve elements through the plasmodesmata [3].
Phloem is the conduit for transport of photosynthates, mainly sucrose, from leaf to sink tissues. Signal molecules also take advantage of this information highway to communicate between different organs. These molecules can be hormones [4], small RNAs [5], fulllength mRNAs [6][7][8][9][10][11][12] and proteins like FT [13,14]. From numerous studies of phloem-mobile signals, it is clear that these molecules can be delivered in either an acropetal or basipetal direction. Two examples illustrate how such phloem-mobile signals regulate development [15]. Under photoperiodic conditions inductive for flowering, FLOWERING LOCUS T (FT) is expressed in the leaf and transported in protein form through the sieve element system to the shoot apex where, in conjunction with FLOWERING LOCUS D (FD), it activates the floral pathway [16]. Several studies have identified FT in the shoot apex or phloem exudate of plants induced for flowering [13,17,18]. In this system, FD provides spatial control of flowering and FT provides temporal control. As another example, using heterografting experiments, fulllength StBEL5 mRNA of potato was verified to move in a downward direction from leaf to stolon and root [9,19]. This long-distance phloem transport of StBEL5 is enhanced under short days and controlled by untranslated regions of the transcript. Movement of StBEL5 mRNA was correlated with increased tuber yields and root growth [9,19]. Both of these long-distance signaling systems utilize photoperiodic cues to activate movement of the developmental signal from source to sink organs.
The mechanism of non-cell autonomous movement and its regulation are still unclear, but RNA-binding proteins (RBP) identified from phloem sap of pumpkin bind to mobile mRNAs to regulate their movement [2,20]. A polypyrimidine tract-binding (PTB) protein of pumpkin (RBP50) was identified as the core protein of a RNA/protein complex that transports RNA. Further evidence suggests that similar RBPs in potato function to facilitate both stability and long-distance transport of select mobile RNAs [21]. Transcription of these RBPs was observed in companion cells of the phloem [20]. To elucidate the potential for long-distance signaling through the sieve element system, several profiles of phloem proteins and RNAs have been undertaken. Analysis of the proteome of phloem sap of pumpkin revealed over 1200 proteins present in the sieve element system [2]. Through both phloem cell microdissection and analysis of phloem sap, we now know that the transcriptome of phloem includes thousands of full-length mRNAs with a diverse range of potential functions [22,23]. Phloem sap, in particular, has been used as an efficient source to study uncontaminated phloem proteins and RNAs [2,20,24]. Results from the most widely used model system for phloem sap analysis, the cucurbits, have been compromised, however, due to the existence of two separate phloem sources each with unique protein and RNA sets [25]. In most plant species pure phloem sap cannot be obtained. As a reasonable alternative, laser capture microdissection (LCM) makes it possible to isolate RNA from specific cell types and provides practical access to expression profiles of phloem tissue [26][27][28][29]. In previous studies, transcripts of seven of the StBEL genes of potato, including the mobile mRNA, StBEL5, were identified in RNA extracted from phloem cells using LCM/RT-PCR [30]. Combining LCM and RNA-seq has proven to be an invaluable tool for profiling highresolution transcription in specific cells [28,[31][32][33][34]. Here we make use of LCM and RNA-seq for an in-depth transcriptional profile of phloem-associated cells of both petioles and stems of potato. The combination of these techniques has provided a valuable approach to the study of phloem tissue and a comprehensive snapshot into the mechanisms associated with long-distance signaling.

Analysis of a LCM phloem transcriptome
To gain insight into the function of the numerous genes actively involved in transport and signaling throughout the phloem system, transcriptomes of phloem-associated cells (PAC) were profiled from both the petiole and the lower stem of short day-grown potato plants. The petiole was selected because of its proximity to the leaf, an important source of a wide range of light-activated and photosynthate-related signals. The lower stem was selected because of its proximity to the strong tuber sink. RNA was isolated from phloem tissue samples dissected from paraffin-embedded petiole (Short day [SD] Petiolephloem) and stem sections (SD Stem-phloem) using the LCM method [30]. Because of our interest in the short day (SD) activated process of tuber formation in potato, both samples come from SD-grown plants. The sample collected by LCM contains not only sieve elements, but also the companion cells, phloem parenchyma cells and other cells associated with the phloem. Making use of a phloem-specific marker, StPTB1 [35], phloem cells that were harvested can be observed in scattered bundles in the petiole (Fig. 1a-c) and in outer regions of discrete vascular bundles in the stem (Fig. 1d). Based on this morphology, the transcriptome profiled from the LCMderived samples represents the transcriptome of PAC. RNA yields from LCM-derived samples are commonly very low. To obtain a working concentration, extracted RNA was amplified and three replicates per tissue type were analyzed.
The number of reads contained in each library was greater than 2.9 × 10 7 and only 25 to 46.9 % reads of the reads were mapped to the genome uniquely, where both pairs of reads mapped in the expected direction and with the expected distance between them (concordantly). Most of the other reads were mapped to multiple locations in the genome. Of the approximately 39 k genes contained in the potato genome, 15 to 23 k genes were detected in the three replicates of phloem-associated cells (either petiole-PAC or stem-PAC) (Additional file 1: Table S1). Most of the genes that were expressed in only one or two of the replicates are present at very low abundance (<10 reads, for example). The six libraries were normalized with the upper quantile and analyzed with a Generalized Linear Model using QuasiSeq. P-values were calculated with QLSpline based on negative binomial distribution. Q-values were given by adjusting the p-value for familywise error rate (Additional file 2: Figure S1). After removing the effect of the sequencing method, a mean value was calculated for petiole-PAC and stem-PAC to indicate their measured level. All of the genes identified in the transcriptomes of this report with mean read values are listed in Additional file 3: Table S2. Excluding the low read hits (<10 reads), 15,167 genes can be identified in either petiole-PAC or stem-PAC transcriptome (Group 2 under the whole genome column, Table 1). The number of expressed genes are comparable to the 14,242 and 13,775 genes identified in the vascular bundles of cucumber and watermelon, respectively [36].

Comparison of Petiole-PAC and stem-PAC transcriptomes
Out of the 26,898 genes that exhibited any expression in either petiole-PAC or stem-PAC, 2087 were identified as differentially expressed (DE) genes between petiolephloem and stem-phloem, with q-values less than 0.05 (Additional file 4: Table S3). Most of these genes are expressed at low abundance levels (<10 mean value in both petiole-PAC and stem-PAC). Only 573 DE genes have a mean value >10 in either petiole-PAC or stem-PAC (Additional file 4: Table S3). At the 500 read cutoff level, the number of DE genes between petiole-PAC and stem-PAC is only 162 ( Table 1, Group 1). This suggests that the petiole PAC and stem PAC have very similar transcriptomes.
To visualize functional relationships in these expression profiles, an ontological analysis was performed. Gene ontology (GO) categories of all the genes in potato were obtained from the GO database using Blast2GO, with parameters of 20 hits and an e-value of 10e −6 . potato genome were matched with at least one GO term. The 573 DE genes were mapped to 3791 GO terms including 736 molecular functions, 2579 biological processes, and 476 cellular components (Fig. 2). As we would expect with active transport and loading through the phloem cells, binding function is one of the most prominent functions in the genes expressed in PAC. Out of the 736 molecular functions, 81, 62, and 126 were classified as "nucleotide binding", "protein binding", and "binding", respectively (Fig. 2). "DNA binding" and "RNA binding" have lower numbers, 33 and 16 respectively, but these functions have very important roles in transcription, mRNA stabilization, localization and transport (Fig. 2, arrows). The signaling-related biological processes and molecular functions were more closely examined for all unique and DE genes related to signal transport, a prominent function of phloem (Table 1). Specifically, the transcripts encoding proteins functional in signaling and regulation, such as light-induced signaling, photoperiodism, floral induction, hormone-related signaling and transcription factors, were explored due to their importance in long-distance transport. Among these DE "signal" genes abundantly expressed, 10 are classified as signalingrelated, 20 as light-related, 5 as hormone-related, and 11 as flowering-related (Group 1, Table 1). Eight genes are classified as transcription factors and six encode for RNAbinding proteins. DE genes from signaling, light, hormone, flowering, and RNA-binding GO categories are listed in Table 2.

Unique and DE genes of petiole-PAC and stem-PAC
Even though petiole-and stem-PAC transcriptomes are similar, there are several transcripts that were expressed uniquely in each. The differentially and uniquely expressed genes indicate the slight difference between petiole phloem and stem phloem. With 10 reads as a mean threshold, approximately 11 k of the genes are expressed in both petiole-PAC and stem-PAC (Table 1, whole genome column, Group 2). There are 1412 and 2710 unique genes expressed in petiole-PAC and stem-PAC, respectively (Table 1, Group 2). Few of these are highly expressed exemplified by the fact that there are only ten petiole-PAC unique genes with greater than 500 reads (Table 3), and only 26 in stem-PAC (Table 4). Several of these highly expressed PAC genes have been functionally characterized in other organisms, such as AP2, an ERF domain-containing transcription factor [37,38], sucrose synthase [39,40], and several other DNA-or RNAbinding proteins. The pentatricopeptide repeat-containing protein (PPR) is a RNA-binding protein essential for RNA editing in chloroplasts and mitochondria [41,42]. PPR proteins help to restore fertility to cytoplasmic malesterile plants [43] and are involved in organelle biogenesis [44]. Also included in the signaling category, are  [46]. Gene ontology categories analyzed for the DE genes were also considered for the uniquely expressed genes in petiole-PAC and stem-PAC (Table 1, Group 2). With 10 reads as a cut-off, the genes uniquely expressed in the stem-PAC are approximately two-fold more in number than the genes uniquely expressed in the petiole-PAC in each category (Table 1, Group 2). Whereas, when considering only the most abundant RNAs (>500 average reads), the number of uniquely expressed genes from both sources is comparable (Table 1, Group 3). These latter abundant transcripts are plausibly important regulators of phloem function or mobile transcripts present in the sieve elements. In summary, there are approximately 1000 genes in these groups (signaling, light, hormone, flowering, transcription factors, RNA-binding) including both unique and common (Table 1, Group 3, red ovals). A complete list of these genes can be found in Additional file 5: Table S4.
Among these genes in the select GO categories of Table 1, many are important regulators of development that may provide insights into the differences between petiole-PAC and stem-PAC. Pseudo-response regulator 5 (PGSC0003DMG400000584) is a well-characterized gene, which has an important role in circadian rhythm [47]. It is classified as a DE petiole-PAC gene, with 1211 reads in the petiole-PAC and only 366 reads in the stem-PAC. BEL33 (PGSC0003DMG400024267) is a flowering-related gene in the TALE (Three Amino Acid Loop Extension) superfamily [48]. Its expression level in stem-PAC is almost twice that of petiole-PAC levels. Gibberellin_receptor GID1 (PGSC0003DMG400028559) is a GA receptor that regulates hormone responses [49]. It is expressed in petiole-PAC with a mean of 1017 reads, five times more than levels in stem-PAC. BRI1 protein is a leucine-rich repeat protein localized to the membrane with an extracellular brassinosteroid receptor domain and intracellular kinase domain [50] and functions in controlling the autonomous flowering pathway [51]. The ccr4-associated protein has mRNA deadenylation activity and is functional in defense [52]. The chromo-domain protein, LHP1, is involved in epigenetic silencing of target genes such as flowering genes. Genetic experiments have shown that LHP1 can affect flowering time and vegetative growth [53].  [93]. Genes with less than ten reads in both petiole-PAC and stem-PAC were removed because of their low abundance. GO terms of the whole genome were analyzed with Blast2GO. The GO categories identified for each gene were made comparable by converting each GO term to the same level in the GO structure to permit. This was done with goslimviewer in AgBase [96] (http://agbase.msstate.edu/cgi-bin/tools/goslimviewer_select.pl)  RNA-binding proteins (RBP) interact with transcripts to mediate numerous aspects of RNA metabolism [54] and function as chaperones that facilitate the long-distance transport of phloem-mobile mRNAs [20]. In the transcriptome of PAC, 364 out of the 464 RBPs in the potato genome were detected (Additional file 6: Table S5). Several of these RBPs have been documented in the literature. The pumpkin ortholog of StPTB1 and StPTB6 (Table 5) was identified as the core protein in a mobile nucleoprotein complex in pumpkin phloem [20]. StPTB1 and  StPTB6 play important roles in regulating the movement of the mobile RNA, StBEL5 in potato [55]. IF2 is the potato ortholog of Nova, a KH domain RBP, that binds to StPTB1 and StPTB6 [56]. A PTB7-like protein was also identified in pumpkin phloem and its orthologs in Arabidopsis have been implicated in alternative splicing [2,57]. Several of the RBPs identified using the 3′ UTR of StBEL5 as bait in the yeast three-hybrid system [58] were also detected in the petiole-and stem-PAC transcriptomes ( Table 5). These include sucrose synthase, eIF5A, and a glycine-rich RBP. Four pumilio proteins containing a Puf domain that interacts with 3′ UTRs in target RNAs were detected. All four were relatively abundant in both petiole and stem profiles. Pumilio has only recently been discovered in plants but is widespread in numerous species and functions in diverse aspects of mRNA metabolism that regulate development and defend against viruses [59][60][61].
To validate expression patterns of select RBPs without the bias of amplification or the imbedding techniques inherent in the LCM protocol, a sample of RNA profiles from the Potato Genome Sequencing Consortium [62] was assembled in eight different organs for 26 RBPs (Fig. 3). These were selected on the basis of their relative abundance in petiole-PACs, their binding affinity for the 3′ UTR of the mobile RNA, StBEL5 [58], or the presence of their protein orthologs in pumpkin phloem sap [2]. Consistent with their abundance in petiole-PAC, most of these proteins exhibited significant levels of transcripts in petioles (Fig. 3, red arrows). Among the samples, three RNA helicases, one PTB protein, and two glycine-rich RBPs were profiled. Plant helicases have a known function in regulating the size exclusion limit of plasmodesmata [63]. Several RBPs exhibited spikes in transcript levels in specific organs ( Fig. 3; Additional file 7: Table S6). RBP6, RBP RZ-1, and glycine-rich protein 2 were relatively abundant in young tubers. RBP-45 and one of the Dead-box RNA helicases exhibited select relative abundance of transcripts in stolons. Two of the "RNA-binding proteins" and the FUS-interacting ser/ arg-rich protein-1 all spiked in roots. These observed abundance levels may reflect a putative organ-specific function for select RBPs.  [58] Abundant transcripts in phloem-associated cells Transcriptome profiling identified a plethora of genes that are highly expressed in petiole PAC or stem PAC. The highest average total reads for one specific gene were 154,207 in petiole PAC and 274,582 in stem PAC.
In the petiole-PAC library, 1209 genes had >1000 reads and 2626 genes had >500. In the stem-PAC library, 1288 genes had >1000 reads and 2822 genes had >500. There are 3593 genes with >500 reads in either petiole-phloem or stem-phloem associated cells (Additional file 8: Table S7). To compare the genes highly expressed in PAC with other genes in the whole genome, these 3593 genes were regarded as genes with abundant transcripts and were analyzed for attributes based on gene annotations. Approximately 2971 GO terms were involved with the abundant transcripts in PAC. The GO categories identified for each gene were made comparable by converting each GO term to the same level in the hierarchy to permit clustering into GOslim categories. This was done with goslimviewer using AgBase software. Twenty-five cellular components, 44 biological processes and 26 molecular functions were applied (Fig. 4). The most abundant GO categories represented were "binding" and "nucleic acid binding" (Fig. 4). By comparing to the whole genome with GOseq [64], the gene ontology of the abundant transcripts revealed 511 categories over-represented with adjusted p-values smaller than 0.05. In the 510 over-represented GO categories there are 101 molecular functions, 73 cellular components and 336 biological processes. Many binding-related functions were verified as over-represented categories with adjusted p-values smaller than 0.05. Ion binding functions, such as zinc ion binding, copper ion binding, cobalt ion binding and calcium ion binding, were all over-represented (Fig. 5a). RNA-binding and proteinbinding functions were also over-represented in active PAC genes. These binding functions likely contribute to facilitating the transport processes of phloem. The PAC-abundant transcripts are also involved with numerous important biological processes, including both response and developmental activities (Fig. 5b-c). The top 32 over-represented molecular function and biological processes related to signaling included responses to both light quality and quantity (Fig. 5c). These signals are commonly perceived in the leaf, and phloem in the leaf veins and petiole serve as the conduit to deliver A B Fig. 3 Expression profiles of select RNA-binding proteins mined from RNA-seq data from the Potato Genome Database from the Tuberosum RH89-039-16 haplotype [62]. RNA profiles from eight organs are presented and medium (a) and high (b) abundance values are shown in FPKMs (fragments per kb per million mapped reads). Selection of these RNAs was based on their relative abundance in petiole-PACs, their binding affinity for the 3′ UTR of the mobile RNA, StBEL5* [58], or the presence of their protein orthologs in pumpkin phloem sap** [2]. Red arrows designate petiole samples. Accession numbers for these genes are listed in Additional file 7: Table S6 these signals to other organs. There are also several flowering-related GO categories over-represented in the abundant transcripts, including photoperiodism, flowering, ovule development, regulation of flower development, and flower morphogenesis (Fig. 5c). Among the 175 GO categories related to signaling, as listed in Table 1, 34 of them are over-represented in the PAC-abundant transcripts (all listed in Fig. 5c).

The effect of photoperiod on the transcriptome of petioles
Day length regulates numerous aspects of plant development and is especially important in potato for controlling tuber formation. The perception of length of daylight/darkness generates leaf-derived signals that are traansported throughout the whole plant through petiole/stem vascular connections [9,13,14]. To identify genes that are regulated in petioles in response to photoperiod, we sequenced RNA samples from petioles of the photoperiod-responsive species Solanum tuberosum ssp. andigena under long-day (LD) and SD photoperiods using RNA-seq. Four replicate samples for each were isolated from the petiole tissues harvested from plants grown under long-and short-day conditions. The reads were mapped to the potato genome with GSNAP, and the number of concordant unique reads was counted for each gene using HT-seq (Additional file 9: Table S8). The number of reads contained in each library was greater than 1 × 10 7 and approximately 94 % of the paired reads were mapped to the genome in a concordant and unique manner. Of the 39,028 genes contained in the potato, approximately 25 k genes were detected in the whole petiole samples (LD Petiole and SD Petiole) with at least one read aligning to the gene (Additional file 9: Table  S8). Representing only a few cell types from the petiole and stem organs, RNA-seq results of PAC scored significantly fewer genes than the whole petiole sample (Additional file 1: Table S1). The genes expressed in phloem are very likely detected in the whole petiole samples depending on the depth of the petiole profile. Beyond PAC, there are many other cell types in the petiole (e.g., collenchyma, sclerenchyma, palisade parenchyma, and epidermis), so a proportion of genes will likely only be detected in the whole petiole. In theory, the genes with higher reads in PAC represent genes with a significant putative function in the phloem. Of course, this set of genes will also be detected in the whole petiole samples, albeit at a lower proportion of reads (Additional file 3: Table S2). Reads of the eight libraries were normalized using the upper quantile normalization approach. A generalized linear model was applied to the LD Petiole and SD Petiole results with the R package "QuasiSeq" to analyze the photoperiod effect. All other conditions were kept consistent and the libraries were sequenced with multiplexing tag in the same lane, so photoperiod effect is the only fixed effect that was considered (Additional file 10: Figure S2).
Eleven-thousand, nine-hundred, fifty-seven genes were identified as significantly DE with q-values less than 0.05 (Additional file 11: Table S9). Means of the normalized reads of the four replicate libraries for both long-and short-day treatments were used to indicate their measured level. Among the 20,564 genes with at least 10 reads in either LD or SD petiole, 517 of them are uniquely expressed in LD, and only 388 of them are uniquely expressed in SD. Most of these uniquely expressed genes exhibited low abundance read values. The most abundant transcript uniquely expressed in LD has only 558 reads  Table S10). Eighthundred and twenty of the DE genes activated by SD have a log2-fold change more than 1, and 128 genes out of the 820 have reads more than 500 under SD (Additional file 13: Table S11). Transcripts that are regulated by photoperiod and are relatively abundant (>380 reads) in petiole-PAC may be indicative of genes involved in signaling or transport mediated by day length. Included in this list are genes encoding for the Agamous-like MADS-box protein/ AGL8 ortholog, a circadian clock-associated FKF1 protein, Pseudo-response regulator 5, the AP2 ERF-domain protein, an ethylene receptor, a NAC-domain protein, and a nodulin MtN3 family protein (Additional file 12: Table  S10 and Additional file 13: Table S11). As an example, the AGL8 ortholog of potato is induced by the StFT-like tuberization signal, SP6A [14], and is involved in controlling meristem and tuber development by regulating cytokinin levels [65]. Several notable DE photoperiod genes were selected to verify their relative expression levels with qRT-PCR (Fig. 6a). Four were up-regulated by SD (Fig. 6b) and four by LD (Fig. 6c). All eight exhibited expression patterns consistent with their comparable RNA-seq data.

Gene ontology of photoperiod-regulated genes of the petiole
To visualize functional relationships in this diverse expression profile, the 11,957 DE genes were also analyzed A B C   Fig. 7; Additional file 14: Figure S3 and Additional file 15: Figure S4). As expected, "circadian rhythm" is identified as an over-represented GO category as well as several lightrelated GO terms (Fig. 7). Among the molecular functions, binding is the most over-represented function, including binding to ATP, protein, nucleotide, DNA, RNA, and several kinds of ions (Additional file 14: Figure S3).

RNA-binding motifs
Mobility of mRNA, stability and control of translation are facilitated by RNA-binding proteins associated with them. RBPs commonly bind to conserved elements in the 3′ un-translated region (UTR) of the RNA. To assess the frequency of select RBP motifs in potato transcripts, downstream sequence (DSS) from the stop codon was screened for the presence of RBP motifs. As a reference, in Arabidopsis, the average length of the 3′ UTR is 248 nt [66]. Three known RBP target elements were searched, including those for polypyrimidine tract-binding proteins (PTB), Pumilio and Nova, a KH-domain protein [67] (Additional file 16: Figure S5A-C). Pumilio was selected because of its relative abundance in PAC (Table 5), its functional relevance, and its widespread role in RNA metabolism [68]. PTB and Nova were selected because of their prominence in binding to RNAs and because both were detected in phloem sap of cucumber suggesting they are both phloem mobile [2]. The Pumilio binding motif has been confirmed as UGUAu/c/aAUA where the 5th nucleotide can be U, C, or A [69], whereas Nova's is modeled as u/c/aCAUUUCAc/u [67]. PTB proteins bind to RNA at four RNA recognition motifs (RRM). Each RRM can interact with a cytosine/uracil (CU) motif ranging from 3 to 6 nt. In our search, the PTB motif was defined as a cluster of four CU runs each, at least, 4 nt in length within the designated DSS. Biochemical analysis of interactions of target RNA to the binding pockets of PTB protein domains demonstrated that binding to PTBs is not sequence-specific and that many RNA fragments readily bind to them [70].
Using the MEME suite [71] and BEDTools [72], Pumilio, Nova, and PTB binding motifs were initially searched across the genome and extracted through 1000 nt of DSS (Additional file 16: Figure S5A-C). Because of their frequent occurrences, all three motifs were again searched through either 500 (Pumilio and Nova) or 200 (PTB) nt of DSS (Table 6). Any transcripts that contained at least four CU runs within this 200-nt DSS were identified as potential PTB targets. Forty-six hundred RNAs were identified with a Pumilio binding motif, 3000 RNAs with a Nova binding motif, and more than 3000 RNAs contain the PTB motif ( Table 6, Group 1). Ham et al. [20] demonstrated that the pumpkin PTB protein, RBP50, binds specifically to UUCUCUCUccuUCUU sequences present within a Fig. 7 Over-represented GO terms related to light signaling in DE photoperiod genes. GO terms involved with DE genes between long-and short-day treatments were analyzed with GOseq [64] to compare their enrichment in DE genes relative to the whole genome. The p-value was adjusted with BH method [97]. The over-represented genes were defined with adjusted p-value smaller than 0.05. The ratio of each GO term is calculated by comparing the number of genes involved with each GO term with number of genes in the whole group subclass of phloem-mobile, polyadenylated transcripts. On this basis and to ensure more exclusiveness, we screened DSSs of the 31,742 PTB RNA pool for this motif. From this screen, 422 RNAs were identified that contained the RBP50 motif. Included in this list were RNAs encoding a Gag-pol polyprotein (HIV-related), an integrase core domain-containing protein (HIV-related), ethylene response factors, a SET-domain protein, a LOB-domain protein, a tuber-specific element-binding protein, and numerous other TFs, signaling and receptor-like proteins.
After examining the frequency of distribution of these motifs in DSS, only the Pumilio motif demonstrated any significant enrichment. This enrichment occurred in the first 200 nt of DSS. In contrast, the motifs for PTB and Nova were randomly distributed across all DSS examined (Additional file 16: Figure S5A-C). Many of the DSSs analyzed here contained multiple binding motifs. Among the 4629 RNAs containing a Pumilio binding motif, there are 383 with two Pumilio binding motifs, 43 with three, 6 with four and 1 with seven. Among the 3042 RNAs containing multiple Nova binding motifs, there are 176 with two motifs, 10 with three, and 1 with four motifs. There are also different motifs existing in the same DSS (Additional file 16: Figure S5D). Three hundred and twenty DSSs screened contained all three types of RBP motifs. Approximately, 6000 transcripts contained two different RBP binding motifs of Pumilio, Nova or PTB. Unique among the three motifs we searched, Pumilio motif-targeted transcripts were over-represented in 13 gene ontology categories (Additional file 17: Figure S6). Included among these categories were "sequence-specific DNA binding transcription factor activity", "protein autoubiquitination", "DNA-dependent regulation of transcription" and "DNA-dependent negative regulation of transcription". All these functions and biological processes are important in regulating their targets and downstream genes. By its binding to the transcripts of regulatory genes, Pumilio protein indirectly regulates the activity of a wide range of genes.

Accumulation of RBP targets in PAC
The RBP targets abundantly expressed in PAC are of especial interest. Present in sieve elements, phloem-mobile transcripts associated with RBPs are good candidates for long-distance signals. Among the PAC-abundant transcripts (>500 in petiole-PAC or stem-PAC), 529 contain the Pumilio binding motif, 290 contain the Nova binding motif, and 3223 contain the PTB binding motif ( Table 6, Group 1). When considering the photoperiod effect, these numbers are reduced even more at 338, 168, and 1943, respectively (Table 6, Group 2). As an over-represented GO group in Pumilio-targeted transcripts, the 1090 "sequence-specific DNA binding transcription factor activity" related genes in the potato genome (Additional file 17: Figure S6) were also screened for Nova and PTB binding motifs. Among the transcripts of transcription factors in PAC (>10 reads in petiole-PAC or stem-PAC), 103 of them are Pumilio targets, 39 are Nova targets and 442 are PTB targets (red text, Table 6, Group 3; Additional file 18: Table S12). Only 28, 6, and 130 of these transcripts were abundant in PAC (>500 reads), respectively, with Pumilio, Nova and PTB binding motifs.
Several TF families exhibited enrichment of these RBP binding motifs in their DSSs (Table 7). Auxin responsive factors (ARF) and AUX/IAA are two different components in auxin-mediated transcription regulation, as transcription factor and transcriptional repressors, respectively [73]. AUX/IAA transcripts have been reported to be phloem mobile into roots [12] and eighteen RNAs in this class contain at least one RBP binding motif. One notable RNA is Auxin response factor 2 (PGSC0003DMG400014179). This RNA accumulates to high levels in both petiole-PAC and stem-PAC and both PTB and Pumilio motifs are present in its DSS. Long-distance movement of transcripts encoding both StBEL1 and StKNOX type TFs has also been previously reported [9,21]. Only two of the 13 StBEL RNAs, StBEL22 and −30, contained no RNAbinding motifs for the three RBPs we searched. DSS of  STH1 contains all three RNA-binding motifs. All six KN1-like RNAs contain PTB motifs and three of the six are relatively abundant in PACs (Table 7). Other abundant PAC TFs containing RNA motifs include members of the NAC-domain [74] and WRKY [75] families. Two NAC domain RNAs, Nam 9, and two WRKY transcription factors contain both PTB and Pumilio binding motifs in their DSS (Table 7).

Discussion
Phloem RNA derived from laser capture microdissection Compared to a genomic study, transcriptome analysis is more informative as it provides a snapshot of processes of physiology and development. RNA levels can be affected by three factors, the rates of transcription, degradation and processes of movement. For phloem-associated cells, the dynamics of transcript levels are even more important, since phloem is the conduit for allocation of photosynthate. In addition to the transcripts that maintain the metabolism and function of phloem, there is also a unique set of non-cell-autonomous mRNAs moving through the phloem as potential long-distance signals. Because of the limitations inherent in the harvest of potato phloem sap, isolation of phloem cells can be readily accomplished by using the well-developed technique of LCM. Early applications of LCM to extract RNA from phloem cells lacked depth and were inefficient. LCM of rice phloem cells yielded only 413 clones that exhibited a high level of redundancy [26]. Refinement of the technique coupled with high-resolution next generation sequencing technology has facilitated expression analysis of select target cells characterized by a very high level of resolution and reproducibility.
In this study, we sequenced the transcriptome of petiole-PAC and stem-PAC using RNA-seq. The raw output is approximately 10 7 reads for each sample. Out of the approximately 40 k genes in the potato genome, roughly 15 k genes were expressed in PAC of petiole and stem. Our numbers are comparable to the 14,242 and 13,775 active genes identified in the vascular bundles of cucumber and watermelon, respectively [36]. Through statistical analysis, petiole-and stem-PAC exhibited very similar transcriptomes with just slight differences. The genes DE in common between them and the unique genes in each were associated with important GO categories in both signaling and developmental regulation. Approximately 50 DE genes were grouped into signaling-related GO categories, including light, hormone, and flowering related categories. GO categories for binding functions were proportionately overrepresented for transcripts abundant in PAC, including both DNA and RNA binding. A propensity for binding and signaling categories for genes expressed in PAC would reflect the dynamic functions of the phloem as a conduit for transporting sucrose and a range of signaling molecules [76].

Previous expression profiles of phloem
Previous work has established the foundation for RNA profiling of phloem utilizing both LMPC-derived cells and sap harvested from Arabidopsis and melon [22,23]. Transcripts present in sieve elements were identified from melon phloem sap, which readily bleeds from stem cuts [22]. In this study, 1830 unique ESTs were sequenced and mapped to 986 unique transcripts. Using gene functional analysis, 15 % of these genes encoded proteins related to signal transduction. Using these ESTs as a query, 124 potato orthologs were identified from the potato genome. Onehundred and four of them were expressed in either petioleor stem-PAC. Unfortunately, the profile of the phloem ESTs in melon lacked much depth and expression levels were not verified quantitatively. Using RNA-seq, approximately 10 4 -fold more fragments can be generated and sequenced compared to the EST sequencing approach. This enhanced resolution provides more quantitative sequence information and more insights into the function of the profiled RNAs.
The study on Arabidopsis phloem compared the profiles of LMPC-derived phloem tissue and leaf phloem exudate and in this way, provided a hint of the identity of mobile transcripts present in the phloem [23]. Approximately 2400 transcripts were identified in the phloem exudate by microarray, and 90 of them were categorized as functional in signaling pathways. Seventy-six genes in the potato genome were identified as orthologs of these 90 putative mobile transcripts. Seventy of the 76 genes were expressed (>10 reads) in either the petiole-or stem-phloem libraries (Additional file 19: Table S13). Twenty-eight exhibited more than 1000 reads in the petiole-PAC library. These orthologs, including 14-3-3 proteins, MAP kinases, lightrelated proteins (e.g., one AUX/IAA RNA), and calciumresponsive signals (Additional file 19: Table S13), represent potential mobile mRNAs in potato. The comparison of profiles in the Deeken et al. study [23] is invaluable but because most plants do not readily yield phloem sap, our current approach using LCM technology coupled with RNA-seq exhibits numerous advantages. It is consistent with previous methods but provides wider applicability, cell specificity, and excellent in-depth sequence resolution. On the downside, the LCM protocol is labor-intensive, may yield small amounts of RNA, and opens the possibility of harvesting cells outside the target tissues. A major advantage of using sap over PACs for examining phloem signaling is the enhanced specificity provided by phloem sap analysis. A PAC profile provides greater overall coverage but less specificity for signal transcripts that move through the sieve element system.

Accumulation patterns for established phloem-mobile mRNAs
RNAs concentrated in PAC can be specifically located in the sieve element, companion cells or parenchyma cells.
Depending on stability and transport dynamics, mobile mRNAs may be concentrated in sieve elements. Through RNA-seq, thousands of abundant phloem transcripts can be profiled, but to verify their mobility requires heterografting experiments with different but related species [6] or RNA movement assays [77]. Another option is to generate stably transformed plants that express the test RNA with a non-plant sequence tag and heterograft with wild type plants [9]. All three approaches are labor-intensive and time consuming. The in silico analysis approach for identifying conserved motifs in 3′ UTRs presented in this study (Tables 6 and 7) has the clear potential to more efficiently predict candidate transcripts for long-distance mobility.
Non-cell-autonomous mRNAs may move into the sieve element system through plasmodesmata connecting companion cells to sieve elements [78]. In this model, any RNAs from the leaf, transported long distance, may be detected in both petiole-and stem-PAC. As discussed previously, there are hundreds of full-length mRNAs present in phloem sap but only a few of these have been confirmed to move and even fewer are associated with a phenotype [15]. This short list includes LeT6 in tomato, GAI in pumpkin, tomato, and Arabidopsis, IAA18/28 in Arabidopsis, and POTH1 and StBEL5 of potato. The potato orthologs of these mobile RNAs along with StBEL5 and POTH1 are detected in the phloem-associated cells of both petiole and stem in relatively abundant levels ( Table 8). Because there are reports of the transport of FT mRNA [11,79], the FT/SP6A orthologs of potato are also included. The complete absence of any accumulation of transcripts for StFT genes suggests that these RNAs are not phloem-mobile. Evidence indicates that SP6A is translated in leaves and moves through the phloem to underground stolons in protein form [14]. STH15 and StBEL5 exhibited the greatest levels of accumulation in PAC (Table 8). Of course, abundance levels of any specific RNA would be determined by the rate of transcription, the stability of the RNA, and the degree of its mobilization.
GA INSENSITIVE (GAI) is exceptional in that longdistance movement of its mRNA has been established in several plant species including cucumber, tomato, pumpkin [8,20], apple [80], and Arabidopsis [81]. It was the first mobile RNA identified and CU-rich sequences in its transcript facilitate binding to CmRBP50, a PTB protein of pumpkin. Accumulation of AtGAI across a graft union can affect leaf architecture [8]. IAA18/28 was verified to cross graft unions and move into root tips to regulate root architecture [12]. STH15 is the ortholog of LeT6 of tomato and STM of Arabidopsis. Mobility assays of LeT6 confirmed upward movement of is transcript associated with a leaf phenotype in tomato [10]. Both POTH1 and StBEL5 have been associated with tuber development [82,83]. Movement of StBEL5 is induced by short days and regulated by its UTRs. The UTRs of POTH1, StBEL5, and StGAI interacted with a potato PTB protein [21]. Because several of these mobile RNAs interact with the same RBP, it is conceivable that multiple RNAs are transported in the same RNP complex. For example, the mobile RNA/RBP50 complex of pumpkin contained six RNAs including CmGAI and CmSTM. All six of these RNAs contained CU-rich PTB motifs. These CU-rich motifs were also observed in the UTRs of StBEL5, POTH1, StGAI, and STH15.

The role of RNA-binding proteins
RBPs mediate numerous aspects of RNA metabolism including mRNA capping, rate of degradation, translation, localization and transport. For long-distance mobilization of mRNAs, RBPs associated with them are especially important in stabilizing and localizing the mRNAs, while repressing translation during the process. Our analysis revealed numerous transcripts encoding RBPs in both petiole-and stem-PAC (Additional file 6: Table S5) and it is very likely that a subgroup of these are functional in the execution of mRNA transport via the sieve elements. The glycine-rich protein 7 (GRP7) was the most abundant RBP in our libraries. Seven KH domain proteins (including Nova), four Pumilio proteins, and all six potato PTBs were identified (Table 5).
Surprisingly, one of the most abundant RBPs was Pumilio1. Pumilio proteins are post-transcriptional regulators containing Puf domains (Pumilo and FBF) that recognize RNA sequences present in the 3′ UTR of target RNAs. Pumilio functions in cytoplasmic de-adenylation, translational repression, RNA localization and decay, maintenance of germline stem cell identity, translation initiation, and rRNA processing and ribosome biogenesis [68]. Puf proteins repress translation of target RNAs during establishment of polarity in the developing embryo of Drosophila and during the localization of Ash1 mRNA to the distal tip of the budding cell [84,85]. They bind to RNAs at a motif containing a conserved UGUR (where R is a purine). Despite its importance, only a scarcity of information is available on the function of these RBPs in plants [59,61]. Whereas the Pumilio protein, APUM23, functions in polarity formation in Arabidopsis [61], the role of any Puf proteins in vascular biology is completely unknown.
As previously mentioned, a PTB protein was identified as the core protein in a mobile RNA/protein complex in the phloem of pumpkin [20]. RBP50 has two orthologs in the potato genome, designated StPTB1 and StPTB6. The PTB family of RNA-binding proteins are functional in a wide range of posttranscriptional processes including RNA stability [86], splicing regulation [87], localization [88], translation control [89], and long-distance transport [20]. There are two subfamilies of plant PTBs. One is represented by StPTB1, StPTB6, CmRBP50, and AtPTB3 and these are speculated to be involved in long-distance movement. A second subfamily of PTB proteins, represented by AtPTB1 and −2 and the StPTB7 types, function in alternative splicing [57,90]. The KH-domain protein, Nova, binds to both StPTB1 and −6 [56] and a Nova ortholog was identified in pumpkin phloem sap [2]. Alba was included because it interacts with the mobile RNA POTH1 [21]. Identification of RBPs and their target RNAs in potato PAC provides a valuable experimental framework for testing interactions between proteins and RNAs that may be functional in long-distance signaling processes. For example, screening for binding elements in RNA sequences comparable to the approach implemented for the PTBs, Nova, and Pumilio in this study could be readily performed for any RBP of interest.

Conclusions
Our results confirm that the combination of laser capture microdissection and RNA-seq provides an invaluable and in-depth approach to the study of phloem biology and a comprehensive picture of the mechanisms associated with long-distance signaling and transport. Out of the roughly 39 k genes in the potato genome, approximately 15 k were expressed in PAC of petiole and stem, numbers that are comparable to the number of genes identified in the vascular bundles of cucumber and watermelon [36]. Our GO analysis indicates that signaling and binding processes are important biological  [14] activities associated with phloem cells. The high proportion of RBPs in the expression profiles and the high percentage of transcripts containing binding motifs for three prominent RBPs in their downstream sequences suggest an important role for RNA binding in vascular tissue. The results of this study illustrate the potential of RNA profiling for providing insights into long-distance transport processes mediated by environmental cues that are associated with the sieve element system.

Plant material
All RNA samples were from the photoperiod-responsive genotype S. tuberosum ssp. were disinfected prior to use by submerging in chloroform followed by air-drying. For microdissection, the PALM® Laser Microbeam instrument (Bernried, Germany) was employed. A pulsed UV nitrogen laser beam is projected through the objective lens to a narrow diameter (<1.0 μm) that ablates the target without heating adjacent material. Cells were selected using the graphic tools of the PALM RoboSoftware. Laser pressure catapulting (LPC), with a high photonic pressure force, was used to capture the target phloem cells into the lid of a LPC-microfuge tube containing 25 μl extraction buffer from a Picopure RNA kit (Arcturus Engineering, Mountain View, CA, USA). For each sample an area of approximately 1.5 × 10 6 μm 2 comprised of approximately 5000 cells was collected. To minimize degradation, total harvest time was restricted to 1 h per sample. After cell collection, tubes were inverted and the cap end was vortexed in several short spurts to release cells. Contents of the upright tube were pulsed in a microfuge, incubated for 30 min in a water bath at 42°C, centrifuged at 800 × g for 2 min and stored at −80°C until RNA isolation.

RNA isolation, library preparation and sequencing
RNA was isolated from microdissected cells with the PicoPure RNA isolation kit (Arcturus Engineering, Mountain View, CA, USA), incorporating an on-column treatment step with RNase-Free DNase (Qiagen, Valencia, CA, USA, Cat #79254). Finally, RNA was quantified with an Agilent 2100 Bioanalyzer using reagents from the manufacturer's RNA 6000 Pico kit and then stored at −80°C. Purified RNAs were amplified using Ovation® RNA-Seq system (NuGEN). cDNA libraries were prepared using 2.0 μg of amplified cDNAs and sequenced at the DNA Facility, Iowa State University. For the LD/SD petiole experiment, 2 to 3 cm petiole sections near the junction of the petiole and stem were excised and harvested for both photoperiod conditions. Total RNA was extracted from petioles of long-day or short-day grown S. tuberosum ssp. andigena using RNeasy mini kit (Qiagen). After validation of the quality of RNAs using the 2100 Bioanalyzer, approximately 3.0 μg of total RNA were used for library preparation and sequenced using the HiSeq2500 (Illumina) at the DNA Facility, Iowa State University. The set of LCM isolated samples was sequenced with either Genomic DNA/cDNA/BAC GA II 75-Cycle or mRNA-Seq HiSEQ High Output 100-Cycle. The set of whole petiole samples was sequenced with mRNA-Seq HiSEQ High Output 100-Cycle P.E.

Processing of reads
All the reads were processed as output in fastq format. These reads were aligned to the potato genome (PGSC_DM_v4.03_pseudomolecules.fasta & PGSC_DM_ V403_genes.gff ) from the potato genome database http://solanaceae.plantbiology.msu.edu/pgsc_download. shtml) with GMAP and GSNAP (http://research-pub. gene.com/gmap/). Parameters were set as default. The number of concordant unique reads in each library was counted with HTseq (http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html). The disparity of reads that were mapped to the genome in a concordant and unique manner between the LCM-PACs (25 to 47 %, Additional file 1: Table S1) and the whole petiole (~94 %, Additional file 9: Table S8) samples is most likely explained by the fact that the LCM-derived RNA was collected in picogram amounts and subsequently amplified, whereas the whole petiole RNA was not [92]. It would appear that amplification produced reads that map to multiple locations disproportionately over the uniquely mapped reads. Because samples were only compared between the same tissue types (stem PAC vs. pet PAC or LD pet vs. SD pet), no sample bias was introduced.

Statistical analysis
All the libraries were normalized with the 0.75 quantile to eliminate the difference caused by sample scale and sequencing depth. The LCM-derived libraries were sequenced with two different sequencing methods, paired-end and single-end sequencing. Both the difference coming from petiole vs. stem organs and the difference coming from different sequencing methods were considered in the Generalized Linear Model. With the R package "QuasiSeq" [93] (http:// cran.rproject.org/web/packages/ QuasiSeq/index.html), quasi-negative binomial deviances of each gene were computed, and the normalized count data was fitted with a quasi-likelihood model. DE genes were selected with adjusted p-values less than 0.2. The p-value was adjusted with the method from Nettleton et al. [94]. Effect from single-end sequencing method is removed based on the derived coefficient. LD-and SD-petiole samples were sequenced with a multiplexing tag in the same lane, so the photoperiod effect is the only effect to be analyzed in comparison. The count of each gene in each sample type is the mean value of the normalized reads of the three or four replicates.

GO analysis and GOSeq
Gene ontology categories of all the genes in potato were obtained from the GO database using Blast2GO (http://www.blast2go.com/b2glaunch/start-blast2go), with parameters of 20 hits and an e-value of 10e −6 (Additional file 20: Table S14). This analysis was performed on the iPlant platform. Gene ontology analysis was performed with GOseq [64] to identify overrepresented GO terms in the DE genes. A probability weighting function (PWF) was generated based on transcript length and was applied to eliminate the bias arising from this parameter. The transcript length was obtained from the longest transcript sequence available from the potato genome database (PGSC_DM_v3.4_transcript-update_representative.fasta) for each gene. When the number of over-represented GO terms was too large to visualize, GO terms were reduced with GOslim (http://agbase.msstate.edu/cgi-bin/tools/goslimviewer_ select.pl).

Motif search
For the motif search, the potato genome (V4.03) and annotations from the Potato Genome Sequencing Consortium [62,95] were utilized. The position weight matrices for Nova and Pumilio motifs were obtained from Jiang et al. [67]. We searched the entire potato genome for the presence of these motifs using MEME suite [71]. Because the exact length of the 3′ UTR is currently unavailable, we chose an arbitrary fixed length of 1000 nt for the UTR. For each gene, the 1000 bp region downstream from the end of the coding sequence was considered as its 3′ UTR in this analysis. Finally, using the 'intersectBed' tool (BED-Tools v2.18.2) [72], we extracted the genes containing Nova or Pumilio motifs in the designated 3′ UTRs. The position of motifs was also identified using the 'inter-sectBed' tool.

Real time RT-PCR
RNA preparation and RT-qPCR were performed as previously described [48]. Primers are listed in Additional file 21: Table S15.